LLM Cost Calculator
Enter your token usage and compare real monthly costs across 18+ models — GPT-4o, Claude 3.5, Gemini 2.0, Llama 3, DeepSeek, and more. Not sure which model to pick? Take the 5-question quiz →
| Model | Input $/1M tok | Output $/1M tok | Context | Your Cost |
|---|
Tip: For most conversational apps with ~70% input / 30% output, the cheapest capable model is usually Gemini Flash or DeepSeek V3. For high-output (code generation, content) workloads, open-source models via Groq or Together offer the best price/performance.
Frequently Asked Questions
Which LLM is cheapest for a production app? +
For most production apps with standard conversational usage (~70% input, 30% output), DeepSeek V3 ($0.14/$0.28 per 1M tokens) and Gemini 2.0 Flash ($0.10/$0.40) are consistently the cheapest. At 1M tokens/day, DeepSeek V3 costs roughly $6/month vs. $120/month for GPT-4o.
What's the difference between input and output tokens? +
Input tokens are the tokens in your prompt (system prompt + user messages + conversation history). Output tokens are the tokens the model generates. Output tokens are almost always priced higher — often 3-5× more — because generation is computationally more expensive. For most chatbots, input tokens dominate (system prompt is re-sent every turn). For code generation, output tokens dominate.
How do I estimate tokens before making an API call? +
A rough rule of thumb: 1 token ≈ 0.75 words in English, or about 4 characters. A typical paragraph is ~100 tokens. Use our LLM Token Counter to count tokens exactly using each provider's tokenizer. For production cost estimates, measure your actual P95 prompt/response lengths and multiply by your request volume.
Is GPT-4o worth the cost vs. cheaper alternatives? +
GPT-4o at $2.50/$10 per 1M tokens is ~17× more expensive than DeepSeek V3. Whether it's "worth it" depends on your use case. For structured data extraction, complex reasoning, or high-stakes decisions, quality differences are significant. For simple classification, summarization, or FAQ responses, a cheaper model will be indistinguishable to users. Test your actual prompts on multiple models before committing.
What are batch API discounts? +
OpenAI, Anthropic, and Google all offer batch APIs at roughly 50% discount vs. real-time pricing. The trade-off is latency — results are returned within 24 hours instead of immediately. For offline workloads (data labeling, content generation, analysis), batch mode can cut costs in half. Prices shown in this calculator are real-time rates; halve them for batch estimates.
How accurate is this pricing data? +
Prices are manually maintained as of May 2026. LLM pricing changes frequently — Gemini 2.0 Flash dropped prices in early 2025, DeepSeek launched at aggressively low prices. Always verify with the official pricing pages: OpenAI, Anthropic, Google. Submit price corrections via GitHub.
LLM Pricing by Use Case
Quick reference: best model choice for common use cases at 1M tokens/day (70% input / 30% output).