Cost Calculator

1.0M tokens
500K tokens
Use Case Presets:
Model Tier:

Monthly Cost Comparison

Provider Model Input $/1M Output $/1M Monthly Cost Context Speed Tier Try It

Visual Comparison

How LLM API Pricing Works

Token-Based Pricing

Large Language Model (LLM) APIs charge based on tokens -- small chunks of text that models process. A token is roughly 3/4 of a word in English. Most providers charge separately for input tokens (your prompt) and output tokens (the model's response), with output tokens typically costing 2-5x more than input tokens.

Prices are quoted per 1 million tokens. For example, if a model charges $2.50 per 1M input tokens and you send 10 million input tokens in a month, your input cost alone would be $25.00.

What Affects Your Cost

Your monthly LLM API bill depends on several factors:

  • Model choice: Flagship models like GPT-4o, Claude Opus 4, and Gemini 1.5 Pro deliver the highest quality but cost significantly more than budget alternatives.
  • Input vs. output ratio: Applications that generate long responses (like code generation) have higher output costs, while search or classification tasks are input-heavy.
  • Context window usage: Larger context windows let you include more information but increase input token costs per request.
  • Request volume: High-throughput applications like chatbots or document processing can accumulate millions of tokens daily.

Choosing the Right Model

Not every task needs a flagship model. Here is a general guide:

  • Flagship models (GPT-4o, Claude Opus 4, Gemini 1.5 Pro): Best for complex reasoning, nuanced writing, and multi-step tasks. Use when quality is critical.
  • Mid-tier models (Claude Sonnet 4, Mistral Large, Command R+): Good balance of capability and cost. Suitable for most production applications.
  • Budget models (GPT-4o-mini, Claude Haiku 3.5, Gemini 2.0 Flash): Ideal for high-volume, simpler tasks like classification, extraction, and basic chat. Often 10-50x cheaper than flagships.

Cost Optimization Tips

  • Prompt caching: Many providers offer cached prompt pricing at 50-90% discount for repeated prefixes.
  • Batch APIs: OpenAI and Anthropic offer batch processing at ~50% discount for non-real-time workloads.
  • Model routing: Use cheap models for simple tasks and route only complex queries to expensive models.
  • Prompt engineering: Shorter, more efficient prompts reduce input costs without sacrificing quality.

More Free Tools

All Tools GPU Benchmark AWS Estimator Config Converter