The problem
Every LLM provider prices their API differently, and prices change fast. Just in 2026:
- o3 dropped from $10 to $2/Mtok input โ a 5ร price reduction. It's now the same price as gpt-4.1 but with far stronger reasoning. If you wrote it off based on the old price, re-run your estimates.
- GPT-5.5 launched at $5/$30 per Mtok with a 1M token context window. GPT-5.4-nano at $0.20/$1.25 is one of the cheapest capable models available.
- Grok 4.3 (xAI) at $1.25/$2.50 per Mtok with 1M context โ output is 6ร cheaper than Claude Sonnet at the same tier
- Claude Opus 4.7 repriced: $5/$25 per Mtok (was commonly listed at $15/$75 โ the old Opus 3 price) with 1M context. A very different value proposition.
- Gemini 2.5 Flash is $0.30/$2.50 (output reflects thinking tokens) โ often mispriced in third-party tools at the old $0.15/$0.60
- The same open-weight models run on Groq, Together, Fireworks, SambaNova, and Cerebras at very different prices โ the gap is often 2โ5ร
There are now 22 major providers and 144 models to track. Comparing them means visiting 22 different pricing pages, doing your own math for your token counts, and updating your spreadsheet every time something changes.
The solution: llm-prices
llm-prices is a zero-dependency Python CLI that bakes in pricing data for all major providers and does the math locally. No API key, no network calls at runtime, stdlib only.
pip install llm-prices
Here's a walkthrough of the six main commands.
1. List: browse all 144 models
$ llm-prices list
Prices as of 2026-05-09. Verify at provider's pricing page.
Model Provider Input/Mtok Output/Mtok Context Notes
------------------------------------------------------------------------------------------
claude-opus-4-7 Anthropic $ 5.0000 $ 25.0000 1000k Most capable Claude; 1M context
claude-sonnet-4-6 Anthropic $ 3.0000 $ 15.0000 1000k Best speed/intelligence balance
gemini-3.1-pro-preview Google $ 2.0000 $ 12.0000 1024k Gemini 3 flagship preview
gemini-2.5-flash Google $ 0.3000 $ 2.5000 1024k Hybrid reasoning; 1M context
gpt-5.5 OpenAI $ 5.0000 $ 30.0000 1025k OpenAI flagship; 1M context
gpt-5.4 OpenAI $ 2.5000 $ 15.0000 1025k More affordable GPT-5; 1M context
gpt-5.4-mini OpenAI $ 0.7500 $ 4.5000 266k Strong mini model
gpt-5.4-nano OpenAI $ 0.2000 $ 1.2500 266k Cheapest GPT-5.4
gpt-5-nano OpenAI $ 0.0500 $ 0.4000 266k Ultra-cheap GPT-5
gpt-4.1-nano OpenAI $ 0.1000 $ 0.4000 1023k Fastest, cheapest GPT-4.1
...
grok-4.3 xAI $ 1.2500 $ 2.5000 1000k Grok flagship; 1M ctx
grok-4.20-reasoning xAI $ 1.2500 $ 2.5000 1953k Grok 4 reasoning; 2M ctx
...
144 model(s) shown.
Filter by provider or search by name:
$ llm-prices list --provider Together
$ llm-prices list --search sonar
$ llm-prices list --sort input # cheapest first
Filter and sort
$ llm-prices list --provider OpenAI --sort input # cheapest first
$ llm-prices list --search grok
$ llm-prices list --provider xAI --markdown
Or export CSV for spreadsheet analysis:
llm-prices list --csv > llm_pricing.csv
2. Calc: what will this specific call cost?
$ llm-prices calc gpt-4o --in 10000 --out 2000
Model : gpt-4o (OpenAI)
Tokens : 10,000 in / 2,000 out
Rate : $2.5/Mtok in, $10.0/Mtok out
Cost : $0.0250 in + $0.0200 out = $0.0450 total
Use your app's actual token counts to get a concrete cost estimate. JSON output for scripting:
$ llm-prices calc claude-sonnet-4-6 --in 5000 --out 1000 --json
{
"model": "claude-sonnet-4-6",
"provider": "Anthropic",
"input_tokens": 5000,
"output_tokens": 1000,
"input_cost_usd": 0.015,
"output_cost_usd": 0.015,
"total_cost_usd": 0.03
}
3. Compare: side-by-side across providers
Which model gives you the best value for a typical agent call (5,000 input / 1,000 output tokens)? Here's how the 2026 frontier models stack up:
$ llm-prices compare gpt-5.5 gpt-5.4 grok-4.3 claude-sonnet-4-6 \
claude-opus-4-7 o3 gemini-3.1-pro-preview --in 5000 --out 1000
Comparison: 5,000 input tokens, 1,000 output tokens
Model Provider Input Output Total
------------------------------------------------------------------------
grok-4.3 xAI $0.006250 $0.002500 $0.008750
o3 OpenAI $0.010000 $0.008000 $0.018000 (2.1x)
gpt-5.4 OpenAI $0.012500 $0.015000 $0.027500 (3.1x)
gemini-3.1-pro-preview Google $0.010000 $0.012000 $0.022000 (2.5x)
claude-sonnet-4-6 Anthropic $0.015000 $0.015000 $0.030000 (3.4x)
claude-opus-4-7 Anthropic $0.025000 $0.025000 $0.050000 (5.7x)
gpt-5.5 OpenAI $0.025000 $0.030000 $0.055000 (6.3x)
Cheapest: grok-4.3 at $0.008750
Grok 4.3 is the cheapest frontier model for this workload โ $0.0088 vs GPT-5.5 at $0.055. For API-heavy applications, that 6ร gap on output tokens alone can determine profitability. But the right model depends on quality, latency, and feature support for your specific task.
For open-weight models via inference providers, the gap is even larger:
$ llm-prices compare qwen3-235b deepseek-chat grok-4.3 gpt-5-nano --in 5000 --out 1000
qwen3-235b Together $0.001000 $0.000600 $0.001600
deepseek-chat DeepSeek $0.001350 $0.001100 $0.002450
gpt-5-nano OpenAI $0.000250 $0.000400 $0.000650
grok-4.3 xAI $0.006250 $0.002500 $0.008750
gpt-5-nano at $0.0007 total for 5k/1k tokens is remarkable โ GPT-5 capability tier at sub-cent prices. It's worth evaluating for classification, extraction, and structured output tasks where you don't need full flagship reasoning.
4. Budget: how many calls per dollar?
Given a budget and your typical token counts, which models can you run at scale?
$ llm-prices budget 1.00 --in 2000 --out 500
Budget: $1.0000 | Tokens per call: 2,000 in / 500 out
Model Provider Cost/call Calls
--------------------------------------------------------------
llama-3.1-8b Groq $0.0001250 8,000
gemini-2.0-flash Google $0.0002500 4,000
qwen3.5-9b Together $0.0002750 3,636
gpt-4.1-nano OpenAI $0.0004000 2,500
...
gpt-4o OpenAI $0.0100000 100
claude-opus-4-7 Anthropic $0.0750000 13
Budget $1 buys you 8,000 Llama calls or 13 Claude Opus calls. That's the concrete number you need for cost planning.
5. Top: find the cheapest models for your workload
Instead of running budget and mentally sorting, top N ranks every model by total cost for your exact token counts and returns the cheapest N:
$ llm-prices top 5 --in 5000 --out 1000
Top 5 cheapest: 5,000 input / 1,000 output tokens
# Model Provider Input Output Total
----------------------------------------------------------------------
1 llama-3.1-8b Groq $0.000250 $0.000080 $0.000330
2 gemini-1.5-flash-8b Google $0.000188 $0.000150 $0.000338
3 command-r7b Cohere $0.000188 $0.000150 $0.000338
4 qwen3.5-9b Together $0.000500 $0.000150 $0.000650
5 gemini-1.5-flash Google $0.000375 $0.000300 $0.000675
Narrow it to a single provider, or paste the result into a doc with --markdown:
llm-prices top 3 --provider Anthropic --in 2000 --out 800
llm-prices top 10 --in 5000 --out 1000 --markdown
6. Providers: quick overview
$ llm-prices providers
Available providers:
AI21 (2 models)
Anthropic (8 models)
Bedrock (5 models)
Cerebras (3 models)
Cohere (3 models)
DeepSeek (2 models)
Fireworks (6 models)
Google (9 models)
Groq (7 models)
Mistral (7 models)
OpenAI (21 models)
Perplexity (4 models)
SambaNova (5 models)
Together (7 models)
xAI (4 models)
Total: 144 models
Notable pricing stories in 2026:
o3 repriced 5ร (o3): dropped from $10/$40 to $2/$8 per Mtok. It's now priced identically to gpt-4.1 but with much stronger multi-step reasoning. Previously one of the most expensive models; now a reasonable choice for complex tasks.
Grok 4.3 (grok-4.3): $1.25 input / $2.50 output per Mtok with a 1M context window. The output price is 6ร cheaper than Claude Sonnet 4.6 ($15) at comparable capability claims. The 2M-context grok-4.20-reasoning variant is at the same price โ largest context window in the dataset.
GPT-5 nano tier: gpt-5-nano at $0.05/$0.40 per Mtok is the cheapest GPT-5-family model. That's competitive with Groq's Llama 3.1 8B on input price ($0.05 vs $0.05) while potentially offering better capability on instruction-following and structured output tasks.
Gemini 2.5 Flash note: many third-party tools still list this at $0.15/$0.60 โ the actual price is $0.30/$2.50 because it's a hybrid reasoning model where output tokens include thinking tokens. Always verify on Google's official pricing page.
Real-world example: planning a consumer AI app
You're building an app that processes user questions. Each request is ~1,500 input tokens (system prompt + message) and ~300 output tokens.
$ llm-prices budget 100.00 --in 1500 --out 300
On $100 (1,500 in / 300 out per call):
- gpt-5-nano: ~900,000 calls ($0.000195/call)
- Groq Llama 3.1 8B: ~400,000 calls ($0.000165/call)
- gpt-5.4-nano: ~200,000 calls ($0.000675/call)
- grok-4.3: ~45,000 calls ($0.002625/call)
- gpt-5.4: ~14,000 calls ($0.008250/call)
- Claude Opus 4.7: ~5,000 calls ($0.015000/call)
- gpt-5.5: ~3,600 calls ($0.016500/call)
The gap between cheapest and most expensive frontier model is 250ร. For a consumer app at scale, the right model selection can mean the difference between profitable and unprofitable at every scale tier.
Comparing the same model across inference providers
One often-overlooked optimization: the same open-weight model runs on multiple inference hosts at different prices.
$ llm-prices compare deepseek-reasoner deepseek-r1-together \
deepseek-r1-fw --in 5000 --out 2000
This compares DeepSeek R1 on DeepSeek's direct API ($0.55/$2.19 per Mtok), Together AI ($3.00/$7.00), and Fireworks ($3.00/$7.00) โ pick the cheapest option or the one with better availability for your region.
Installation
pipx install git+https://github.com/benbencodes/llm-prices
brew tap benbencodes/tap && brew install llm-prices
pip install llm-prices
No API key needed. No network calls at runtime. Works offline.
Use as a Python library
result = calculate_cost("gpt-5.4", input_tokens=10_000, output_tokens=3_000)
print(f"${result['total_cost_usd']:.4f}")
cheap = {name: info for name, info in MODELS.items()
if info["input_per_mtok"] < 0.30}
for model in ["o3", "gpt-4.1"]:
m = MODELS[model]
print(f'{model}: ${m["input_per_mtok"]}/${m["output_per_mtok"]} per Mtok')
Contributing
The pricing data lives in a single Python file (llm_prices/data.py) โ a dict of model โ prices. If a price is wrong or a new model launched, open a PR with your source cited.
GitHub: benbencodes/llm-prices