LLM API Prices in 2026: GPT-5 Series, Grok 4.3, o3 Repriced 5× Cheaper

The problem

Every LLM provider prices their API differently, and prices change fast. Just in 2026:

o3 dropped from $10 to $2/Mtok input — a 5× price reduction. It's now the same price as gpt-4.1 but with far stronger reasoning. If you wrote it off based on the old price, re-run your estimates.
GPT-5.5 launched at $5/$30 per Mtok with a 1M token context window. GPT-5.4-nano at $0.20/$1.25 is one of the cheapest capable models available.
Grok 4.3 (xAI) at $1.25/$2.50 per Mtok with 1M context — output is 6× cheaper than Claude Sonnet at the same tier
Claude Opus 4.7 repriced: $5/$25 per Mtok (was commonly listed at $15/$75 — the old Opus 3 price) with 1M context. A very different value proposition.
Gemini 2.5 Flash is $0.30/$2.50 (output reflects thinking tokens) — often mispriced in third-party tools at the old $0.15/$0.60
The same open-weight models run on Groq, Together, Fireworks, SambaNova, and Cerebras at very different prices — the gap is often 2–5×

There are now 22 major providers and 144 models to track. Comparing them means visiting 22 different pricing pages, doing your own math for your token counts, and updating your spreadsheet every time something changes.

The solution: llm-prices

llm-prices is a zero-dependency Python CLI that bakes in pricing data for all major providers and does the math locally. No API key, no network calls at runtime, stdlib only.

# install in seconds pip install llm-prices # or: pipx install git+https://github.com/benbencodes/llm-prices

Here's a walkthrough of the six main commands.

1. List: browse all 144 models

$ llm-prices list Prices as of 2026-05-09. Verify at provider's pricing page. Model Provider Input/Mtok Output/Mtok Context Notes ------------------------------------------------------------------------------------------ claude-opus-4-7 Anthropic $ 5.0000 $ 25.0000 1000k Most capable Claude; 1M context claude-sonnet-4-6 Anthropic $ 3.0000 $ 15.0000 1000k Best speed/intelligence balance gemini-3.1-pro-preview Google $ 2.0000 $ 12.0000 1024k Gemini 3 flagship preview gemini-2.5-flash Google $ 0.3000 $ 2.5000 1024k Hybrid reasoning; 1M context gpt-5.5 OpenAI $ 5.0000 $ 30.0000 1025k OpenAI flagship; 1M context gpt-5.4 OpenAI $ 2.5000 $ 15.0000 1025k More affordable GPT-5; 1M context gpt-5.4-mini OpenAI $ 0.7500 $ 4.5000 266k Strong mini model gpt-5.4-nano OpenAI $ 0.2000 $ 1.2500 266k Cheapest GPT-5.4 gpt-5-nano OpenAI $ 0.0500 $ 0.4000 266k Ultra-cheap GPT-5 gpt-4.1-nano OpenAI $ 0.1000 $ 0.4000 1023k Fastest, cheapest GPT-4.1 ... grok-4.3 xAI $ 1.2500 $ 2.5000 1000k Grok flagship; 1M ctx grok-4.20-reasoning xAI $ 1.2500 $ 2.5000 1953k Grok 4 reasoning; 2M ctx ... 144 model(s) shown.

Filter by provider or search by name:

$ llm-prices list --provider Together $ llm-prices list --search sonar $ llm-prices list --sort input # cheapest first

Filter and sort

$ llm-prices list --provider OpenAI --sort input # cheapest first $ llm-prices list --search grok $ llm-prices list --provider xAI --markdown

Or export CSV for spreadsheet analysis:

llm-prices list --csv > llm_pricing.csv

2. Calc: what will this specific call cost?

$ llm-prices calc gpt-4o --in 10000 --out 2000 Model : gpt-4o (OpenAI) Tokens : 10,000 in / 2,000 out Rate : $2.5/Mtok in, $10.0/Mtok out Cost : $0.0250 in + $0.0200 out = $0.0450 total

Use your app's actual token counts to get a concrete cost estimate. JSON output for scripting:

$ llm-prices calc claude-sonnet-4-6 --in 5000 --out 1000 --json { "model": "claude-sonnet-4-6", "provider": "Anthropic", "input_tokens": 5000, "output_tokens": 1000, "input_cost_usd": 0.015, "output_cost_usd": 0.015, "total_cost_usd": 0.03 }

3. Compare: side-by-side across providers

Which model gives you the best value for a typical agent call (5,000 input / 1,000 output tokens)? Here's how the 2026 frontier models stack up:

$ llm-prices compare gpt-5.5 gpt-5.4 grok-4.3 claude-sonnet-4-6 \ claude-opus-4-7 o3 gemini-3.1-pro-preview --in 5000 --out 1000 Comparison: 5,000 input tokens, 1,000 output tokens Model Provider Input Output Total ------------------------------------------------------------------------ grok-4.3 xAI $0.006250 $0.002500 $0.008750 o3 OpenAI $0.010000 $0.008000 $0.018000 (2.1x) gpt-5.4 OpenAI $0.012500 $0.015000 $0.027500 (3.1x) gemini-3.1-pro-preview Google $0.010000 $0.012000 $0.022000 (2.5x) claude-sonnet-4-6 Anthropic $0.015000 $0.015000 $0.030000 (3.4x) claude-opus-4-7 Anthropic $0.025000 $0.025000 $0.050000 (5.7x) gpt-5.5 OpenAI $0.025000 $0.030000 $0.055000 (6.3x) Cheapest: grok-4.3 at $0.008750

Grok 4.3 is the cheapest frontier model for this workload — $0.0088 vs GPT-5.5 at $0.055. For API-heavy applications, that 6× gap on output tokens alone can determine profitability. But the right model depends on quality, latency, and feature support for your specific task.

For open-weight models via inference providers, the gap is even larger:

$ llm-prices compare qwen3-235b deepseek-chat grok-4.3 gpt-5-nano --in 5000 --out 1000 qwen3-235b Together $0.001000 $0.000600 $0.001600 deepseek-chat DeepSeek $0.001350 $0.001100 $0.002450 gpt-5-nano OpenAI $0.000250 $0.000400 $0.000650 grok-4.3 xAI $0.006250 $0.002500 $0.008750

gpt-5-nano at $0.0007 total for 5k/1k tokens is remarkable — GPT-5 capability tier at sub-cent prices. It's worth evaluating for classification, extraction, and structured output tasks where you don't need full flagship reasoning.

4. Budget: how many calls per dollar?

Given a budget and your typical token counts, which models can you run at scale?

$ llm-prices budget 1.00 --in 2000 --out 500 Budget: $1.0000 | Tokens per call: 2,000 in / 500 out Model Provider Cost/call Calls -------------------------------------------------------------- llama-3.1-8b Groq $0.0001250 8,000 gemini-2.0-flash Google $0.0002500 4,000 qwen3.5-9b Together $0.0002750 3,636 gpt-4.1-nano OpenAI $0.0004000 2,500 ... gpt-4o OpenAI $0.0100000 100 claude-opus-4-7 Anthropic $0.0750000 13

Budget $1 buys you 8,000 Llama calls or 13 Claude Opus calls. That's the concrete number you need for cost planning.

5. Top: find the cheapest models for your workload

Instead of running budget and mentally sorting, top N ranks every model by total cost for your exact token counts and returns the cheapest N:

$ llm-prices top 5 --in 5000 --out 1000 Top 5 cheapest: 5,000 input / 1,000 output tokens # Model Provider Input Output Total ---------------------------------------------------------------------- 1 llama-3.1-8b Groq $0.000250 $0.000080 $0.000330 2 gemini-1.5-flash-8b Google $0.000188 $0.000150 $0.000338 3 command-r7b Cohere $0.000188 $0.000150 $0.000338 4 qwen3.5-9b Together $0.000500 $0.000150 $0.000650 5 gemini-1.5-flash Google $0.000375 $0.000300 $0.000675

Narrow it to a single provider, or paste the result into a doc with --markdown:

llm-prices top 3 --provider Anthropic --in 2000 --out 800 llm-prices top 10 --in 5000 --out 1000 --markdown

6. Providers: quick overview

$ llm-prices providers Available providers: AI21 (2 models) Anthropic (8 models) Bedrock (5 models) Cerebras (3 models) Cohere (3 models) DeepSeek (2 models) Fireworks (6 models) Google (9 models) Groq (7 models) Mistral (7 models) OpenAI (21 models) Perplexity (4 models) SambaNova (5 models) Together (7 models) xAI (4 models) Total: 144 models

Notable pricing stories in 2026:

o3 repriced 5× (o3): dropped from $10/$40 to $2/$8 per Mtok. It's now priced identically to gpt-4.1 but with much stronger multi-step reasoning. Previously one of the most expensive models; now a reasonable choice for complex tasks.

Grok 4.3 (grok-4.3): $1.25 input / $2.50 output per Mtok with a 1M context window. The output price is 6× cheaper than Claude Sonnet 4.6 ($15) at comparable capability claims. The 2M-context grok-4.20-reasoning variant is at the same price — largest context window in the dataset.

GPT-5 nano tier: gpt-5-nano at $0.05/$0.40 per Mtok is the cheapest GPT-5-family model. That's competitive with Groq's Llama 3.1 8B on input price ($0.05 vs $0.05) while potentially offering better capability on instruction-following and structured output tasks.

Gemini 2.5 Flash note: many third-party tools still list this at $0.15/$0.60 — the actual price is $0.30/$2.50 because it's a hybrid reasoning model where output tokens include thinking tokens. Always verify on Google's official pricing page.

Real-world example: planning a consumer AI app

You're building an app that processes user questions. Each request is ~1,500 input tokens (system prompt + message) and ~300 output tokens.

$ llm-prices budget 100.00 --in 1500 --out 300

On $100 (1,500 in / 300 out per call):

gpt-5-nano: ~900,000 calls ($0.000195/call)
Groq Llama 3.1 8B: ~400,000 calls ($0.000165/call)
gpt-5.4-nano: ~200,000 calls ($0.000675/call)
grok-4.3: ~45,000 calls ($0.002625/call)
gpt-5.4: ~14,000 calls ($0.008250/call)
Claude Opus 4.7: ~5,000 calls ($0.015000/call)
gpt-5.5: ~3,600 calls ($0.016500/call)

The gap between cheapest and most expensive frontier model is 250×. For a consumer app at scale, the right model selection can mean the difference between profitable and unprofitable at every scale tier.

Comparing the same model across inference providers

One often-overlooked optimization: the same open-weight model runs on multiple inference hosts at different prices.

$ llm-prices compare deepseek-reasoner deepseek-r1-together \ deepseek-r1-fw --in 5000 --out 2000

This compares DeepSeek R1 on DeepSeek's direct API ($0.55/$2.19 per Mtok), Together AI ($3.00/$7.00), and Fireworks ($3.00/$7.00) — pick the cheapest option or the one with better availability for your region.

Installation

# pipx (recommended — installs as isolated tool, works immediately) pipx install git+https://github.com/benbencodes/llm-prices # Homebrew (macOS/Linux) brew tap benbencodes/tap && brew install llm-prices # pip pip install llm-prices # PyPI publish in progress

No API key needed. No network calls at runtime. Works offline.

Use as a Python library

from llm_prices import calculate_cost, MODELS # Calculate exact cost for a GPT-5.4 call result = calculate_cost("gpt-5.4", input_tokens=10_000, output_tokens=3_000) print(f"${result['total_cost_usd']:.4f}") # $0.0700 # Find all models under $0.30/Mtok input price cheap = {name: info for name, info in MODELS.items() if info["input_per_mtok"] < 0.30} # Compare o3 vs gpt-4.1 (same price, different capability) for model in ["o3", "gpt-4.1"]: m = MODELS[model] print(f'{model}: ${m["input_per_mtok"]}/${m["output_per_mtok"]} per Mtok')

Contributing

The pricing data lives in a single Python file (llm_prices/data.py) — a dict of model → prices. If a price is wrong or a new model launched, open a PR with your source cited.

GitHub: benbencodes/llm-prices

Chain	Address
SOL	kbghHYeBXr2AcYUyvkofHa9sArgkJcKBC6zZhSdao82
Base / ETH / EVM	0x310eEb225245D5A3e1773C5Def30Fe5d0289A1b3
LTC	ltc1q9fwegmfey7njksnmw8p787cz87l2lpf5372p2w
DOGE	DCHKeC2QQQSFVTA49gK44D1bfyv8QSnZyX
BTC	bc1qv0ny3c97lk80qv5v79f52w3hyaqq2ss0zdqp52
TRX / USDT-TRC20	TFaN8RPkgFkWjL5XHfJKRzyDQp2ECskQtH

LLM API Prices in 2026: GPT-5 Series, Grok 4.3, and o3 Repriced 5× Cheaper

The problem

The solution: llm-prices

1. List: browse all 144 models

Filter and sort

2. Calc: what will this specific call cost?

3. Compare: side-by-side across providers

4. Budget: how many calls per dollar?

5. Top: find the cheapest models for your workload

6. Providers: quick overview

Real-world example: planning a consumer AI app

Comparing the same model across inference providers

Installation

Use as a Python library

Contributing

Support this project