The problem
Every LLM provider prices their API differently, and prices change fast. Just in 2026:
- o3 dropped from $10 to $2/Mtok input โ a 5ร price reduction. It's now the same price as gpt-4.1 but with far stronger reasoning. If you wrote it off based on the old price, re-run your estimates.
- GPT-5.5 launched at $5/$30 per Mtok with a 1M token context window. GPT-5.4-nano at $0.20/$1.25 is one of the cheapest capable models available.
- Grok 4.3 (xAI) at $1.25/$2.50 per Mtok with 1M context โ output is 6ร cheaper than Claude Sonnet at the same tier
- Claude Opus 4.7 repriced: $5/$25 per Mtok (was commonly listed at $15/$75 โ the old Opus 3 price) with 1M context. A very different value proposition.
- Gemini 2.5 Flash is $0.30/$2.50 (output reflects thinking tokens) โ often mispriced in third-party tools at the old $0.15/$0.60
- The same open-weight models run on Groq, Together, Fireworks, SambaNova, and Cerebras at very different prices โ the gap is often 2โ5ร
There are now 44 major providers and 273 models to track. Comparing them means visiting 44 different pricing pages, doing your own math for your token counts, and updating your spreadsheet every time something changes.
The solution: llm-prices
llm-prices is a zero-dependency Python CLI that bakes in pricing data for all major providers and does the math locally. No API key, no network calls at runtime, stdlib only.
Here's a walkthrough of the six main commands.
1. List: browse all 273 models
Filter by provider or search by name:
Filter and sort
Or export CSV for spreadsheet analysis:
2. Calc: what will this specific call cost?
Use your app's actual token counts to get a concrete cost estimate. JSON output for scripting:
3. Compare: side-by-side across providers
Which model gives you the best value for a typical agent call (5,000 input / 1,000 output tokens)? Here's how the 2026 frontier models stack up:
For open-weight models via inference providers, the gap is even larger:
4. Budget: how many calls per dollar?
Given a budget and your typical token counts, which models can you run at scale?
Budget $1 buys you 8,000 Llama calls or 13 Claude Opus calls. That's the concrete number you need for cost planning.
5. Top: find the cheapest models for your workload
Instead of running budget and mentally sorting, top N ranks every model by total cost for your exact token counts and returns the cheapest N:
Narrow it to a single provider, or paste the result into a doc with --markdown:
6. Providers: quick overview
Notable pricing stories in 2026:
Liquid AI LFM-2 24B (lfm-2-24b): $0.03/$0.12 per Mtok โ one of the cheapest capable models in the dataset. Built on Liquid AI's non-transformer "liquid network" architecture (MIT spinout, founded 2023). This is the first alternative-architecture model in our dataset. Available via OpenRouter.
Inception AI Mercury 2 (mercury-2): $0.25/$0.75 per Mtok, 128k context. Uses a diffusion-based language model architecture rather than autoregressive next-token prediction. Founded 2024. A second alternative architecture now in the dataset.
Gemini 3.5 Flash (gemini-3.5-flash): $0.75/$4.50 per Mtok, 1M context. Launched at Google I/O 2026. Output price includes thinking tokens (hybrid reasoning model). 2.5ร more expensive on input than Gemini 2.5 Flash but with improved quality.
Qwen3.6 series: new generation from Alibaba. qwen3.6-flash at $0.19/$1.13 per Mtok with 1M context window stands out; qwen3.6-27b is the dense flagship. Qwen3.5 397B MoE at $0.39/$2.34 is the largest Qwen model in the dataset.
o3 repriced 5ร (o3): dropped from $10/$40 to $2/$8 per Mtok. It's now priced identically to gpt-4.1 but with much stronger multi-step reasoning. Previously one of the most expensive models; now a reasonable choice for complex tasks.
Grok 4.3 (grok-4.3): $1.25 input / $2.50 output per Mtok with a 1M context window. The output price is 6ร cheaper than Claude Sonnet 4.6 ($15) at comparable capability claims. The 2M-context grok-4.20-reasoning variant is at the same price โ largest context window in the dataset.
GPT-5 nano tier: gpt-5-nano at $0.05/$0.40 per Mtok is the cheapest GPT-5-family model. That's competitive with Groq's Llama 3.1 8B on input price while potentially offering better capability on instruction-following and structured output tasks.
New in v0.1.42: DeepSeek V4, Qwen3.6, Mistral Medium 3.5, Perceptron
DeepSeek V4 Pro: deepseek-v4-pro at $0.435/$0.87 per Mtok with a 1M context window. This is DeepSeek's latest flagship โ a 1.6 trillion total parameter MoE model (active subset per inference). V4 continues the pattern of each DeepSeek generation delivering a major capability leap at the same or lower price. The efficient tier, deepseek-v4-flash, runs a 284B-parameter MoE at $0.112/$0.224 โ among the cheapest capable models in the dataset.
Qwen3.6 series: Four new models from Alibaba's Qwen3.6 generation. qwen3.6-flash at $0.1875/$1.125 with a 1M context window supports text, image, video, and audio input โ the most capable cheap multimodal model in the dataset. qwen3.6-27b (dense 27B, $0.32/$3.2) and qwen3.6-35b-moe (35B total / 3B active, $0.15/$1.0) fill the mid-tier. The qwen3.6-max-preview at $1.04/$6.24 is the sparse MoE frontier preview. Qwen now has 33 models in the dataset โ more than any other single provider.
Mistral Medium 3.5: mistral-medium-3-5 at $1.5/$7.5 per Mtok, 262k context โ an update to Medium 3 with text+image input support. Dense 128B architecture. Mistral now has 23 models in the dataset, spanning $0.02/Mtok (Mistral Nemo/Small) to $7.5/Mtok (Medium 3.5).
Moonshot Kimi K2.6: kimi-k2.6 at $0.73/$3.49, 262k ctx โ next-generation multimodal from Moonshot AI, designed for long-horizon coding and agentic tasks. A direct competitor to Claude Sonnet at 5ร cheaper output price.
Perceptron Mk1 (new provider #44): perceptron-mk1 at $0.15/$1.5, 32k context. Perceptron AI specializes in video understanding and embodied AI reasoning โ not a general-purpose LLM, but a specialist for robotics and video-analysis workflows. A new category in the dataset alongside Morph's code-editing specialization.
New in v0.1.39: 8 more providers โ now 43 total
InclusionAI Ling 2.6 (ling-2.6-flash): $0.01/$0.03 per Mtok โ the cheapest input price in the dataset. The full ling-2.6-1t model has 1 trillion parameters (MoE architecture, active subset per token), with a 262k context window at $0.30/$2.50. The Flash tier at $0.01/Mtok makes it effectively free for most workloads. Provider #40.
Xiaomi MiMo V2.5 (mimo-v2.5): $0.40/$2.00 per Mtok with a 1M token context window. Xiaomi (the consumer electronics giant behind Mi phones) open-sourced their MiMo reasoning model series, claiming MiMo V2.5 outperforms GPT-4o on mathematical reasoning benchmarks. The Flash variant at $0.10/$0.30 is significantly cheaper. Provider #36.
Tencent Hunyuan A13B (hunyuan-a13b): $0.14/$0.57 per Mtok, 131k context. Tencent's HunyuanLLM with 13B active parameters (MoE architecture, 80B total). The preview HY3 model at $0.066/$0.26 per Mtok is one of the cheapest Chinese-origin flagship models. Provider #38.
Z.AI GLM 5 series: The GLM 5.x generation models at 202k context โ glm-4.7-flash at $0.06/$0.40, glm-5 at $0.60/$1.92. Note: Z.AI is a separate entity from ZhipuAI on OpenRouter โ the dataset now covers both generations (ZhipuAI GLM 4.x and Z.AI GLM 5.x). Provider #37.
Arcee AI Trinity series: US-based startup specializing in model merging and enterprise fine-tuning. arcee-trinity-mini at $0.045/$0.15 is their budget tier; arcee-maestro at $0.90/$3.30 is their reasoning flagship. 262k context on the larger variants. Provider #39.
Morph code editing models: A new category in the dataset โ not a general-purpose LLM provider but a specialized API for applying code diffs in agentic workflows. morph-v3-large at $0.90/$1.90, 262k context. Purpose-built for Claude Code / Cursor-style code application tasks. Provider #43.
Real-world example: planning a consumer AI app
You're building an app that processes user questions. Each request is ~1,500 input tokens (system prompt + message) and ~300 output tokens.
On $100 (1,500 in / 300 out per call):
- gpt-5-nano: ~900,000 calls ($0.000195/call)
- Groq Llama 3.1 8B: ~400,000 calls ($0.000165/call)
- gpt-5.4-nano: ~200,000 calls ($0.000675/call)
- grok-4.3: ~45,000 calls ($0.002625/call)
- gpt-5.4: ~14,000 calls ($0.008250/call)
- Claude Opus 4.7: ~5,000 calls ($0.015000/call)
- gpt-5.5: ~3,600 calls ($0.016500/call)
The gap between cheapest and most expensive frontier model is 250ร. For a consumer app at scale, the right model selection can mean the difference between profitable and unprofitable at every scale tier.
Comparing the same model across inference providers
One often-overlooked optimization: the same open-weight model runs on multiple inference hosts at different prices.
This compares DeepSeek R1 on DeepSeek's direct API ($0.55/$2.19 per Mtok), Together AI ($3.00/$7.00), and Fireworks ($3.00/$7.00) โ pick the cheapest option or the one with better availability for your region.
Installation
No API key needed. No network calls at runtime. Works offline.
Use as a Python library
Contributing
The pricing data lives in a single Python file (llm_prices/data.py) โ a dict of model โ prices. If a price is wrong or a new model launched, open a PR with your source cited.