The problem

Every LLM provider prices their API differently, and prices change fast. Just in 2026:

There are now 44 major providers and 273 models to track. Comparing them means visiting 44 different pricing pages, doing your own math for your token counts, and updating your spreadsheet every time something changes.

The solution: llm-prices

llm-prices is a zero-dependency Python CLI that bakes in pricing data for all major providers and does the math locally. No API key, no network calls at runtime, stdlib only.

# install in seconds pip install llm-prices # or: pipx install git+https://github.com/benbencodes/llm-prices

Here's a walkthrough of the six main commands.


1. List: browse all 273 models

$ llm-prices list Prices as of 2026-05-09. Verify at provider's pricing page. Model Provider Input/Mtok Output/Mtok Context Notes ------------------------------------------------------------------------------------------ claude-opus-4-7 Anthropic $ 5.0000 $ 25.0000 1000k Most capable Claude; 1M context claude-sonnet-4-6 Anthropic $ 3.0000 $ 15.0000 1000k Best speed/intelligence balance gemini-3.1-pro-preview Google $ 2.0000 $ 12.0000 1024k Gemini 3 flagship preview gemini-2.5-flash Google $ 0.3000 $ 2.5000 1024k Hybrid reasoning; 1M context gpt-5.5 OpenAI $ 5.0000 $ 30.0000 1025k OpenAI flagship; 1M context gpt-5.4 OpenAI $ 2.5000 $ 15.0000 1025k More affordable GPT-5; 1M context gpt-5.4-mini OpenAI $ 0.7500 $ 4.5000 266k Strong mini model gpt-5.4-nano OpenAI $ 0.2000 $ 1.2500 266k Cheapest GPT-5.4 gpt-5-nano OpenAI $ 0.0500 $ 0.4000 266k Ultra-cheap GPT-5 gpt-4.1-nano OpenAI $ 0.1000 $ 0.4000 1023k Fastest, cheapest GPT-4.1 ... grok-4.3 xAI $ 1.2500 $ 2.5000 1000k Grok flagship; 1M ctx grok-4.20-reasoning xAI $ 1.2500 $ 2.5000 1953k Grok 4 reasoning; 2M ctx ... 232 model(s) shown.

Filter by provider or search by name:

$ llm-prices list --provider Together $ llm-prices list --search sonar $ llm-prices list --sort input # cheapest first

Filter and sort

$ llm-prices list --provider OpenAI --sort input # cheapest first $ llm-prices list --search grok $ llm-prices list --provider xAI --markdown

Or export CSV for spreadsheet analysis:

llm-prices list --csv > llm_pricing.csv

2. Calc: what will this specific call cost?

$ llm-prices calc gpt-4o --in 10000 --out 2000 Model : gpt-4o (OpenAI) Tokens : 10,000 in / 2,000 out Rate : $2.5/Mtok in, $10.0/Mtok out Cost : $0.0250 in + $0.0200 out = $0.0450 total

Use your app's actual token counts to get a concrete cost estimate. JSON output for scripting:

$ llm-prices calc claude-sonnet-4-6 --in 5000 --out 1000 --json { "model": "claude-sonnet-4-6", "provider": "Anthropic", "input_tokens": 5000, "output_tokens": 1000, "input_cost_usd": 0.015, "output_cost_usd": 0.015, "total_cost_usd": 0.03 }

3. Compare: side-by-side across providers

Which model gives you the best value for a typical agent call (5,000 input / 1,000 output tokens)? Here's how the 2026 frontier models stack up:

$ llm-prices compare gpt-5.5 gpt-5.4 grok-4.3 claude-sonnet-4-6 \ claude-opus-4-7 o3 gemini-3.1-pro-preview --in 5000 --out 1000 Comparison: 5,000 input tokens, 1,000 output tokens Model Provider Input Output Total ------------------------------------------------------------------------ grok-4.3 xAI $0.006250 $0.002500 $0.008750 o3 OpenAI $0.010000 $0.008000 $0.018000 (2.1x) gpt-5.4 OpenAI $0.012500 $0.015000 $0.027500 (3.1x) gemini-3.1-pro-preview Google $0.010000 $0.012000 $0.022000 (2.5x) claude-sonnet-4-6 Anthropic $0.015000 $0.015000 $0.030000 (3.4x) claude-opus-4-7 Anthropic $0.025000 $0.025000 $0.050000 (5.7x) gpt-5.5 OpenAI $0.025000 $0.030000 $0.055000 (6.3x) Cheapest: grok-4.3 at $0.008750
Grok 4.3 is the cheapest frontier model for this workload โ€” $0.0088 vs GPT-5.5 at $0.055. For API-heavy applications, that 6ร— gap on output tokens alone can determine profitability. But the right model depends on quality, latency, and feature support for your specific task.

For open-weight models via inference providers, the gap is even larger:

$ llm-prices compare qwen3-235b deepseek-chat grok-4.3 gpt-5-nano --in 5000 --out 1000 qwen3-235b Together $0.001000 $0.000600 $0.001600 deepseek-chat DeepSeek $0.001350 $0.001100 $0.002450 gpt-5-nano OpenAI $0.000250 $0.000400 $0.000650 grok-4.3 xAI $0.006250 $0.002500 $0.008750
gpt-5-nano at $0.0007 total for 5k/1k tokens is remarkable โ€” GPT-5 capability tier at sub-cent prices. It's worth evaluating for classification, extraction, and structured output tasks where you don't need full flagship reasoning.

4. Budget: how many calls per dollar?

Given a budget and your typical token counts, which models can you run at scale?

$ llm-prices budget 1.00 --in 2000 --out 500 Budget: $1.0000 | Tokens per call: 2,000 in / 500 out Model Provider Cost/call Calls -------------------------------------------------------------- llama-3.1-8b Groq $0.0001250 8,000 gemini-2.0-flash Google $0.0002500 4,000 qwen3.5-9b Together $0.0002750 3,636 gpt-4.1-nano OpenAI $0.0004000 2,500 ... gpt-4o OpenAI $0.0100000 100 claude-opus-4-7 Anthropic $0.0750000 13

Budget $1 buys you 8,000 Llama calls or 13 Claude Opus calls. That's the concrete number you need for cost planning.


5. Top: find the cheapest models for your workload

Instead of running budget and mentally sorting, top N ranks every model by total cost for your exact token counts and returns the cheapest N:

$ llm-prices top 5 --in 5000 --out 1000 Top 5 cheapest: 5,000 input / 1,000 output tokens # Model Provider Input Output Total ---------------------------------------------------------------------- 1 llama-3.1-8b Groq $0.000250 $0.000080 $0.000330 2 gemini-1.5-flash-8b Google $0.000188 $0.000150 $0.000338 3 command-r7b Cohere $0.000188 $0.000150 $0.000338 4 qwen3.5-9b Together $0.000500 $0.000150 $0.000650 5 gemini-1.5-flash Google $0.000375 $0.000300 $0.000675

Narrow it to a single provider, or paste the result into a doc with --markdown:

llm-prices top 3 --provider Anthropic --in 2000 --out 800 llm-prices top 10 --in 5000 --out 1000 --markdown

6. Providers: quick overview

$ llm-prices providers 273 models across 44 providers (data: 2026-05-20) Provider Models Min Input/Mtok Min Output/Mtok Max Context ------------------------------------------------------------------ AI21 2 $0.2000 $0.4000 256k Anthropic 13 $0.2500 $1.2500 1000k Arcee AI 3 $0.0450 $0.1500 262k Baidu 4 $0.0700 $0.2800 131k Bedrock 5 $0.0350 $0.1400 1000k ByteDance 3 $0.0700 $0.3000 262k Cerebras 3 $0.1000 $0.1000 128k Cohere 4 $0.0375 $0.1500 256k Crusoe 5 $0.2000 $0.2000 262k DeepInfra 4 $0.0800 $0.3000 1048k DeepSeek 9 $0.1120 $0.2240 1048k Fireworks 6 $0.2000 $0.2000 512k Google 20 $0.0375 $0.0800 2097k Groq 11 $0.0500 $0.0800 262k Hyperbolic 5 $0.1200 $0.2500 131k IBM 2 $0.0200 $0.1000 131k InclusionAI 3 $0.0100 $0.0300 262k Inception AI 1 $0.2500 $0.7500 128k Inflection 2 $2.5000 $10.0000 8k Lambda 4 $0.0500 $0.1000 131k Liquid AI 1 $0.0300 $0.1200 128k Microsoft 4 $0.0700 $0.1400 131k MiniMax 5 $0.1500 $0.9500 1000k Mistral 23 $0.0200 $0.0300 262k Moonshot 6 $0.2000 $2.0000 262k Morph 2 $0.8000 $1.2000 262k NVIDIA 4 $0.0400 $0.1600 262k Nebius 5 $0.0200 $0.0600 262k NousResearch 3 $0.1300 $0.4000 131k Novita 4 $0.1800 $0.5900 1048k OpenAI 34 $0.0500 $0.3000 1050k Perceptron 1 $0.1500 $1.5000 32k Perplexity 4 $1.0000 $1.0000 200k Qwen 33 $0.0300 $0.1000 1000k Reka 1 $0.1000 $0.2000 65k SambaNova 5 $0.2000 $0.3500 163k StepFun 1 $0.1000 $0.3000 262k Tencent 2 $0.0660 $0.2600 262k Together 6 $0.1000 $0.1500 512k Writer 1 $0.6000 $6.0000 1040k Xiaomi 3 $0.1000 $0.3000 1048k Z.AI 4 $0.0600 $0.4000 202k ZhipuAI 2 $0.1000 $0.1000 131k xAI 10 $0.2000 $0.5000 2000k

Notable pricing stories in 2026:

Liquid AI LFM-2 24B (lfm-2-24b): $0.03/$0.12 per Mtok โ€” one of the cheapest capable models in the dataset. Built on Liquid AI's non-transformer "liquid network" architecture (MIT spinout, founded 2023). This is the first alternative-architecture model in our dataset. Available via OpenRouter.

Inception AI Mercury 2 (mercury-2): $0.25/$0.75 per Mtok, 128k context. Uses a diffusion-based language model architecture rather than autoregressive next-token prediction. Founded 2024. A second alternative architecture now in the dataset.

Gemini 3.5 Flash (gemini-3.5-flash): $0.75/$4.50 per Mtok, 1M context. Launched at Google I/O 2026. Output price includes thinking tokens (hybrid reasoning model). 2.5ร— more expensive on input than Gemini 2.5 Flash but with improved quality.

Qwen3.6 series: new generation from Alibaba. qwen3.6-flash at $0.19/$1.13 per Mtok with 1M context window stands out; qwen3.6-27b is the dense flagship. Qwen3.5 397B MoE at $0.39/$2.34 is the largest Qwen model in the dataset.

o3 repriced 5ร— (o3): dropped from $10/$40 to $2/$8 per Mtok. It's now priced identically to gpt-4.1 but with much stronger multi-step reasoning. Previously one of the most expensive models; now a reasonable choice for complex tasks.

Grok 4.3 (grok-4.3): $1.25 input / $2.50 output per Mtok with a 1M context window. The output price is 6ร— cheaper than Claude Sonnet 4.6 ($15) at comparable capability claims. The 2M-context grok-4.20-reasoning variant is at the same price โ€” largest context window in the dataset.

GPT-5 nano tier: gpt-5-nano at $0.05/$0.40 per Mtok is the cheapest GPT-5-family model. That's competitive with Groq's Llama 3.1 8B on input price while potentially offering better capability on instruction-following and structured output tasks.

New in v0.1.42: DeepSeek V4, Qwen3.6, Mistral Medium 3.5, Perceptron

DeepSeek V4 Pro: deepseek-v4-pro at $0.435/$0.87 per Mtok with a 1M context window. This is DeepSeek's latest flagship โ€” a 1.6 trillion total parameter MoE model (active subset per inference). V4 continues the pattern of each DeepSeek generation delivering a major capability leap at the same or lower price. The efficient tier, deepseek-v4-flash, runs a 284B-parameter MoE at $0.112/$0.224 โ€” among the cheapest capable models in the dataset.

Qwen3.6 series: Four new models from Alibaba's Qwen3.6 generation. qwen3.6-flash at $0.1875/$1.125 with a 1M context window supports text, image, video, and audio input โ€” the most capable cheap multimodal model in the dataset. qwen3.6-27b (dense 27B, $0.32/$3.2) and qwen3.6-35b-moe (35B total / 3B active, $0.15/$1.0) fill the mid-tier. The qwen3.6-max-preview at $1.04/$6.24 is the sparse MoE frontier preview. Qwen now has 33 models in the dataset โ€” more than any other single provider.

Mistral Medium 3.5: mistral-medium-3-5 at $1.5/$7.5 per Mtok, 262k context โ€” an update to Medium 3 with text+image input support. Dense 128B architecture. Mistral now has 23 models in the dataset, spanning $0.02/Mtok (Mistral Nemo/Small) to $7.5/Mtok (Medium 3.5).

Moonshot Kimi K2.6: kimi-k2.6 at $0.73/$3.49, 262k ctx โ€” next-generation multimodal from Moonshot AI, designed for long-horizon coding and agentic tasks. A direct competitor to Claude Sonnet at 5ร— cheaper output price.

Perceptron Mk1 (new provider #44): perceptron-mk1 at $0.15/$1.5, 32k context. Perceptron AI specializes in video understanding and embodied AI reasoning โ€” not a general-purpose LLM, but a specialist for robotics and video-analysis workflows. A new category in the dataset alongside Morph's code-editing specialization.

New in v0.1.39: 8 more providers โ€” now 43 total

InclusionAI Ling 2.6 (ling-2.6-flash): $0.01/$0.03 per Mtok โ€” the cheapest input price in the dataset. The full ling-2.6-1t model has 1 trillion parameters (MoE architecture, active subset per token), with a 262k context window at $0.30/$2.50. The Flash tier at $0.01/Mtok makes it effectively free for most workloads. Provider #40.

Xiaomi MiMo V2.5 (mimo-v2.5): $0.40/$2.00 per Mtok with a 1M token context window. Xiaomi (the consumer electronics giant behind Mi phones) open-sourced their MiMo reasoning model series, claiming MiMo V2.5 outperforms GPT-4o on mathematical reasoning benchmarks. The Flash variant at $0.10/$0.30 is significantly cheaper. Provider #36.

Tencent Hunyuan A13B (hunyuan-a13b): $0.14/$0.57 per Mtok, 131k context. Tencent's HunyuanLLM with 13B active parameters (MoE architecture, 80B total). The preview HY3 model at $0.066/$0.26 per Mtok is one of the cheapest Chinese-origin flagship models. Provider #38.

Z.AI GLM 5 series: The GLM 5.x generation models at 202k context โ€” glm-4.7-flash at $0.06/$0.40, glm-5 at $0.60/$1.92. Note: Z.AI is a separate entity from ZhipuAI on OpenRouter โ€” the dataset now covers both generations (ZhipuAI GLM 4.x and Z.AI GLM 5.x). Provider #37.

Arcee AI Trinity series: US-based startup specializing in model merging and enterprise fine-tuning. arcee-trinity-mini at $0.045/$0.15 is their budget tier; arcee-maestro at $0.90/$3.30 is their reasoning flagship. 262k context on the larger variants. Provider #39.

Morph code editing models: A new category in the dataset โ€” not a general-purpose LLM provider but a specialized API for applying code diffs in agentic workflows. morph-v3-large at $0.90/$1.90, 262k context. Purpose-built for Claude Code / Cursor-style code application tasks. Provider #43.


Real-world example: planning a consumer AI app

You're building an app that processes user questions. Each request is ~1,500 input tokens (system prompt + message) and ~300 output tokens.

$ llm-prices budget 100.00 --in 1500 --out 300

On $100 (1,500 in / 300 out per call):

The gap between cheapest and most expensive frontier model is 250ร—. For a consumer app at scale, the right model selection can mean the difference between profitable and unprofitable at every scale tier.


Comparing the same model across inference providers

One often-overlooked optimization: the same open-weight model runs on multiple inference hosts at different prices.

$ llm-prices compare deepseek-reasoner deepseek-r1-together \ deepseek-r1-fw --in 5000 --out 2000

This compares DeepSeek R1 on DeepSeek's direct API ($0.55/$2.19 per Mtok), Together AI ($3.00/$7.00), and Fireworks ($3.00/$7.00) โ€” pick the cheapest option or the one with better availability for your region.


Installation

# pipx (recommended โ€” installs as isolated tool, works immediately) pipx install git+https://github.com/benbencodes/llm-prices # Homebrew (macOS/Linux) brew tap benbencodes/tap && brew install llm-prices # pip pip install llm-prices # PyPI publish in progress

No API key needed. No network calls at runtime. Works offline.


Use as a Python library

from llm_prices import calculate_cost, MODELS # Calculate exact cost for a GPT-5.4 call result = calculate_cost("gpt-5.4", input_tokens=10_000, output_tokens=3_000) print(f"${result['total_cost_usd']:.4f}") # $0.0700 # Find all models under $0.30/Mtok input price cheap = {name: info for name, info in MODELS.items() if info["input_per_mtok"] < 0.30} # Compare o3 vs gpt-4.1 (same price, different capability) for model in ["o3", "gpt-4.1"]: m = MODELS[model] print(f'{model}: ${m["input_per_mtok"]}/${m["output_per_mtok"]} per Mtok')

Contributing

The pricing data lives in a single Python file (llm_prices/data.py) โ€” a dict of model โ†’ prices. If a price is wrong or a new model launched, open a PR with your source cited.

GitHub: benbencodes/llm-prices