The Cheapest LLM API in 2026 (Ranked)
| Model | Input / 1M | Output / 1M | Context |
|---|---|---|---|
gpt-oss 120B OpenAI | $0.039 | $0.10 | 131K |
Mistral Small 24B Mistral | $0.05 | $0.08 | 33K |
gpt-oss 20B OpenAI | $0.03 | $0.14 | 131K |
MiMo V2.5 Xiaomi | $0.015 | $0.18 | 1000K |
Gemma 4 31B Google | $0.05 | $0.15 | 256K |
Hunyuan HY3 Preview Tencent | $0.03 | $0.30 | 256K |
DeepSeek V4 Flash DeepSeek | $0.07 | $0.27 | 1000K |
NVIDIA Nemotron 3 Super NVIDIA | $0.07 | $0.28 | 1000K |
GPT-5.4 nano (xhigh) OpenAI | $0.05 | $0.40 | 400K |
Llama 4 Scout 17B Meta | $0.11 | $0.34 | 10000K |
Qwen3.6 35B A3B Alibaba | $0.10 | $0.37 | 262K |
MiMo V2.5 Pro Xiaomi | $0.04 | $0.58 | 1000K |
Qwen3.6 Plus Alibaba | $0.12 | $0.48 | 1000K |
MiniMax M2.7 MiniMax | $0.05 | $0.70 | 205K |
Llama 3.3 70B Instruct Meta | $0.23 | $0.40 | 131K |
How we rank "cheapest"
Raw input price alone is misleading — most real workloads generate 20–40% as many output tokens as input. We use a 70/30 input/output blend, which closely matches what production chat, RAG and agent workloads actually consume. If your workload is output-heavy (long-form generation, code), sort by output price separately on the comparison tool.
Cheap doesn't always mean cheap
A model that costs $0.05/1M tokens but needs 3x more tokens to solve the same task isn't cheaper. For reasoning, coding, and tool-use, look at cost per successful task, not cost per token. DeepSeek V4 Flash and Gemini 3.5 Flash currently offer the best intelligence-per-dollar in the <$1/1M tier.
When the cheapest tier is enough
Use a sub-$0.10/1M model when the task is: bulk classification, embedding-style retrieval rewrites, simple extraction, autocomplete, or first-pass routing. Escalate to a $1–$5/1M model only when the cheap tier fails — and log every escalation so you can measure the real cost of "smart enough".
Related: LLM Price Comparison 2026 — Frontier Models Ranked by Cost per Quality.