Cheapest LLM API in 2026: Top 10 Under $1/M Tokens

In 2026, "cheap" no longer means "bad". A dozen production-grade LLM APIs now ship at under $1 per million tokens blended, with reasoning quality within 5–10 points of GPT-5. If you're still routing every request to a flagship, you're overpaying by 10–50×.

This post ranks the cheapest LLM APIs available today, with quality scores from Artificial Analysis and real-workload math. Live data is in our cheapest LLM ranking, and you can plug your traffic into the cost calculator for an exact monthly figure.

The top 10 cheapest LLM APIs (blended, June 2026)

Sorted by blended cost (70% input + 30% output). Quality is Artificial Analysis Intelligence Index where available.

Rank	Model	Provider	Blended $/1M	Quality
1	MiMo V2.5	Xiaomi	$0.06	49
2	gpt-oss 20B	OpenAI (open)	$0.06	—
3	gpt-oss 120B	OpenAI (open)	$0.06	—
4	Mistral Small 24B	Mistral	$0.06	—
5	Hunyuan HY3 Preview	Tencent	$0.11	42
6	Gemma 4 31B	Google	$0.08	39
7	DeepSeek V4 Flash	DeepSeek	$0.13	47
8	GPT-5.4 nano	OpenAI	$0.16	44
9	Llama 4 Scout 17B	Meta	$0.18	—
10	NVIDIA Nemotron 3 Super	NVIDIA	$0.13	36

Source: provider pricing pages + Artificial Analysis, June 2026. Live ranking in cheapest LLM.

What "cheap" actually buys you in 2026

At these prices, 1M tokens costs less than a coffee. Here's what 100M tokens/month — a substantial production workload — costs on the top 5:

MiMo V2.5 — $6/month
gpt-oss 20B — $6/month
Mistral Small 24B — $6/month
DeepSeek V4 Flash — $13/month
GPT-5.4 nano — $16/month

For reference, the same 100M tokens on GPT-5.5 costs $405/month; on Claude Opus 4.8 it's $1,100/month. The cheap tier is two orders of magnitude cheaper.

Which cheap model should you actually use?

For frontier-quality reasoning at a discount

DeepSeek V4 Flash ($0.13 blended). Best quality-per-dollar in the cheap tier. Comparable to GPT-5.4 mini on most evals at less than half the price.

For code generation and IDE autocomplete

gpt-oss 20B or Codestral 2508 via Groq/Fireworks. Sub-millisecond first-token latency makes them ideal for inline completions.

For classification and structured outputs

Mistral Small 24B or Llama 3.1 8B Instruct. Both excel at JSON-mode generation and high-volume tagging where quality variance is acceptable.

For long-context RAG

Llama 4 Scout 17B (10M context window) or Gemini 3.5 Flash (1M context, $0.30/$2.50). Both let you stuff entire codebases or document corpora into context for pennies.

For EU data residency

Mistral Small 24B hosted in France. Only sub-$0.10 model with full EU data residency.

The hidden cost: quality variance

Cheap models fail more often. The right way to use them is in a cascade: send 80% of traffic to a cheap model, escalate to a frontier tier when confidence is low. Average cost drops 60–80% with negligible quality loss. See the cost-cutting playbook for the routing logic.

The bottom line

The cheapest LLM API isn't a single model — it's a portfolio. Use the sub-$0.20 tier as your default, cache aggressively, batch where you can, and escalate to GPT-5 or Claude Sonnet only when the cheap model actually fails. Most teams that do this cut their LLM bill by 70%+ without measurable quality regression.