The Complete Guide to LLM Token Pricing
Everything you need to know about how LLM providers price tokens — input vs output, caching, batch, and the gotchas.
Token pricing seems simple until you ship to production. Here's the full picture.
Input vs output tokens
Every major provider charges 3–5x more for output tokens than for input tokens. This is because generation is sequential and expensive on GPU memory bandwidth.
What counts as a token?
Roughly 4 characters of English, or 0.75 of a word. Code tokenizes denser; non-Latin scripts can tokenize 2–4x larger.
Caching discounts
If you repeat the same system prompt, you can save 50–90% on input cost. Hit rates matter — design your prompt structure with the static portion at the top.
Batch API discounts
50% off if you can wait up to 24 hours. Perfect for evals, backfills, and offline jobs.
Hidden costs
Tool calls, structured outputs, and vision inputs each have their own token math. Always test with real production traffic before committing to a contract.