The Complete Guide to LLM Token Pricing

Token pricing seems simple until you ship to production. Here's the full picture.

Input vs output tokens

Every major provider charges 3–5x more for output tokens than for input tokens. This is because generation is sequential and expensive on GPU memory bandwidth.

What counts as a token?

Roughly 4 characters of English, or 0.75 of a word. Code tokenizes denser; non-Latin scripts can tokenize 2–4x larger.

Caching discounts

If you repeat the same system prompt, you can save 50–90% on input cost. Hit rates matter — design your prompt structure with the static portion at the top.

Batch API discounts

50% off if you can wait up to 24 hours. Perfect for evals, backfills, and offline jobs.

Hidden costs

Tool calls, structured outputs, and vision inputs each have their own token math. Always test with real production traffic before committing to a contract.