Llama Pricing Calculator (Llama 4 & 3.3)

Meta's Llama family is the cheapest way to run frontier-quality, open-weight models in 2026. Prices below reflect Fireworks-style hosted inference; expect ±20% across Groq, Together and Replicate. For monthly bills based on your real traffic, see the cost calculator; for hosted alternatives compare with DeepSeek pricing.
ModelInput / 1MOutput / 1MContext
Llama 4 Scout 17B
Meta
$0.11$0.3410000K
Llama 3.3 70B Instruct
Meta
$0.23$0.40131K

Hosted vs self-hosted Llama

Hosted inference makes sense up to roughly 50M tokens/day. Above that, self-hosting on H100s or H200s usually beats hosted pricing by 3–5×, but you take on operational complexity. Most teams start hosted, then move the hot path on-prem once volume justifies it.

Llama 4 Scout's 10M context window

The killer feature of Llama 4 Scout 17B isn't the parameter count — it's the 10M-token context. That fits an entire mid-size codebase, a full book series, or weeks of conversation history in a single request. At $0.11/1M input, processing 1M tokens of context costs 11 cents. The same 1M tokens on GPT-4o would cost $2.50.

Related guides

Compare with OpenAI pricing, Mistral pricing, or the cheapest LLM ranking. See how prices have fallen on the LLM pricing history.

Frequently asked questions