Llama Pricing Calculator (Llama 4 & 3.3)
Meta's Llama family is the cheapest way to run frontier-quality, open-weight models in 2026. Prices below reflect Fireworks-style hosted inference; expect ±20% across Groq, Together and Replicate. For monthly bills based on your real traffic, see the cost calculator; for hosted alternatives compare with DeepSeek pricing.
| Model | Input / 1M | Output / 1M | Context |
|---|---|---|---|
Llama 4 Scout 17B Meta | $0.11 | $0.34 | 10000K |
Llama 3.3 70B Instruct Meta | $0.23 | $0.40 | 131K |
Hosted vs self-hosted Llama
Hosted inference makes sense up to roughly 50M tokens/day. Above that, self-hosting on H100s or H200s usually beats hosted pricing by 3–5×, but you take on operational complexity. Most teams start hosted, then move the hot path on-prem once volume justifies it.
Llama 4 Scout's 10M context window
The killer feature of Llama 4 Scout 17B isn't the parameter count — it's the 10M-token context. That fits an entire mid-size codebase, a full book series, or weeks of conversation history in a single request. At $0.11/1M input, processing 1M tokens of context costs 11 cents. The same 1M tokens on GPT-4o would cost $2.50.
Related guides
Compare with OpenAI pricing, Mistral pricing, or the cheapest LLM ranking. See how prices have fallen on the LLM pricing history.