Today's lead · interactive

LLM API Cost Calculator

Drag the sliders to size a request, then compare the cost across 27 models from Claude, GPT, Gemini, Grok, DeepSeek, Mistral, Amazon Nova, Cohere, and Meta Llama — per request and per month, with Batch API and prompt-caching discounts.

How to use this

Set the sliders to a typical request — input tokens (your prompt plus any context), output tokens (the reply), and how many requests you send per month. The table ranks every model by cost per request and per month. Turn on caching if you reuse a prompt prefix, or Batch for async jobs. Apply it by picking the cheapest model that clears your quality bar, and by seeing how much shorter output saves.

Your usage

Input tokens / request 2,000 Output tokens / request 600 Cached input 0%

How do I actually cache?

Caching bills the repeated part of your prompt at roughly 10% of the input price on a hit. Put stable content (system prompt, instructions, context) first and the changing part (the user's question) last. On Anthropic you mark the cached span with cache_control; OpenAI and Google cache long repeated prefixes automatically. Keep timestamps and random IDs out of the top, or you lose the cache. Set the slider to the share of your input that repeats. More tips.

Requests / month Use Batch API (50% off, async)

How does the Batch API work?

Submit many requests as one async job and get results back within (usually) a few hours, for 50% off. Good for evals, backfills, and bulk generation that can wait; not for anything interactive. Each provider has a batches endpoint: you upload the requests and poll for results. Estimate the savings.

Caching and batch are modeled as multipliers; providers may not combine them. Estimates only.

Cost comparison — 27 models

Loading pricing…

Estimated API cost per request and per month by model, filterable by provider and sortable by column.

In the wild · what people are sharing right now

Live from Hacker News and Mastodon — recent, popular posts about the latest models. Links go straight to the source.

Loading the latest…

Ideas to build · what the latest models can do

Real ways the newest models are being used. Pick one to price it or find the right model. We add to this regularly.

Loading ideas…

One planned request vs. many follow-ups

Every follow-up re-sends the whole conversation as input. This shows what that costs — and how much prompt caching claws back.

Model Context re-sent every turn 4,000 Tokens you add per turn 300 Tokens returned per turn 500 Number of turns 8 Prompt caching on (re-sent context cached)

Many follow-ups (no cache)—

Many follow-ups (cached)—

One planned request—

—

Model: each follow-up turn re-sends the context plus all prior turns. The "one planned request" path sends the context once with everything asked up front, for the same total work. Caching prices the re-sent prefix at the model's cache-read rate.

Latest

Loading…

All news →

Head-to-head

Opus 4.8 vs GPT-5.5 vs Gemini 3.1 Pro

The frontier three, priced side by side.

Sonnet 4.6 vs GPT-5.4 vs Gemini 3 Flash

The workhorse tier, where most production traffic lives.

DeepSeek V4 Flash vs Gemini Flash-Lite

Two of the cheapest credible options, head to head.

All comparisons →

From the cheat sheet

Right-size the model

The gap between a budget and a frontier model is often 10x. Use the smallest one that passes your own test.

Cache the repeated prefix

Stable context billed at about a tenth on a cache hit. Keep fixed content first, variable bits last.

One request beats ten

Every follow-up re-sends the whole conversation. Ask once, well.

Full cheat sheet →

Price watch · cheapest by tier

Loading…

Cheapest by blended cost (input + output, per 1M tokens). See how prices have fallen over time on the pricing history.

Token counter (estimate)

Paste text for a quick token estimate, then push it into the calculator.

0 est. tokens

0 characters

Rough heuristic (~4 chars/token). Real tokenization is model-specific — for exact Claude counts use the count_tokens API; OpenAI and Google have their own tokenizers.

Sections · all tools

Model comparison Head-to-head (X vs Y) Model recommender Caching & batch savings Subscription vs API Cost cheat sheet Pricing history AI add-on library RAG & embeddings cost Agent cost