Independent LLM cost intelligence · since 2026

LLMCalculators

Today's lead · interactive

LLM API Cost Calculator

Drag the sliders to size a request, then compare the cost across 27 models from Claude, GPT, Gemini, Grok, DeepSeek, Mistral, Amazon Nova, Cohere, and Meta Llama — per request and per month, with Batch API and prompt-caching discounts.

How to use this

Set the sliders to a typical request — input tokens (your prompt plus any context), output tokens (the reply), and how many requests you send per month. The table ranks every model by cost per request and per month. Turn on caching if you reuse a prompt prefix, or Batch for async jobs. Apply it by picking the cheapest model that clears your quality bar, and by seeing how much shorter output saves.

Your usage

How do I actually cache?

Caching bills the repeated part of your prompt at roughly 10% of the input price on a hit. Put stable content (system prompt, instructions, context) first and the changing part (the user's question) last. On Anthropic you mark the cached span with cache_control; OpenAI and Google cache long repeated prefixes automatically. Keep timestamps and random IDs out of the top, or you lose the cache. Set the slider to the share of your input that repeats. More tips.

How does the Batch API work?

Submit many requests as one async job and get results back within (usually) a few hours, for 50% off. Good for evals, backfills, and bulk generation that can wait; not for anything interactive. Each provider has a batches endpoint: you upload the requests and poll for results. Estimate the savings.

Caching and batch are modeled as multipliers; providers may not combine them. Estimates only.

Cost comparison — 27 models

Loading pricing…

Estimated API cost per request and per month by model, filterable by provider and sortable by column.

In the wild · what people are sharing right now

Live from Hacker News and Mastodon — recent, popular posts about the latest models. Links go straight to the source.

Loading the latest…

Ideas to build · what the latest models can do

Real ways the newest models are being used. Pick one to price it or find the right model. We add to this regularly.

Loading ideas…

One planned request vs. many follow-ups

Every follow-up re-sends the whole conversation as input. This shows what that costs — and how much prompt caching claws back.

Many follow-ups (no cache)
Many follow-ups (cached)
One planned request

Model: each follow-up turn re-sends the context plus all prior turns. The "one planned request" path sends the context once with everything asked up front, for the same total work. Caching prices the re-sent prefix at the model's cache-read rate.

Front page

Latest

Loading…

All news →

Head-to-head

Opus 4.8 vs GPT-5.5 vs Gemini 3.1 Pro

The frontier three, priced side by side.

Sonnet 4.6 vs GPT-5.4 vs Gemini 3 Flash

The workhorse tier, where most production traffic lives.

DeepSeek V4 Flash vs Gemini Flash-Lite

Two of the cheapest credible options, head to head.

All comparisons →

From the cheat sheet

Right-size the model

The gap between a budget and a frontier model is often 10x. Use the smallest one that passes your own test.

Cache the repeated prefix

Stable context billed at about a tenth on a cache hit. Keep fixed content first, variable bits last.

One request beats ten

Every follow-up re-sends the whole conversation. Ask once, well.

Full cheat sheet →

Price watch · cheapest by tier

Loading…

Cheapest by blended cost (input + output, per 1M tokens). See how prices have fallen over time on the pricing history.

Token counter (estimate)

Paste text for a quick token estimate, then push it into the calculator.

0 est. tokens
0 characters

Rough heuristic (~4 chars/token). Real tokenization is model-specific — for exact Claude counts use the count_tokens API; OpenAI and Google have their own tokenizers.

Sections · all tools