Caching & batch savings

Two of the biggest levers on an LLM bill: prompt caching (re-used context billed at ~10% of input) and the Batch API (50% off for async work). Enter your usage to see the difference.

How to use this

Enter your monthly usage and the share of input that repeats across calls. The tool shows what prompt caching and the Batch API would each save versus paying full price. Apply it by wiring caching into prompts whose prefix you reuse, and routing non-urgent jobs like evals and backfills to batch.

Your usage

Model Input tokens / request 8,000 Output tokens / request 800 Share of input that repeats (cacheable) 70% Requests / month

Caching helps when a big chunk of the prompt (system instructions, context, examples) repeats across requests. Batch suits work that can wait minutes-to-hours.

Monthly cost

Standard—

With prompt caching—

With Batch API—

Caching + Batch—