RAG & embeddings cost

A retrieval pipeline has two costs: a one-time charge to embed your documents, then a per-query charge to feed retrieved context through an LLM. Embeddings are cheap; the generation step usually dominates. Here's the split.

How to use this

Set your corpus size, how many queries you run, and how much context each answer uses. It estimates the one-time cost to embed your documents plus the recurring per-query retrieval and generation cost. Apply it by lowering how many chunks you retrieve (top-k) and caching context that comes back often.

Your pipeline

Tokens to index (whole corpus) Embedding price ($ / 1M tokens) Queries / month Retrieved context / query 4,000 Answer tokens / query 500 Generation model Cache the retrieved context (prompt caching)

Embedding price examples: OpenAI small ~$0.02, large ~$0.13; Voyage/Cohere ~$0.10–0.12 per 1M. Set yours above.

Estimated cost

One-time indexing—

Per query—

Monthly (queries)—

Monthly with caching—