Sitemap

Member-only story

Is Prompt Caching the new RAG?

3 min readAug 21, 2024

--

recently, Anthropic, the company behind Claude, has announced a remarkable new feature called Prompt Caching. This breakthrough development makes the processing of lengthy documents more affordable than ever before, and it has the potential to revolutionize how we handle vast amounts of static information in AI conversations!
Let’s delve into the exciting implications this has for AI applications.

What is Prompt Caching?

Prompt Caching involves storing the system prompt — the static part of the conversation. This system prompt can include substantial content such as entire books, long research papers, or large codebases. Here’s how it works:

  1. The system prompt is cached on the first request, incurring a one-time cost.
  2. Subsequent user queries only process the dynamic user input against this cached context.
  3. This approach dramatically speeds up interactions and reduces costs for repeated queries.

Key Points About Prompt Caching

  • System Prompt vs. User Input: The system prompt (static, cached) is separate from the user’s input (dynamic, varies with each query).
  • Initial Caching Cost: The first time you cache the system prompt, it costs approximately 25% more than standard input pricing.
  • Subsequent Query Savings: After caching, processing new queries against the cached context costs only about 10% of…

--

--

No responses yet