RAG vs CAG
This post introduces two popular augmented generation methods - Cache Augmented Generation (CAG) and Retrieval Augmented Generation (RAG) - and compares them across key parameters that influence their practical usage.
CAG - Cache Augmented Generation
Information flow
All the documents available are passed into the LLM as a context. The model state (weights and parameters) after seeing all the documents (a forward pass) is stored and use to generate a response to a future user query. In essence, the information is cached into the model and thus the name.
Latency
Low latency since a user query is directly passed into the model without any intermediate steps.
Scalability
Limited scalability since the information store is the cached context and we are bounded by the context window of the model in terms of how much information can be stored.
Live information
Any new/updated information requires the context to be updated and passed into the model again. This is not efficient since it requires significant computation. Also, the amount of information that can be updated is limited by the context window of the model.
Primary worker & Accuracy
Primary worker is the LLM itself and thus the accuracy is dependent on how useful and relevant the originally provided information is to the user query.
RAG - Retrieval Augmented Generation
Information flow
A separate database contains all the available documents. A user query comes in and the Retriever finds documents relevant to the query. These documents are now passed on to the LLM as context and a final answer is generated.
Latency
Significant latency since a user query is first mapped in the embedding space then relevant documents are searched in the database. Only then can the model be used to generate an answer.
Scalability
Highly scalable since our main information store is database and we can store a lot of documents in there.
Live information
New information can simply be added in the database and will be fetched by the retriever when needed.
Primary worker & Accuracy
The retreiver plays the main role by fetching relevant documents. Thus the overall accuracy is dependent on how good the retriever is and the relevance of the documents fetched.
Enjoy Reading This Article?
Here are some more articles you might like to read next: