RAG vs CAG | Aditya Sharma

This post introduces two popular augmented generation methods - Cache Augmented Generation (CAG) and Retrieval Augmented Generation (RAG) - and compares them across key parameters that influence their practical usage.

CAG - Cache Augmented Generation

Information flow

All the documents available are passed into the LLM as a context. The model state (weights and parameters) after seeing all the documents (a forward pass) is stored and use to generate a response to a future user query. In essence, the information is cached into the model and thus the name.

Latency

Low latency since a user query is directly passed into the model without any intermediate steps.

Scalability

Limited scalability since the information store is the cached context and we are bounded by the context window of the model in terms of how much information can be stored.

Live information

Any new/updated information requires the context to be updated and passed into the model again. This is not efficient since it requires significant computation. Also, the amount of information that can be updated is limited by the context window of the model.

Primary worker & Accuracy

Primary worker is the LLM itself and thus the accuracy is dependent on how useful and relevant the originally provided information is to the user query.

RAG - Retrieval Augmented Generation

Information flow

A separate database contains all the available documents. A user query comes in and the Retriever finds documents relevant to the query. These documents are now passed on to the LLM as context and a final answer is generated.

Latency

Significant latency since a user query is first mapped in the embedding space then relevant documents are searched in the database. Only then can the model be used to generate an answer.

Scalability

Highly scalable since our main information store is database and we can store a lot of documents in there.

Live information

New information can simply be added in the database and will be fetched by the retriever when needed.

Primary worker & Accuracy

The retreiver plays the main role by fetching relevant documents. Thus the overall accuracy is dependent on how good the retriever is and the relevance of the documents fetched.

CAG - Cache Augmented Generation

Information flow

Latency

Scalability

Live information

Primary worker & Accuracy

RAG - Retrieval Augmented Generation

Information flow

Latency

Scalability

Live information

Primary worker & Accuracy

Enjoy Reading This Article?