Retrieval-Augmented Generation (RAG) is an approach where a language model looks things up in an external knowledge source while answering, instead of relying only on what was in its training data.
At a high level, RAG does two things:
- Retrieves relevant information (documents, notes, web pages, PDFs) from a database or index.
- Generates an answer using both the user’s question and the retrieved information.
This makes answers more:
- Grounded in real data (you can point to the sources)
- Up to date (you can update the data without retraining the model)
- Customizable (you decide what knowledge the model can use)
1. Intuitive Picture of RAG
1.1 RAG as an "Open-Book Exam" for LLMs
A normal language model without retrieval is like a student taking a closed-book exam. Everything they use must already be memorized.
RAG turns this into an open-book exam:
- The model can search through notes, textbooks, or a knowledge base.
- It finds a few relevant passages.
- Then it writes an answer, quoting and summarizing those passages.