Retrieval-Augmented Generation (RAG) is an approach where a language model looks things up in an external knowledge source while answering, instead of relying only on what was in its training data.

At a high level, RAG does two things:

Retrieves relevant information (documents, notes, web pages, PDFs) from a database or index.
Generates an answer using both the user’s question and the retrieved information.

This makes answers more:

Grounded in real data (you can point to the sources)
Up to date (you can update the data without retraining the model)
Customizable (you decide what knowledge the model can use)

1. Intuitive Picture of RAG

1.1 RAG as an "Open-Book Exam" for LLMs

A normal language model without retrieval is like a student taking a closed-book exam. Everything they use must already be memorized.

RAG turns this into an open-book exam:

The model can search through notes, textbooks, or a knowledge base.
It finds a few relevant passages.
Then it writes an answer, quoting and summarizing those passages.