Retrieval-Augmented Generation in Plain Terms
How retrieval grounds outputs and where it can still fail.
Key takeaways
- RAG ties an LLM to external knowledge for grounded answers.
- Retrieval quality determines generation quality.
- RAG reduces hallucinations but introduces new failure modes.
- Treat the knowledge base as a living dependency.
Retrieval-Augmented Generation (RAG) is a technique that connects a Large Language Model to an external knowledge source. It allows the model to generate answers that are grounded in specific, up-to-date, or private information, rather than relying solely on its static training data.
What does RAG actually solve?
RAG solves the gap between a frozen model and the current information a task requires. This page is for builders who need grounded answers from private or changing knowledge, and the important shift is treating retrieval quality as part of the product instead of an optional add-on.
In practice, clarity at boundaries reduces downstream errors more than late-stage tuning.
Act I: The fundamentals
Retriever + generator
A standard LLM’s knowledge is frozen at the end of its training. It knows nothing about events that have happened since, nor does it have access to your company’s private documents. This leads to “hallucinations” or factually incorrect answers when asked about things outside its knowledge base.
RAG addresses this by combining two systems:
- A Retriever: A search system that can find relevant information from a specified knowledge base (like a collection of documents, a database, or a website).
- A Generator: A standard LLM that takes the retrieved information and uses it to synthesize a human-readable answer.
The retriever’s job is to find the right puzzle pieces; the generator’s job is to assemble them.
Act II: The modern paradigm
The RAG pipeline
The standard RAG pipeline works as follows:
- Indexing: An external knowledge base (e.g., PDFs, web pages, Notion docs) is broken into chunks, and each chunk is converted into a numerical embedding. These embeddings are stored in a vector database.
- Retrieval: When a user asks a question, the question is also converted into an embedding. The vector database is searched for the document chunks with the most similar embeddings. These are the “retrieved documents.”
- Augmentation: The original question and the retrieved documents are combined into a new, augmented prompt. The prompt might look something like this:
"Given the following context documents, please answer the user's question. Context: [retrieved documents]. Question: [original question]." - Generation: This augmented prompt is sent to an LLM, which generates an answer based on the provided context.
This process ensures that the model’s answer is directly informed by the external data, not just its internal training.
Act III: Principles in practice
Failure modes and data quality
RAG is a powerful technique, but it is not a magic bullet. Its effectiveness is highly dependent on the quality of the retriever. If the retriever fails to find the correct documents, the generator will not have the information it needs to produce a correct answer. This is the principle of “garbage in, garbage out.”
Common failure modes include:
- Poor data quality: The knowledge base contains inaccurate or outdated information.
- Chunking problems: Documents are split in ways that separate related ideas, making it hard to retrieve full context.
- Retrieval mismatch: The user’s question is phrased in a way that does not match the language of the documents, leading the semantic search to fail.
Therefore, building a RAG system is not just about connecting an LLM to a database. It is about carefully curating the knowledge base, optimizing the retrieval process, and implementing checks to handle cases where no relevant information is found.
For related systems context, see Systems 001: Foundations and From Prompt to Production. For implementation references and evaluation patterns, use the Retrieval and Grounding Evaluation Kit.
What this changes in practice
Instead of just prompting a model, you must first ensure it has access to the right information by building a reliable retrieval system.