Probabilities, Not Truth
Why AI models sound confident even when they are wrong, and why hallucination is a feature of probabilistic systems, not a bug.
Key takeaways
- Large Language Models (LLMs) are probabilistic, not deterministic. They do not know what is true; they know what is likely.
- Hallucination is not a malfunction; it is the model doing exactly what it was trained to do (predict the next token) without a grounding in reality.
- Confidence is a style, not a metric of accuracy. A model can be 100% confident and 100% wrong.
- To build reliable systems, you must treat model outputs as proposals, not facts.
If you ask a database for the capital of France, it looks up the row labeled “France” and returns “Paris.” If the row doesn’t exist, it returns an error.
If you ask an AI model for the capital of Mars, it doesn’t look up a fact. It calculates which words are statistically most likely to follow the phrase “The capital of Mars is…” based on the science fiction and speculative texts it has read. It might confidently answer “Elon City” or “Olympus Mons.”
This is the fundamental nature of the technology: it is a probability engine, not a truth engine.
Why do models sound true when they are only probable?
Models sound true because they are trained to continue authoritative language patterns, not to verify reality. This page is for readers who need a safer mental model for using AI in products and decisions, and the practical shift is to treat outputs as proposals that must be grounded, checked, or constrained.
Act I: The fundamentals
The Confidence Trick
Why do models sound so confident even when they are wrong?
Because they are trained on human writing, and humans usually write confidently. The model is mimicking the style of an authoritative answer.
- Training Data: Wikipedia, textbooks, news articles (all authoritative styles).
- Model Behavior: Predicts the next token that fits that authoritative style.
If you ask a question, the most probable continuation is a confident answer, not a hesitant one. The model isn’t “feeling” confident; it is simply completing the pattern of a confident expert.
Hallucination Is a Feature, Not a Bug
We call it “hallucination” when the model makes things up. But to the model, there is no difference between “recalling a fact” and “inventing a fact.” Both are just predicting the next token.
- Recall: “The capital of France is [Paris].” (High probability because “Paris” almost always follows).
- Hallucination: “The first person on Mars was [John Boone].” (High probability because in the sci-fi context it read, this was true).
The model is designed to be creative and fluent. The same mechanism that allows it to write a fictional story (a feature) allows it to invent a legal precedent (a bug). You cannot easily turn off one without killing the other.
Act II: The modern paradigm
Accuracy vs Trustworthiness
In traditional software, we expect 100% accuracy. If a calculator says 2+2=5, it is broken.
In AI, we cannot expect 100% accuracy. Instead, we must design for trustworthiness.
- Accuracy: Getting the right answer.
- Trustworthiness: Knowing when you might be wrong.
Current models are high-accuracy but low-trustworthiness. They are often right, but they don’t know when they are wrong. This makes them dangerous if used without supervision.
Act III: Principles in practice
The goal is not to eliminate uncertainty, but to design around it. If the system is probabilistic, then your workflows must be explicit about verification, constraints, and fallback paths.
For related systems context, see Systems 001: Foundations and From Prompt to Production. For grounding and retrieval patterns, see Retrieval-Augmented Generation in Plain Terms.
What this changes in practice
Since we cannot rely on the model to be truthful, we must change how we build systems around it:
- Grounding (RAG): Don’t ask the model to “know” things. Give it a document and ask it to “extract” things. This forces it to use the provided context rather than its internal probabilities.
- Verification: Never use an LLM for a high-stakes decision without a verification step (either a human review or a code-based check).
- Citations: Ask the model to cite its sources. If it can’t point to where it found the information in your provided text, it is likely hallucinating.
Treat the model like a brilliant but prone-to-exaggeration intern. You give them the task, but you check their work.