Why do AI models hallucinate?

Table of Contents

Key learnings
Why it matters for me

Why do AI models hallucinate? (Anthropic)

Part of the Anthropic AI Fluency series. Jordan from Anthropic, 5:13.

https://www.youtube.com/watch?v=005JLRt3gXI

Short Anthropic Academy explainer framing hallucinations as the model's pressure to be helpful winning over its honesty about not knowing. The core thesis: AI is trained on next-token prediction, and when you ask about something obscure it still tries to answer, so the failure mode is confident wrongness, not silence.

Key learnings

Hallucinations are confident, not uncertain. The wrong answer looks identical to a right one. As models get better, hallucinations get rarer, which means users check less, which means the ones that slip through do more damage.
Root cause is the next-word game. LLMs learn "what usually comes next" from internet text. Obscure topics have thin training signal, so the model guesses to stay helpful. Jordan's analogy: a friend who's proud of knowing every book and would rather fake it than admit a gap.
Anthropic's mitigation is explicit IDK training. During training they reward honest uncertainty and test Claude against thousands of gotcha prompts (obscure facts, niche topics, questions whose true answer is "I don't know"), tracking how often Claude hedges correctly vs. asserts falsely.
High-risk categories. Specific stats or citations, niche or very recent topics, not-widely-known people or places, exact dates/names/numbers. Default to skeptical in these cases.
User tactics that work. Ask for sources, then ask the model to verify the sources back the claim. Tell the model upfront "it's okay if you don't know." Ask how confident it is. For any answer that feels off, start a new chat and ask that instance to find errors.
Cross-reference for anything critical. Treat the model as a confident intern, not a source of truth.

Why it matters for me

Every Kaltura customer conversation where AI hallucination comes up (and it comes up in every AI-adjacent deal) wants a crisp, honest framing. This video gives one: "training teaches models to be helpful, which sometimes wins over being truthful, and here's how we push back." Good to borrow the IDK framing and the "confident intern" stance when explaining trust boundaries in our own AI features.