What Everyone Gets Wrong About How LLMs Actually Work

I spent five years writing production software. In that time I developed a reliable heuristic: the people with the strongest opinions about a technology are usually the ones who have not read the source code, the paper, or the documentation. This applies nowhere more aggressively than to large language models.

Here is the single most common misunderstanding, stated plainly: LLMs do not retrieve information. They generate text. This distinction sounds pedantic until you understand why it matters entirely.

What an LLM actually does

A large language model is trained on text. During training, it learns statistical relationships between tokens — roughly, pieces of words. At inference time, when you ask it a question, it does not search a database. It predicts the most statistically likely continuation of the input sequence, given everything it learned during training.

This is why LLMs “hallucinate.” They are not lying. They are not retrieving a wrong fact from a database. They are generating a plausible-sounding continuation. If the training data contains patterns where the question you asked is typically followed by a certain type of answer, the model will generate that type of answer regardless of whether the specific content is accurate. It is doing exactly what it was designed to do. The design has limitations.

Why this matters for how you use them

If you treat an LLM as a search engine with better prose, you will consistently over-trust its factual outputs and under-utilise its genuine strengths. The genuine strengths are: generating structured text, reasoning through problems step by step when prompted correctly, transforming content from one form to another, summarising, drafting.

The weaknesses are: factual accuracy on specific claims (especially recent events, precise numbers, obscure topics), reliability on tasks requiring genuine logical consistency across long contexts, and anything where being confidently wrong is worse than admitting uncertainty.

The abstraction is the loan

Every abstraction you use without understanding costs you debugging time when it breaks. LLMs are now deeply embedded in production systems, in developer tooling, in writing workflows. Most of the people using them have a model of how they work that is wrong in precisely the ways that cause the most expensive failures.

Read the paper. Not the press release. The paper is Attention Is All You Need (Vaswani et al., 2017). It is not easy reading. It is what the technology actually is. The demo always works. The mental model only works if you built it on the right foundation.

Leave a Reply

Your email address will not be published. Required fields are marked *