Is AI Smarter Than We Think, or Just Luckier?

When AI suddenly solves a complex physics problem, is it reasoning or pattern matching? The grokking phenomenon suggests the answer is stranger than either.

The Feynman Question

Richard Feynman drew a sharp line between knowing the name of something and understanding it. You can memorize that a bird is called a "thrush" in English and a "Drossel" in German and still know absolutely nothing about the bird itself.

Real understanding is different from fluent labeling.

This distinction haunts every conversation about AI capability. When a large language model solves a complex physics problem, writes elegant code, or produces a nuanced legal analysis — is it understanding, or is it the world's most sophisticated pattern matching? Is the model a Feynman, or is it just very good at naming birds in every language?

The Grokking Phenomenon

Recent research on a phenomenon called "grokking" suggests the answer is stranger than either camp expected.

In typical machine learning, a model trains on data and gradually improves. Performance curves are smooth and predictable. But in grokking, something different happens: the model memorizes the training data early on, appears to plateau, and then — long after it seemed to have stopped learning — suddenly develops genuine generalization.

It goes from "reciting answers it has seen" to "understanding the underlying pattern" in what looks like a phase transition.

The analogy to human learning is striking. Students often memorize formulas before understanding them. Then, sometimes after a period of apparent stagnation, the underlying structure clicks. The formulas stop being arbitrary sequences and start being expressions of a deeper logic.

What Grokking Means for AI Reasoning

If grokking is real generalization — and the evidence is growing that it is — then the "AI is just pattern matching" dismissal is too simple. Pattern matching does not explain why a model would develop new capabilities long after memorizing its training data.

Something more is happening during that extended training period: the model is reorganizing its internal representations from surface-level correlations to structural understanding.

But the "AI truly understands" claim is also too strong. Grokking is fragile. It happens for some problems and not others. It depends on training dynamics in ways that are not yet well understood. And the model has no way to know whether it is in a "memorized" state or a "grokked" state — it produces outputs with equal confidence in both regimes.

The Effectiveness Lens

For practitioners, the question "does AI really understand?" may be less useful than "under what conditions does AI reason reliably?" Grokking research suggests three things:

Training duration matters more than we thought. Models may appear converged while still developing deeper capabilities. Stopping training at apparent convergence might leave genuine understanding on the table.

Evaluation is harder than we thought. A model that performs well on benchmarks might be memorizing rather than generalizing. The difference only shows up on novel problems outside the training distribution.

Luck plays a role. The line between a model that groks and one that merely memorizes may come down to training dynamics — learning rate, data ordering, random initialization. The same architecture can end up in fundamentally different capability regimes.

Perhaps AI is neither smarter than we think nor just luckier. Perhaps it is something we do not yet have the right word for — a form of intelligence that is real but alien, capable but brittle, understanding but in a way that looks nothing like human understanding.

The honest answer to the Feynman question: we do not know yet. And that uncertainty itself is important information.