Two Thousand Pages, One Moving Median: Keeping Data-Driven Prose True

A Sentence That Was True When I Wrote It

I published a sentence with a number in it: the median negotiated price for a common outpatient procedure in Colorado. It was correct. I had checked it against the file. The sentence was true.

It is not true anymore. Nobody edited it. The data moved underneath it.

What happened in between: we deepened our coverage of the state, from 59 hospitals to 70, pulling in rural facilities scattered across Colorado and more of the Denver metro than we had before. None of that was a mistake. It was the work going right. But a median is a statement about a population, and we had just changed the population. The middle moved. The spread around it moved with it. And a sentence sitting on a page — "the median is X" — became a sentence about a world that no longer existed.

This is the part of building on data that nobody puts on the slide.

The Maintenance Cost of the Messy-Data Advantage

The pitch for what I have called the Messy-Data Advantage is correct. Your unglamorous, badly-formatted, multi-gigabyte data is a moat, because making it legible is genuinely hard and most people will not do the work. Hospital price files are a clean example. Every hospital in the country is legally required to publish its negotiated rates; the files are public; and they are almost completely unusable — two to four gigabytes each, in three incompatible formats, in schemas that drift every regulatory revision. Turn that into a number a patient can read and you have built something a competitor cannot trivially copy.

All true. Here is what the pitch leaves out.

The moment you win — the moment you turn raw data into a sentence a human can read — you take on an obligation you did not have before. You have promised that the sentence is true. And data does not hold still. It refreshes, it grows, it gets corrected. Every sentence you wrote on top of it is a claim a future version of the data can falsify without telling you.

We now have roughly two thousand pages of analytical prose sitting on top of about 495.9 million records. A few months ago it was 480 pages. Each page exists to make some slice of the data legible — this procedure, this metro, this comparison — and each one makes claims in prose: a median, a spread, a ranking, a "roughly seven times more expensive." Every claim is a hostage to the next ingest.

That is the maintenance bill on the moat. Not a one-time cost you pay to make the data legible — a recurring cost you pay to keep it legible, due every time the data changes. For a national, recurring pipeline, that is constantly.

Legibility Is Not a State You Reach

I used to treat the Information layer of the IRDA framework — Information, Recommendation, Decision, Action — as a job with a finish line. Make the data legible. Get it from "two-gigabyte file that crashes a browser" to "number a human can read." Then move up the stack to recommendations and decisions, because Information is handled.

The Information layer is not "make it legible." It is "keep it legible and true" — two different commitments, and the second has no finish line.

"Make it readable" is a transformation: it happens once per piece of data and then it is done. "Keep it true" is a discipline: it has to happen every time the underlying data moves, across everything you have already published. The first scales with how much data you ingest. The second scales with how much you have said about the data — which, if you are doing the legibility work well, grows faster than the data itself, because one dataset spawns many pages of prose.

The trap is that the two feel identical when you are small. With five pages of analysis, "keep it true" is just "re-read the five pages when the data changes." Care covers it. So you never build the muscle, and you carry the assumption — legibility is a state you reach — straight into the regime where it is false. By the time you have two thousand pages, the assumption has become a liability, and you find out the way I found out: a true sentence became a false one and nobody noticed, because nobody re-reads two thousand pages against a moving dataset.

Why Care Does Not Scale

The instinct, when you discover a published number has gone stale, is to resolve to be more careful. Proofread before each refresh. Spot-check the medians. Have someone review the changed pages.

The instinct fails twice.

It fails on volume, because two thousand pages times every refresh is not a proofreading task a human can do. Even sampling, you check a few dozen claims out of thousands, and the one that went stale is the one your sample missed. Care has a throughput, and the throughput is far below what the page count and the refresh cadence demand.

It fails worse on silence. A stale number does not announce itself. It looks exactly like a fresh one — same font, same confidence, same place on the page. The reader cannot tell the difference, and neither can you on a casual pass. A failure that is both high-volume and silent gives you no signal. Everything looks fine right up until a reader checks a number against the source and finds you were confidently incorrect.

So "be more careful" is not a smaller version of the right answer. It is the wrong category of answer. The right answer is not a human disposition. It is machinery.

A Guard That Refuses to Lie

What we built sits in the publishing pipeline and does one job: it re-derives every cited number directly from the live data on every build, and refuses to publish when the prose and the data disagree.

A claim in the prose is not free text that happens to contain a number; it is tied to the query that produced it — this procedure, this metro, this statistic. At build time, the guard re-runs that query against the current data and compares the result to what the page says. If the page says the median is one thing and the freshly-derived median is another, the build goes red. The page does not ship. I find out before a reader does.

Two design choices generalize past my particular problem.

First, the guard re-derives rather than remembers. It would have been easier to store "the median was X when we wrote this" and check the stored value. That checks nothing — it confirms the prose matches a frozen snapshot, which is precisely the stale state I am trying to detect. You have to compute the current truth fresh, every build, from the same data the product serves. A check against a remembered value is a check against the past. I want a check against the present.

Second, the guard ships with its own known-bad test. This is a discipline I carry over from the data-validation side of the same system: a check you have never watched fail is not a check. So the prose guard has a deliberately-wrong fixture — a page that cites a number the data does not support — and the guard must flag it. If that fixture ever passes, the guard is broken, and every green build it ever produced is worthless. A guard you have not seen bite is a guard-shaped object. You have to prove it catches the thing it exists to catch.

The result is a small inversion in who has to be vigilant. Before, the writing was trusted and the burden was on me to notice when it lapsed. Now the writing is suspect by default, and the burden is on the data to vouch for it on every build. Write a number the data will not back, and you do not get to publish it. The system embarrasses me in private, on a schedule, whether I am paying attention or not — which is exactly where you want to be embarrassed.

The Uncomfortable Generalization

Here is the part that travels past hospital prices.

Most data-driven writing is trusted because nobody re-checks it, not because it is still true. The dashboard screenshot in the quarterly deck, the "our data shows" sentence in the strategy memo, the benchmark in the pitch — they are believed because they were true once, when someone responsible looked, and because re-looking is expensive and nobody has the appetite. The trust is real. The basis for it has usually expired.

This is a governance problem before it is a writing problem. An organization that decides on prose claims about data almost never has a mechanism that ties those claims back to the live data and fails loudly when they drift. The claim and the data are coupled at the moment of authorship and never again. They are free to wander apart, and they do, and the wandering stays invisible until someone external runs the check you stopped running.

The fix is not a culture of carefulness. Carefulness does not scale and it fails silently, for exactly the reasons it failed me. The fix is to make the coupling permanent and automatic: treat every number in your prose as a live query against the source, re-evaluated on a schedule, with the authority to flag itself when it goes stale. That is more work to build than a proofreading habit. It is the only version still working at two thousand pages.

What I Do Not Yet Know

The guard catches a specific, tractable failure: a cited number that no longer matches a re-derivable query. That is the easy half, and I want to be honest that it is the easy half.

The hard half is the prose that carries no number but is still a claim about the data — "the gap is driven by contracting structure, not negotiating power," or "rural patients are the ones the metro view leaves out." Those sentences go just as stale as a median when the data shifts, and no query re-derives them. They are interpretations, and what falsifies an interpretation is not a mismatched value; it is a changed shape in the distribution that the words no longer describe. I do not have a guard for that. I am not sure one is possible, or whether the honest version is a periodic human re-reading of the interpretive claims specifically — a far smaller set than the full two thousand pages — triggered when the guard detects that the numbers under them have moved enough to matter.

So the question I would genuinely like to compare notes on: if you run a publication, a research function, or a product that puts claims about live data in front of people, what is your mechanism for the sentences that are about the data but contain no number to check? I have a partial answer for the numbers. For the interpretations, I mostly have the uncomfortable awareness that they are out there, aging, and that I have not yet built the thing that tells me when they have aged out.

→ See the companion piece on the validation side of the same system: When You're More Correct Than the Source of Truth.

→ See the full framework: The Decision Effectiveness Framework.

Two Thousand Pages, One Moving Median

A Sentence That Was True When I Wrote It

The Maintenance Cost of the Messy-Data Advantage

Legibility Is Not a State You Reach

Why Care Does Not Scale

A Guard That Refuses to Lie

The Uncomfortable Generalization

What I Do Not Yet Know

Related

When You're More Correct Than the Source of Truth

The Berkshire Test for AI: A Compounding Diagnostic for Leaders Allocating Capital to AI

The Tower of Babel in Your Boardroom