Mayo Clinic's Outcome-Labeled Corpus: AI-ECG and the Compounding Test

The Tracing the Cardiologist Could Not Yet Read

A 64-year-old man walks into a Mayo Clinic outpatient clinic in 2018 for a routine pre-operative workup before knee surgery. The standard 10-second 12-lead ECG comes back unremarkable to the reading cardiologist. Sinus rhythm. Normal axis. No ST changes. No measurable abnormality the textbook trains a physician to flag. The patient has no history of palpitations or syncope that would steer the workup toward a cardiac source. The pre-op clearance moves forward.

A deep-learning model trained on roughly 650,000 standard ECGs from prior Mayo patients reads the same tracing and returns a probability score that the patient has, or will soon develop, paroxysmal atrial fibrillation (a rhythm disorder that comes and goes, hard to catch on a single visit). The score is well above the threshold the 2019 Lancet study by Attia and colleagues had calibrated. Six months later the patient presents to a community ER with a transient ischemic attack — a mini-stroke. The atrial fibrillation that had been intermittent and clinically silent at the pre-op visit is now visible on telemetry. The stroke risk had been measurable from the sinus-rhythm tracing taken half a year earlier; what was missing was the reader who could see it.

That scenario, in its broad shape, is what Mayo Clinic's AI-ECG program was built to address. The program is a clean case of AI applied at the right bottleneck in a non-commercial setting. Mayo is a non-profit; its incentives are patient outcome and clinical reputation, not earnings per share. It belongs in this series for a specific reason: the compounding mechanism is structural, not financial, and the Mayo case proves it. A non-profit can build an AI moat that for-profit hospital systems and pure-tech entrants cannot replicate, because the moat is shaped by the work the institution was already doing, not by the capital deployed in the AI project.

Where Diagnostic Latency Lives

The bottleneck in the clinical management of hidden cardiac conditions is diagnostic latency. By the time a patient develops symptoms a cardiologist will recognize, the underlying pathology has typically been advancing for months or years. Atrial fibrillation is the canonical case. A substantial share of strokes attributable to AF occur in patients with no prior diagnosis at the time of the stroke, because the rhythm was paroxysmal and undocumented on any prior ECG. Cardiac amyloidosis, the protein-misfolding disease that infiltrates the cardiac muscle, presents an even sharper version of the same problem: average disease duration before recognition has been measured in years across multiple cohorts.

Throughput in clinical cardiology is not limited by imaging capacity, by cardiologist availability, or by drug-treatment efficacy at the point of diagnosis. It is limited by the time window between when a disease first becomes detectable in routinely collected data and when a human reader has enough clinical signal to act on it. Every other step in the workflow can be optimized at the margins. Patient outcome on a hidden cardiac condition is still gated by the latency at the diagnostic step.

Mayo picked this bottleneck because it is where patient outcome is decided for a class of cardiac disease the cardiovascular practice sees thousands of times a year. The AI investment landed where the operational pain was already concentrated.

How AI-ECG Works

The AI lever is a convolutional neural network applied to the standard 10-second 12-lead ECG waveform. The architecture is unremarkable by 2026 standards. What is remarkable is the data the model was trained on: decades of Mayo ECG tracings, each paired with the diagnoses, procedures, and clinical outcomes that arrived for that same patient in the months and years after the tracing was recorded. The Mayo Cardiovascular Research Center ECG Core Laboratory holds a digital ECG archive that runs back to the 1990s. The labels are not the cardiologist's read of the tracing at the time. The labels are what eventually happened to the patient.

The 2019 atrial-fibrillation paper made the architecture concrete. Attia and colleagues, in The Lancet, trained a deep-learning model on 649,931 normal sinus-rhythm ECGs from 180,922 patients. The question they asked: could the model identify patients who had had atrial fibrillation in the recording window of the available archive, even from an ECG that itself showed only normal sinus rhythm? The reported area under the receiver operating characteristic curve (AUC, the standard accuracy measure for a binary classifier — 1.0 is perfect, 0.5 is a coin flip) on the held-out test set was 0.87, with sensitivity of 79.0% and specificity of 79.5% at the chosen threshold. That is a screening performance no human cardiologist achieves on a sinus-rhythm tracing alone. The signal the model reads is not the rhythm in the recording window. It is the structural and electrical fingerprint of an atrium that has remodeled under episodic fibrillation between visits.

The 2021 cardiac-amyloidosis paper, by Grogan and colleagues in Mayo Clinic Proceedings, extends the same architecture to a disease where the diagnostic latency is even more punishing. The paper reports that the AI-ECG model identified patients with confirmed cardiac amyloidosis from standard ECG tracings with discrimination performance well above clinical baselines. Critically, it was able to flag the disease in tracings recorded years before the eventual amyloidosis diagnosis was entered in the medical record. The model is reading a condition the cardiologist, looking at the same tracing on the same day, could not yet see. The label that trained the model came from the patient's eventual diagnosis. The prediction the model now makes runs the same loop in the opposite temporal direction.

The clinical decision boundary stays with the cardiologist. The model produces a probability score; the physician interprets it in the context of the patient's other findings and decides whether to order confirmatory imaging, a rhythm monitor, or a genetic workup. The lever moves the diagnostic-latency curve to the left, surfacing candidates the workflow would otherwise have missed.

Condition	Score (0–4)	Evidence sentence
Proprietary data origin	4	Mayo's multi-decade ECG archive, linked at the patient level to longitudinal clinical outcomes only Mayo observes, is structurally inaccessible to any institution without an equivalent integrated record system and an equivalent twenty-year horizon of careful data discipline.
Self-labeling workflow	4	Every patient diagnosis that eventually arrives in the medical record labels the prior ECG tracings for that patient, so the work of clinical care itself produces the training labels for the next model cycle without a separate labeling budget.
Decreasing marginal cost	3	The model-development infrastructure, the labeling pipeline, and the clinical-deployment surface are now amortized across multiple disease targets, so each additional condition (after atrial fibrillation came amyloidosis, ejection-fraction estimation, hypertrophic cardiomyopathy screening) is cheaper to train than the last; clinical-validation cadence prevents a true commodity-cost curve.
Defensible asymmetry	4	The asymmetry lives in the longitudinal label structure plus the embedded clinical context that interprets the score, neither of which a competitor can acquire by buying the same model from a vendor; replication requires both the corpus and the institutional reading practice that gives the score its action.

The Four Conditions

Condition 1 — Proprietary data origin. The ECG itself is a commodity signal collected on standard machines available in every cardiology practice in the world. What is not commodity is the linkage. The ability to attach a specific patient's 1998 sinus-rhythm tracing to that same patient's 2003 atrial-fibrillation diagnosis, and 2007 stroke, and 2014 heart-failure admission, inside one integrated medical record, under one governance regime, across millions of patients. James Currier's analysis of data network effects names the condition directly: the data has to be generated where the model is being trained. Mayo's ECG archive satisfies the test by construction. A competitor with a state-of-the-art neural network but no equivalent longitudinal corpus cannot reproduce the result by purchasing a dataset. The dataset does not exist outside Mayo's record system.

Condition 2 — Self-labeling workflow. The label is the diagnosis the patient eventually receives. Every clinic visit, every imaging study, every procedure that confirms or excludes a cardiac diagnosis updates the label set for that patient's prior ECG tracings, retrospectively. This is the AI factory pattern Iansiti and Lakhani named in Competing in the Age of AI, translated into clinical practice. The work of patient care produces the labels for the next model cycle without a separate annotation team. The contrast case is the dermatology or radiology model that requires hand-curated labels from a small panel of expert readers. There the labeling pipeline becomes the binding cost, and the model improves only as fast as the panel can be funded. Mayo's pipeline does not have that cost structure.

Condition 3 — Decreasing marginal cost per cycle. The first AI-ECG model required a heavy lift: building the data pipeline, the labeling discipline, the clinical-validation protocol, and the integration into the cardiology read workflow. The marginal cost of the next models (amyloidosis, left-ventricular ejection-fraction estimation, hypertrophic-cardiomyopathy screening) is materially lower because the infrastructure already exists. Clinical validation does set a floor: every new indication requires its own validation study, peer-reviewed publication, and in many cases its own regulatory clearance. The Brynjolfsson, Rock, and Syverson J-curve framing is the correct one. Mayo paid the early-period cost; the compounding now sits on the rising side of the curve.

Condition 4 — Defensible asymmetry. The asymmetry is the hardest condition to argue precisely, and it is the one where Mayo's case is most distinctive. A competitor can train a convolutional network on ECG waveforms. A vendor can sell an FDA-cleared AI-ECG product into other health systems, and several already do. What neither competitor nor vendor can replicate is the combination of three things. The longitudinal outcome-labeled corpus. The clinical reading practice that interprets the model's score in the context of the full patient record. And the institutional-trust position that lets a Mayo cardiologist act on a score the model returned, with appropriate downstream workup, while other systems are still negotiating the workflow with their risk committees. The asymmetry is in the operational substrate the model is embedded in. That substrate is what Carlota Perez would call deployment-phase advantage: the embedded knowledge of the incumbent, retrofitted with the new technology, beats the pure-tech entrant whose only asset is the model.

A clean reading of the four conditions: 4, 4, 3, 4. The third condition is the one that drags, because clinical validation is genuinely slow, and properly so. The model is making decisions about human patients. A rapid-iteration culture that bypassed validation would forfeit the institutional-trust position that makes Condition 4 hold.

The Easier Wrong Choice

The wrong place. Imagine a Mayo Clinic that, in the late 2010s, had read the same market signals about generative AI in healthcare and routed its flagship AI bet through a different door. The candidate is not a strawman. It is an Epic-integrated ambient AI scribe of the Nuance DAX type, deployed across the clinical practice. The product would transcribe and structure physician-patient encounters in real time, reducing the documentation burden every health system in the country complained about.

Why it would have looked attractive. The wrong place would have looked attractive for defensible reasons. Physician burnout from documentation work was a measurable problem with a measurable cost. Early DAX pilots at peer institutions were reporting twenty-percent reductions in documentation time, qualitative improvements in physician satisfaction, and a clear path to a board-presentable ROI story inside one fiscal year. The vendor was Microsoft. The integration was native to Epic, the EHR Mayo already runs. The procurement story was clean.

The failure mechanics. The failure mechanics are the ones Luke Sernau named in We Have No Moat, And Neither Does OpenAI. A capability available to every Epic customer is not a Mayo capability. Within twelve months of broad rollout, the same DAX integration is running at Cleveland Clinic, Johns Hopkins, Geisinger, and several hundred mid-market Epic shops. Documentation time falls by roughly the same twenty percent at all of them. The competitive position is unchanged. The vendor captures the margin the cost savings produce. None of the productivity gain feeds a labeled-outcome loop Mayo uniquely owns. The labels generated by the scribe are documentation tokens, not diagnostic ground truth. The corpus that grows is the corpus of physician dictations, which the vendor sees across every customer.

The time to failure. Twelve months from launch to commodity. The diagnostic-latency curve, the bottleneck Mayo's AI investment was supposed to act on, is unchanged. The multi-decade ECG-outcome corpus continues to sit unmonetized, because the institutional AI attention has been spent on the scribe.

The early-warning signal. A careful observer could have seen it in advance in the location signal. Ambient scribing acts on the documentation step, a real cost center but not the bottleneck on patient outcome. The scribe's outputs do not loop back as labels for the diagnostic decisions the same physicians are making. A capability that runs identically at every customer in the segment is a productivity feature, not a moat.

What Mayo Teaches

The general lesson is that the moat lives in the labeling structure of the work, not in the algorithm and not in the brand. Mayo's institutional brand is significant; the brand alone would not have produced the AI-ECG results. What produced them was the discipline of maintaining a clinical record system in which every diagnostic conclusion eventually labeled the prior diagnostic data for the same patient, across decades, under one governance regime. The model exploited a labeling regime that already existed. The labeling regime was the asset.

The non-profit point is worth stating directly. Mayo is a non-profit. The AI program was built to reduce diagnostic latency, not to maximize shareholder returns. The compounding mechanism worked anyway, because the four conditions are operational, not financial. A for-profit hospital system without the same longitudinal data discipline would have the wrong substrate even with a larger AI budget. A non-profit with the discipline has the right substrate. The Berkshire Test does not care about the tax status of the operator. It cares about whether the work generates its own labels and whether those labels are structurally inaccessible to the competition.

What You Can Do

If you run an organization where the bottleneck is fast diagnosis of an event you already observe repeatedly, the Mayo pattern is the one to study. Ask whether your operation has been quietly producing the labels for an AI model you have not yet built, and whether those labels are linked to the right input data at the patient, customer, or transaction level. Many institutions, in my experience, have the substrate and have not noticed.

If you are evaluating an AI vendor pitch where the proposed lever is a product the vendor will sell to every comparable institution in your segment, run Sernau's test. The capability will be available to every competitor within the budget cycle. The investment may still be worth making for the cost-savings story; it will not produce competitive distinctiveness, and labeling the deployment as a moat is a category error.

One closing observation from practice: the institutions that have built compounding AI loops in the past decade are almost all institutions that were already disciplined about their data before AI made the discipline pay. The discipline produced the substrate. The model harvested it. A leader looking at this pattern in 2026 has a choice that has nothing to do with picking the right model architecture and everything to do with whether the operation, today, is generating the labeled outcomes a future model would need to compound on.

Back to the framework: The Berkshire Test for AI.

Continue the series: Progressive's Risk-Selection Flywheel — the textbook case where all four conditions hold strongly. Deere's Physical-World Data Loop — physical-world labels at the row-crop spraying step. Mastercard's Network of Labeled Outcomes — chargebacks as the label substrate, with a candid asterisk on the Visa peer.

Mayo Clinic's Outcome-Labeled Corpus: Reading the ECG the Cardiologist Cannot Yet See

The Tracing the Cardiologist Could Not Yet Read

Where Diagnostic Latency Lives

How AI-ECG Works

The Four Conditions

The Easier Wrong Choice

What Mayo Teaches

What You Can Do

Related

The Berkshire Test for AI: A Compounding Diagnostic for Leaders Allocating Capital to AI

The Bug Is in the Design, Not the Code

The Last Mile is Action — Closing the Decision-to-Execution Gap