Quality
The Makegoods QA framework that protected $20M in client billings
A makegood is what you owe a client when delivery falls short of what was sold. At small volumes it is an occasional embarrassment. At scale, undetected shortfalls become a steady, invisible transfer of money out of the business. Here is how we stopped it — and what the fix reveals about quality at scale.
The problem did not announce itself. That is what made it dangerous. Across a very large volume of campaigns, the gap between what had been promised and what was actually delivered was surfacing late — often after invoicing, sometimes after the client noticed first. Each instance carried a makegoods liability and, worse, a small withdrawal from the client's trust. Individually, none of them looked like a crisis. Added together, across the volume we were running, they represented millions of dollars a year leaking out of the business through a hole nobody could see.
The instinct in that situation is to check harder. Add reviewers. Inspect more campaigns before they invoice. That instinct is wrong, and understanding why is the whole lesson.
Why "check everything" fails at scale
Manual, end-of-line inspection has two fatal properties at volume. First, it scales linearly with the work — double the campaigns and you double the checking cost — so it becomes the bottleneck precisely when the business is succeeding. Second, it provides no proof of coverage. You can inspect two thousand campaigns and still have no defensible answer to "are we catching the failures that cost us the most?" because the inspection is reacting to whatever the reviewer happens to notice.
You cannot inspect your way out of a scale problem. You have to design the inspection so that it scales sub-linearly and proves its own coverage.
What we needed was not more checking. It was a framework — a deliberate design that placed the right check at the right point, prioritised by financial exposure, and produced evidence of what it had covered.
Three steps, placed by where errors are still correctable
The framework that worked had three stages, and the design principle behind all of them was the same: catch each kind of error at the point where it is both most likely to occur and still cheap to correct. A shortfall caught while a campaign is in flight can be fixed. The same shortfall caught after invoicing is a makegood. The placement is the entire value.
The three steps staged verification across the delivery cycle so that, by the time a campaign reached invoicing, the failure modes that produced makegoods had already been caught and corrected upstream. Each step had a defined, standardised thing it verified and a defined way it scored the result — which is what made coverage provable rather than assumed. (I have written the framework up as a standalone reference; this piece is about why it worked.)
Prioritise by exposure, not by volume
A subtle but critical decision: the framework did not treat every campaign equally. It concentrated effort where the financial exposure was greatest — the campaigns and failure modes that, if they slipped, cost the most. This is the opposite of inspecting everything to the same depth, and it is what let the framework run at the real volume of the operation without becoming the bottleneck.
This is risk-weighting, and it is uncomfortable for teams raised on "check everything equally." But equal checking is itself a choice — a choice to spend the same scrutiny on a campaign with trivial exposure as on one with enormous exposure. Risk-weighting just makes the allocation deliberate.
Close the loop back into delivery
The final move is the one that turns a quality framework from a net into a system. Every defect the framework caught was traced back to its cause in the delivery process, and recurring causes were removed at source. Without this, a QA framework simply re-catches the same failures forever — a permanent tax. With it, the framework gets lighter over time, because the failure modes it was built to catch stop occurring.
$20M+
in client billings protected
3-step
staged verification framework
Standard
adopted across the function
The result was more than twenty million dollars in client billings protected — but the number understates the real change. An invisible, recurring leak became a controlled, measured process. The business could now answer, with evidence, the question it could not answer before: are we catching the delivery failures that cost us the most? And it became the standard QA approach for the function, which is the truest sign that a framework has worked — it stops being a project and becomes simply how the work is done.
The transferable lesson
Most operations have a leak like this. It is rarely as visible as makegoods; it might be rework, or churn traced to delivery quality, or margin lost to errors caught too late. The pattern is always the same: a cost that is real and recurring but distributed thinly enough across high volume that no single instance forces action.
The fix is never to look harder. It is to design where you look — stage the verification across the flow, weight it by exposure, prove the coverage, and feed every catch back into removing its cause. Do that, and you convert a silent loss into a number you can see, manage, and steadily shrink.
A worked example of the staging logic
Strip away the specifics of any one operation and the staging logic is easy to see. Picture any delivery process that takes an order, configures it, runs it, and then bills for what was delivered. Each of those phases introduces a distinct family of errors, and each family has a moment when it is cheap to catch and a later moment when it has hardened into a liability.
At the order stage, the errors are errors of understanding. What the client agreed to and what got written into the system diverge. A figure is transcribed wrong, a condition is dropped, an assumption goes unstated. Caught here — before anything is built on top of it — the cost of correction is a conversation and an edit. Caught after delivery, the same error is a shortfall against a promise nobody can now meet.
At the configuration stage, the errors are errors of translation. The order was understood correctly, but the set-up that should deliver it does not. A parameter is off, a component is mismatched, a dependency is missed. Caught here, the fix is a reconfiguration before the work runs. Caught later, it is delivered work that has to be redone or compensated.
At the delivery stage, the errors are errors of execution. The set-up was right, but something drifted while the work ran. Caught while the work is still in flight, there is room to correct course. Caught only at billing, there is no room left at all — the period is over, the spend is committed, and the gap is a makegood.
The point of the worked example is not the particular phases. It is the shape. Every error family has an upstream point where it is correctable and a downstream point where it is a liability, and those two points are usually far apart. The entire design problem is to move the check from the second point to the first.
A defect has a half-life. It starts cheap and gets more expensive the longer it survives undetected. Good quality design is the deliberate shortening of that half-life.
How to build it: a step-by-step
If you are standing this up in your own operation, the sequence matters as much as the content. Here is the order I use.
1. Map where the money actually leaks. Before designing a single check, find the failure modes that produce the liability. Pull the last year of shortfalls, rework, or whatever your version of the leak is, and sort them by total cost — frequency multiplied by severity. You are looking for the short list of failure modes that account for most of the money. There is almost always a short list. This is the target.
2. Trace each costly failure mode to its origin. For each one, ask where in the flow it is introduced. Not where it is noticed — where it is born. The two are usually different, and the gap between them is the half-life you are trying to cut.
3. Place a check just downstream of each origin. The check goes immediately after the point where the error enters, not at the end of the line. It verifies one specific thing, and it scores the result against a written standard so that two reviewers reach the same verdict. Resist the urge to make each check do everything; a check that verifies one thing well is worth more than a vague review that gestures at the whole.
4. Weight the depth by exposure. Not every check deserves the same rigour. The failure modes that cost the most get the most thorough verification; the trivial ones get a light touch or none. This is what keeps the framework affordable at real volume.
5. Record what each check covered. Every check produces evidence — what it looked at, what it found, what it passed. This is the part that turns assumed coverage into proven coverage, and it is the part teams most often skip.
6. Feed every catch back into the process. When a check catches a recurring cause, the cause gets removed at source, not just corrected this once. This is what makes the framework lighter over time instead of a permanent tax.
Common mistakes, and how to avoid them
Most attempts to fix a leak like this fail in predictable ways. I have made some of these myself.
Adding reviewers instead of redesigning. The most common response, and the most expensive. More inspection at the same wrong point buys a little coverage at linear cost and fixes nothing structural. The leak persists; it just costs more to monitor. Avoid it by treating headcount as the last lever, not the first.
Checking everything to the same depth. Equal scrutiny feels fair and is quietly wasteful. It spends the same effort on a trivial-exposure item as on a high-exposure one, which means it under-protects what matters and over-protects what does not. Risk-weight deliberately, even though it feels uncomfortable to inspect some things less.
Treating every defect as a one-off. If each catch is corrected in isolation and never traced to its cause, the framework re-catches the same failures forever. The fix is the feedback loop — recurring causes get designed out, not re-corrected.
Scoring by narration. "This looks a bit off" is not a score. It is an opinion, and opinions do not aggregate into a defensible coverage number. Force every check to a binary against a written standard, then count. The discipline is dull and it is the whole point.
Placing the check by convenience, not by correctability. Teams tend to put checks where it is easy to look — usually at the end, where the output is assembled. That is exactly the most expensive place. The check belongs where the error is still cheap to fix, even if that point is more awkward to instrument.
What to measure
A framework like this should be held to its own numbers. Track a small set, and watch them over time rather than in isolation.
- Escaped defect rate. The share of costly failures that reach the client despite the framework. This is the headline measure of whether the staging is catching what it should. It should fall and then stay low.
- Catch point. For each defect, where it was caught relative to where it was born. Pushing catches upstream is the whole game; if catches are creeping back toward billing, the design is drifting.
- Cost of the leak. The total liability the framework exists to prevent, tracked period over period. This is the number leadership cares about, and the one that justifies the framework's existence.
- Coverage. The proportion of high-exposure work that actually passed through the intended checks. Coverage gaps are where the next expensive surprise lives.
- Framework weight. The effort the framework consumes. A healthy framework gets lighter as recurring causes are designed out. If its cost is rising while the leak is flat, you are inspecting, not fixing.
The measure that matters most is the one most teams never track: where defects are caught relative to where they were born. Everything good about this approach shows up in that single number moving upstream.
Where to start
You do not need a transformation programme to begin. You need one number and one honest afternoon.
Start by quantifying the leak. Pull your last year of shortfalls — or rework, or churn traced to delivery, or whatever your distributed loss is — and put a total cost on it. The number is usually larger than anyone has admitted, because no single instance ever forced the reckoning. That total is your mandate.
Then sort those instances by cost and find the short list of failure modes that dominate. Pick the single most expensive one. Trace it to where it is born. Place one scored check just downstream of that origin, weight it to the exposure, and record what it covers. Run it for a few cycles and watch whether that failure mode stops reaching the client.
That is the whole method, proven on one failure mode before you scale it. When the first check works — and it will, because the logic is sound — you extend it to the next costliest mode, and the next. The framework grows the way it should: one risk-weighted, evidence-producing check at a time, each one feeding its catches back into a process that gets quietly more reliable. The silent leak becomes a managed number. And the question the business could never answer — are we catching the failures that cost us the most — finally has evidence behind it.