Skip to content

02Industry

Advertising & agencies

Agency operations run at a volume that does not forgive a weak quality model. When delivered campaigns miss their guarantees, the agency owes makegoods — and undetected shortfalls become a steady, invisible loss. This is the world I led at Publicis Groupe, across 500+ clients and USD 750M+ in media spend.

An agency sells a promise — reach, impressions, viewability, a delivered schedule — and then has to make that promise true across thousands of line items, dozens of markets and a calendar that never stops. The economics are unforgiving in a specific way: when delivery falls short of what was guaranteed, the agency owes a makegood, free inventory handed over to settle the gap. A makegood is margin given away after the fact for a problem that was usually visible, in the data, days before anyone acted on it. At volume, those small, late corrections compound into a real and recurring drag on profitability.

The harder version of the problem is the shortfall nobody catches. A campaign under-delivers, no one notices before the invoice goes out, and either the client finds it — which costs trust as well as money — or it simply becomes an unbilled loss absorbed quietly into the numbers. Multiply a small leakage rate across hundreds of clients and the figure stops being a rounding error. This is the gap a quality model is supposed to close, and the reason manual, end-of-cycle checking fails: there is too much to inspect, the checks are not placed where the money actually leaks, and coverage cannot be proven when a client asks what was actually reviewed.

The approach I bring is staged QA built to run at volume, not heroic checking that collapses under it. At a global advertising network, a three-step makegoods QA framework protected more than USD 20 million in client billings — not by inspecting everything, but by placing the heaviest verification exactly where exposure was greatest and catching shortfalls while they were still correctable. The same discipline lifted a quality score from 95% to 99% across more than 2,000 campaigns and 450 clients. The point is consistency that holds across teams and markets, and coverage you can prove rather than assert.

What tends to break

  • Delivery shortfalls surface late — after invoicing, sometimes after the client finds them.
  • Quality varies across teams and markets, with no provable coverage.
  • Margin leaks through rework nobody can see or quantify.
  • Manual checking can’t keep up with the volume.

How I help

  • Architect staged QA placed where errors are most likely and still correctable.
  • Standardise what’s inspected and scored so coverage is provable and comparable.
  • Weight verification by financial exposure so it runs at real volume.
  • Feed every catch back into delivery to remove recurring causes at source.

Sound familiar?

01

Makegoods are a recurring, painful line.

02

You can’t prove what your QA actually covers.

03

Quality scores differ by team and nobody fully trusts them.

Proof in this sector

$20M+

A three-step Makegoods QA framework that protected $20M+ in billings

Read the case

The fit

Business Excellence & Quality Audits

A clear, evidenced read on how good your operation really is — and where it will break.

In depth

The operating detail for this sector.

Makegoods are a measurement failure before they are a cost

A makegood almost never comes out of nowhere. The under-delivery was sitting in the pacing data well before it became a settlement — the signal was there; the system that should have caught it was not. So I treat makegoods as a measurement problem first and a commercial one second. The question is not "how do we negotiate these down" but "why are we finding them after the period closes instead of mid-flight, when a campaign can still be corrected?" Move the detection forward and a large share of makegoods simply stop happening, because the shortfall is fixed while there is still time to deliver against the guarantee. What remains is smaller, expected, and budgeted for — not a recurring surprise eroding the margin every cycle.

Staged QA placed where the money leaks

Inspecting everything equally is how a quality programme dies — there is far too much volume, so checks get skipped under pressure and coverage becomes a fiction. The alternative is staged QA: verification placed deliberately at the points in the delivery chain where errors are both most likely and still correctable, and weighted by financial exposure so the heaviest scrutiny lands where the largest billings are at risk. A high-value, guarantee-heavy campaign gets more checking than a small, low-risk one. This is exactly the logic behind the three-step makegoods framework that protected more than USD 20 million in billings: not more checking everywhere, but the right checking in the few places that actually move money, designed to run at full volume without collapsing.

Provable coverage, not assertable coverage

There is a question every agency eventually faces from a serious client: what does your QA actually cover? "We check everything" is not an answer that survives scrutiny, and increasingly it is not one clients accept. A defensible quality model standardises what is inspected and how it is scored, so coverage is explicit, comparable across teams and markets, and demonstrable on request. That shift — from confidence to evidence — changes the commercial conversation. It lets you show a prospective client a real, scored baseline rather than a reassurance, and it turns quality from a soft claim into something closer to a competitive asset. In a market where clients audit their agencies, provable coverage is worth more than an impressive description of effort.

Closing the loop so the same defect stops recurring

Catching a shortfall is necessary but not sufficient. If every catch is a one-off save and nothing feeds back into how delivery works, the same class of error returns next cycle and the QA team runs to stand still. The valuable move is to treat each catch as data: cluster the defects, find the recurring root — a trafficking step that is error-prone, a handoff with no clear owner, a market that consistently mis-paces — and design that cause out of the delivery process itself. This is what separates a quality function from a safety net. The net catches falls; the function stops people falling. Over time the verification load actually drops, because the upstream defects that generated it are no longer being created.

Why quality drifts across teams and markets

Agency work is delivered by many hands in many places, and quality almost always varies across them — not because some teams are careless, but because each one carries its own tacit standard of what good means and what counts as a defect. Without a shared, explicit standard, you cannot compare a team in one market to a team in another, and you cannot tell whether a low score reflects worse work or stricter marking. The first job is to define quality precisely enough that it means the same thing everywhere it is measured. Only then is a score honest, and only then can you see the real pattern — which teams need support, which markets carry the most exposure, and where the systemic failure modes actually sit beneath the local variation.

When better QA is not the answer

Sometimes the makegood problem is not a quality problem at all, and I will say so. If campaigns are being sold with guarantees the available inventory cannot realistically deliver, no amount of downstream checking will fix it — the shortfall is engineered at the point of sale, and the honest correction is upstream, in how deals are structured and what is promised. Likewise, if the volume is low and bespoke, heavy staged QA is over-engineering; a lighter touch will do. The discipline earns its place where delivery runs at genuine scale, guarantees are real and frequent, and shortfalls are leaking margin invisibly. Building an elaborate quality machine on top of a sales or inventory problem just adds cost without closing the gap.

Questions

Common questions.

By catching under-delivery while a campaign is still running, not after the period closes. A makegood is almost always a shortfall that was visible in the pacing data days before it became a settlement — the signal was there; the system to act on it was not. Move detection forward, placing checks where exposure is greatest, and a large share of makegoods stop happening because the campaign is corrected while it can still deliver against its guarantee. What remains is smaller and budgeted for. This is the logic of the framework that protected more than USD 20 million in client billings.

Staged QA places verification deliberately at the points in the delivery chain where errors are most likely and still correctable, weighted by financial exposure so the heaviest scrutiny lands on the largest billings at risk. Checking everything equally sounds rigorous but fails in practice — the volume is too high, so checks get skipped under pressure and coverage becomes a fiction. Staged QA is designed to run at full volume without collapsing: more scrutiny on a high-value, guarantee-heavy campaign, less on a small low-risk one. It is the right checking in the few places that move money, not more checking everywhere.

By standardising what is inspected and how it is scored, so coverage is explicit, comparable across teams and markets, and demonstrable on request. "We check everything" does not survive a serious client audit, and increasingly clients will not accept it. A defensible model lets you show a real, scored baseline rather than a reassurance — which shifts the conversation from confidence to evidence. In a market where clients audit their agencies, provable coverage becomes a competitive asset: you can demonstrate quality rather than describe effort, and a prospective client sees proof instead of a promise.

Usually because each team carries its own tacit sense of what good means and what counts as a defect — not because some are careless. Without a shared, explicit standard, you cannot compare one market to another, and you cannot tell whether a low score reflects worse work or stricter marking. The fix is to define quality precisely enough that it means the same thing everywhere it is measured. Only then is the score honest and the real pattern visible: which teams need support, which markets carry the most exposure, and where the systemic failure modes sit beneath the local variation.

Not by asking people to be more careful. At 95% across more than 2,000 campaigns and 450 clients, the residual defects are not random — they cluster into a handful of systemic failure modes. The work was to segment those remaining defects, find the recurring root of each, and design that cause out of the delivery process rather than inspecting harder for it. The last few points of quality are the hardest precisely because they require fixing the system, not the symptom. Once the systemic causes are removed, the score moves and, crucially, holds, instead of drifting back when attention shifts.

Yes — it is designed to make them stronger, not to replace them. The quality layer sits across delivery, feeding every catch back so recurring causes are removed at source, which over time lowers the checking burden on the teams themselves. The aim is a quality function the agency owns, with a clear standard, provable coverage and a cadence that holds the gain — not a permanent dependency on me. Much of the value is in transferring the framework so your own people run it after I have gone, with the systemic defects already designed out of their workflow.

Sometimes it is the deals, and I will say so. If campaigns are sold with guarantees the available inventory cannot realistically deliver, no amount of downstream checking will fix it — the shortfall is engineered at the point of sale, and the honest correction is upstream in how deals are structured and what is promised. Building an elaborate quality machine on top of a sales or inventory problem just adds cost without closing the gap. Staged QA earns its place where delivery runs at real scale, guarantees are frequent, and shortfalls are leaking margin invisibly — not where the gap is created before delivery even begins.

A verification platform measures delivery; it does not decide what happens because of what it measures. Without staged checks placed where money leaks, clear ownership of who acts when pacing slips, and a loop that feeds catches back into delivery, a platform produces accurate data that problems still flow straight past. The tooling is useful, but the makegood leak is a governance and process problem far more often than a tooling one. This work installs the operating system around the data — where to check, who acts, and how recurring causes get designed out — which is what actually protects the margin.