Business Excellence

How to set up a Business Excellence function from scratch

Ashish Kumar Agnihotri·14 January 2026·14 min read

Most companies do not decide to build a Business Excellence function. They back into one — usually after a quality failure large enough to reach the board. By then the conversation is about blame, not design. This is how to build it deliberately instead, in the order that actually works.

I have stood up quality and excellence functions inside operations running at the scale of thousands of campaigns and hundreds of clients. The mistake I see most often is treating Business Excellence as a team you hire rather than a system you design. You can hire ten quality analysts and still have no Business Excellence function — because the function is not the people. It is the standard, the measurement against it, and the governance that acts on what the measurement finds.

Here is the sequence I use.

Start with the standard, not the staffing

The first question is never "who do we hire?" It is "what does good look like, in terms specific enough to score?" Until you can answer that, a quality team has nothing to measure against and will default to catching whatever it personally notices — which is inconsistent by definition.

A standard is not a values statement. It is a concrete, written definition of an acceptable output for each kind of work you deliver, specific to your clients and your risk. For a campaign delivery operation, that might be a precise specification of what "delivered as sold" means, line by line. For a finance process, it is the set of conditions an approval must satisfy. The test of a good standard is simple: two different reviewers, applying it to the same piece of work, reach the same verdict.

Measure honestly, even when it is uncomfortable

Once the standard exists, you measure against it — and the first measurement is almost always worse than leadership expects. That is the point. A Business Excellence function that produces flattering numbers in its first quarter is not measuring; it is reassuring.

Three principles make the measurement trustworthy:

Sample the real work, not a showcase. Pull the sample at random from live output, not from the pieces a team chooses to submit. The gap between those two is itself a finding.
Score, do not narrate. A defect either breaches the standard or it does not. Replace "this could be better" with a binary against the standard, then count.
Trace every defect to a cause in the system. A defect is not "an analyst made a mistake." It is "the process allowed a mistake to reach the output." The cause lives in the design, and that is where it gets fixed.

The output of this stage is a quality score that is honest, comparable over time, and — crucially — decomposed by failure mode. You are not trying to produce one number. You are trying to learn which few failure modes produce most of the defects, because that is where remediation pays.

Separate the systemic few from the long tail

Every operation has two kinds of defect. There is a long tail of one-offs — genuinely random, expensive to chase, and largely unavoidable. And there are a small number of systemic failure modes that recur because something in the process design invites them. These two require completely different responses, and conflating them is how excellence programmes waste their first year.

The last few points of quality are never an effort problem. They are a design problem, and only a design response moves them.

In one enterprise operation, the residual defects at a 95% quality score turned out to be dominated by a handful of systemic causes. Once those were designed out — not inspected out, designed out — the score moved to 99% across more than two thousand campaigns, and it held, because the gain was built into the controls rather than carried by extra vigilance.

Build controls into the flow, not onto the end

The instinct of a young quality function is to add a final inspection gate: check everything before it ships. This feels rigorous and is quietly disastrous at scale. A terminal gate catches defects at the most expensive possible moment — after all the work is done — and it scales linearly with volume, so it becomes the bottleneck the moment you grow.

The better design places lightweight checks at the points in the flow where a given error is both most likely to occur and still cheap to correct. This is the difference between inspecting quality and building it in. A control early in the flow that prevents a class of defect is worth far more than a thorough inspection that catches it at the end.

Wire in governance, or the gains will not hold

Here is the step most quality functions skip, and the reason most excellence programmes regress within a year: governance. Measurement tells you the score. Controls hold the line within a process. Governance is what makes the organisation act when the score moves — and act through clear decision rights rather than another meeting.

Concretely, governance means three things:

Thresholds. A defined level at which a metric moving triggers a response — not a vague sense that "we should look into that."
Owners. A named person accountable for each metric, empowered to act, not merely to report.
Escalation. A path for when a threshold is breached and the owner needs authority they do not have.

With those in place, the quality score stops being a number that gets admired in a monthly deck and becomes a number that the organisation steers by.

The order matters more than the speed

If you take one thing from this: build in this order — standard, measurement, controls, governance — and resist the temptation to jump ahead. Teams that hire analysts before they have a standard get inconsistency. Teams that build controls before they have measurement fortify the wrong points. Teams that skip governance watch their gains evaporate.

A Business Excellence function built in the right order is not a cost centre that polices the business. It is the system that lets the business scale without quality becoming the thing that breaks first. That is the whole point: to make quality structural, so that growth — the thing everyone actually wants — does not quietly erode the standard of the work along the way.

A worked example of writing a standard

The standard is where everything begins, and it is the step most teams rush. So make it concrete. Suppose you deliver a recurring service with several moving parts — an order is taken, work is configured, work is run, and a result is reported back to the client. A standard for that service is not a sentence about excellence. It is a line-by-line definition of what an acceptable output looks like at each step, written so that two reviewers reach the same verdict.

For the order step, the standard specifies exactly what must be captured and what each captured item must satisfy — every required field present, every figure matching what was agreed, every condition recorded. A reviewer can check each line and mark it pass or fail without interpretation.

For the configuration step, the standard specifies what a correct set-up looks like for the order as written — the parameters that must match, the dependencies that must be present, the conditions that must hold before work begins. Again, each line is checkable.

For the result step, the standard specifies what "delivered as sold" means in concrete terms — the output reconciles to the order, the conditions agreed were met, the report says what actually happened. No line relies on the reviewer's mood.

The test is the one that matters: hand the same piece of work to two reviewers with this standard, and they reach the same verdict. If they disagree, the standard is not yet specific enough, and that is the work to do before anything else. A standard you can argue about is a standard problem wearing the costume of a quality problem.

A standard is not a statement of values. It is a written definition specific enough that two reviewers, applying it to the same work, reach the same verdict. Anything vaguer is an opinion with a job title.

A worked example of separating the systemic few from the long tail

Once you measure honestly against the standard, the defects sort themselves into two piles, and the whole future of the function depends on telling them apart.

Imagine the first honest measurement returns a quality score below where leadership hoped, with a list of defects attached. Resist the urge to read the list as a flat catalogue of mistakes. Decompose it by failure mode — group the defects by what actually went wrong — and a pattern almost always appears. A small number of failure modes account for most of the defects. The rest is a long tail of genuine one-offs.

The two piles demand opposite responses. The long tail is random, expensive to chase, and largely unavoidable; trying to eliminate it burns the function's first year for almost no return. The systemic few are different. They recur because something in the process design invites them — a step where it is easy to make the same error, a handoff that loses the same information, a point where the standard is easy to miss. These are not effort problems. They are design problems, and only a design response moves them.

The discipline is to ignore the tail and attack the systemic few — to find the handful of causes that produce most of the defects and design them out at source, rather than inspecting harder everywhere. Designed out, those failure modes stop occurring, and the gain holds because it is built into the process rather than carried by extra vigilance. That is the difference between a score that improves and stays improved, and one that drifts back the moment attention moves on.

Common mistakes, and how to avoid them

Building this function wrong follows a predictable script. I have seen each of these, and some I have had to unwind.

Hiring analysts before there is a standard. The most common error. Without a standard, a quality team catches whatever it personally notices, which is inconsistent by definition — two analysts disagree, and neither is wrong, because there is nothing to be right against. Write the standard first. The team has nothing to measure until you do.

Producing flattering first numbers. A function whose first quarter looks reassuring is not measuring; it is performing. The first honest measurement should be worse than leadership expects — that gap is the entire reason the function exists. Sample real work at random, score against the standard, and report what you find.

Building controls before measurement. Controls placed before you know which failure modes dominate fortify the wrong points. You end up with rigorous checks on things that rarely fail and nothing where the real leak is. Measure first, find the systemic few, then place controls where they pay.

Adding a terminal inspection gate. The young function's instinct is to check everything before it ships. This catches defects at the most expensive possible moment and scales linearly with volume, so it becomes the bottleneck the instant you grow. Build checks into the flow where errors are still cheap to correct, not onto the end.

Skipping governance. The reason most excellence programmes regress within a year. Measurement tells you the score and controls hold a process, but without thresholds, owners, and escalation, nothing makes the organisation act when the score moves. The gains evaporate quietly. Wire in governance or watch the work undo itself.

What to measure

A Business Excellence function should be held to a small, honest set of numbers — and the numbers should be decomposed, not aggregated into a single comforting figure.

Quality score against the standard. The headline measure, scored binary against the written standard on a random sample of real work. It must be honest, comparable over time, and trusted enough to act on.
Defects by failure mode. Never just one number. The decomposition is the point — it tells you which few failure modes produce most of the defects, which is where remediation actually pays.
Systemic versus tail share. How much of the defect load comes from recurring systemic causes versus genuine one-offs. This tells you whether there is design work to do or whether you are at the irreducible floor.
Recurrence after remediation. Whether a failure mode you designed out actually stops occurring. A cause that recurs after remediation was inspected out, not designed out — the fix did not hold.
Action closure in the operating review. The share of governance actions opened that actually get closed. A review that does not close actions is theatre, and this number is how you catch it.

Do not chase a single quality number. Chase the decomposition. One score tells you how you are doing; the breakdown by failure mode tells you what to fix — and the second is the only one that pays.

Where to start

Do not begin by hiring, and do not begin by buying a tool. Begin by writing down what good looks like for one kind of work you deliver.

Take a single process and define an acceptable output for it, line by line, specifically enough that two reviewers would score the same piece of work the same way. Test it exactly that way — hand the same output to two people and see if they agree. Where they do not, the standard needs sharpening. That sharpening is the foundational work, and it is worth doing slowly.

Once the standard holds, measure against it. Pull a random sample of real, live work — not a showcase — and score it binary against the standard. Expect the result to be uncomfortable; that discomfort is the function proving it measures rather than reassures. Then decompose the defects by failure mode, find the one systemic cause that produces the most, and design it out at source. Place a lightweight control in the flow to prevent it recurring, and wire in the one thing that makes the gain hold: a threshold, an owner, and a short operating review that closes its own actions.

That is the whole function in miniature — standard, measurement, controls, governance — proven on one process before you scale it. Build it in that order and resist jumping ahead. When it works on the first process, you extend it to the next, and the function grows the way it should: as a system that makes quality structural, so the business can scale without the standard of its work quietly eroding underneath the growth.

Business ExcellenceQuality AuditsOperations Governance

How to set up a Business Excellence function from scratch

Start with the standard, not the staffing

Measure honestly, even when it is uncomfortable

Separate the systemic few from the long tail

Build controls into the flow, not onto the end

Wire in governance, or the gains will not hold

The order matters more than the speed

A worked example of writing a standard

A worked example of separating the systemic few from the long tail

Common mistakes, and how to avoid them

What to measure

Where to start

First Pass Yield — the quality metric most teams ignore

Using AI to run quality control in delivery operations

The Makegoods QA framework that protected $20M in client billings

Start with the standard, not the staffing#

Measure honestly, even when it is uncomfortable#

Separate the systemic few from the long tail#

Build controls into the flow, not onto the end#

Wire in governance, or the gains will not hold#

The order matters more than the speed#

A worked example of writing a standard#

A worked example of separating the systemic few from the long tail#

Common mistakes, and how to avoid them#

What to measure#

Where to start#

First Pass Yield — the quality metric most teams ignore

Using AI to run quality control in delivery operations

The Makegoods QA framework that protected $20M in client billings

Start with the standard, not the staffing

Measure honestly, even when it is uncomfortable

Separate the systemic few from the long tail

Build controls into the flow, not onto the end

Wire in governance, or the gains will not hold

The order matters more than the speed

A worked example of writing a standard

A worked example of separating the systemic few from the long tail

Common mistakes, and how to avoid them

What to measure

Where to start