Roughly 30% of the budget on a typical AI engagement does not pay for the system the buyer is paying for. It pays for coordination — status meetings, scoping artifacts, handoffs, change orders, tool sprawl, and decisions that never get made. That is the AI agency tax. It is the largest line item nobody puts on the SOW, and the one the legacy agency model is structurally unable to cut.
This piece quantifies the tax, decomposes it into six named cost lines, and gives a method to bring it down to roughly 10% — the realistic floor for any engagement where two organisations have to stay aligned. It is a spoke under the AI agency manifesto, which argues for the operating model this piece prices.
The 30% number
The 30% figure is the floor of a six-line decomposition, derived by applying three primary research findings to the specific shape of an AI engagement.
Capers Jones’ Software Engineering Best Practices (McGraw-Hill, 2010) puts coordination overhead at roughly 25–35 percent of total effort on small software teams of five to ten people, and over 50 percent on teams larger than 200. AI engagements look like small teams by headcount and large teams by coordination shape — they touch buyer-side legal, security, compliance, data, infra, product, and executive stakeholders, plus the agency’s own delivery, ML, and PM functions. Large-team coordination on a small-team payroll.
Microsoft’s 2024 Work Trend Index found knowledge workers spend roughly 57 percent of working hours on communication and coordination versus 43 percent on creation. On a billable engagement every hour of communication is paid for.
The Standish Group’s CHAOS report has held a baseline software project failure rate around 30 percent for two decades. The cause is rarely “engineering was hard.” It is misaligned scope, missed requirements, stakeholder churn — three names for coordination failure.
Stack those on the AI engagement shape — small headcount, many stakeholders, continuous re-decisioning — and 30 percent is the conservative case. Engagements run on a 2018 SOW-and-PMO template routinely go higher.
Why AI engagements run higher coordination tax
Traditional software engagements assume scope is decided up front and execution proceeds. That assumption broadly held for CRUD applications and integrations.
It does not hold for AI work. AI engagements force continuous re-decisioning across at least five axes that cannot be locked at SOW signing:
- Model choice. Frontier model cost-per-token, latency, and capability shift roughly every six weeks. A model picked at SOW is rarely the right model at launch.
- Retrieval design. RAG architecture (chunking, embedding model, reranker, lexical-vector mix) is determined by the actual content and queries, neither fully knowable at SOW.
- Eval bar. “Good enough” is a moving curve, co-designed with operators running the system in shadow mode.
- Latency and cost budget. Token spend at scale dwarfs build cost in many production agents and is rarely modelled accurately at SOW.
- Failure-mode taxonomy. Failure modes are emergent, surfaced only by running real traffic against real evals.
Every time reality contradicts the SOW on any of these axes, the contradiction becomes coordination work: change orders, replanning, status calls, scoping rewrites, partner escalations, legal redlines. None of that produces a system. All of it is billable.
This is the core argument BCG made in its 2024 Build for the Future analysis: roughly 70 percent of AI deployments that fail to scale do so for people-and-process reasons, not technical ones. McKinsey’s 2024 State of AI survey lands in the same place — organisations capturing measurable AI value are the ones that restructured operating models alongside the deployment.
The six cost lines
| Line | Share | Primary driver |
|---|---|---|
| 1. Status | 8–12% | Recurring meetings whose output is “we are still aligned” |
| 2. Scoping / RFP | 4–7% | SOW writing, change orders, replanning |
| 3. Handoff | 5–8% | Person-to-person and team-to-team context loss |
| 4. Scope-creep dispute | 5–10% | Defending or attacking the original SOW |
| 5. Tool sprawl | 2–4% | Switching between Jira, Linear, Slack, Notion, Loom |
| 6. Judgement debt | 4–6% | Decisions deferred because the meeting is full of status |
| Total | 28–47% |
The 30% headline is the lower band rounded down. The upper band — close to half the budget — is what high-friction enterprise engagements actually run.
Cost line 1: Status (8 to 12 percent)
Status meetings confirm that everyone is still aligned without changing anything. The classic Monday-Wednesday-Friday cadence converts roughly two hours per practitioner per week into restating, plus another two in preparation. On a billable engagement those four hours are charged, and they produce no artifact the buyer can use.
DeMarco and Lister, in Peopleware (3rd edition, 2013), put the post-interruption recovery cost at roughly fifteen minutes for any focused task. A 30-minute status meeting therefore consumes 30 minutes of clock time plus 15 of pre-meeting wind-down plus 15 of recovery — an hour, not half an hour, of effective practitioner time. Multiply by team size and frequency and the share of budget reaches 8–12% before anyone has discussed scope.
The reduction move is not “fewer meetings.” It is replacing status with demos. A weekly Friday demo against the eval set is a meeting that produces an artifact: the buyer sees the system improve in numbers, the agency stops billing for coordination that produces nothing. See AI Project Management and Collaboration for the cadence comparison.
Cost line 2: Scoping and RFP (4 to 7 percent)
The SOW gets written twice — once during sales, once again as “kickoff scope refinement” within the first two weeks — and rewritten at every change order. On a six-month engagement with three change orders, cumulative scoping cost is typically 4–7%.
The cause is the deterministic-software assumption that fails on AI work every time at the model-choice, retrieval-design, and eval-bar level.
The reduction move is to fix scope at the artifact level. Instead of “the system shall summarise Tier 1 tickets with X, Y, Z fields,” contractual scope is “the system passes Eval Set v1, v2, v3 at the threshold defined in the rubric.” The eval set evolves, the threshold evolves, but the contractual progress measure does not need rewriting. See the 7 commitments every AI dev agency should make in writing for the contractual surface.
Cost line 3: Handoff (5 to 8 percent)
Handoff cost is the price of context loss when work moves between people. The 2018 agency model is built around handoffs: account executive to delivery PM to ML engineer to MLOps to support. Each handoff loses context proportional to the gap between sender and receiver.
GitHub’s Octoverse reports consistently show that the highest-velocity engineering teams have the fewest handoffs per change. The agency translation: engagements where the senior practitioner who scoped the work also ships it run dramatically lower handoff cost than engagements where each function is staffed separately.
The reduction move is to collapse the role stack. A senior practitioner running scoping, code, eval design, and weekly demo eliminates four handoff boundaries. The economics only work if practitioners are senior enough to do all four credibly — which is why high-tax agencies whose business model depends on staffing leverage from junior to senior cannot adopt it.
Cost line 4: Scope-creep dispute (5 to 10 percent)
Scope-creep dispute is the most expensive line item, and it is almost entirely produced by fixed-bid pricing on AI work. Fixed-bid pushes the agency to defend the original SOW against discovered reality, converting every reality-driven scope adjustment into a contractual negotiation.
Capers Jones’ data on requirements churn — roughly 1–4% per month on stable work, higher on novel work — implies that on a six-month AI engagement, 6–25% of original scope will need to change just to stay accurate to the problem. On fixed-bid that becomes 6–25% of budget locked in dispute meetings, partner escalations, and change-order legal time.
The reduction move is pricing that rewards eval-bar progress, not feature checkboxes. “System passes Eval Set v2 at 87% threshold” is testable. “System has feature X” routinely degrades into “what counts as feature X.” See hidden costs of AI development for the broader catalogue.
Cost line 5: Tool sprawl (2 to 4 percent)
A typical engagement runs status in Slack, tickets in Jira, docs in Confluence, video in Loom, design in Figma, and code review in GitHub. Atlassian’s State of Teams surveys put time lost to context-switching across these surfaces at 2–3 hours per knowledge worker per week. The reduction move is to commit, at SOW, to one source of truth per artifact: one ticket tracker, one decision log, one demo channel, one eval dashboard.
Cost line 6: Judgement debt (4 to 6 percent)
Judgement debt is the most underpriced line item and the most distinctively AI-native. It is the cost of decisions that have not been made yet — usually because the meeting cadence is full of status — that the system is therefore quietly making in production, in the form of bad answers it should have refused.
In 2026 AI engagements:
- Eval bar is “we will define it later.” System ships, generates ten thousand low-quality answers, eval bar gets defined retroactively at higher cost.
- Human-in-the-loop boundary is “we will iterate on it.” System ships with no boundary, generates a Tier 1 incident, boundary gets defined under crisis pressure.
- Failure-mode taxonomy is “we will catalogue them as they appear.” System ships, the long tail is undocumented, customer support carries the cost of every novel failure.
Each is a deferred decision converted into production cost. The reduction move is a weekly forced-decision cadence: any decision unresolved beyond two weeks gets escalated to a 30-minute decision meeting with a written ADR (architecture decision record) as the only output. See AI development milestones and payments for tying ADR cadence to milestone acceptance.
Four moves to a 10% tax
A 10 percent coordination overhead is the realistic floor where two organisations stay aligned. Below that, alignment breaks down. Above 30 percent, the engagement is paying for the legacy operating model. Four moves bring the number from 30 to 10.
Move 1: Collapse the PMO. Replace the delivery PM layer with a senior practitioner who runs scoping, code, eval design, and the weekly demo. Compresses lines 1, 3, and 6 simultaneously. Junior-staffed agencies cannot do this because their unit economics depend on leverage.
Move 2: Ship, don’t status. Replace recurring status with a weekly Friday demo against the eval set. Highest-leverage single move; cuts line 1 by roughly two-thirds.
Move 3: Fix scope at the artifact level. Define contractual scope as eval thresholds, not feature checkboxes. Compresses lines 2 and 4, the two biggest dollar lines.
Move 4: Run AI-native operating cadence. Three artifacts committed at SOW — eval set, ADR log, runbook — and updated continuously. Weekly cadence is demo (Friday), ADR review (Monday), async work the rest of the week.
These moves do not require new technology. They require the agency to stop running engagements on a 2018 deterministic-software template, and the buyer to stop buying engagements on that template. Both sides have to change for the math to work — the broader argument in the AI agency manifesto.
What buyers should put in the SOW
Buyers fund the legacy model by signing legacy contracts. Three additions to a standard MSA cut the tax structurally:
- Name the artifacts. The agency produces, alongside the system, an eval set, an ADR log, and a runbook. Each delivered at every milestone, not just final.
- Make the cadence a demo. The contractual weekly meeting is a demo against the eval set, not a status report. If the practitioner cannot demo, the meeting is cancelled — the right outcome.
- Define scope at the artifact level. Milestone acceptance is “Eval Set v2 passes at threshold X,” not “feature Y is implemented.”
Three lines in the SOW cut roughly 20 percent of total engagement cost. That is not a process improvement; it is a different operating model written into the contract.
Frequently asked questions
What is the AI agency tax?
The share of an AI engagement budget that pays for coordination — status, scoping artifacts, handoffs, scope-creep, tool sprawl, and unfinished decisions — rather than for the system the buyer is paying for. On engagements run on the legacy SOW-and-PMO model it is roughly 30%.
Why is the AI agency coordination cost higher than for normal software?
AI engagements force continuous re-decisioning of model choice, retrieval design, eval bar, latency budget, and failure-mode taxonomy. The legacy model assumes scope is decided at the SOW. Every time reality contradicts the SOW the contradiction becomes coordination work — change orders, replanning, status syncs, scoping rewrites — none of which produces a system.
Where does the 30% number come from?
It is the floor of a six-line decomposition: status (8–12%), scoping (4–7%), handoff (5–8%), scope-creep (5–10%), tool sprawl (2–4%), judgement debt (4–6%). Consistent with Capers Jones’ 25–35% coordination cost on small software teams in Software Engineering Best Practices, applied to AI engagements which have small-team headcount but large-team coordination shape.
Can the AI agency overhead be eliminated entirely?
No. Some coordination is load-bearing. The realistic floor with an AI-native operating model is around 10%. Below that, buyer and agency stop being able to stay aligned in any meaningful sense.
How do I tell if an agency is running a low-tax operating model?
Ask three questions: who runs the engagement and does that person write code; what is the weekly cadence; what artifacts will the agency commit to produce alongside the system. “Delivery PM, Wednesday status, the system you asked for” is high-tax. “Senior practitioner, Friday demo against your eval set, eval set + ADR log + runbook” is low-tax.
Does fixed-bid pricing reduce the AI agency tax?
Usually not, often the opposite. Fixed-bid pushes the agency to defend the original scope against discovered reality, converting scope discovery into scope-creep disputes — the most expensive line. Pricing that rewards hitting eval-bar milestones outperforms pure fixed-bid for AI work.
How is judgement debt different from technical debt?
Technical debt is code or architecture you took shortcuts on, paid back in refactor time. Judgement debt is decisions you have not made yet, usually because the meeting cadence is full of status. In AI specifically, deferred decisions about the eval bar, failure modes, or the human-in-the-loop boundary become decisions the system itself makes in production — bad answers it should have refused. The cost is paid by users, not by the engineering backlog.
What should buyers put into the SOW to cut the tax?
Three additions. Name the artifacts the agency must produce — eval set, ADR log, runbook — alongside the system. Make the weekly cadence a demo, not a status. Define scope at the artifact level so “the system passes more of the eval set” is the contractual progress measure, not “the system has feature X.”
How long does it take to see the tax come down on a running engagement?
Roughly four to six weeks from the first cadence change. Replacing one weekly status with one weekly demo against an eval set produces measurable change inside the first month — practitioners feel it first, finance reports it within a quarter, the buyer sees it in change-order cadence going to zero.
Does the 30% tax apply to internal AI teams as well?
Partially. Internal teams pay coordination cost in the same six lines, but in headcount and calendar time rather than billable hours, which makes it less visible. The structural lessons apply identically. Internal teams have a harder time renegotiating their own operating model than an external SOW.
Key takeaways
- Roughly 30% of an AI engagement budget is the coordination tax: status, scoping, handoff, scope-creep, tool sprawl, judgement debt.
- The number is conservative. High-friction enterprise engagements run closer to 45%.
- The tax is structurally produced by the legacy SOW-and-PMO model, not by AI being hard.
- The realistic floor is roughly 10%. Four moves get there: collapse the PMO, ship-don’t-status, fix scope at the artifact level, run AI-native operating cadence.
- Buyers fund the legacy model by signing legacy contracts. Three SOW additions — named artifacts, demo cadence, artifact-level scope — change the math.
The agency tax is not an inevitability. It is a contract you signed.
Arthur Wandzel