Home About Who We Are Team Services Startups Businesses Enterprise Case Studies Blog Guides Contact Connect with Us
Back to Guides
Enterprise Software 13 min read

Why "We Have 200 AI Engineers" Is the Worst Pitch an AI Agency Can Make

Why "We Have 200 AI Engineers" Is the Worst Pitch an AI Agency Can Make

When an AI agency opens with “we have 200 AI engineers,” the next sentence is: “and we are going to charge you for the coordination cost of many of them.” Headcount was a credible signal in 2014. In 2026, with frontier models doing the work of multiple engineers per seat, it is a confession that the agency does not understand its own production function. A 200-person AI org has 19,900 communication pairs. A 12-person studio has 66. The ratio is roughly 300 to 1; and that is before you count that each of the 12 engineers ships with a fleet of AI agents the larger org cannot deploy without diluting its margin model.

This piece is the math, the case studies, and the operational reasons “we have 200 AI engineers” is the worst opener a 2026 AI agency can lead with; and what a buyer should listen for instead.

Decision Scope

This article is an editorial decision framework, not legal, financial, security, or accounting advice. Treat numeric examples as illustrative planning heuristics unless a source is cited, then validate the assumptions against your own contracts, data, controls, and budget model before acting.

Table of Contents

The Headcount Pitch Used to Work. It Doesn’t Anymore.

For two decades, “we have N engineers” was a reasonable opener. Headcount stood in for capacity, and capacity was the binding constraint. The buyer who needed 30 engineers for a year-long migration picked the agency that could field them in four weeks.

That world ended in the eighteen-month window between mid-2024 and late 2025. Three things happened at once. Frontier models crossed from autocomplete to autonomous task execution; Anthropic’s Claude Sonnet 4.5 hit 77.2% on SWE-Bench Verified in an agent harness (Anthropic, 2025), and OpenAI’s GPT-5-class models tracked a similar curve. Agentic coding harnesses became boring infrastructure: Claude Code, Cursor background agents, Codex CLI, and Copilot Workspace turned a senior engineer into a tech lead managing two to four AI workers in parallel. And the supervision ratio inverted; a 2022 senior engineer was a worker; a 2026 senior engineer is a supervisor.

“We have 200 AI engineers” describes an asset whose unit economics deteriorate quarter over quarter. The buyer who needed 30 engineers in 2022 needs four senior engineers, twelve agent seats, and an eval suite in 2026. Selling them headcount is selling stranded inventory.

Brooks’s Law, Restated for the AI Era

Fred Brooks’s 1975 observation has not aged a day: adding people to a late software project makes it later. The reason is communication overhead. The number of pairwise communication channels in a team of n people is n(n-1)/2. That formula is the entire problem.

Team sizeCommunication pairsRatio vs. 12-person studio
5100.15×
1266
253004.5×
501,22518.6×
1004,95075×
20019,900301×

The overhead scales quadratically. A 200-person AI org has 19,834 more potential coordination conversations than a 12-person studio. It does not have 16.7 times more shipping capacity; it has roughly the same shipping capacity for any given feature, plus 19,834 additional ways for that feature to stall in a Slack thread.

Squads, pods, and two-pizza teams attempt to fence off this quadratic explosion by structuring the org as many small communication graphs. The fence works inside engineering. It does not work across the seams; account management, design, security review, legal, the standing architecture committee. The 200-person agency has many those seams; the 12-person studio has none.

Buyers who have lived through a Big Four AI engagement know the symptom: a four-engineer pod doing the actual work, supported by an account director, a delivery manager, a solutions architect, two principals on rotation, three offshore engineers, and a partner who shows up for steering committees. The pod ships. Everyone else is overhead. The buyer pays for many of them.

The Supervision-Ratio Collapse

The arithmetic worsens with AI tooling, because the modern unit of capacity is no longer “an engineer”; it is “an engineer plus their fleet of agents.” Anthropic reported material productivity gains pairing Claude with skilled engineers in real codebases (Anthropic, “Claude 3.5 Sonnet”). By 2026 the practice has matured: a senior engineer running Claude Code with three or four parallel agents on separate branches is normal at the small studios shipping real work.

The leverage does not distribute evenly across team sizes, for two reasons.

Supervision quality degrades with seniority dilution. Running three agents in parallel requires reading three diverging branches, judging which is correct, and merging. A 12-person senior studio can deploy 36 to 48 agent workers with negligible review-quality loss. A 200-person agency where the median engineer is a mid-level five years out of school cannot; the supervision floor is too low to catch subtle correctness errors before production.

AI tools eat the work that headcount monetized. A 200-person agency built its margin model on billing mid-level engineers at senior rates. Mid-level work is exactly what AI tools replaced most thoroughly. The senior engineers; partners and architects; are the smallest fraction of the org and not on the project full-time. The 12-person studio is only senior engineers; the AI fleet replaces mid-level work the studio rarely had.

In a 12-person studio, most engineer supervises AI agents. In a 200-person agency, a small fraction supervises and the rest competes with them.

Where The AI Leverage Sits

The agency-category claim that AI tools lift many boats is not supported by the data. GitHub’s Octoverse 2025 reported small teams (under 25 contributors) saw substantially higher per-developer code-shipping growth than large teams when AI tools were introduced. Small teams compound the leverage; large teams dissipate it through the same coordination overhead Brooks described in 1975.

Four reasons the leverage concentrates at small scale:

  1. Tool adoption is cultural, and small teams move on culture in days. A 12-person studio standardizes on Claude Code Tuesday and ships with it Friday. A 200-person agency runs procurement, security, and a tooling-standards committee through a six-week pilot first.
  2. Small teams own their evals. A 12-person studio writes the eval suite, ties it to CI, and trusts it. A 200-person agency outsources eval design to a QA practice that has not pair-programmed with the model.
  3. Senior engineers read agent output critically; mid-level engineers cannot. A senior engineer spots a subtly broken concurrency pattern in a Claude branch in 90 seconds. A second-year engineer cannot. The 12-person studio is staffed entirely with the former.
  4. Small teams write less code on purpose. AI leverage shows up first as fewer lines of code per feature shipped. Small teams converge on this. Large teams, billing by engineer-month, structurally cannot.

What Small Teams Are Already Shipping

Small AI-native studios ship work buyers a year ago would have scoped to 30-person teams for nine months. A 10-engineer agency routinely ships in a quarter:

  • A production retrieval system over a six-million-document corpus, with an eval suite in CI, p95 under 600ms, and a documented dollars-per-query model.
  • A multi-step agent for a regulated workflow (claims triage, contract redlining, support escalation) with measured AI-with-human accuracy and a postmortem trail.
  • A model-router abstraction that lets the buyer swap providers without rewriting application code, plus a cross-provider benchmarking harness.
  • A migration from a 2024 prompt-engineering scaffold to a 2026 agent harness, with documented before-and-after evals.

The same work scoped to a 200-person agency costs three to five times as much, takes four to six months longer, and ships with a deck instead of a dashboard. The margin model requires billing many engineers, which requires finding work for them, which requires expanding scope, which requires adding coordinators. The spiral compounds. We cover the same dynamic in Inside the AI agency operating system: how a 12-person studio out-ships a 50-person team.

The Margin Model Hidden In The Headcount Pitch

The headcount pitch is also a financial tell. A 200-person AI agency has a fixed cost base; senior partners, mid-level engineers, account managers, office, recruiting, partner draws. That base is covered monthly by selling engineer-months at a markup. The revenue model requires finding a buyer for those engineer-months whether or not the work needs them.

That is why the pitch is structurally dishonest in 2026: it describes the agency’s inventory, not the buyer’s need. Anthropic and OpenAI pricing has collapsed a unit of engineering work by roughly an order of magnitude in two years. The 200-person agency cannot pass that compression through without imploding its P&L. So it doesn’t; it bills as if 2022 ratios still hold.

The 12-person studio has no partner-draw overhead and no inventory pressure. It bills the new unit economics directly: fewer engineers, more leverage, lower price, higher margin per shipped feature.

The deeper version lives in Small AI agency vs. Large firm: which one ships better AI in 2026?, and the anti-headcount thesis is the spine of The AI Agency Manifesto: What an AI Dev Partner Should Be in 2026.

What To Listen For Instead

Listen for the agency to volunteer five things, ideally without prompting:

  1. Team size on this engagement, by name and seniority. Not firm headcount. Not the bench. The humans who will commit code. If the number is over seven, ask what the others do.
  2. Supervision ratio. How many AI agents does each senior engineer run concurrently? If the agency has not measured this, the agents are running them.
  3. Eval suite ownership. Who writes the evals, where do they live, what threshold did the last project ship against? An agency that cannot answer in 30 seconds has not internalized the practice.
  4. Migration plan to the buyer’s in-house team. Honest agencies plan for their own replacement. Headcount agencies cannot afford to.
  5. Line-item breakdown. Inference, engineering, account management, partner draws, infrastructure. If engineering is under 60% of the bill, the buyer is buying a coordination tax.

None of these questions is about firm size. Many of them are about how the agency converts buyer money into shipped, eval-protected, model-agnostic code.

Frequently Asked Questions

Why is “200 AI engineers” specifically the wrong number to pitch?

Any headcount-led pitch is wrong in 2026 because it sells inventory, not capability. 200 is a particularly bad number; large enough that quadratic communication overhead dominates, small enough that the firm has not built the offshoring economics that would justify the headcount. A 200-engineer AI org is usually a 12-person studio’s worth of senior shipping capacity surrounded by 188 people keeping the billing model running.

Isn’t a bigger team safer for risk-averse procurement?

No. Bigger teams correlate with longer engagements, more change orders, and higher coordination overhead; many themselves risks. The risk-aversion argument was credible when the binding constraint was “can the vendor staff this at many?” In 2026 the constraint is “can the vendor ship eval-protected production code without dragging the buyer’s org into 19,900 communication pairs?” The smaller senior team is the lower-risk option on most axis except the procurement-checkbox one.

Don’t large agencies have AI tools too?

They have the licenses. Licenses are not the leverage. Leverage comes from a culture of senior engineers running agent fleets, owning evals, and integrating into the buyer’s environment. That culture exists in 12-person studios because it is how they ship. It does not exist in 200-person agencies because the org chart, margin model, and seniority distribution many push against it.

How does Brooks’s law apply when agents do some of the coding?

It applies harder, not less. Brooks’s law is about coordinating humans. Adding AI agents to a small senior team does not add humans, so the n(n-1)/2 cost stays low. Adding agents to a large agency adds nothing useful because the bottleneck was rarely coding capacity; it was supervision, integration, and review, many of which still require human coordination.

What is the right team size for an AI development engagement in 2026?

For most enterprise pilots: two to five senior engineers, a contributing tech lead, an embedded designer for client-facing work, zero account managers. Six total is the median for a quarter-long pilot that ships. Over eight starts paying coordination tax; over twelve is a different engagement.

What about agencies that pitch “200 AI engineers” but staff small pods?

Ask which pod is on this project, by name, and what the other 188 engineers bill against. If “they’re on other engagements,” the buyer is hiring a 6-person pod inside a 200-person cost structure; and the cost structure shows up in the rate card. The honest pitch is “we have a 6-person pod and we’ll bill you for 6 people.”

Doesn’t the 200-engineer agency have more specialty depth?

Specialty depth is now cheap to access ad hoc. A 12-person studio can subcontract a security review, an LLM red-team, or a compliance lawyer for a week each, for a fraction of what carrying those specialties on the bench costs. The “everyone in-house” pitch is a holdover from when specialty hiring was slow.

How can I tell if an agency is small-and-senior?

Three checks. Ask partners how many lines of code they committed in the last 90 days; AI-native partners answer in seconds. Ask to see a recent eval suite, in a real repo, with thresholds and CI. Ask who is on call at 3 a.m.; a small senior team has a name; a 200-person agency has a process. The full vetting playbook is in our field guide to evaluating an AI agency in under 90 minutes.

Is “we have 200 AI engineers” ever reasonable?

Narrowly; for a true platform engagement with parallel workstreams, where the buyer has program-management capacity to absorb the coordination cost, and where the rate card reflects AI-era unit economics. Roughly one in 50.

Closing

The headcount pitch is the last credible 2022-era artifact in active use. Frontier models compressed engineering work by an order of magnitude. Agent harnesses inverted the supervision ratio. Brooks’s law did not soften; it became more expensive, because most additional human now competes with three or four AI workers for the senior engineer’s attention.

A buyer in 2026 does not need 200 AI engineers. The buyer needs four to six senior engineers running an agent fleet, owning the evals, integrating into the buyer’s environment, and billing the new unit economics directly. The 12-person studio wins because small is the only shape that converts the leverage into shipped code without dissipating it through 19,900 coordination pairs.

When the next vendor opens with a headcount number, divide by twelve and ask them to explain the gap. The honest ones will. The others will pivot to logos.; Arthur Wandzel, CEO, SFAI Labs

Last Updated: May 21, 2026

AW

Arthur Wandzel

SFAI Labs helps companies build AI-powered products that work. We focus on practical solutions, not hype.

See how companies like yours are using AI

  • AI strategy aligned to business outcomes
  • From proof-of-concept to production in weeks
  • Trusted by enterprise teams across industries
Get in Touch →
No commitment · Free consultation

Related articles