More than half of the AI development agencies operating in May 2026 will be closed, sold at distress prices, or pivoted out of agency work by Q4 2027. This is a structural call, not a recession call. Five compounding pressures — LLM commoditization, model unification, agent-template economics, the foundation-model gateway shift, and a senior-engineer talent reset — are arriving simultaneously, on the same cost structure, in the same 18-month window.
The pattern is not new. Web agencies consolidated between 2008 and 2014. Mobile agencies consolidated between 2014 and 2019. Each cycle followed Clay Christensen’s disruption shape almost verbatim, and the AI cycle is running roughly 3x faster than either. The instructive question is no longer whether the market compresses. It is which archetypes survive, which do not, and what an AI agency founder should do in the next 90 days. This is a contrarian read on a market most people are still calling “early innings.”
The thesis in one paragraph
The default AI agency in 2026 — a 10–30 person team selling discovery, prototyping, and integration work across any vertical — was priced into a market structure that no longer exists. Token prices have collapsed roughly 100x in 24 months. Frontier labs ship enterprise-grade products into the same buying centers the agency tier sells into. Open-source agent templates eliminate the 60% of agency work that was undifferentiated scaffolding. Hyperscaler gateways absorb procurement at the platform layer. Senior AI engineers, the agency’s only durable margin lever, are being bid out by labs and product companies. None of these forces is fatal alone. Together, on a 12–24 month clock, they remove the conditions that made the generalist mid-tier agency viable. What survives is what an AI agency should have been all along — a discussion taken up in the AI Agency Manifesto.
Consolidation is not new, but the clock is faster
Services-firm consolidation is one of the most studied phenomena in modern strategy. Clayton Christensen, Dina Wang, and Derek van Bever’s Consulting on the Cusp of Disruption (HBR, October 2013) maps the pattern: bottom-of-market commodity work automates first, top-of-market relationship work survives on access and trust, and the middle gets crushed.
| Cycle | Boom | Compression onset | Steady state | Duration |
|---|---|---|---|---|
| Web agencies | 1998–2007 | 2008 | 2014 | ~6 years |
| Mobile agencies | 2010–2013 | 2014 | 2019 | ~5 years |
| Data/ML agencies | 2014–2018 | 2019 | 2022 | ~3 years |
| AI/LLM agencies | 2023–2025 | 2026 | 2027–2028 | 12–24 months |
The compression is faster because the pressures arrive faster. Foundation-model price/performance moves on a 6–9 month cadence rather than the 24–36 month enterprise-software cadence that governed prior cycles. Bain’s Global Technology Report 2024 and McKinsey’s State of AI 2024 both document a step-change in enterprise GenAI adoption — McKinsey reports 65% of organizations using generative AI regularly in 2024, roughly double 2023. Adoption is up; agency-tier capture of that adoption is down. Layoffs.fyi tracked roughly 263,000 tech layoffs in 2023 and 152,000+ in 2024, a meaningful share of them senior AI/ML engineers being absorbed by labs and product companies — the exact talent layer agencies depend on.
Pressure 1: LLM commoditization
Frontier-class model output has fallen roughly 100x in 24 months. GPT-4 launched in March 2023 at $30 input / $60 output per million tokens. By late 2024, frontier-class small models (Claude Haiku 3.5, GPT-4o mini, Gemini 1.5 Flash) priced between $0.15 and $0.50 per million tokens for comparable quality on most production tasks. By mid-2025, sub-$0.10/M-token tiers were normal.
Two consequences for agencies. First, model usage is no longer a line item the customer needs help optimizing — the “we’ll cut your inference bill 70%” pitch was real in 2023 and is now too small to justify a consultant. Second, the architectures agencies charged for (prompt caching, semantic routing, prompt compression, multi-model fallback, structured outputs) are now SDK flags in the Anthropic, OpenAI, and Google SDKs. When the underlying input commoditizes, the value moves up the stack. Agencies whose deliverable was effectively “a working integration with a foundation model” are now selling a generic, and generic services do not sustain $250–$400/hour blended rates.
Pressure 2: Model unification
In 2023–2024, multi-model orchestration was a real engineering problem. Each lab had distinct strengths and incompatible APIs. Agencies built routing layers, eval frameworks, and abstraction libraries on top of the divergence, and billed for them.
By 2025 the divergence is closing on three fronts: capability surface, API surface, and tool-use surface. OpenAI, Anthropic, and Google all ship long-context, multimodal, native tool-use, structured output, and vision in their flagship models. The OpenAI-style chat completions schema is the de facto interchange format. Agent frameworks (LangGraph, OpenAI Agents SDK) normalize orchestration. Abstraction libraries have converged on a small set (LiteLLM, Vercel AI SDK, providers’ own SDKs). Buyers do not need an agency to build a fourth one. The “AI orchestration platform” pitch — a major agency revenue source through 2024 — has lost most of its premium. For deeper context, see the 7 commitments every AI dev agency should make in writing.
Pressure 3: Agent-template economics
A representative AI agency in 2024 sold roughly the same shape of project repeatedly: a chat interface, a retrieval layer over customer documents, an agent loop with three to seven tools, eval harnesses, deployment scripts, observability. Inside the firm this was scaffolding. Outside it was billed as a $250K–$400K bespoke engagement.
By 2026 most of that scaffolding is templated and free. OpenAI Agents SDK ships chat, tools, retrieval, and tracing as a single primitive. LangGraph templates cover the common multi-agent patterns end-to-end. Anthropic Skills and Computer Use ship complete agentic workflow primitives. Vercel AI SDK ships streaming UI, tool-use UX, and evaluation. Microsoft Copilot Studio ships a no-code agentic builder for Office-tenant customers. The marginal effort to stand up a competent agent on customer data has dropped from “8 weeks of two engineers” to “2 days of one engineer plus a Friday afternoon eval pass.” The 60% of agency work that was scaffolding has zero willingness-to-pay. The remaining 40% — genuine domain integration, evaluation, security, and operations — does not support a 30-person firm at 2024 pricing.
Pressure 4: The foundation-model gateway shift
Procurement is moving directly to the labs and to hyperscaler-managed AI platforms. Three observable patterns: direct enterprise contracts (OpenAI, Anthropic, Google Gemini Enterprise) that bypass the systems-integrator and agency tiers; hyperscaler managed platforms (AWS Bedrock, Azure AI Foundry, Google Vertex AI) that bundle multi-model access, retrieval, agent orchestration, evaluation, and governance behind a single contract a CIO already has signed; and first-party products from the labs themselves — ChatGPT Enterprise, Projects, GPTs, Assistants, Agents Builder; Claude Projects, Skills, Files, Computer Use; Gemini Enterprise, NotebookLM, Code Assist; Copilot Studio, Foundry. Each occupies space the agency tier was selling into 6–12 months earlier.
A foundation lab building first-party products that compete with the agencies it nominally partners with is the textbook channel conflict signal that ends services-cycle expansion. It is what Salesforce did to its early ISVs in 2010–2014 and what AWS did to its launch-partner consultancies in 2017–2020. For agencies whose value proposition was “we’ll integrate a foundation model into your stack,” the gateway has moved. For how this changes the in-house decision, see build AI in-house vs. outsource.
Pressure 5: The senior-engineer talent reset
Through 2023, the binding constraint on AI agency growth was senior engineers. The talent layer has reset on three fronts. Frontier labs and product companies have bid up the senior-engineer floor — OpenAI, Anthropic, Google DeepMind, Meta, xAI, and a long tail of well-funded startups now pay total comp between $700K and $5M+ for engineers an agency would have hired at $250K–$400K in 2023. Coding agents (Claude Code, Cursor, GitHub Copilot Workspace, Devin) have roughly doubled to quadrupled senior throughput on integration work while compressing junior marginal productivity, moving the workable senior-to-junior ratio from 1:4 to 1:1–2. And Layoffs.fyi shows displaced senior AI/ML engineers being absorbed by product companies and labs, not by agencies — net flow is out of the agency tier.
This is the most existential of the five pressures because it removes the agency’s only durable margin lever. An agency cannot win a price war against a $0.05 token, but it could historically win on senior engineering. The talent layer used to be a moat. It is now a bidding contest the agency cannot win. For staffing implications, see AI development agency vs. in-house team.
Which agency archetypes survive
Five archetypes are well-positioned to grow through the compression:
- Vertical specialists with proprietary data or distribution. A firm with exclusive customer access in a regulated vertical (healthcare claims, legal discovery, energy trading, defense) trades on relationships and data assets, not generic AI engineering. The labs have neither the access nor the risk appetite to compete here directly.
- Embedded operating partners. Fractional CTO, Head of AI, or Head of Applied AI engagements at a flat retainer for 6–24 months. The value is judgment, not throughput.
- Original-product agencies. Firms using services revenue to fund a wedge product that will eventually replace services as primary revenue — the Basecamp / 37signals model applied to AI.
- Compliance and regulated-industry boutiques. HIPAA, FedRAMP, SOC 2 Type II, ISO 42001, EU AI Act, NIST AI RMF, OCC model risk management. The buyer cannot replace them with an SDK call, and the labs cannot replace them without owning the audit posture.
- Small top-tier teams (3–7 senior engineers, $1M+ ARR per head) doing frontier-difficulty work. Long-context document automation, novel agent architectures, multi-modal applications, custom post-training, on-device inference. They compete on bench depth and price as products even when they bill as services.
What unifies all five is that what they sell is not generic AI engineering. Each sells a domain asset, a trust-and-judgment relationship, productized IP, regulatory posture, or frontier-difficulty depth — none of which is on the foundation lab’s six-month roadmap.
Which agency archetypes do not survive
Three archetypes face the worst exposure. Generalist mid-tier agencies (10–30 people, all-vertical, all-stack, $250–$400/hour blended) — the modal AI agency in 2026 and the most exposed quadrant; the “we do AI for any vertical” pitch has no defensible position against any of the five pressures. Pure prompt-engineering shops — the category never had a moat and the shrinking surface of frontier models has eliminated most of the bespoke prompt engineering. Reseller-plus-glue agencies — firms whose value was being a channel for OpenAI / Anthropic / Azure OpenAI plus thin integration; the labs and hyperscalers now sell direct, the integration is templated, and the channel margin has compressed past viability.
Gut check for any specific firm: are the foundation labs visibly competing with it from above (first-party products), the hyperscalers from below (managed platforms), and open-source templates from the side (free scaffolding)? If yes on all three, the firm is in the squeezed middle. For supply- and demand-side mapping, see the 12-person studio operating model and how to choose an AI development agency.
What an AI agency founder should do in the next 90 days
Three forcing decisions. Pick one of the surviving five archetypes and commit — vertical, embedded partner, product-funded, compliance, or frontier-difficulty. The middle is the only position that is definitively gone. Productize the recurring 60% — anything built more than three times (eval harnesses, RAG scaffolding, agent boilerplate, deployment templates) becomes an internal product or open-source artifact, not a billable workstream. Choose smaller-and-premium or larger-and-GTM. Avoid the middle. Either go to 5 senior engineers at $1M+ ARR per head doing frontier work, or scale to 80+ with a real GTM motion, productized IP, and a vertical wedge. The 10–30 person all-purpose agency has the worst structural odds in the consolidation.
This is unfashionable advice in May 2026 because the headline is still “AI services market booming.” Both can be true. Total enterprise AI spend is up. Agency-tier capture of that spend is being structurally reallocated to foundation labs, hyperscaler platforms, and a small set of differentiated specialists. The AI agency category is not going away. The default 2024 version of it is.
Frequently asked questions
Will most AI agencies really not survive the next 18 months?
Yes, directionally. More than half of the AI development agencies operating in May 2026 will be closed, sold at distress prices, or pivoted out of agency work by Q4 2027. The five structural pressures (LLM commoditization, model unification, agent-template economics, foundation-model gateway shift, talent reset) compound on the same cost structure simultaneously. Generalist mid-tier agencies are the most exposed; vertical specialists, embedded partners, and product-company hybrids will grow.
What is LLM commoditization and why does it matter for agencies?
Price-per-token for frontier-quality model output has fallen roughly 100x in 24 months — from $30/$60 per million tokens for GPT-4 in March 2023 to $0.15–$0.50 for frontier-class small models in late 2024 and sub-$0.10/M-token by mid-2025. Model usage is no longer a line item the customer needs help optimizing, and the architectures (caching, routing, compression) that agencies charged for are now SDK flags.
Is this just like the web or mobile agency consolidation?
Structurally yes, but roughly 3x faster. Web agency consolidation took 6–7 years (2008–2014). Mobile agencies took roughly 5 years (2014–2019). The AI version is unfolding on a 12–24 month clock. The pattern is identical: bottom commodity work automates first, top survives on relationships and bespoke depth, middle gets crushed. HBR’s “Consulting on the Cusp of Disruption” (Christensen, Wang, van Bever, October 2013) applies directly.
Which AI agencies are safe?
Five archetypes are well-positioned: vertical specialists with proprietary data or distribution; embedded operating partners (fractional CTO/Head-of-AI); original-product agencies using services revenue to fund a wedge product; compliance and regulated-industry boutiques (HIPAA, FedRAMP, EU AI Act, NIST AI RMF); and small top-tier teams (3–7 senior engineers, $1M+ ARR per head) doing frontier-difficulty work.
Which AI agencies are most exposed?
Three groups: generalist mid-tier agencies (10–30 people, all-vertical, all-stack); pure prompt-engineering shops; and reseller-plus-glue agencies whose value was integrating OpenAI / Anthropic / Azure OpenAI into customer stacks. The first group is the modal AI agency today and the most exposed quadrant.
Won’t enterprise demand for AI keep the market growing?
Enterprise AI demand is growing — McKinsey’s State of AI 2024 reports 65% of organizations using generative AI regularly, roughly double 2023. But procurement gravity has shifted toward direct foundation-model relationships and hyperscaler managed services (Bedrock, Azure AI Foundry, Vertex AI). Total AI spend is up; agency-tier capture of that spend is down. The two facts are connected, not contradictory.
What should an AI agency founder do in the next 90 days?
Audit which of the five surviving archetypes your firm is closest to and commit to one. Identify the recurring 60% of your work that isn’t differentiated IP and start productizing it. Decide whether you are getting smaller (3–7 senior engineers, premium billing) or larger (80+ with real GTM and a vertical wedge). Avoid the middle.
Are foundation-model labs really competing with their channel partners?
Observably, yes. OpenAI ships ChatGPT Enterprise, Projects, GPTs, Assistants, and Agents Builder. Anthropic ships Claude Projects, Skills, Files, and Computer Use. Google ships Gemini Enterprise, NotebookLM, and Code Assist. Microsoft ships Copilot Studio and Foundry. Each occupies space the agency tier was selling into 6–12 months earlier. Agencies that reposition above the lab’s product line will survive; agencies that don’t will be selling against their own supplier.
Will services-firm consolidation look like roll-ups, or attrition?
Mostly attrition, with a small layer of acqui-hires. Historical services-consolidation cycles (Bain Global Technology Report, McKinsey services studies) show formal M&A accounts for roughly 15–25% of firm exits during compression phases. The rest is quiet attrition: founders return to operating roles, contractors disperse, customers in-source the work, employees take direct roles at labs and product companies. Expect the AI cycle to follow this shape, accelerated by foundation-lab acqui-hires.
What is the single best leading indicator that compression has started for a specific agency?
Net senior-engineer flow. If the firm is losing senior engineers to labs and product companies faster than it is replacing them, and replacement candidates are demanding 1.5–3x the comp the firm offered 18 months ago, the compression has already started for that firm regardless of pipeline strength. Pipeline is a lagging indicator. Senior-engineer flow is a leading one.
Arthur Wandzel