For ten years it was clean: an agency sold time and expertise to build software for clients, and a product studio built its own software and sold licenses, seats, or usage. Different income statements, different incentives, different exits. In May 2026 that distinction has stopped describing reality. The two operating models are converging on the same shape; small senior team, agent-leveraged delivery, eval-first engineering, recurring revenue from a handful of accounts that looks indistinguishable from ARR; and the firms that survive the next two years are the ones running a hybrid that explicitly funds product R&D out of services revenue.
This is a structural argument, not a stylistic one. The cost of building software has fallen far enough that the make-vs-buy line between an agency and a studio has lost its economic meaning. What is left is one operating model with two go-to-market faces. The buyer’s job is no longer to pick agency or studio. It is to read which hybrid the firm in front of them runs.
Decision Scope
This article is an editorial decision framework, not legal, financial, security, or accounting advice. Treat numeric examples as illustrative planning heuristics unless a source is cited, then validate the assumptions against your own contracts, data, controls, and budget model before acting.
Two definitions, then the collision
Before arguing they have merged, the two models need clean definitions, because most of the disagreement in this debate is people using the words to mean different things.
An AI agency sells time and expertise to build software for clients. The customer owns the code and IP. Revenue is project- or retainer-based. Margin is the spread between blended billable rate and fully loaded cost. The unit is the engagement; the asset is the team.
An AI product studio builds its own software and sells licenses, seats, usage, or outcomes. The studio owns the code and IP. Revenue is recurring. Margin is the spread between subscription price and incremental cost-to-serve. The unit is the product; the asset is the codebase. Studios funding early development out of services revenue, in the Basecamp / 37signals shape, are still studios; what defines them is that the eventual primary revenue is product, not services.
These were genuinely different businesses. Different cap tables, different pitches, different hires, different boards. The collision is not that the labels have lost meaning. It is that on the day-to-day operating substrate; how engineers spend their hours, what gets shipped, which evals run, what customers pay for; the two shapes now look almost identical. Five forces are doing the work.
Agent leverage erases the dev-cost differential
The first force is the simplest. The cost to build a feature has collapsed.
A senior engineer in 2026, paired with a competent coding agent (Claude Code, Cursor, Devin, or an internal stack), ships 2–4x the integration-grade code per day they shipped in 2023. Junior throughput compresses, because review-and-correction overhead dominates net contribution. The workable senior-to-junior ratio has moved from 1:4 to 1:1–2 across the firms I work with. The headline is not the multiplier; it is what the multiplier does to the cost of building.
In 2023, an agency engagement for a non-trivial AI workflow ran $250K–$400K and 8–14 weeks. A studio building the same surface area as a v1 product spent comparable money. The make-vs-buy economics were a toss-up. In 2026 the same surface area takes 2–4 weeks of one senior plus an agent, and the engagement number drops toward $60K–$120K. The studio’s v1 build cost drops by the same factor. What used to be a six-figure decision; “hire an agency, or fund my own studio?”; has compressed into a question about who keeps the IP, not about cost. Once you remove the cost differential, agency and studio start running the same playbook, staffed by the same people doing the same work. For what that does to throughput economics, see the 12-person studio operating model.
Eval discipline is the same discipline
The second force is that the engineering discipline separating good AI work from bad AI work; evaluations, regression suites, behavior contracts, red-team harnesses, golden datasets, judge models, traffic shadowing; does not vary by business model. It is the same set of practices whether the consumer is one client or ten thousand SaaS users.
That is new. In a 2018 web-agency vs SaaS-startup comparison, the studio had the harder engineering job by a wide margin: load testing, multi-tenant isolation, billing, observability for thousands of concurrent users. The agency could deliver something that worked on a demo Zoom call and call it shipped.
In 2026 AI work, the agency cannot. A non-deterministic, tool-using, sometimes-hallucinating system fails the same way for a 1-tenant deployment as it does for a 1000-tenant one. The mid-market AI agency without an eval harness is one judge-flagged regression away from a churn event. The studio without an eval harness is one quiet capability-drift away from a customer outage. The eval stack; datasets, judges, regression gates, behavior baselines; is identical in both shops, often built on the same off-the-shelf primitives (LangSmith, Braintrust, Phoenix, Inspect, Helicone), and authored by the same kind of senior engineer.
When the engineering discipline is identical, the cultural adaptations downstream of it converge as well: weekly demo cadence, eval review meetings, regression-gated deploys, behavior changelogs. An agency that runs this loop and a studio that runs this loop look the same from the inside. The only thing that differs is the name on the customer contract. For the day-to-day cadence shape, see SFAI Labs’ weekly demos, evals, and roadmap reviews.
One client’s recurring revenue looks like ARR
The third force is on the income statement. A modern AI agency engagement is rarely a one-shot project anymore. The shape that closes today is a 3-month build plus a 6–24 month operate-and-improve retainer at $20K–$80K/month, and the retainer dwarfs the build. Across SFAI Labs’ last 40 engagements, the post-launch retainer accounted for 62% of lifetime revenue per account; in 2023 the same number was 31%.
That recurring retainer is structurally indistinguishable from ARR. It is contracted, renews, has churn risk, has an expansion vector, has a net revenue retention number. A finance person looking at the agency’s quarterly revenue cannot tell from the cash-flow shape whether the firm calls itself an agency or a studio. Recent transaction multiples (Bain’s PE in tech-services 2024 commentary on professional-services M&A) reflect this; buyers price the retainer book against the same SaaS yardstick they use on a studio’s ARR, with a discount for concentration risk and a premium for gross margin. If two models share a substantively similar revenue shape, they will be valued, sold, hired, and operated similarly. Convergence is the rational response.
Products without services cannot learn customer shape
The fourth force runs the other direction. Pure-play AI product studios; well-funded, top-tier engineering, confident product taste; have been quietly hitting a wall on customer-shape discovery, because AI products in 2026 are deeply context-dependent and the studio does not have direct exposure to the messy reality of any one customer’s data, evals, prompts, or workflows.
Three patterns make this visible. First, a meaningful share of well-funded AI infra startups have added “design partner programs” indistinguishable from agency engagements: bespoke onboarding, dedicated forward-deployed engineers, custom SLAs, customer-specific eval suites. Second, the median AI product studio’s GTM motion now requires a forward-deployed engineering team that sits at customer sites for weeks at a time; an unmistakable agency motion in studio clothing. Third, Hamilton Helmer’s 7 Powers framing on counter-positioning predicts pure-product firms will be selectively beaten by hybrids that co-build with customers, because in AI workflows the customer’s process knowledge is part of the product.
A studio can theoretically learn customer shape through telemetry and interviews. In practice, the firms shipping the highest-quality AI products in 2026 are the ones with first-hand exposure to the messy customer environments those products run in; and that exposure is what an agency engagement structurally provides. Product without services is a learning-rate disadvantage. For why this drives customer-success patterns, see build AI in-house vs outsource.
What survives: the services-funded R&D shop
The four forces, taken together, point at a single stable shape. The firm that survives 2026–2028 is the one that explicitly runs a services-funded R&D shop: senior, agent-leveraged, eval-disciplined, with a few large multi-year retainers that fund a small number of internal product wedges, and a clear written rule about which IP is client and which IP is house.
The shape has six visible attributes:
- Small senior team. Five to fifteen senior engineers, no juniors, no offshore filler. Senior-to-junior ratio at 1:0 or 1:1.
- Agent-leveraged delivery. Coding agents are first-class in the stack, not an aside. Delivery velocity is 2–4x the 2023 baseline; pricing has not moved 2–4x lower, which is where margin comes from.
- Eval discipline as table stakes. Most engagement ships an eval suite, a regression gate, and a behavior baseline. The same harness pattern runs against the firm’s internal product wedges.
- Three to seven anchor retainers. Most revenue comes from a handful of large multi-year operate-and-improve retainers that look like enterprise SaaS contracts. New logo acquisition is slow and curated.
- One or two internal product wedges, funded by retainers. Wedges come from patterns observed across the retainers; anything built more than three times becomes a productized internal artifact. 10–25% of senior-engineer time is allocated.
- A written IP boundary. Engagements explicitly carve out which artifacts are client-owned and which are house-owned. The carve-out is in most contract, not negotiated case-by-case.
Each attribute is independently observable in well-run AI agencies (Vercel’s professional services, Galileo, Modal’s customer-engineering team, OpenAI’s forward-deployed engineering, the boutique tier including SFAI Labs and a dozen peers) and in well-run AI product studios that have added a services arm. The firms not running this shape are the legacy generalist mid-tier agencies and the pure-play infra studios without a services arm; both quadrants are being squeezed. For the broader compression argument, see why most AI agencies will not survive the next 18 months and the AI agency manifesto.
How a buyer should pick now
If the two models have converged onto one shape, the buyer’s selection question changes. The old question was “agency or studio?” The new questions are sharper.
- Who owns the eval suite at the end? If the firm cannot show a current eval suite with regression gates and behavior baseline on three of their last five engagements, they are still on the 2023 substrate. Walk away.
- What is their senior-to-junior ratio? Below 1:1 is the bar. 1:0 is the modal shape now in firms shipping at the rate buyers expect. Above 1:2 is staffing for the prior decade.
- What share of revenue comes from their largest three accounts? Below 70% is healthy. 70–90% is the modal services-funded R&D shop. Above 90% is single-account dependency; a warning sign for both sides.
- Is there an internal product wedge, funded by the retainer book? Yes means converged shape. No means a pure agency without a learning loop into productized IP, or a pure studio without customer-shape exposure. Both quadrants are weaker than the hybrid for the AI work being bought in 2026.
- What is the written IP boundary? A clean carve-out in most contract is the cheapest signal the firm has thought about the collision. A vague answer means they have not. For deeper diligence, see a field guide to evaluating an AI agency in under 90 minutes.
The five questions also work in reverse. A founder running an AI services firm, or a studio considering a services arm, can use them to read where their own firm sits on the convergence axis.
What the collision means for founders
For a firm built before the collision, the next 12 months are about choosing the converged shape deliberately rather than drifting into it. Three forcing decisions.
Decide which IP is house IP. Anything built more than three times across clients is house IP that should be productized. Anything built once is client IP. Firms that get this wrong leave revenue on both sides; they fail to ship a product because most artifact is “spoken for,” and they fail to retain senior engineers who do not see a product story attached to their work.
Decide your funded R&D allocation. 10–25% of senior-engineer time on internal wedges, funded by retainer margin. Below 10%, you are a pure agency calling itself a studio in pitches. Above 25%, you are a studio using services as a stopgap and will struggle on retainer SLAs.
Decide your IP boundary template, then put it in most contract. The boundary should not be re-negotiated per engagement. It should be a paragraph the firm believes is fair, can defend at scale, and the largest customer the firm wants to land would sign without hesitation.
The convergence will continue regardless. Foundation labs will keep shipping first-party products that look like the easy half of an agency engagement (Anthropic Skills, OpenAI Agents Builder, Gemini Enterprise). Hyperscalers will absorb the boring half (AWS Bedrock, Azure AI Foundry, Vertex AI). The space the converged firm occupies; senior judgment, eval discipline, customer-specific shape; keeps narrowing in width and increasing in margin. The agency vs product studio debate, as argued for the last fifteen years, is over. What replaces it is one operating model with two GTM faces. Buyers are starting to price for that. Founders should build for it.
Frequently asked questions
Is the AI agency model the same as the AI product studio model now?
Operationally, yes. On the day-to-day substrate; small senior team, agent-leveraged delivery, eval-disciplined engineering, a few large multi-year retainers, one or two internal product wedges; the two models have converged onto a single shape. Legal and IP structure differs (clients own engagement IP; studios own product IP), but engineering, hiring, and operating cadence are nearly identical. The label matters less than which of the converged-shape attributes the firm runs.
What is a “services-funded R&D shop”?
A firm running primarily on retainer revenue from a small number of large client engagements, with 10–25% of senior-engineer time explicitly allocated to internal product wedges funded out of retainer margin. It is the converged shape both well-run AI agencies and well-run AI product studios end up at in 2026, in the Basecamp / 37signals shape.
How big should the services-funded R&D allocation be?
Between 10% and 25% of senior-engineer time. Below 10%, the firm is a pure agency and will not produce productized IP. Above 25%, retainer SLAs slip and the firm cannot keep its anchor accounts. The exact number depends on retainer margin and the maturity of the internal product wedge.
Why does eval discipline collapse the agency-vs-studio distinction?
Because non-deterministic AI systems fail the same way at one tenant and at one thousand. The harness; datasets, judges, regression gates, behavior baselines; is the same engineering primitive in both shops. The cadence around the harness (weekly demos, eval reviews, behavior changelogs) is also the same. Two firms running the same engineering loop end up looking like the same firm.
How does agent leverage change the cost-of-building math?
A senior engineer plus a competent coding agent ships 2–4x the integration-grade code per day they shipped in 2023. The agency-vs-studio make-vs-buy decision; once a real toss-up at $250K–$400K; has compressed by the same factor. The cost differential is now small enough that the choice is not an economic one. It is an IP-ownership and learning-loop question.
Can a pure-play AI product studio still win without a services arm?
Some can; foundation labs themselves and a few horizontal infra studios with strong telemetry-driven feedback loops. Most cannot. AI products in 2026 need first-hand exposure to messy customer environments to mature, which is why design-partner programs and forward-deployed engineering teams have become standard at studios that previously prided themselves on being product-only.
Won’t the foundation labs absorb both agencies and studios?
Partially. Labs absorb the easy half of both; templated agent scaffolding, basic integrations, generic product surfaces; through first-party products like ChatGPT Enterprise, Claude Skills, Gemini Enterprise, and Copilot Studio. They are not absorbing the hard half: customer-specific eval suites, deep workflow integration, regulatory posture, and the senior judgment that lives in a small senior team.
How should a buyer evaluate a hybrid firm versus a pure agency or studio?
Use five questions. Who owns the eval suite at the end? What is the senior-to-junior ratio? What share of revenue comes from the largest three accounts? Is there an internal product wedge funded by the retainer book? What is the written IP boundary? Clean answers on many five mean the converged shape. Hedging on two or more means a 2023 substrate, regardless of label.
What does the convergence mean for AI agency valuations?
Agency retainers are increasingly priced against SaaS ARR yardsticks rather than services-firm yardsticks. A clean retainer book with low concentration, high gross margin, and a productized wedge can trade at SaaS multiples discounted for concentration risk. A book without a wedge, or with high single-customer concentration, trades at traditional services multiples. “Studio” is no longer a free valuation premium.
What should an AI services founder do in the next 90 days?
Three things. Decide which artifacts are house IP and which are client IP, and put the rule in most contract template. Set your funded R&D allocation as a percentage of senior-engineer time and defend the number. Pick one wedge built more than three times across clients and start the productization. The convergence is not optional; the question is whether you choose the shape deliberately or drift into the worst version of it.
Arthur Wandzel