The Build-vs-Buy Framework, Revisited for the Agent Era

The classic build-vs-buy framework; designed for stable software with a clear vendor boundary; produces wrong answers for AI agents. The agent stack composes a model layer, a framework layer, an orchestration layer, a tool registry, an eval suite, and an on-call surface. Each layer has its own sourcing logic, and the framework that worked for SaaS-era decisions collapses them into one verb that fits none. This piece names the four new axes the agent era has added: orchestration depth, tool registry control, eval ownership, and on-call ownership. Together they explain why most “we bought an agent platform” decisions look defensible at signing and indefensible six months in. This is the build-vs-buy framework as it applies to the systems deployed in 2026.

This is a spoke under the AI build-vs-buy-vs-hire decision matrix for 2026. The matrix names eight principles governing AI sourcing across the full capability stack; this piece operationalizes them for the specific case of multi-step AI agents, where the classic framework fails most loudly.

Decision Scope

This article is an editorial decision framework, not legal, financial, security, or accounting advice. Treat numeric examples as illustrative planning heuristics unless a source is cited, then validate the assumptions against your own contracts, data, controls, and budget model before acting.

Why the classic framework breaks for agents
Axis 1: Orchestration depth
Axis 2: Tool registry control
Axis 3: Eval ownership
Axis 4: On-call ownership
The default sourcing shape for production agents
When the classic framework still applies
Frequently asked questions
Key takeaways

Why the classic framework breaks for agents

The classic build-vs-buy framework was written for software with a stable feature surface and a clear vendor boundary. You decided whether to build a CRM or buy Salesforce. Features were known on day one and largely the same on day 730. The vendor boundary was a contract, an API, an SLA. The verb resolved a procurement decision that held for years.

AI agents have neither property. The feature surface drifts most release because the underlying model drifts; a prompt that produced a clean tool call in March produces a hallucinated argument in June after a model update the vendor rarely warned you about. The vendor boundary is fuzzy because most agent platform is a stack; model, framework, orchestration, tools, eval; and the platform vendor often controls only one or two of those layers while pretending to control many of them.

Applying a binary verb to that stack produces decisions that look correct on the contract and incorrect in the production incident. The org “bought” an agent platform but is still on-call for behavioral regressions. The org “built” an agent but the framework underneath churns most release. Neither verb describes what the org owns.

The fix is not to abandon build-vs-buy. It is to apply the verb to layers, not stacks, and to ask four axes the classic framework rarely had to: orchestration depth, tool registry control, eval ownership, on-call ownership.

Axis 1: Orchestration depth

Orchestration depth is how much of the agent’s control flow is determined by the org’s logic versus by the framework vendor’s defaults.

A shallow orchestration agent uses the vendor’s default planner, retry policy, tool-routing, and error handling. The agent’s behavior is whatever the framework decided. When the framework changes defaults; and frameworks do, regularly; agent behavior changes without the org touching code.

A deep orchestration agent has org-specific planners, custom retry semantics, tool-routing tied to internal routing tables, and explicit error handling for most failure mode the org cares about. The framework provides primitives; the org composes orchestration logic on top.

Three rules of thumb:

Shallow (under 200 lines of org-specific control flow): buying the platform is feasible. Vendor defaults will not bite hard enough to matter.
Moderate (200 to 1,500 lines): buy the rails, build the orchestration on top. Modal case.
Deep (1,500+ lines, custom planners, bespoke routing): orchestration is unambiguously a build, owned in a repo the org controls.

The error pattern: orgs buy “agent platforms” expecting deep orchestration included, then discover the platform’s depth tops out at the demo. Six months in, the team is fighting the platform to do what a 2,000-line orchestration layer would have done from day one.

Axis 2: Tool registry control

The tool registry is the set of functions the agent can call: internal APIs (billing, customer-data), MCP servers, third-party connectors (Slack, GitHub), and proprietary internal tools.

Control of the registry means who decides which tools the agent can see, who versions them, who handles auth scopes, who retires deprecated tools, who writes tool-specific eval cases. This is operational control, not procurement control.

Buying tool registry means accepting the vendor’s catalog as authoritative. You can extend through plugin slots, but the slot is the vendor’s API; most proprietary tool lives downstream of the vendor’s plugin model and breaks when that model changes.

Building the tool registry means the org owns the catalog. You add proprietary tools without waiting on a vendor roadmap, version them on your cadence, audit auth scopes against your security policy. The trade-off is maintenance; for an enterprise with 50+ internal services, a real ongoing cost.

Default verdict for production agents: build the registry, buy specific OAuth-heavy commodity connectors (Slack, GitHub, Linear), rarely let the vendor’s catalog define what the agent can do. The proprietary tools are the agent’s value, and the things competitors cannot copy by buying the same vendor.

Axis 3: Eval ownership

Eval ownership is who maintains the test suite that decides whether the agent is shipping correctly. The eval suite is the contract between intended and observed behavior; whoever owns it owns the agent’s quality bar.

Vendor eval suites test the framework’s capabilities; does the planner work, does the retry policy do what docs say, does tool-routing pick correctly under reference conditions. These test the platform, not the agent.

Org eval suites test the agent’s actual job; does the support agent resolve the 200 worst tickets, does the research agent retrieve relevant documents at depth 3, does the ops agent execute its 30 procedures. These are workload-specific and almost rarely expressible in the vendor’s eval framework, because that framework was built for the vendor’s universe of customers.

Buying eval; accepting the vendor’s “your agent is working correctly” signal; sources the wrong test against the wrong question. We named this in stop paying AI agencies for documentation, pay them for evals: the eval suite is the deliverable that survives the engagement. It cannot be outsourced wholesale.

Right verbs: build (when the org has eval-fluent engineers) or hire (when it doesn’t, with explicit transfer-of-ownership). Rarely buy. Promptfoo and Inspect are scaffolding, not products that own the eval.

Axis 4: On-call ownership

The fourth axis, and the most often missed in agent procurement: who carries the pager when the agent breaks at 3 AM.

In the SaaS era, on-call meant infrastructure on-call. Vendor pager covered uptime, latency, platform errors; org pager covered application-layer issues. Clean boundary.

In the agent era, agents fail in modes the vendor has rarely seen. Model drift after a provider’s quiet update. A tool deprecation the agent’s prompt hard-coded. A prompt regression after a framework version bump. A retrieval pipeline returning irrelevant chunks because the embedding model changed. None are infrastructure failures. Many present as the agent doing the wrong thing in production.

A vendor on-call team can fix their platform. They cannot fix your agent’s behavior, because they do not understand your workload, your tools, or your eval suite. Buying an agent platform buys a SaaS contract in which the org is still on-call for behavioral failures the vendor cannot debug. Most orgs do not staff for it and discover the gap at the first production incident; vendor support replies “cannot reproduce; provide prompt and trace” while the agent produces wrong outputs at production volume.

Right answer: on-call ownership lives with the org for any production agent that matters. Vendor covers their layer; org covers the rest. Train at least two engineers on the full failure surface. Document the runbook.

The default sourcing shape for production agents

Combine the four axes with the model and framework layers from the classic stack and you get a default sourcing shape:

Layer	Default verb	Why
Foundation model	Buy	Commoditized; per-call selection at runtime
Agent framework	Buy	Commodity rails; do not reinvent
Orchestration logic	Build	Where actual agent behavior lives
Tool registry	Build (bought commodity connectors)	Proprietary tools are the moat
Eval suite	Build or hire	Workload-specific; rarely the vendor’s
On-call	Own	Vendor cannot debug your agent’s behavior

This is the inverse of the classic SaaS default of “buy the platform and own the configuration.” For agents, the default is “buy the rails and own the running system.” Orgs that start from this shape and argue exceptions ship faster than orgs that start from a blank framework.

When the classic framework still applies

The four agent-era axes apply meaningfully only to multi-step agents; systems with control flow, tool calling, multi-turn behavior, and surface that drifts with the model. They do not apply to single-step LLM features (“summarize this ticket,” “extract entities from this PDF”). For those, the classic framework still works with minor amendments around model drift.

The transition is when the system starts making sequential decisions or calling tools. As soon as control flow enters, the four axes start to matter. The frame build-buy-or-fine-tune for foundation model choices handles the model layer in isolation; this piece picks up where that one stops.

Frequently asked questions

Why does the classic build-vs-buy framework break down for AI agents?

It was designed for software with a stable feature surface and a clear vendor boundary. AI agents have neither. The feature surface drifts most release because the model drifts; the vendor boundary is fuzzy because most agent platform composes model, framework, and tool layers that can be sourced independently. The binary verb produces decisions that look defensible on the contract and indefensible six months in.

What is orchestration depth and why does it matter?

How much of the agent’s control flow is determined by the org’s logic versus the framework’s defaults. Shallow orchestration tolerates buy; deep orchestration resists buy because most drift in vendor defaults breaks the agent’s behavior in production.

Why is tool registry control a distinct sourcing axis?

The tool registry is the set of functions the agent can call. Buying means accepting the vendor’s catalog as authoritative; building means the org owns the catalog and can add proprietary tools without waiting on a vendor roadmap. For most production agents, the proprietary tools are the actual value.

What does eval ownership mean?

Who maintains the test suite that decides whether the agent is shipping correctly. Vendor eval tests the framework; org eval tests the agent’s job. Building is default; hiring (through an eval-fluent agency) is the answer when the org lacks eval engineers but cannot defer the suite. Rarely buy.

What is on-call ownership?

Who carries the pager when the agent breaks at 3 AM. Agents fail in modes the vendor has rarely seen; model drift, tool deprecation, prompt regression after a framework upgrade. A vendor on-call team can fix their platform; they cannot fix your agent’s behavior.

Should the orchestration framework be bought or built?

Bought. LangGraph, AutoGen, OpenAI Agents SDK, and Vercel AI SDK are commodity rails. The buy is at the framework layer; the build is the orchestration logic running on those rails. Confusing the two is the most common sourcing error in the agent era.

How does the agent era change the answer for the model layer?

It does not. Foundation models were a buy in 2023 and remain a buy in 2026. What changed is the model is selected per agent step rather than per organization, moving the buy decision from architecture time to runtime.

Does this framework apply to single-step LLM features?

Only to multi-step agents. Single-step LLM features have low orchestration depth, no tool registry, simpler eval, shallow on-call surface. The classic framework still works with minor amendments around model drift.

What is the right sourcing shape for a typical production agent?

Buy the model and framework rails, build the orchestration logic, build or hire the tool registry, build or hire the eval suite, own the on-call rotation. The inversion of the classic SaaS default.

When does an off-the-shelf agent platform make sense?

Shallow orchestration, small standard tool registry, generic eval requirements, vendor can credibly own on-call. That describes generic productivity agents; calendar assistants, simple research bots; not the agents most enterprises are deploying in 2026.

Key takeaways

The classic build-vs-buy framework was designed for stable software with clear vendor boundaries; agents have neither, and the binary verb produces wrong sourcing decisions.
Four new axes resolve agent sourcing: orchestration depth, tool registry control, eval ownership, on-call ownership.
Default verbs: buy model and framework rails, build orchestration logic and tool registry, build or hire the eval suite, own on-call.
Most common error: buying an agent platform expecting deep orchestration included, then discovering platform depth tops out at the demo.
Single-step LLM features still serve under the classic framework; the four axes matter when control flow, tool calling, and multi-turn behavior enter the picture.

The build-vs-buy verb is still useful in 2026; it just has to be applied per layer rather than per stack. Orgs that update their sourcing playbook around the four new axes ship agents that hold up under production pressure. Orgs that apply the classic framework unchanged keep signing platform contracts that cover the wrong layers.

The Build-vs-Buy Framework, Revisited for the Agent Era

Why the classic framework breaks for agents

Axis 1: Orchestration depth

Axis 2: Tool registry control

Axis 3: Eval ownership

Axis 4: On-call ownership

The default sourcing shape for production agents

When the classic framework still applies

Frequently asked questions

Why does the classic build-vs-buy framework break down for AI agents?

What is orchestration depth and why does it matter?

Why is tool registry control a distinct sourcing axis?

What does eval ownership mean?

What is on-call ownership?

Should the orchestration framework be bought or built?

How does the agent era change the answer for the model layer?

Does this framework apply to single-step LLM features?

What is the right sourcing shape for a typical production agent?

When does an off-the-shelf agent platform make sense?

Key takeaways

See how companies like yours are using AI

Related articles

The 10x Developer Used to Be a Unicorn — Now We're Approaching the 1000x Paradigm

A field guide to evaluating an AI agency in under 90 minutes

Agentic AI Development: Tool Use and Function Calling

Where ideas become AI products

Company

General

Case Studies

Services

Resources