Most AI build-vs-buy frameworks were written for a 2022 cost surface that no longer exists. Inference is 80 percent cheaper, model upgrades arrive on a quarterly cadence, eval engineering is a third of project cost, and the SaaS layer above the model has consolidated into vertical agents that are credible alternatives to a custom build. The 2018 decision tree (build for differentiation, buy for commodity) routes most AI questions to the wrong leaf. This piece replaces that tree with one calibrated to 2026 economics; six branching questions, each tied to a measurable signal, each producing a defensible build, buy, or hybrid answer.
The argument is anchored in the AI project economics manifesto: if evaluation is the unit of account, then make-or-buy is a question about who pays the eval bill, not who writes the code.
Why the 2018 decision tree fails in 2026
The classic build-vs-buy heuristic has three branches. Build if it is core differentiation. Buy if it is commodity. Hybrid if you can buy the platform and build the configuration. That tree assumed a stable cost surface; the cost of buying did not move much between the decision and the launch, the cost of building was a one-time engineering bill, and the maintenance line was a known function of headcount.
None of those assumptions hold for AI work. The cost of buying drops most quarter as vendor inference passes through to vendor pricing. The cost of building has a permanent eval-engineering tax that the 2018 framework does not name. The maintenance line is dominated by re-evaluation against new models, which is structurally invisible to an annual operating budget.
The result is a tree that produces confident answers to the wrong question. We see this fail in two characteristic ways. First, organizations build custom systems that ship on top of a SaaS-equivalent that becomes 60 percent cheaper twelve months later, locking in negative ROI. Second, organizations buy SaaS for problems with sufficient workload-specificity that the buy-side eval bar is unreachable, locking in chronic accuracy debt that the vendor cannot prioritize.
A 2026 tree fixes both failure modes by replacing the “differentiation” axis with a measurable signal: the workload-specificity of the eval set. We discuss the broader cost-side picture in the AI project budget anti-patterns piece, where eight of the ten anti-patterns trace back to a misfired make-or-buy decision.
What changed under the hood
Four cost-surface shifts make the 2018 tree obsolete.
Inference prices keep falling. Frontier model token prices fell roughly 80 percent between 2023 and 2026, with no sign of a floor. Any “build” decision justified by inference savings relative to a managed alternative needs to model the next 12 months of price drops, not the spot price.
Model upgrades happen quarterly. Frontier labs ship non-trivial model upgrades three to five times a year. Most upgrade is a re-eval event that costs the build-side two to four engineering weeks. The buy-side absorbs that cost into the vendor subscription.
Eval engineering is 30 to 40 percent of build cost. The eval line; test sets, harness, regression triage, threshold-locking; runs a third of total project cost. SaaS alternatives ship with vendor-curated evals. Make-or-buy is partly “who writes and maintains the eval suite.”
Vertical SaaS has matured into a real alternative. Where 2022 procurement faced a binary choice between custom build and a bare API, 2026 procurement faces a third option: vertical AI products with packaged evals, observability, and domain-tuned prompts.
The 2026 decision tree, six branches
The tree below is the operating version we use with buyers. Walk it top to bottom, hit the first leaf that fires, and stop.
Branch 1; Is the workload public-internet-typical? If yes, buy. The frontier models are trained on the open web and consumer SaaS layers above them inherit that fluency. Building a custom system to summarize public news articles or draft generic marketing copy is the most reliable way to lose money in 2026. The vendor eval suite covers the workload because the workload is what the vendors evaluated against.
Branch 2; Does a vertical AI vendor exist with eval transparency that meets your bar? If yes and the eval bar is met, buy. The eval-transparency check matters: a vertical vendor without a published or audit-accessible eval set is functionally a black box, and the make-or-buy decision cannot be made against an unmeasured comparator. We discuss what an enforceable eval audit looks like in the AI agency reference call piece, which adapts directly to vendor diligence.
Branch 3; Does the workload depend on proprietary data that cannot leave your perimeter? If yes and a vertical vendor does not offer a same-perimeter deployment, build (or hybrid). The data-residency constraint dominates other considerations because it removes most buy options from the menu. Note that “buy” can still mean buying a managed inference layer (Bedrock, Azure OpenAI) inside the perimeter; that is a hybrid leaf.
Branch 4; Is the cost-per-action a hard ceiling that the buy alternative cannot meet? If yes, build. Some workloads (high-volume internal automation, embedded inference at scale) have economics that vertical SaaS pricing cannot match because the SaaS price has to support marketing, sales, and a margin layer the build path does not. Use the cost-per-query framework to compute the ceiling rigorously before declaring it.
Branch 5; Is the workload-specificity high enough that vendor evals will under-measure your accuracy bar? If yes, build (or hybrid). Workload-specificity is the 2026 replacement for “differentiation”; it asks whether your inputs and outputs are far enough from the public-internet-typical case that vendor accuracy claims systematically over-state actual performance. If 30 percent of your eval set produces failures the vendor’s eval would mark as passes, the buy path under-delivers and the build path is the only honest answer.
Branch 6; Default leaf. If none of the above fired, hybrid. Buy the model and the platform; build the eval suite, the orchestration, the domain-specific prompts, and the observability. This leaf catches the majority of 2026 enterprise AI workloads. We are not contrarians; most decisions terminate here for good economic reasons, which we explore in the build vs outsource piece.
Worked examples on each branch
Branch 1; buy. A mid-market e-commerce team summarizes incoming support emails. Workload is public-internet-typical. Three vertical vendors offer support-summary products at $0.01–$0.02 per email with eval transparency. Build alternative would ship at parity-minus accuracy. Decision: buy.
Branch 2; buy. A B2B SaaS company wants AI contract review. Two vendors publish eval results on clause classification with F1 above 0.92. Internal eval bar is 0.88. Decision: buy; published eval clears the bar; build would replicate published work at 6x the cost.
Branch 3; perimeter-hybrid. A regulated financial services firm has credit documents that cannot leave the AWS GovCloud boundary. No vendor offers same-perimeter deployment. Decision: hybrid; build orchestration and evals; buy managed inference inside the perimeter.
Branch 4; build. A high-volume logistics workflow processes 40 million inference calls per day. Vertical SaaS at $0.001 per call would cost $14.6M per year. Custom build on a router architecture brings cost-per-action below $0.0002, saving $11.7M annually after eval and ops. Decision: build.
Branch 5; build. A specialty insurance underwriter has claims documents with terminology that diverges from public training data. Vendors with vertical insurance products score 0.71 on the firm’s internal eval; bar is 0.85. Vendor eval (0.91 on their public set) over-states by 20 points. Decision: build.
Branch 6; hybrid. A mid-market healthcare SaaS wants AI clinical-note drafting. Workload is partially public (general medical knowledge) and partially specific (their EHR shape). Vendors meet the public-knowledge bar but fail the EHR-specific bar by 8 points. Decision: hybrid; buy model and platform, build EHR adapters and the clinical-note eval suite.
The hybrid leaf is the most common answer
In our engagements across 2024–2026, the hybrid leaf catches roughly 60 percent of make-or-buy decisions. That number is not a hedge; it is structural. The model layer has commoditized, the orchestration layer has not, and the eval suite is the most workload-specific component of an AI system.
The economic logic is straightforward. The model is cheaper to rent than build. The orchestration platform (retrieval, tool use, agent loops) is sometimes cheaper to rent and sometimes cheaper to build, depending on workload-specificity. The eval suite is almost usually cheaper to build because nobody else has your test set. The observability layer is almost usually cheaper to rent because the same metrics matter across organizations. The hybrid leaf reflects exactly this decomposition.
The trap to avoid: treating “hybrid” as “indecision wrapped in jargon.” A hybrid decision needs explicit ownership boundaries; which component each party owns, which contract governs each line, which eval threshold each component must clear. We discuss the contract mechanics in the AI project pricing models piece; the eval-threshold pricing model is the structural fit for the hybrid leaf.
How to operationalize the decision
Three practices turn the tree from a slide into a procurement gate.
Run the tree against a written eval set, not a written feature list. The tree is calibrated to eval coverage and eval bars. If the make-or-buy meeting starts before the eval set exists, the tree cannot fire correctly because there is nothing to score the buy alternative against. Build the eval set first, then run the tree. Yes, the eval set takes two weeks. The bad decisions cost two quarters.
Re-run the tree most two quarters. A buy decision in 2026 is not a permanent buy decision. Inference prices fall, vertical vendors ship, and the tree’s leaves move. Set a re-evaluation cadence in the contract. We discuss the structural reason in the AI project compounding return piece; year-two economics dominate year-one for any decision that compounds.
Pre-commit the kill rule on build branches. A build decision needs the 30-day kill rule attached as a governance gate. Any build that does not clear day 30 against its eval trajectory should revert to the hybrid or buy leaf without renegotiation. The tree is only as good as the buyer’s willingness to act on its outputs.
Frequently asked questions
What is the single biggest difference between a 2018 and a 2026 make-or-buy tree?
The axis labeled “differentiation” in 2018 is replaced by “workload-specificity measured against a written eval set” in 2026. Differentiation was a narrative; workload-specificity is a number. The narrative supported either answer; the number forces one.
How do falling inference prices change the tree?
They strengthen the buy and hybrid leaves and weaken the build leaf, especially for workloads where a vendor exists today and the build-path savings rely on inference pricing that the vendor will eventually pass through. A build justified at 2026 spot prices but not at projected 2027 prices is a build that should not start.
How do model upgrades affect the decision?
Build paths absorb the full re-evaluation cost of most model upgrade; two to four engineering weeks per upgrade, three to five upgrades per year. Buy paths push that cost to the vendor. The build economics need to carry an explicit re-eval line; if they do not, the build is mispriced from day one.
How does the eval set change the make-or-buy answer?
It is the make-or-buy answer. Buy is right when vendor eval coverage meets your bar on your eval set. Build is right when it does not. Hybrid is right when vendor coverage meets the bar on part of the eval set and not the rest.
What about the differentiation argument; is it dead?
Differentiation is a real concept; it is just not the right axis for an AI make-or-buy decision in 2026. The right axis is workload-specificity, and most “differentiation” arguments dissolve when you measure them. If your workload is high specificity, build is differentiation. If it is not, “build for differentiation” is a vanity project.
How does data residency interact with the tree?
It overrides everything below it. If the workload’s data cannot leave the perimeter and no vendor offers a same-perimeter deployment, the tree skips to the build or perimeter-hybrid leaf regardless of what the cost-per-action says. Compliance is not a cost line; it is a precondition.
Does the hybrid leaf usually require a vertical SaaS?
No. Hybrid can mean buy-the-model-build-the-orchestration, buy-the-platform-build-the-evals, or buy-the-managed-inference-build-the-app. The decomposition matters more than the label.
What if our internal team strongly prefers build?
Run the tree on the eval numbers and the cost numbers, not the team preference. Internal preference for build is a real organizational signal but it is not a make-or-buy criterion. If the tree says hybrid and the team says build, the conversation is about team development, not about the right answer for the workload. We address the related discipline in the AI project sunk-cost piece.
How does this connect to in-house vs agency hiring?
The tree is upstream of that question. Once it produces “build” or the build half of “hybrid,” the in-house vs agency vs hybrid TCO comparison decides who builds. The trees compose.
Key takeaways
- The 2018 build-vs-buy tree fails in 2026 because the cost surface moves quarterly and “differentiation” is the wrong axis. Replace it with workload-specificity measured against a written eval set.
- Six branches: public-internet-typical (buy), vertical vendor with eval transparency (buy), data residency hard constraint (build/perimeter-hybrid), cost-per-action ceiling (build), workload-specificity beyond vendor eval coverage (build), default (hybrid).
- The hybrid leaf catches roughly 60 percent of 2026 decisions because the model layer has commoditized but the eval and orchestration layers have not.
- Run the tree against a written eval set, re-run it most two quarters, and pre-commit the 30-day kill rule on build branches.
- Make-or-buy is a question about who pays the eval bill, not who writes the code. The economics in the manifesto make this concrete.
Arthur Wandzel