The AI Hybrid Playbook: Which 30% to Keep In-House When an Agency Owns 70%

The 30/70 hybrid is the dominant AI engagement shape in 2026; agency owns build velocity, in-house owns the parts that can’t be safely outsourced. The structure works when the 30 percent is the right 30 percent and fails when it isn’t. The wrong 30 percent is “the parts the agency couldn’t fit in scope,” which produces a hybrid where the in-house team owns leftover plumbing while the agency owns the moat. The right 30 percent is non-negotiable: the eval test set, the prompt registry, the model-routing config, the on-call ownership for AI incidents, and the IP defining what defensibility the AI feature is supposed to produce. Those five assets are the org’s leverage; outsourcing any of them is a structural error that surfaces 12 to 18 months later as architectural lock-in. This piece names what to keep, what to outsource, and the contract clauses that enforce the split when both parties have an incentive to drift.

This is a spoke under the AI build-vs-buy-vs-hire decision matrix for 2026. The matrix’s eighth principle is that the default verb is compose; buy the rails, build the moat, hire the judgment; this piece is the operational shape of the compose verb when an agency is the buy partner.

Decision Scope

This article is an editorial decision framework, not legal, financial, security, or accounting advice. Treat numeric examples as illustrative planning heuristics unless a source is cited, then validate the assumptions against your own contracts, data, controls, and budget model before acting.

Why the hybrid shape exists in 2026
The five assets that must stay in-house
Asset 1: the eval test set
Asset 2: the prompt registry
Asset 3: the model-routing config
Asset 4: on-call ownership
Asset 5: the IP defining defensibility
What the agency owns: the 70 percent
Contract clauses that enforce the split
Frequently asked questions
Key takeaways

Why the hybrid shape exists in 2026

Three years ago the AI build-vs-buy choice was binary at the org level. Either the org built the AI capability with internal hires, or the org bought it through an agency or vendor. The binary worked because the AI surface was small enough that one party could plausibly own many of it.

In 2026 the surface is too large for either pure shape to be optimal. A pure in-house build has the org learning most part of the agent stack, the model-routing layer, the eval discipline, the integration plumbing, the agent orchestration, and the deployment pipeline simultaneously; at a moment when the org has at most one or two AI engineers and a roadmap that needs to ship. A pure outsource has the org giving up the parts of the stack that compound into defensibility; the eval set, the routing config, the prompt registry; in exchange for fast initial delivery and permanent lock-in.

The 30/70 hybrid resolves the tension by splitting the surface along a defensibility axis rather than along an effort axis. The agency does the parts where their velocity advantage is real and the IP is not load-bearing. The in-house team owns the parts where the IP is load-bearing and the velocity tradeoff is worth paying. The split is not 30/70 because 30 percent is a magic number; it is 30/70 because the load-bearing IP layer happens to be roughly that fraction of the surface in most enterprises. Some engagements run 25/75, some run 40/60; the boundary is determined by the assets, not the percentages.

The hybrid is the dominant shape because it composes. It can scale up the agency side when build velocity matters; it can scale down when the in-house team has absorbed enough capacity to take over. The pure shapes do not compose; pure in-house cannot easily reintroduce an agency without re-buying context, and pure outsource cannot easily insource without re-buying capability. The composability makes the hybrid the durable shape.

The five assets that must stay in-house

The five in-house assets are not negotiable. Outsourcing any of them produces architectural lock-in that surfaces months later as “we want to switch agencies but we can’t, because the eval set is theirs” or “we want to migrate models but we can’t, because the routing config is in their codebase.” The lock-in is the agency’s leverage on renewal. It is not produced by malice; it is the structural consequence of where the load-bearing IP lives.

The five assets are: the eval test set, the prompt registry, the model-routing config, the on-call ownership for AI incidents, and the IP defining what defensibility the AI feature is supposed to produce. Each is load-bearing for a different reason; together they are the irreducible in-house surface that the hybrid is built around.

Asset 1: the eval test set

The eval test set is the single most load-bearing asset in any AI engagement. The eval set is what tells you whether the AI feature is working, whether a model swap is safe, whether a prompt change is a regression, and whether the agency’s deliverable meets the contract. Whoever owns the eval set owns the answer to “is this good enough to ship.”

If the agency owns the eval set, three things happen. First, the agency grades its own work; the conflict of interest is structural and unavoidable. Second, switching agencies becomes a multi-month project because the new agency cannot inherit the old agency’s eval set without rebuilding it from scratch. Third, the org cannot independently evaluate model alternatives, because the eval set is the only ground truth and the agency controls it.

The in-house ownership of the eval set is non-negotiable. The senior in-house AI engineer owns the eval set; the agency may contribute evals but cannot own them. The contract specifies that many eval data, scoring rubrics, and test cases are the org’s IP and remain in the org’s repos. The agency runs evals against the in-house set; they do not own the set itself.

The detail on eval discipline is in the case for buying your AI evaluation stack but building your evaluator; the evaluator (the eval set, the rubrics, the threshold logic) is build, the evaluation stack (the test runner, the dashboard, the regression detector) is buy.

Asset 2: the prompt registry

The prompt registry is the org’s accumulated knowledge about how to talk to the foundation models it depends on. Most prompt in production is a small piece of intellectual capital; a phrasing that worked, a structure that improved consistency, an example set that calibrated the model. The registry is the compounding asset.

If the agency owns the prompt registry, the org’s AI capability is rented. When the engagement ends, the prompts walk out the door. The org may have copies but does not have the version history, the experimental rationale, the eval coverage, or the institutional memory of why each prompt is the way it is. The next agency or in-house team starts from scratch.

The in-house ownership of the prompt registry is operational, not legal. The legal IP transfer can be handled in the contract; the operational reality is that prompts written by the agency in the agency’s tools migrate poorly. The fix is to require the agency to author prompts directly in the org’s prompt registry from day one; same tooling, same versioning, same eval coverage. The agency contributes commits to the org’s registry rather than maintaining a parallel registry that gets transferred at engagement end.

Asset 3: the model-routing config

The model-routing config is the runtime brain of the AI feature. It decides which foundation model gets which request, what fallbacks fire when, and how cost-quality tradeoffs are made for each capability. The routing config is the embodiment of most routing decision the project has accumulated.

If the agency owns the routing config, the org’s runtime cost and quality are agency-controlled. Switching the routing means switching the agency. Optimizing the routing for the org’s specific cost or latency goals requires agency cycles the org is paying for. Migrating to a new model, a new provider, or a new agentic framework requires the agency’s cooperation.

The in-house ownership of the routing config is the most important runtime ownership decision in the hybrid. The config is data, not code (per the principle in the AI hire trap piece), so transferring ownership is operationally easy; the file lives in the org’s infrastructure, deploys through the org’s pipeline, and is editable by the org’s engineers. The agency may propose routing changes; the org’s senior AI engineer approves and merges them.

Asset 4: on-call ownership

On-call ownership for AI incidents is operationally non-negotiable for a single reason: the org is the entity that gets paged at 3am when the AI feature breaks in production. The agency cannot reliably take a 3am page for an org’s customers because the agency does not own the customer relationship and cannot be liable for customer-facing downtime in the same way the org is.

The hybrid contract often allows the agency to provide on-call support during business hours or on a best-effort basis after hours. That is fine. What is not fine is the org delegating primary on-call to the agency; that delegation moves customer SLA accountability to a vendor whose incentives and SLAs do not align with the org’s customer relationships.

In-house on-call also enforces operational competence. If the in-house team is on-call, they have to understand the AI feature’s failure modes, eval signals, and triage paths. That competence is what prevents the AI hire trap from concentrating in a single engineer; it also prevents the agency engagement from creating an in-house team that does not know how the AI feature works in production.

Asset 5: the IP defining defensibility

The IP defining what defensibility the AI feature is supposed to produce is the strategic asset of the hybrid. It is the answer to: “if a competitor showed up with an identical AI feature tomorrow, what would we still have that they wouldn’t?”

The defensibility IP usually consists of: the proprietary data the AI is trained on or grounded against, the domain-specific eval set that defines “good,” the user interaction patterns that produce the moat, and the integration depth into the org’s existing systems. Some of these are tangible (data, evals); some are conceptual (interaction patterns, integration depth).

If the agency owns any part of the defensibility IP, the org has rented its moat. The agency can bring the same expertise to a competitor next quarter. The contract can have many the IP-transfer language it wants; if the operational artifacts that produce the moat live in agency infrastructure or agency processes, the IP transfer is a legal fiction.

The in-house ownership of defensibility IP is the strategic test of the hybrid. The five questions to ask: does the proprietary data flow through the org’s pipeline before any agency tool sees it? Does the eval set encode the org’s specific quality bar in the org’s repos? Are the interaction patterns documented in the org’s design system? Is the integration depth maintained by the org’s platform engineers? If the answer to any of those is no, the defensibility IP is at risk.

What the agency owns: the 70 percent

The 70 percent the agency owns is everything that produces velocity advantage without producing defensibility. Build velocity, agent orchestration scaffolding, integration plumbing, deployment pipeline setup, observability tooling integration, and the boring infrastructure that an agency has built fifteen times before and an in-house team has built zero times.

The agency’s velocity advantage is real. They have done this before. They have templates, tooling, code generators, and operational playbooks that the in-house team would need 6 to 12 months to build from scratch. The hybrid captures that velocity advantage on the parts where it does not compromise defensibility.

The agency also owns the ramp on capabilities the in-house team will eventually need to absorb. Agent orchestration patterns, eval pipeline construction, model router design; these are capabilities the in-house team will eventually own, but the agency can ship working versions while the in-house team is still hiring. The hybrid uses the agency to shorten the time-to-shipped while in-house capacity grows.

The detail on what specifically tends to be in the 70 percent is in the AI buy trap piece; the inverse of what cannot be outsourced is the menu of what can be. The 70 percent is the menu of what can be.

Contract clauses that enforce the split

The 30/70 split survives only if the contract enforces it. Three clauses are non-negotiable.

Clause 1: IP and artifact location. Many evals, prompts, routing configs, and runbooks are authored in the org’s repos and infrastructure. The agency’s contributions are commits to org-owned repos, not deliveries from agency-owned repos. The clause specifies the location, the access model, and the standard that many engagement artifacts produced before going to production must live in org infrastructure.

Clause 2: knowledge transfer cadence. The agency commits to a calendared knowledge-transfer cadence; typically biweekly during the engagement and a structured offboarding document at end. The cadence has named deliverables (architecture review docs, runbook updates, on-call training sessions) that produce in-house artifacts rather than relying on the agency’s internal documentation.

Clause 3: replaceability test. The contract specifies a replaceability test that runs at month 6 of the engagement: can the in-house team take over the agency’s deliverables in 30 days if the agency disappeared? The test is run as a tabletop exercise; if the answer is no, the engagement has produced lock-in and the contract triggers a remediation plan. The clause is the structural counterweight to the agency’s natural incentive to make themselves indispensable.

The contract sophistication is the dividing line between hybrids that work and hybrids that produce 18-month lock-in. Sophisticated buyers write the three clauses on day one. Unsophisticated buyers write a generic SOW and discover at month 14 that they cannot leave.

Frequently asked questions

What if the agency objects to in-house owning the eval set?

The objection is a yellow flag. A serious AI agency understands that in-house eval ownership is the structural requirement for any engagement that is not staff augmentation, and they support it because it makes their work more defensible (they can point to in-house eval scores as proof of quality). An agency that objects is signaling that their engagement model depends on owning the eval set; that is a structural mismatch with the hybrid shape and the engagement should not proceed.

Is 30/70 a fixed split, or does it shift over time?

It shifts. The typical engagement starts at 20/80 (more agency, less in-house) in the first quarter while in-house capacity is growing, moves to 30/70 by the second or third quarter as in-house absorbs the load-bearing assets, and ends at 50/50 or higher in-house by month 12 to 18 if the engagement is succeeding. The percentages are descriptive of the trajectory, not prescriptive of any single moment.

How do we handle the on-call boundary when the agency is in another time zone?

The on-call boundary is a hard line on customer SLA. If the agency is in another time zone, they may be the secondary on-call for AI-specific issues during their business hours, but the in-house team is primary 24/7. The detail is enforced through the alerting routing; pages route to in-house first, with the agency as escalation rather than primary. This is non-negotiable; outsourcing primary on-call is the leading indicator that the engagement is staff augmentation rather than a hybrid.

What if the in-house team isn’t experienced enough to own the five assets at engagement start?

Most aren’t. The engagement is the mechanism for the in-house team to grow into the ownership. Day one, the agency may operate the assets while the in-house team observes. By month three, the in-house team should be the primary author with the agency reviewing. By month six, the in-house team owns and the agency contributes. The handoff is calendared as part of the contract, not assumed as a natural emergence.

How do we evaluate which agency can run the hybrid shape vs which is staff augmentation in disguise?

The diagnostic is the contract. Ask the agency what their default contract looks like. If it says “we own the AI stack, you own the integration with your existing systems,” the agency is staff augmentation. If it says “we contribute to your AI stack, you own the load-bearing assets, here are the knowledge-transfer milestones,” the agency is built for the hybrid shape. The detail on agency evaluation is in the field guide to evaluating an AI agency in under 90 minutes.

What happens at engagement end? How do we know we’re not stuck?

The replaceability test answers it. If the test passes; the in-house team can operate the AI stack in 30 days without agency help; the engagement has produced a competency, and the agency can leave cleanly. If the test fails, the contract triggers remediation: an extension where the agency’s primary deliverable is closing the gap that produced the test failure, not new feature work. The test is the safety mechanism that prevents engagement extension from drifting into permanent lock-in.

Is the 30/70 hybrid better than full in-house?

For most organizations in 2026, yes. Full in-house requires the org to build the agent orchestration layer, the deployment pipeline, the observability integration, and the routing infrastructure simultaneously while shipping product. Most orgs do not have the engineering capacity for that. The hybrid trades a controlled amount of agency dependency for a velocity multiplier that gets the AI feature shipped while in-house capacity grows. The detail is in build AI in-house vs outsource.

How does this interact with the AI hire trap?

Directly. The five in-house assets in this piece are the same assets that, when concentrated in a single engineer, produce the AI hire trap. The hybrid mitigates the trap by giving the in-house engineer a senior agency engineer to pair with, structured artifact production, and a forcing function (the replaceability test) that requires the artifacts to be in-house-operable.

Can we run the hybrid with multiple agencies?

Rarely well. Two agencies on the same AI stack tend to produce coordination overhead that exceeds either agency’s velocity contribution. The exception is when the agencies are in disjoint domains; one on AI feature X, one on AI feature Y; with the in-house team owning the shared infrastructure. The default recommendation is one agency per AI workstream.

What’s the smallest engagement size where the hybrid makes sense?

The hybrid requires enough engineering surface area that a 30/70 split is meaningful. Engagements under $200K total typically don’t warrant the contract sophistication; the agency runs as a focused contractor on a specific deliverable rather than as a hybrid partner. The hybrid economics start to make sense around $500K and become structurally appropriate at $1M+ where the agency engagement is multi-quarter and load-bearing.

Key takeaways

The 30/70 hybrid is the dominant AI engagement shape in 2026 because it composes; agency velocity scales up when build pressure is high, scales down as in-house absorbs capacity, and the engagement boundary moves cleanly. The split works when the 30 percent is the right 30 percent: the eval test set, the prompt registry, the model-routing config, the on-call ownership, and the IP defining defensibility. Each is a load-bearing asset; outsourcing any of them produces architectural lock-in.

The agency owns the 70 percent that produces velocity without producing defensibility; build scaffolding, agent orchestration, integration plumbing, deployment pipeline, observability tooling. The agency’s velocity advantage is real and worth capturing; the hybrid is designed to capture it without surrendering the load-bearing assets.

The contract is what makes the split survive. Three clauses are non-negotiable: IP and artifact location (in org infrastructure), knowledge transfer cadence (calendared and tracked), and the replaceability test at month six (can the in-house team take over in 30 days). Sophisticated buyers write these on day one; unsophisticated buyers discover at month 14 that the absence of these clauses produces lock-in.

The hybrid is not a permanent state. The trajectory is from 20/80 (agency-heavy) at engagement start to 50/50 or higher in-house by month 12 to 18. The five in-house assets are the structural anchor that makes the trajectory possible; they are the part of the engagement that is in-house from day one and that grows in-house capacity around them. The rest of the engagement is the velocity supplement that fades as the anchor strengthens.

The AI Hybrid Playbook: Which 30% to Keep In-House When an Agency Owns 70%

Why the hybrid shape exists in 2026

The five assets that must stay in-house

Asset 1: the eval test set

Asset 2: the prompt registry

Asset 3: the model-routing config

Asset 4: on-call ownership

Asset 5: the IP defining defensibility

What the agency owns: the 70 percent

Contract clauses that enforce the split

Frequently asked questions

What if the agency objects to in-house owning the eval set?

Is 30/70 a fixed split, or does it shift over time?

How do we handle the on-call boundary when the agency is in another time zone?

What if the in-house team isn’t experienced enough to own the five assets at engagement start?

How do we evaluate which agency can run the hybrid shape vs which is staff augmentation in disguise?

What happens at engagement end? How do we know we’re not stuck?

Is the 30/70 hybrid better than full in-house?

How does this interact with the AI hire trap?

Can we run the hybrid with multiple agencies?

What’s the smallest engagement size where the hybrid makes sense?

Key takeaways

See how companies like yours are using AI

Related articles

The 10x Developer Used to Be a Unicorn — Now We're Approaching the 1000x Paradigm

A field guide to evaluating an AI agency in under 90 minutes

Agentic AI Development: Tool Use and Function Calling

Where ideas become AI products

Company

General

Case Studies

Services

Resources