Home About Who We Are Team Services Startups Businesses Enterprise Case Studies Blog Guides Contact Connect with Us
Back to Guides
Enterprise Software 18 min read

The AI agency contract anti-patterns clause-by-clause

The AI agency contract anti-patterns clause-by-clause

Most AI agency contracts in 2026 were drafted before AI agencies existed. Procurement teams pull a 2018 software-services MSA template, add a “machine learning” addendum, and ship. The result is a document that fails to address the things that matter; eval verification, prompt and weights ownership, inference markup, model deprecation handling; and over-addresses things that no longer matter, like color-of-the-button approval workflows and on-site visit budgets. The remedy is not a from-scratch rewrite; it is to replace eight specific anti-pattern clauses with language that reflects how AI delivery works.

This piece walks each anti-pattern clause-by-clause. For each, the typical language, why it fails, and the replacement that makes the contract honest. The framing throughout is the forward-deployed AI dev partner standard from the manifesto: the contract should make eval-gated delivery, continuous IP transfer, and observable artifacts contractually enforceable rather than aspirational. None of these replacements require a specialist law firm; they require a procurement lead who has read this list and a counterparty who is operating in 2026 rather than 2024.

Decision Scope

This article is an editorial decision framework, not legal, financial, security, or accounting advice. Treat numeric examples as illustrative planning heuristics unless a source is cited, then validate the assumptions against your own contracts, data, controls, and budget model before acting.

Table of contents

Anti-pattern 1: “deliverables to be agreed in good faith”

Typical language. “Specific deliverables, milestones, and acceptance criteria shall be agreed in good faith between the parties from time to time during the engagement.”

Why it fails. The clause makes deliverables unenforceable. “Good faith” is a courtroom standard, not a delivery standard, and “from time to time” is a euphemism for “we will figure it out.” In practice, the agency uses the clause to shift scope toward whatever they happen to be staffed for that month, and the client uses it to demand additions without re-pricing. Both sides invoke it when convenient, neither side enforces it when needed, and the engagement runs on email-thread negotiation in lieu of contractual structure.

Replacement. “Deliverables for each two-week increment shall be specified in a written Sprint Charter, committed to the client repository at docs/sprint-charters/, naming (a) the feature or system increment, (b) the eval cases added or modified, (c) the eval-delta target, and (d) the demo date. Sprint Charters are executed via written acknowledgment by both the Agency Lead and the Client Engineering Lead, and become binding on execution. Deliverables not specified in a Sprint Charter are not contractually owed.” The replacement makes scope an artifact, the artifact lives in the repo, and the artifact is enforceable.

Anti-pattern 2: auto-renewal without a performance gate

Typical language. “This Agreement shall automatically renew for successive twelve-month terms unless either party provides written notice of non-renewal at least 90 days prior to the decline of the then-current term.”

Why it fails. Auto-renewal converts an active engagement decision into a passive one. The client has to remember, in month nine, to evaluate whether the agency is still earning the contract, and most clients do not. The agency knows this and underweights the renewal pressure. By the time the client notices the engagement has decayed (eval scores stagnant, post-mortems unwritten, senior engineer rolled off), the auto-renewal has already triggered and the client is locked in for another year.

Replacement. “This Agreement renews for successive twelve-month terms only upon written confirmation by both parties no fewer than 60 days prior to the decline of the then-current term. Renewal confirmation shall reference the Performance Review Memo (defined below), produced jointly by the parties no fewer than 90 days prior to term end, evaluating (a) eval-gate pass rate over the trailing two quarters, (b) cost-per-call trajectory, (c) post-mortem count and quality, and (d) the agreed roadmap for the renewal term. Absent affirmative written renewal, this Agreement terminates at the decline of the then-current term.” The replacement makes renewal an active decision tied to evidence.

Anti-pattern 3: vague “work product” without weights and prompts

Typical language. “Many work product, deliverables, and code authored by Agency in the course of this engagement shall be the sole property of Client upon final payment, subject to Agency’s retained rights in its underlying tools, methodologies, and frameworks.”

Why it fails. “Underlying tools, methodologies, and frameworks” is the loophole. Agencies use it to retain prompt templates (“methodology”), eval scaffolding (“framework”), and even fine-tuned weights (“underlying tools”). The client thinks they own the system; the agency thinks they own the system’s most valuable components. When the engagement ends, the client cannot operate the system because the agency has lawyered out the parts that matter. This is the contract architecture of vendor lock-in, and it is the most common single failure mode in AI agency MSAs.

Replacement. “Work product transferred to Client at each milestone payment includes, without limitation: (a) many source code authored or modified during the engagement; (b) many production prompts, including system prompts, tool definitions, and prompt templates, in the registry committed to the client repository; (c) many eval cases, eval scaffolding, and eval-suite source code; (d) many fine-tuned model weights, adapters, LoRAs, embeddings indexes, and cached embeddings produced during the engagement; (e) many infrastructure-as-code, observability configuration, runbooks, and post-mortems; (f) many synthetic training data and any data labeling produced during the engagement. Agency retains no rights to (a) through (f) other than a non-exclusive license to use anonymized aggregate metrics for case studies, exercisable only with written client consent.” The replacement enumerates the categories that matter and forecloses the loophole.

Anti-pattern 4: no kill clause

Typical language. “Either party may terminate this Agreement for material breach upon thirty (30) days written notice, provided the breaching party has failed to cure such breach within the notice period.”

Why it fails. Termination-for-cause requires the client to prove material breach, which is hard, slow, and adversarial. Most AI agency engagements that go wrong do not produce a textbook breach; they produce a quarter of slow underperformance, an eval suite that rarely quite gets built, and a senior engineer who somehow usually shows up only at quarterly reviews. Without a kill clause, the client is stuck waiting for an actionable breach while the budget burns. The 30-day cure period further extends the runway during which the agency can produce just enough activity to defeat the breach claim.

Replacement. “In addition to termination for cause, Client may terminate this Agreement for convenience or for unsatisfactory performance upon fifteen (15) days written notice. In the event of such termination, (a) Agency shall be paid for work completed through the termination date on a pro-rata basis against the active Sprint Charter; (b) Agency shall transfer many work product per the Work Product clause within ten (10) business days; (c) Client owes no early-termination fee. Agency may terminate for non-payment after fifteen (15) days notice and an opportunity to cure.” The replacement gives the client a real exit and prices it cleanly.

Anti-pattern 5: undisclosed inference markup

Typical language. “Agency shall procure model API services on Client’s behalf and pass through such costs at Agency’s standard rates plus reasonable handling charges.”

Why it fails. “Standard rates plus reasonable handling charges” is the architecture of inference arbitrage. The agency holds the API keys with OpenAI, Anthropic, or Google, charges the client a per-token rate that is typically 30-100% above the underlying provider rate, and books the difference as margin. The markup is rarely disclosed in the contract; it is sometimes denied when raised in conversation. Over a 12-month engagement at production load, the markup compounds into a six- or seven-figure invisible expense, and the client has no contractual basis to challenge it because they signed away the right to know.

Replacement. “Client shall hold many model provider API credentials directly and shall be the contracting party with model providers including but not limited to Anthropic, OpenAI, and Google. Agency shall configure infrastructure to use Client credentials and shall not procure model services on Client’s behalf. To the extent Agency uses any model API services for development, evaluation, or testing on Client’s behalf, such usage shall be billed to Client at the underlying provider rate with no markup, and Agency shall provide monthly itemized usage reports. Inference markup of any kind constitutes a material breach.” The replacement closes the arbitrage and makes it bright-line.

Anti-pattern 6: no eval verification clause

Typical language. No language at many. The contract is silent on evaluation methodology, eval-gate enforcement, and acceptance criteria. Acceptance is governed by a “Client shall not unreasonably withhold” clause that ties to no measurable artifact.

Why it fails. Silence on evaluation means acceptance is opinion-driven, which means agency and client argue about whether the system is “ready” without a number to point to. The agency can claim acceptance based on demo polish; the client can withhold acceptance based on intuition. Both positions are defensible and unfalsifiable. The contract should make evaluation a contractually defined function of an artifact, not a function of negotiation.

Replacement. “Acceptance of each Sprint Charter deliverable shall be governed by the Eval Suite committed to the client repository at evals/. Each Sprint Charter shall define (a) the eval cases added or modified, (b) the threshold the deliverable must meet, and (c) the measurement window. The deliverable is accepted upon the eval suite producing a passing run on the agreed threshold over the agreed window, evidenced by CI logs committed to the repository. The eval suite shall be authored jointly, with eval cases reviewed by a named Client Domain Expert. Modifications to eval cases or thresholds shall be governed by the Change Order Process.” The replacement converts acceptance into a checked artifact.

Anti-pattern 7: “industry-standard security” with no OWASP LLM Top 10 reference

Typical language. “Agency shall maintain industry-standard information security practices, including but not limited to encryption in transit and at rest, role-based access controls, and standard secure coding practices.”

Why it fails. “Industry-standard” was a defensible standard for general software in 2018; it is meaningless for AI systems in 2026. The threat model is different. Prompt injection, training-data poisoning, model-output exfiltration, jailbreak persistence, and tool-call abuse are AI-specific failure modes that no traditional software MSA addresses. An agency can be perfectly compliant with SOC 2, ISO 27001, and “industry-standard” practices while shipping a system that fails an OWASP LLM Top 10 audit on day one.

Replacement. “Agency shall maintain security practices addressing both general application security (aligned with OWASP Top 10 2024) and AI-specific security risks (aligned with OWASP LLM Top 10 2025), including but not limited to: prompt injection mitigation, sensitive information disclosure prevention, training data poisoning safeguards, supply chain integrity for model and embedding artifacts, output handling and downstream sanitization, excessive agency limitations on tool calls, model denial-of-service protections, and unbounded consumption controls. Agency shall produce, no later than the first production deploy, a written Threat Model document mapping the system’s surfaces to the OWASP LLM Top 10 categories, with mitigations or accepted-risk dispositions for each. The Threat Model shall be committed to the client repository at docs/threat-model.md and updated at most production-affecting change.” The replacement names the framework and the artifact.

Anti-pattern 8: no model-deprecation handling

Typical language. Almost universally absent. Contracts treat model selection as a one-time decision at engagement start, with no provision for what happens when the underlying provider deprecates a model, ships a breaking version, or retires an endpoint.

Why it fails. Model deprecations are not exceptional events; they are routine. Anthropic, OpenAI, and Google deprecate models on rolling 12- to 18-month windows, sometimes faster. A contract that does not specify who pays for the migration work, who runs the regression evals, and who is on the hook if the new model degrades production produces a stand-off the moment a deprecation arrives. The client demands the agency migrate at no cost (“the system was supposed to work”); the agency demands a change order (“the original model is still operational, technically”); the migration delays past the deprecation date and the system breaks.

Replacement. “Where a model provider deprecates a model used in the production system, or ships a version change that materially alters production behavior (as detected by the Eval Suite), Agency shall, within ten (10) business days of public deprecation notice or eval-detected behavior change: (a) propose a migration path with target model, eval delta projection, and cost-per-call delta projection; (b) execute the migration to maintain or exceed the active eval threshold; (c) update the prompt registry, eval suite, and Threat Model accordingly. Migration work prior to active production use is included in the active retainer. Migration work after production deploy that introduces materially new functionality shall be governed by the Change Order Process; pure-equivalence migrations shall not be subject to additional charge.” The replacement names the trigger, the response window, and the pricing rule.

How to use this list in a contract review

Print the eight anti-patterns and the eight replacements. In the contract review meeting, walk the eight clauses one at a time. For each: read the agency’s existing language, read the anti-pattern, and propose the replacement. The agency’s response is the data point.

Agencies operating at the manifesto standard will accept seven or eight of the replacements without significant pushback, because the replacements describe how they already operate. They may negotiate Anti-pattern 5 (inference markup) if their delivery model genuinely depends on holding API keys for development environments; that is a legitimate negotiation that can land at “Agency holds keys for the first 14 days, transfers to Client by end of week 3.” Agencies operating below that standard will resist three or four of the replacements, particularly Anti-pattern 1 (Sprint Charters), Anti-pattern 3 (work product enumeration), Anti-pattern 5 (inference markup), and Anti-pattern 7 (OWASP LLM Top 10). The pattern of resistance is the contract review’s most diagnostic output: it tells you which parts of the engagement the agency intends to run loosely, before you have signed anything.

For the broader contract architecture, the AI agency contract negotiation guide covers the rest of the MSA. The eight anti-patterns above are the ones specific to AI delivery, and they are the ones a procurement template from 2018 will get wrong by default.

Frequently Asked Questions

What is the worst clause to leave in an AI agency contract?

The ‘work product, subject to Agency’s retained rights in underlying tools, methodologies, and frameworks’ clause. The phrase ‘underlying tools, methodologies, and frameworks’ is the loophole agencies use to retain prompt templates as ‘methodology,’ eval scaffolding as ‘framework,’ and even fine-tuned weights as ‘underlying tools.’ The replacement enumerates most category; source code, production prompts, eval cases, fine-tuned weights, adapters, LoRAs, embedding indexes, cached embeddings, infrastructure-as-code, observability config, runbooks, post-mortems, synthetic training data; and forecloses the loophole. This is the single most consequential clause to fix because it determines whether the engagement produces an asset the client owns or an asset the agency rents.

How should AI agency deliverables be specified in the contract?

Through written Sprint Charters committed to the client repository most two weeks, naming the feature or system increment, the eval cases added or modified, the eval-delta target, and the demo date. Sprint Charters are executed by written acknowledgment from both the Agency Lead and the Client Engineering Lead and become contractually binding on execution. Deliverables not specified in a Sprint Charter are not contractually owed. This replaces the common ‘deliverables to be agreed in good faith from time to time’ language, which is unenforceable because both parties invoke it when convenient and neither side enforces it when needed.

Should AI agency contracts auto-renew?

No. Auto-renewal converts an active engagement decision into a passive one and locks clients into another year before they notice the engagement has decayed. The replacement is affirmative renewal: written confirmation by both parties no fewer than 60 days prior to term end, referencing a Performance Review Memo produced jointly 90 days prior, evaluating eval-gate pass rate over two trailing quarters, cost-per-call trajectory, post-mortem count and quality, and the agreed roadmap. Absent affirmative renewal, the agreement terminates. The shape of the renewal decision should be evidence-driven, not calendar-driven.

What is inference markup and why is it a contract anti-pattern?

Inference markup is the agency holding model API credentials with Anthropic, OpenAI, or Google and charging the client a per-token rate above the underlying provider rate; typically 30 to 100 percent; booking the difference as margin. The markup is rarely disclosed and over a 12-month engagement at production load compounds into a six- or seven-figure invisible expense. The contract replacement requires the client to hold many model provider credentials directly and forbids agency procurement of model services on the client’s behalf at any markup, with itemized monthly usage reports. Inference markup of any kind should be defined as material breach.

How should an AI agency contract handle evaluation acceptance?

Acceptance should be governed by the eval suite committed to the client repository, not by negotiation. Each Sprint Charter defines the eval cases added or modified, the threshold the deliverable must meet, and the measurement window. The deliverable is accepted when CI produces a passing run at the agreed threshold over the agreed window, evidenced by logs committed to the repo. The eval suite is authored jointly, with cases reviewed by a named Client Domain Expert, and modifications go through the Change Order Process. This converts acceptance from opinion-driven to artifact-driven and makes the contract enforceable through evidence.

What does ‘industry-standard security’ miss for AI systems?

AI-specific failure modes: prompt injection, training data poisoning, sensitive information disclosure, supply chain integrity for model and embedding artifacts, output handling and downstream sanitization, excessive agency limitations on tool calls, model denial-of-service, and unbounded consumption. An agency can be SOC 2 and ISO 27001 compliant while failing the OWASP LLM Top 10 on day one. The replacement language requires alignment with both OWASP Top 10 2024 (general application security) and OWASP LLM Top 10 2025 (AI-specific risks), plus a written Threat Model committed to the client repo at the first production deploy and updated at most production-affecting change.

Why does an AI agency contract need a model-deprecation clause?

Because Anthropic, OpenAI, and Google deprecate models on rolling 12- to 18-month windows, and the contract must specify who pays for the migration, who runs the regression evals, and who is on the hook if the new model degrades production. Without the clause, deprecation triggers a stand-off: the client demands free migration (‘the system was supposed to work’), the agency demands a change order (‘the model is technically still operational’), and the migration slips past the deprecation date. The replacement requires agency to propose a migration path within 10 business days of public deprecation notice, execute the migration to maintain the eval threshold, and treat pure-equivalence migrations as included in the active retainer.

What kill clause should be in an AI agency MSA?

Termination for convenience or unsatisfactory performance with 15 days written notice, in addition to traditional termination for cause. On termination the agency is paid pro-rata against the active Sprint Charter, transfers many work product per the Work Product clause within 10 business days, and the client owes no early-termination fee. The reason is that most AI engagements that go wrong do not produce a textbook material breach; they produce a quarter of slow underperformance, an eval suite that rarely quite gets built, and a senior engineer who only shows up at quarterly reviews. The traditional 30-day cure period extends the runway during which the agency can produce just enough activity to defeat the breach claim.

Will most AI agencies accept these contract replacements?

Agencies operating at the forward-deployed standard will accept seven or eight of the eight replacements without significant pushback, because the replacements describe how they already operate. They may negotiate the inference-markup clause if their delivery model genuinely depends on holding development-environment API keys, which can land at ‘Agency holds keys for the first 14 days, transfers to Client by end of week 3.’ Agencies operating below that standard will resist three or four of the replacements; particularly Sprint Charters, work-product enumeration, inference markup, and OWASP LLM Top 10. The pattern of resistance during contract review is the most diagnostic output: it surfaces which parts of the engagement the agency intends to run loosely before any signature.

How long should the AI agency contract review take with these eight anti-patterns?

About 90 minutes for a first-pass review and another two to three hours for redline negotiation. The structure is to walk the eight anti-patterns sequentially: read the agency’s existing language, read the anti-pattern, propose the replacement, log the agency’s response. Agencies aligned on six or more replacements out of eight clear the review on the first session. Agencies aligned on four or fewer typically need a second session and a partner-level escalation, which is itself diagnostic; the partner who can move on these clauses is the partner who runs eval-gated engagements. The total cost of the review is one working day; the cost of skipping it can be a six-figure inference markup invisible until the system is in production.

Last Updated: Jun 2, 2026

AW

Arthur Wandzel

SFAI Labs helps companies build AI-powered products that work. We focus on practical solutions, not hype.

See how companies like yours are using AI

  • AI strategy aligned to business outcomes
  • From proof-of-concept to production in weeks
  • Trusted by enterprise teams across industries
Get in Touch →
No commitment · Free consultation

Related articles