Home About Who We Are Team Services Startups Businesses Enterprise Case Studies Blog Guides Contact Connect with Us
Back to Guides
Enterprise Software 19 min read

The AI agency change-order playbook: how to keep scope honest

The AI agency change-order playbook: how to keep scope honest

The single most predictive marker of whether an AI agency engagement will run hot in month four is whether the change-order process was set up in week two. Engagements without a change-order process accumulate scope verbally; a feature added in a Slack thread, a model swap requested in a demo, an eval threshold quietly tightened to please a stakeholder. Each accumulation is small. By month four they have compounded into a system that bears no resemblance to the original Sprint Charter, an inference bill that nobody projected, and an agency-client relationship that runs on the assumption that “we will figure it out at the decline of the engagement.” That assumption rarely survives the figuring out.

The remedy is a five-step change-order process, codified in the MSA and operated weekly. It treats most deviation from the eval-bound scope as a discrete event: triggered, sized, priced, approved, logged. The process is not bureaucratic; done correctly it adds about 20 minutes per change to a healthy engagement and saves a quarter of cleanup at the end. This piece walks the five steps, then covers the trap that defeats most change-order processes; the “absorbed favor”; and how to refuse it. The framing is downstream of the forward-deployed AI dev partner standard described in the manifesto: if eval-gated PRs are the unit of work and Sprint Charters are the unit of scope, change orders are the only legitimate way to modify either.

Decision Scope

This article is an editorial decision framework, not legal, financial, security, or accounting advice. Treat numeric examples as illustrative planning heuristics unless a source is cited, then validate the assumptions against your own contracts, data, controls, and budget model before acting.

Table of contents

Step 1: trigger; name the deviation

A change-order is triggered by any deviation from the eval-bound scope of the active Sprint Charter. The trigger has to be defined precisely, because the most common failure mode is the unnamed deviation; the moment when one party assumes they have changed the scope and the other party assumes the scope is unchanged. The deviation is named when one of three signals appears.

A new eval case category. A stakeholder asks for behavior that is not covered by an existing eval case (“the system should now also handle multi-language inputs,” “the agent should be able to escalate to a human when confidence is below 0.7,” “the response should rarely include URLs”). The new behavior is not in the active Sprint Charter; it is a deviation. Even if the agency engineer agrees in the moment that “yeah, that’s a small change, we can do it,” the deviation has been triggered and a change-order is owed.

A model or routing-layer change. Any change to the production model, fallback model, retrieval strategy, embedding model, or re-ranker that was not specified in the active Sprint Charter is a deviation. Model changes have downstream eval and cost implications that usually exceed the surface estimate. The agency’s instinct will be to absorb the change as a “small refactor”; the discipline is to refuse the absorption and trigger the change-order process even when the change is genuinely small.

A scope expansion or threshold change. Any addition to the deliverable list (“can we also add a small UI for the eval dashboard?”) or any modification to the eval threshold (“can we tighten faithfulness from 0.80 to 0.85 for this milestone?”) is a deviation. The threshold-change case is the one most often mishandled: a tightened threshold can require substantial additional work to clear, and absorbing it without a change-order leaves the agency working uncompensated against an invisible new bar.

The trigger step ends when the deviation is named in writing; typically a Slack message in a dedicated #change-orders channel; within 24 hours of when it surfaced. Naming is the only thing that matters at this step. The size, the price, and the approval many come later.

Step 2: sizing; re-estimate eval, dev, and cost-per-call impact

Once triggered, the change-order is sized along three dimensions. Sizing is done by the senior engineer doing the work, not by the project manager, because the project manager systematically undersizes the eval-impact and cost-per-call dimensions.

Eval impact. What new eval cases need to be authored? What threshold changes are implied? What regression risks are created for existing eval cases? The output is a written estimate: “Adding multi-language support requires authoring 30 new eval cases (10 per target language), modifying 14 existing English-only cases to assert language-detection behavior, and creating a new threshold-decomposition (per-language faithfulness, with a global aggregate). Risk: the existing English faithfulness threshold may regress 2-4 points until the multi-language prompts are stable; recovery in 2-3 sprints.”

Dev hours impact. How many engineer-hours, at which seniority levels, will the change consume? The estimate is in named-engineer hours: “30 hours of senior engineer time for the routing-layer changes, 24 hours of mid-level engineer time for the eval scaffolding, 8 hours of senior engineer review time for the integration.” Hour estimates that round to “a couple of weeks” are not estimates; they are intentions.

Cost-per-call impact. What does the change do to the per-request cost profile? Adding a verification call against a second model raises cost-per-call. Switching to a more capable model raises cost-per-call. Tightening retrieval to top-10-with-rerank raises cost-per-call. The cost-per-call delta has to be estimated and fed into the renewal-period budget projection, because a 15% per-call increase compounds into a meaningful annual cost. The output is a single number: “Cost-per-call delta: +$0.004, projected to +$3,100/month at production load.”

The three dimensions are written into a Change Order Memo, typically a one-page markdown document at docs/change-orders/2026-04-15-multi-language.md. The Memo is the input to the pricing step. Sizing without the Memo is sizing in conversation, which is sizing without accountability.

Step 3: pricing; rate card plus risk premium

Pricing the change order has two components. The base is the agency’s published rate card: senior engineer at one rate, mid-level at another, principal-level review time at a third. The base price is the dev-hours estimate multiplied by the rate card.

The premium is the risk price. AI changes are not deterministic; a “30-hour change” may turn into a 60-hour change because an unexpected eval regression surfaces. The risk premium prices the variance. A reasonable risk premium for AI changes is 15-30% on top of the base estimate, scaled by the size of the change and the riskiness of the dimension. A change that touches the routing layer (high downstream regression risk) carries a higher premium than a change that adds an eval case (low risk).

The pricing Memo also includes the cost-per-call delta as a separately disclosed line item. This is critical: cost-per-call deltas are passed to the client’s inference budget directly (under a contract that prohibits inference markup, per the AI agency contract negotiation guide), but the client should see them at change-order time so they can decide whether the per-call cost is worth the new behavior. A change that adds a verification call may improve faithfulness by 4 points and add $5,000/month at production load; that may or may not be worth it, but the decision belongs to the client.

The output of the pricing step is a single document, the Change Order Pricing Memo, with three numbers: the dev-hours estimate at rate card, the risk premium, and the cost-per-call delta projection. The Memo is sent to the client decision-maker for the approval step.

Step 4: approval; written sign-off, not verbal absorption

Approval is the step that defeats most change-order processes. The temptation is to handle approval verbally; a Slack thumbs-up, a “yeah let’s do it” in the architecture review; because the change feels small and the friction of a written approval feels excessive. The temptation is wrong. Verbal approval produces engagements that, six months in, contain twenty changes that nobody has fully tracked, with disputed pricing, disputed eval impact, and disputed authorship of the change.

Written approval requires three things. The named decision-maker on the client side (typically the Engineering Lead or the Product Owner, with budget authority above a defined threshold) acknowledges the Change Order Pricing Memo by replying in writing. The reply names the change order (using its memo path), confirms the price, and authorizes the work to begin. The agency does not start the work until the written authorization is in.

Above a budget threshold defined in the MSA; typically $5,000 or $10,000 in dev-hours pricing; written approval also requires CFO or finance-team acknowledgment. Below the threshold, the engineering lead can approve unilaterally. The threshold ensures that small changes do not bog down in finance review while large changes get the right level of scrutiny.

The 24-hour rule applies here too. The Pricing Memo should be issued within 24 hours of the trigger, and the approval should land within 48 hours. Faster cycles are fine; slower cycles signal that the process is not running, in which case the engagement is sliding back into verbal-approval mode. Once it slides, recovering the discipline mid-engagement is hard.

Step 5: logging; the change-order register, visible to both sides

Most change order is logged in a single register, visible to both sides, updated in real time. The register is a markdown table at docs/change-orders/register.md in the client repository, with one row per change order, columns for memo path, trigger date, approval date, pricing, and status (proposed / approved / in-progress / shipped / rejected).

The register matters because it is the single source of truth for cumulative change-order activity. At any moment, both sides can pull up the register and see: how many change orders have been approved this quarter, how much they total, how the cost-per-call profile has drifted from the original Sprint Charter, and which changes are still in-flight. The register replaces the alternative; each side maintaining its own internal tracker, with periodic reconciliation that usually discovers two or three discrepancies.

The register also produces the artifact for the renewal conversation. When the engagement comes up for renewal, the register is the document both sides read first. It tells the truth about how the engagement evolved, separate from the narrative either side wants to tell. Engagements that produced 14 well-tracked, well-priced change orders renew on stronger terms than engagements that produced 30 verbally-absorbed, untracked changes; even if the underlying work was identical.

The absorbed-favor trap and how to refuse it

The most insidious failure mode in change-order processes is the absorbed favor. The pattern is familiar: a stakeholder asks for “one small thing.” The agency engineer, wanting to be helpful and not wanting to seem petty, says “no problem, we’ll fold it in.” The stakeholder thanks them. The change ships. No change order is logged.

The favor is absorbed in the engineer’s head as goodwill banked. By month three, the engineer has absorbed twelve such favors, none of them tracked, many of them now part of the production system. Two compound effects play out. First, the engineer has burned dev-hours that nobody has compensated for, which shows up in the agency’s margin and eventually in the partner’s pressure on the engineer to “accelerate delivery”; that is, to absorb fewer favors and produce more billable output. Second, the client has come to expect the favor cadence, so when the engineer eventually pushes back on a thirteenth favor, the relationship cools.

The trap defeats good engineers because the immediate cost of saying no feels higher than the long-run cost of saying yes. The remedy is institutional: the change-order process must be the engineer’s defense, not their burden. The script is short. “That’s a great idea; let me write it up as a change order. I’ll have the sizing memo to you tomorrow morning, and assuming approval, we can have it in by the decline of next sprint.” The phrasing is friendly, the process is invoked, and the absorption is refused.

The script works because it does not refuse the change; it refuses the absorption. The stakeholder who genuinely needs the change will say “great, send me the memo.” The stakeholder who was testing whether the process would hold will hesitate, which is exactly the test the process is supposed to surface. Either way, the engineer has done the right thing for both sides; preserved the agency’s margin discipline and preserved the client’s visibility into what is happening to their engagement.

The harder version of the absorbed-favor trap is the partner-level absorption. Sometimes the agency CEO, in a quarterly review with the client CEO, agrees to a change in real time and tells the engineering team to “just absorb it, we’re playing the long game on this account.” This is the most expensive version because it overrides the process at the executive level and signals to the team that the process is optional. The remedy is to log the partner-level absorption as a zero-priced change order in the register. The work is done, the register reflects it, and the next renewal conversation is honest about the cumulative absorptions.

What the change-order rhythm looks like in practice

A healthy AI engagement at month four typically has three to six change orders logged per month, with average pricing in the $4,000–$15,000 range. Two-thirds of triggered changes get approved; the other third get rejected on review (the cost-per-call delta turns out to be too high, or the eval impact reveals that the change requires more sprint capacity than is available). The rejected ones are the most valuable evidence that the process is working; a process that approves everything is not a process.

The change-order register at month four typically shows a clear pattern. The first month had two or three small changes as the team calibrated. The second month had a larger change as a stakeholder request reshaped a meaningful piece of the system. The third month had a series of small changes that compounded into a measurable cost-per-call delta, prompting a conversation about whether the system was on the right architectural trajectory. The fourth month has fewer changes, because the architecture has stabilized and the eval discipline is producing a system that does what it should without constant adjustment.

By month six, the change-order register is the document that defines the engagement. Most claim either side makes about scope, cost, or quality can be checked against it. The register is also the document that, in the unfortunate case of a contentious renewal or termination, settles the disputes that would otherwise consume weeks of legal time. Engagements with healthy change-order discipline almost rarely have contentious terminations, because the register has already prevented the misalignments that produce contention. The discipline is dull to operate and invaluable in retrospect, which is the signature of most operating practice that is worth doing in the first place.

Frequently Asked Questions

What are the five steps of an AI agency change-order process?

Trigger, sizing, pricing, approval, logging. Trigger names the deviation in writing within 24 hours of surfacing. Sizing produces a written Change Order Memo with eval impact (new cases, threshold changes, regression risks), dev-hours impact (named-engineer hours by seniority), and cost-per-call delta. Pricing applies the rate card plus a 15-30% risk premium scaled by change size and riskiness, with the cost-per-call delta as a separately disclosed line item. Approval requires written sign-off from the named client decision-maker, with CFO acknowledgment above a defined threshold. Logging records the change in the register at docs/change-orders/register.md visible to both sides.

What triggers a change order in an AI engagement?

Any deviation from the eval-bound scope of the active Sprint Charter. Three specific signals: a new eval case category (behavior not covered by existing eval cases, like multi-language input or human escalation), a model or routing-layer change (any modification to the production model, fallback model, retrieval strategy, embedding model, or re-ranker not specified in the Sprint Charter), or a scope expansion or threshold change (added deliverables or modified eval thresholds). Even small changes that the engineer agrees to absorb in the moment must trigger the process; the absorption is the failure mode the process exists to prevent.

What is the absorbed-favor trap in AI agency engagements?

The pattern where a stakeholder asks for ‘one small thing,’ the agency engineer says ‘no problem, we’ll fold it in,’ and no change order is logged. By month three, twelve such favors have been absorbed, none tracked, many in the production system. The agency has burned uncompensated dev-hours, which shows up in margin pressure and eventually in the partner pressuring the engineer to absorb fewer favors. The client has come to expect the favor cadence, so when the engineer eventually pushes back, the relationship cools. The trap defeats good engineers because saying no in the moment feels worse than saying yes; but the long-run cost of yes compounds.

How do I refuse the absorbed-favor trap without damaging the client relationship?

The script is short and friendly: ‘That’s a great idea; let me write it up as a change order. I’ll have the sizing memo to you tomorrow morning, and assuming approval, we can have it in by the decline of next sprint.’ The phrasing does not refuse the change; it refuses the absorption. Stakeholders who genuinely need the change will say ‘great, send me the memo.’ Stakeholders who were testing whether the process would hold will hesitate, which is exactly the test the process is supposed to surface. Either response is the right outcome, and the engineer has preserved both the agency’s margin discipline and the client’s visibility into what is happening to the engagement.

How should AI agency change orders be priced?

Two components. The base is the agency’s published rate card: senior, mid-level, and principal-level engineer hours each at their respective rates. The premium is the risk price; a 15-30% markup on the base estimate, scaled by change size and downstream risk. Changes that touch the routing layer carry higher premiums than changes that add eval cases. The cost-per-call delta is disclosed as a separate line item, not folded into the price, because it passes to the client’s inference budget directly under a no-markup contract and the client should see it at change-order time to decide whether the per-call cost is worth the new behavior.

Why should change orders be approved in writing rather than verbally?

Verbal approval produces engagements that six months in contain twenty changes nobody has fully tracked, with disputed pricing, disputed eval impact, and disputed authorship. Written approval requires the named client decision-maker to acknowledge the Change Order Pricing Memo in writing, naming the change order by memo path and confirming the price. The agency does not start the work until written authorization is in. Above a defined budget threshold (typically $5,000 or $10,000), CFO or finance-team acknowledgment is also required. The threshold ensures small changes do not bog down in finance review while large changes get the right level of scrutiny.

What is the change-order register and where should it live?

A markdown table at docs/change-orders/register.md in the client repository, with one row per change order and columns for memo path, trigger date, approval date, pricing, and status (proposed, approved, in-progress, shipped, rejected). The register is the single source of truth for cumulative change-order activity, visible to both sides at many times. It replaces the alternative of each side maintaining internal trackers with periodic reconciliation that usually discovers discrepancies. The register is also the artifact both sides read first at renewal; it tells the truth about how the engagement evolved, separate from any narrative either side wants to tell.

How many change orders should a healthy AI engagement have per month?

Three to six per month at month four, with average pricing in the $4,000-$15,000 range and a one-third rejection rate. The rejected change orders are the most valuable evidence that the process is working; a process that approves everything is not a process. The pattern over the engagement typically shows two or three small changes in month one as the team calibrates, a larger change in month two as a stakeholder request reshapes a meaningful piece of the system, a series of small compounding changes in month three that prompts an architectural conversation, and fewer changes in month four as the architecture stabilizes and the eval discipline matures.

What happens if the agency partner overrides the change-order process at the executive level?

This is the partner-level absorption; when the agency CEO agrees to a change in a quarterly review with the client CEO and tells the engineering team to ‘just absorb it, we’re playing the long game.’ It is the most expensive version of the absorbed-favor trap because it overrides the process at the executive level and signals that the process is optional. The remedy is to log the partner-level absorption as a zero-priced change order in the register. The work is done, the register reflects it, and the next renewal conversation is honest about the cumulative absorptions. The discipline is to keep the register accurate even when the price is zero.

How does the change-order register affect renewal negotiations?

It is the document both sides read first. Engagements that produced 14 well-tracked, well-priced change orders renew on stronger terms than engagements that produced 30 verbally-absorbed, untracked changes; even if the underlying work was identical, because the tracked engagement is legible and the untracked engagement is contestable. By month six, most claim either side makes about scope, cost, or quality can be checked against the register. In contentious terminations or renewals, the register settles disputes that would otherwise consume weeks of legal time. Engagements with healthy change-order discipline almost rarely have contentious terminations because the register has already prevented the misalignments that produce contention.

Last Updated: Jun 2, 2026

AW

Arthur Wandzel

SFAI Labs helps companies build AI-powered products that work. We focus on practical solutions, not hype.

See how companies like yours are using AI

  • AI strategy aligned to business outcomes
  • From proof-of-concept to production in weeks
  • Trusted by enterprise teams across industries
Get in Touch →
No commitment · Free consultation

Related articles