Annual budgets were built for software whose cost curve is roughly flat after launch. AI projects do not have a flat cost curve. They have a step function with quarterly resets; most 90 days a new model upgrade or eval-bar shift can move the unit economics by 20 to 40 percent. Funding such a project on an annual approval cycle means the budget is decided on data that is provably stale by month four, then defended for eight more months while the project either over-spends or under-invests against the world it lives in. Quarterly milestone funding is not a fancier annual budget. It is the only cadence that matches the rhythm of how AI work progresses.
This piece argues the case in full: why annual is the wrong instrument, what quarterly milestone funding looks like in practice, and what changes for finance, procurement, and engineering when an organization makes the switch. It is a spoke of the AI project economics manifesto, which establishes evaluation as the unit of account; this piece extends the thesis to the funding cadence that unit of account demands.
Why annual budgets misprice AI work
The annual software budget is one of the most stable instruments in enterprise finance. Engineering FTE, infra, license cost, a small contingency. Approve once, draw down monthly, true up quarterly, refresh next year. It works for CRUD software because the cost shape is smooth: build cost during the year, mostly-flat run cost after launch. The biggest discontinuity is a major version upgrade, and even that is typically pre-planned.
AI projects break most assumption inside that template.
First, the cost shape is not smooth. Eval engineering; the discipline that produces correctness in AI work; is roughly 30 to 40 percent of project cost and is concentrated in two pulses: an initial test set and harness build, and a series of regression triages triggered by model upgrades. The cost is lumpy and the lumps are not on the calendar.
Second, the cost is reactive to events that originate outside the buyer. When Anthropic ships Claude 4.7, when OpenAI rolls 5.1, when Google updates Gemini, the buyer’s AI project becomes two-to-four engineering weeks of re-evaluation work it did not budget. We covered this in detail in the AI agency tax piece; the tax compounds when the funding cycle cannot absorb these external events.
Third, the value side is also lumpy. AI value is not a steady drip of monthly cost savings. It is binary above an eval threshold and zero below it, then steps up again when the next eval threshold clears. Funding the project on a steady annual drip is funding it on the wrong shape of return.
Fourth; and this is the part finance teams underestimate; the kill criteria are different. CRUD software rarely deserves to be killed mid-year; it works or it does not, and you fix bugs. AI software produces a clear “kill, restart, or scale” signal most 90 days when you compare the eval curve and unit cost trajectory against the locked threshold. Annual funding throws away that signal, because the budget is committed regardless of what the 90-day data says.
Annual budgets misprice AI work the way fixed-bid pricing misprices it. We argued the contract analog of this case in the AI agency annual contract piece; this piece is the buyer-side version. The funding instrument and the contract instrument have to update together.
What quarterly milestone funding means
“Quarterly funding” is a phrase finance teams have heard before and often associate with quarterly reforecasts; the same annual budget with three slow conversations layered on top. That is not what we mean.
Quarterly milestone funding is structurally different. It has four properties:
-
Each quarter has a named eval-bar milestone. Not “build the agent.” Not “ship feature X.” A specific eval-set version with a specific weighted threshold. Q1’s milestone might be “agent passing eval-set v1 at >= 0.78 weighted score on the 240-prompt enterprise test set, with cost-per-completion below $0.04.” Q2’s is the next bar.
-
The next quarter’s funding is gated on the current quarter’s milestone. Pass: the next 90 days are funded. Fail: the next 90 days are funded only with a structured restart plan, or the project moves to wind-down.
-
A model-upgrade reserve is allocated separately. Roughly 8 to 12 percent of annualized AI ops spend is reserved for re-evaluation work triggered by frontier model upgrades. The reserve sits outside the quarterly milestone budget so that an upgrade does not contaminate the milestone signal.
-
Inference and observability are continuous lines, not quarterly approvals. They scale with usage and are forecasted as COGS, drawing down from a pool the CFO refreshes quarterly based on actual run-rate, not from quarterly milestone budgets.
The result is a funding instrument that has the rigor of an annual budget at the planning layer and the responsiveness of a milestone-based contract at the execution layer. It does not chase most model release; it does respond to the eval data most 90 days.
The four quarterly gates
Inside a year of AI work, a quarterly milestone budget has a natural four-gate structure. Each gate has its own kill criteria, its own evidence, and its own funding decision.
Q1; The “trajectory” gate. Q1 funds the test set construction, the eval harness build, and the first generation of the system. The Q1 milestone is a passable eval score at a defensible threshold, with the eval suite and harness operational. The kill criterion: no eval suite, no eval curve, or a trajectory clearly stuck below the threshold. Q1 is the cheapest quarter to kill in.
Q2; The “regression” gate. Q2 funds the first model upgrade re-eval, the first regression triage, and the move from initial threshold to “deployable threshold.” The Q2 milestone is “system holds at threshold for at least one full model upgrade cycle, with regressions triaged within the SLA.” Kill criterion: regression rate above tolerance and trajectory not closing the gap. Q2 is where the project either earns the right to deploy or earns a structured restart.
Q3; The “production” gate. Q3 funds the production deployment, observability instrumentation, and the maintenance retainer ramp. The Q3 milestone is “system in production with traffic at threshold, observability operating, retainer signed.” Kill criterion: no production traffic, no observability, no retainer. Q3 is where the project becomes either a deployed asset or a sunsetting prototype.
Q4; The “compounding” gate. Q4 funds the platform work that turns this project into the bootstrap for the next: eval library extraction, prompt registry consolidation, agent skill packaging, observability harness reuse. The Q4 milestone is “platform assets identified, packaged, and reused on at least one named successor project.” Kill criterion: nothing reusable, no successor identified. Q4 is the gate that distinguishes platform work from one-off feature work.
Across the four gates the cost curve is not smooth, but the funding decisions are predictable: most 90 days the question “do we fund the next quarter, restart it, or wind it down” is answered against named criteria, with named evidence, by named decision-makers.
How to structure the milestones
Three rules for milestone construction make the difference between a quarterly budget that produces signal and one that produces theater.
Rule 1: Each milestone references an eval set by name and version. “Pass eval-set-v1.3 at weighted score >= 0.82” is a real milestone. “Improve agent quality” is not. If finance cannot read the milestone and tell whether it has been hit, the milestone has not been written.
Rule 2: Each milestone names a unit cost. Eval threshold in isolation is not a milestone; it is a quality target without a cost ceiling. “Pass eval-set-v1.3 at >= 0.82 with cost-per-completion ≤ $0.04” is the full milestone. The two are co-optimized; trading 10 percent of eval score for 60 percent of cost is an explicit decision, not a hidden one.
Rule 3: Each quarter’s milestone references the prior quarter’s threshold. The Q2 milestone is not “0.85” in a vacuum; it is “improve from Q1’s locked 0.78 to >= 0.84 on the same or expanded eval set.” This forces the funding signal to be a delta against a baseline, not an absolute number that can be gamed by changing the test set.
When the milestones are constructed this way, the quarterly funding decision is mechanical: the eval report and unit cost dashboard sit next to the milestone, the numbers either hit or they do not, the next 90 days fund or do not. The decision still requires judgment; a near-miss that came with regression-triage work in flight is different from a flat trajectory; but the judgment is now applied to evidence, not to vibes.
What changes for finance, procurement, and engineering
Finance. The annual budget template gets retired for AI work and replaced with a four-gate quarterly template. The CFO approves a 12-month ceiling and four quarterly tranches, each gated on the prior quarter’s milestone. The model-upgrade reserve and the inference/observability pool are separated out as continuous lines. The forecasting motion changes: instead of a single annual variance review, the CFO runs a 90-day variance review, which is a smaller, faster conversation against named milestones. We unpack the broader budget shape in the AI project economics manifesto.
Procurement. SOW language shifts from feature-list deliverables with payment milestones at calendar dates to eval-threshold milestones with payment milestones tied to those thresholds. Quarterly tranches are written as continuation options, not termination clauses, with default-renew on milestone hit. Pass-through inference clauses are standard. Maintenance retainer language is drafted once and reused, with SLAs around eval freshness and regression triage.
Engineering. The roadmap is structured around four quarterly milestones, each of which has an eval-bar and unit-cost target. The eval suite is a first-class artifact, with a senior engineer owning it the way a senior owns the build pipeline. Sprint planning still happens, but the unit of governance is the quarter, not the sprint. Engineering leaders who insist on this discipline produce a clean cost curve and a clean eval curve. Engineering leaders who let the quarterly milestone slide into “we’ll catch up next quarter” produce the under-budgeted trajectory the annual budget hides.
The pattern across the three: the quarterly cadence is not a finance-only artifact. It is the operating cadence of the project, and it has to be coherent with the work it governs.
Common objections, answered
“Quarterly funding will produce short-termism.” The opposite is true. A 6-month payback rule produces short-termism; it kills exactly the projects that compound. We argued this in the payback paradox piece. Quarterly milestone funding combined with a Q4 compounding gate explicitly rewards platform work; it is the staged structure that protects compounding investments from the single-gate ROI rule.
“This is just stage-gate funding with new labels.” Stage-gate funding traditionally uses scope-based or feature-based gates: “did we ship X?” Quarterly milestone funding uses eval-bar and unit-cost gates: “did the system reach this threshold at this cost?” The distinction is the same one that separates feature-cost economics from evaluation-cost economics. The labels are different because the underlying work is different.
“Engineering will spend more time defending the milestone than building.” Only if the milestones are written as theater. Real milestones are mechanical to evaluate: run the eval suite, read the cost dashboard, compare to threshold. The defense is the report. The narrative is short. If engineering is spending weeks defending a milestone, the milestone is wrong, not the cadence.
“Our procurement cycle cannot move that fast.” This is the most honest objection. Many procurement organizations cannot turn around an SOW amendment in 90 days, which means quarterly tranches written as continuation options must be drafted into the original SOW so that no amendment is required to renew. The procurement work is upfront, in the original contract, not at most gate. Buyers who do this once benefit from it on most subsequent project; buyers who do not, pay the agency tax of mid-cycle renegotiation.
Frequently asked questions
Why is annual budgeting wrong for AI projects?
Because AI projects do not have a smooth cost curve or a smooth value curve. Eval engineering produces lumpy cost concentrated around test set construction and regression triage. Frontier model upgrades introduce 8 to 16 weeks per year of re-evaluation work that an annual budget cannot absorb. Eval-bar effects produce binary value steps, not steady monthly cost savings. Funding many of this on a single annual approval means the budget is committed against assumptions that are stale by month four and indefensible by month eight.
What is “quarterly milestone funding” exactly?
A funding instrument that approves a 12-month ceiling and releases four 90-day tranches, each gated on a named eval-bar and unit-cost milestone the prior quarter delivered. Pass: next quarter funds. Fail: structured restart or wind-down. A separate model-upgrade reserve and a continuous inference/observability pool sit outside the milestone budget. The result is annual planning rigor with quarterly execution responsiveness.
How is this different from quarterly business reviews?
QBRs are typically retrospective conversations with no funding consequence. Quarterly milestone funding is prospective: the next 90 days fund or do not based on whether named milestones hit. The QBR can still happen; the funding decision is now connected to evidence rather than narrative.
What are the four quarterly gates?
Q1 trajectory gate (eval suite operational, eval curve rising). Q2 regression gate (system holds at threshold across a model upgrade cycle). Q3 production gate (production traffic at threshold with observability and retainer). Q4 compounding gate (platform assets identified, packaged, reused on a named successor). Each has named criteria and named kill conditions.
How big should the model-upgrade reserve be?
Roughly 8 to 12 percent of annualized AI ops spend, sized to the planning horizon of three to five non-trivial frontier model upgrades per year. The reserve sits outside the quarterly milestone budget. If the model cadence slows the reserve rolls forward; if it accelerates the project is not under-budgeted.
What if a project misses a quarterly milestone narrowly?
Treat the near-miss as a structured restart, not a kill. The Q2 funding is released on a 30-day plan that names what is in flight and what threshold it will hit. The point of the gate is to force a deliberate decision, not to optimize for binary kills. A near-miss with regression-triage work in flight is different from a flat trajectory; the gate documents the difference.
How does this work for fixed-scope contracts?
It does not, cleanly. Fixed-scope contracts assume the scope is decided up front and defended; quarterly milestone funding assumes the scope adapts to what the eval data shows. Buyers running fixed-scope contracts against AI work pay the agency tax we decomposed in the coordination cost piece; paying for misalignment instead of for software. Quarterly milestone funding is structurally compatible with eval-threshold pricing, not with fixed-scope.
Should most AI project use quarterly milestone funding?
No. Productivity-substitution AI work; narrow automation of high-volume, low-stakes tasks with a clean human baseline; can stay in a simpler annual lane because the cost and value curves are flatter for that class. Capability-expanding, platform-building, and downside-risk projects belong on the quarterly cadence because their cost and value shapes require it.
How does this relate to staged payback?
Staged payback (90-day, 12-month, 24-month gates) is the long-horizon question; quarterly milestone funding is the short-horizon execution discipline that produces the evidence the staged payback gates evaluate. The two are joined at the eval suite: the eval data feeds both the quarterly funding decision and the staged payback assessment.
Key takeaways
- Annual budgets misprice AI work because the cost curve is lumpy, the value curve is binary, and frontier model upgrades introduce 8 to 16 weeks of unbudgeted re-evaluation per year.
- Quarterly milestone funding releases four 90-day tranches against named eval-bar and unit-cost milestones, with a separate model-upgrade reserve and continuous COGS pool.
- The four gates; Q1 trajectory, Q2 regression, Q3 production, Q4 compounding; each have named kill criteria and named evidence, not vibes.
- Each milestone references an eval set by name and version, names a unit cost ceiling, and is expressed as a delta against the prior quarter’s locked threshold.
- Finance retires the annual template, procurement writes quarterly tranches as continuation options in the original SOW, and engineering treats the eval suite as a first-class artifact.
- Quarterly funding does not produce short-termism; it is the structure that protects compounding investments from a 6-month payback rule.
- Productivity-substitution AI work can stay annual; capability-expanding and platform work must move quarterly.
The annual budget was an instrument for a software economy whose costs were smooth and whose value arrived steadily. AI work has neither shape. Funding it on the wrong cadence is the most expensive form of mispricing an organization can do; because the surprise compounds for eleven months before the next approval gate forces a conversation.
Arthur Wandzel