AI projects have a distinctive multi-year cost curve that legacy software project budgeting does not capture. Year one is expensive; build, eval discipline, observability stack, prompt registry, senior judgment compounding into the system architecture. Year two should drop 40 to 60 percent because most of year-one’s spend was foundational rather than recurring: the eval suite is built, the observability stack is installed, the senior architectural decisions are locked. What stays is regression triage, model-upgrade re-evals, retainer maintenance, and inference. What may rise is inference if usage grows, but on a per-unit basis it typically falls. CFOs who budget year two as “year one minus 10 percent” overpay; CFOs who budget it as “year one minus 50 percent and call it done” underpay and lose the system. This piece decomposes the cost curve and gives a defensible year-two budget for any year-one AI project.
This is a spoke under the AI project economics manifesto. The manifesto argues that AI economics requires a multi-year framing rather than a one-time-build framing. The cost curve is that framing operationalized; what budget shape an AI project has across years one, two, and three.
The shape of the curve
A representative $500K year-one AI project, run with discipline, decomposes roughly as follows across three years.
| Category | Year 1 | Year 2 | Year 3 |
|---|---|---|---|
| Engineering build | $260K | $20K | $20K |
| Eval suite construction | $60K | $15K | $10K |
| Observability stack setup | $25K | $5K | $5K |
| Prompt registry build | $15K | $5K | $5K |
| Regression triage | $50K | $50K | $55K |
| Model-upgrade re-evals | $30K | $40K | $45K |
| Retainer & maintenance | $40K | $80K | $80K |
| Inference (scaled with usage) | $20K | $35K | $50K |
| Total | $500K | $250K | $270K |
Year two is roughly 50 percent of year one. Year three rises slightly from year two because inference scales with usage and model-upgrade work compounds, but does not return to year one because the build and foundational lines are largely locked.
The curve is universal across well-run mid-tier AI projects in 2026, with variance in the magnitude of the year-one to year-two drop driven by how much of year one was foundational (more foundational = bigger drop) versus how much was usage-driven (more usage = smaller drop, because usage continues into year two).
What follows: the lines that fall, the lines that stay, the lines that may rise, and how to compose the year-two budget defensibly.
What falls 60 to 80 percent
Engineering build. Year-one build is the bulk of the project; discovery, feature implementation, integration, hardening. Year two does not have a build. It has feature extensions and minor system evolution, typically 8 to 12 percent of year-one build cost. The 90 percent reduction is the largest single line drop in the curve.
Eval suite construction. Year one builds the test set (200 to 2000 inputs), the harness, the scoring rubrics, the threshold-locking process. Year two adds new test cases as the system evolves and the eval bar progresses, but does not rebuild. Typical year-two cost is 20 to 25 percent of year one; the maintenance and incremental-expansion fraction of the original construction cost.
Observability stack setup. Year one installs the observability tooling, instruments the system, configures dashboards, sets up alerts, integrates with on-call. Year two pays subscription fees and small incremental instrumentation work. Typical year-two cost is 20 percent of year one; most of the spend was setup labor rather than recurring infrastructure.
Prompt registry build. Year one stands up the registry, defines the schema, integrates with deployment, builds the audit trail. Year two adds entries and refactors fragments, no rebuild. Typical year-two cost is 30 percent of year one.
The pattern: year-one foundational work drops 70 to 90 percent in year two because the foundation is built once. CFOs who budget year two without recognizing the foundational vs recurring split overpay 30 to 50 percent on year-two budgets.
What stays roughly flat
Regression triage. Engineering judgment on red eval runs is a continuous activity that does not decrease with system maturity; it grows slightly because the eval suite grows, the system gets more capable, and more sophisticated regressions emerge. Year-two triage is typically 100 to 110 percent of year-one triage on a normalized basis (some adjustment up because suite size grew, some down because the team got faster at triage).
Inference (per-unit cost). Per-request inference cost typically falls 30 to 50 percent year over year as token prices drop. But total inference cost depends on usage growth; on a successful product, total inference goes up because usage 2x or 3x outpaces per-unit decay. The flat-ish line shown in the table assumes moderate growth (1.5 to 2x) offsetting moderate decay (30 to 40 percent).
Project management overhead. A retainer engagement still has weekly check-ins, quarterly reviews, retainer governance. The PM line is smaller in year two than year one in absolute terms, but as a fraction of total cost it stays roughly flat at 7 to 10 percent.
The pattern: the categories that scale with operations (continuous quality enforcement, governance) stay roughly flat year over year. The categories that scale with one-time builds (foundational construction) collapse.
What may rise
Model-upgrade re-evaluations. Frontier model providers ship 3 to 5 non-trivial upgrades per year. Each upgrade triggers a 2 to 4 week re-eval cycle. Year two often has more re-evals than year one because year one’s build absorbed two of them silently into engineering hours. Year-two re-eval cost is typically 130 to 150 percent of year-one re-eval cost as a standalone line.
Inference at successful scale. A product that succeeds (3 to 10x usage growth) sees inference rise materially in year two even with per-unit decay. A product that fails sees inference shrink. The shape of the inference line in year two is the most reliable read of product success; flat or shrinking inference at a year-two-supposed-to-be-growing product is a leading indicator of trouble.
Retainer scope. Some teams expand retainer scope in year two as the system matures and the buyer asks for more capability rather than just maintenance. Retainer line in year two is typically 200 percent of year one (because year one’s retainer was only 90 days) but should be roughly 90 to 100 percent of full-year retainer terms.
The pattern: the categories tied to external substrate (frontier model release cadence) and to product success (usage scale) can rise in year two. CFOs who budget year two without modeling these two upward pressures undershoot when products succeed; and overcount when products stall.
Year-two budget formula
A defensible year-two budget formula for any year-one AI project:
Year-2 budget = (Year-1 build × 12%) + (Year-1 eval × 25%) + (Year-1 observability × 20%) + (Year-1 prompt registry × 30%) + (Year-1 regression triage × 105%) + (Year-1 model-upgrade re-eval × 140%) + (Full-year retainer at typical retainer rate) + (Year-1 inference × usage growth ÷ token decay).
Run on a $500K year-one project, the formula produces approximately $250K; exactly the year-two figure in the table above, plus or minus 10 percent depending on the specific cost-line shapes.
A simpler heuristic that works at the planning stage: Year-2 budget = Year-1 build × 50% to 60%. The heuristic is correct on average and wrong in specific cases (high-usage-growth products see year two closer to 65 percent of year one; stable-usage products see it closer to 40 percent), but it is the right starting estimate for finance review of a multi-year AI engagement.
A third framing that connects to the manifesto’s economics: Year-2 budget = Year-1 budget minus the foundational-work fraction. A typical mid-tier $500K project has $200K to $250K of foundational work in year one (build + eval suite construction + observability setup + prompt registry build + first-time discovery). Year two does not pay that. Year two pays the recurring lines (triage + retainer + inference + re-evals + small incremental capability work). The recurring lines on the same project are $200K to $250K; exactly the year-two budget the formula produces.
Implications for contract structure
Three direct implications for how AI project contracts should be written.
Separate the build SOW from the year-one retainer. Year one is genuinely two engagements; a build engagement (high cost, foundational) and a retainer engagement (steady-state cost, ongoing). Bundling them into a single SOW makes the year-two budget conversation harder because the build cost is mixed with retainer cost in the buyer’s mind. Separating them lets the buyer see clearly what is foundational and will not recur, versus what is steady-state and will.
Sign the year-one retainer at project kickoff, not at launch. Retainer terms are easier to negotiate before the team has leverage. Buyers who push retainer to “after we see how the build goes” pay 15 to 30 percent more for the same retainer terms because they negotiate from a weaker position. The retainer paradox piece covers retainer pricing structure.
Index the year-two retainer to eval-suite metrics, not to hours. Eval-suite progression; number of test cases, threshold movement, regression remediation rate; is the clearest measure of retainer value. Indexing retainer to hours produces the wrong incentives (more hours, regardless of outcome). Indexing to eval-suite metrics produces the right incentives (better quality, regardless of effort).
The cost curve also implies that buyers should plan for the year-two budget at year-one kickoff, not at year-one launch. The reason: if the buyer waits until year-one launch to plan year-two, the team’s engagement winds down between launch and the year-two start, eval discipline degrades during the gap, and year-two work begins with regression to clean up. Continuous engagement at the lower year-two rate is structurally cheaper than gap-and-restart.
The year-three baseline
Year three is roughly equal to year two on a well-run AI project, with two adjustments. First, inference continues to scale with usage growth, often outpacing per-unit decay if the product has succeeded. Second, model-upgrade work compounds because each year’s frontier upgrades stack. A product that has been in production for three years has weathered roughly 9 to 15 frontier model upgrades, and the discipline of re-eval after each is now a fixed feature of the system’s operating cost.
The year-three steady-state baseline is roughly 40 to 50 percent of year-one cost, which is the level the project plateau at indefinitely barring major scope changes. CFOs planning AI projects on a 3- to 5-year planning horizon should use the year-three baseline as the long-run cost; not the year-one cost.
The compounding effect that the NPV trap piece describes; capability premium, optionality value; extends across the cost curve as well. A team that has shipped its first AI project has lower year-one foundational costs on the second project, because the eval discipline, observability stack, and senior judgment carry over. The cost curve therefore compounds across a portfolio of projects, not just within a single project; the team’s third AI project has a year-one budget that is 70 to 80 percent of the first project’s year-one budget at the same scope.
Frequently asked questions
Why does year-2 spend drop 40 to 60 percent specifically?
Because foundational work; engineering build, eval suite construction, observability stack setup, prompt registry build; collapses to 20 to 30 percent of year-one cost in year two. These categories collectively are 50 to 60 percent of year-one cost in a representative mid-tier project. Subtracting the foundational drop from year one produces the 40 to 60 percent decline.
What happens if usage grows faster than expected?
Inference and inference-scaling lines (model-upgrade re-evals on a larger eval suite, regression triage on more user-facing surface area) rise materially. A product seeing 5x or more usage growth in year two can see year-two cost approach 70 percent of year one rather than 50 percent. The growth case is the most common reason year-two budgets miss high.
What happens if the system is barely used?
Year-two cost drops further, to 30 to 40 percent of year one. Inference shrinks, retainer can be stepped down, eval suite expansion slows. The low-use case is often missed in the wrong direction; buyers who provisioned year-two retainer at full year-one-equivalent rate end up paying for capacity they do not use. Right-sizing the retainer to actual production load is a year-two optimization that frees 10 to 20 percent of budget.
Should year-two retainer be the same monthly rate as year-one retainer?
Typically yes for the first 90 days of year two, with a re-base at the 90-day mark based on actual production load and incident rate. Year-one retainer (after the build’s first 90-day post-launch period) is often the right size for year two because the system is still maturing. A retainer that drops too quickly leaves the team thin during the year-two regression cycle.
How does this compare to traditional software cost curves?
Traditional enterprise software has a flatter curve; year-two maintenance is typically 15 to 20 percent of year-one build. AI projects’ year-two is 40 to 60 percent of year-one because the recurring lines (triage, re-evals, retainer, inference) are larger as a fraction of total cost than they are in traditional software. The curve shape is genuinely different, not just a scaled version of the same shape.
What’s the right way to communicate the curve to a CFO?
Present a 3-year decomposed budget at year-one kickoff. Show the foundational lines collapsing to 20 to 30 percent of year-one in year two. Show the recurring lines staying roughly flat. Show the variable lines (inference) tied to a usage projection. The CFO sees the multi-year shape clearly and budgets accordingly. Withholding year two until year-one launch is the most common failure mode.
How does the cost curve interact with the seven TCO lines?
The seven TCO lines are the categories the curve traces across years. Test set construction, prompt registry, observability; these are foundational lines that collapse year-over-year. Regression triage, inference, model-upgrade re-eval, retainer; these are recurring lines that stay flat or rise. The seven lines are the “what”; the cost curve is the “when.”
What’s the year-three baseline for a $500K year-1 project?
Approximately $230K to $270K, depending on usage growth and frontier upgrade cadence. This is the steady-state long-run cost the project will run at indefinitely barring major scope changes. CFOs planning on 3- to 5-year horizons should use this as the long-run baseline, not the year-one figure.
Does this apply to in-house AI teams or only agency engagements?
Both. The curve shape reflects the work that needs to happen, not the entity doing it. An in-house team has different cost basis (burdened FTE rather than agency rates) but the same shape; foundational work in year one collapses, recurring work in year two stays flat. The 40 to 60 percent year-two drop holds for any well-run AI project regardless of vendor structure.
Key takeaways
- AI projects have a distinctive cost curve: year-one is heavy because of foundational work; year-two should drop 40 to 60 percent because the foundation does not recur.
- Foundational lines (build, eval suite construction, observability setup, prompt registry build) collapse to 20 to 30 percent of year-one cost in year two.
- Recurring lines (regression triage, inference, model-upgrade re-eval, retainer) stay roughly flat or grow modestly with usage.
- Year-three baseline is approximately equal to year-two and is the steady-state long-run cost. CFOs on 3- to 5-year horizons should plan against this baseline rather than year-one.
- Contract structure should separate the build SOW from the year-one retainer, sign retainer at kickoff rather than launch, and index year-two retainer to eval-suite metrics rather than hours.
The cost curve is universal across well-run AI projects in 2026. CFOs and portfolio owners who budget against it produce defensible multi-year forecasts; those who treat year two as “year one minus 10 percent” or “year one minus 50 percent and we are done” miss in opposite directions and lose either money or the system. The 40 to 60 percent year-two drop is the right planning assumption.
Arthur Wandzel