Most AI project pays six taxes the original budget did not name. They are not scope creep. They are not vendor opportunism. They are recurring, predictable, monthly cost lines that arrive whether anyone budgeted for them or not; eval-construction tax, model-upgrade re-eval tax, regression-triage tax, prompt-registry curation tax, cost-spike alert tax, and post-launch on-call tax. Together they run $20,000 to $80,000 per month on a serious enterprise engagement, and they are the single most reliable cause of AI project budget surprise in 2026. This piece names each tax, sizes it, explains why it stays hidden, and shows how to budget for it upfront.
It is a spoke under the AI project economics manifesto, which argues that AI economics has shifted from feature cost to evaluation cost, and that finance teams need a different decomposition than the 2018 software template provides.
Why these costs stay hidden
The 2018 enterprise software budget template has six categories: engineering, infrastructure, data, security, support, and a thin contingency line. It was built for systems whose behavior was deterministic, whose performance was a function of code quality, and whose failure modes were exceptions rather than distribution shifts.
AI projects break the template in a specific way. Their behavior is non-deterministic. Their performance is a function of model choice, prompt quality, retrieval quality, and the underlying training data of the foundation model; none of which are visible in the budget template. Their failure modes are not exceptions; they are silent regressions in a continuously shifting input distribution.
The six taxes below are what fills the gap between “the system runs” and “the system is trustworthy enough to ship.” They were not in the 2018 template because the work did not exist as a discipline in 2018. They are not change orders or scope creep. They are line items the original budget did not have a category for.
The fix is mechanical: add the six lines to the budget template, size each one against the project’s profile, and assign owners. The cost of doing this is paperwork. The cost of not doing it is the six taxes surfacing as scope creep between months three and nine.
Tax 1: Eval-construction tax
What it is. The cost of building and maintaining the evaluation suite; test set construction, scoring rubric design, harness wiring into CI, threshold-locking, and the dashboards that visualize eval performance to engineering and the buyer. Detailed in the hidden cost of AI evals.
Typical cost. $8,000 to $20,000 per month on a serious enterprise engagement, weighted to the early phase of the project (months one through four) and tapering through steady state.
Why hidden. The 2018 template had a single “QA” line, sized for unit tests and integration tests, not for eval engineering. Eval suites are not unit tests; they require domain expertise to author, rubric design to score, and senior engineering to integrate. Vendors bidding against the legacy template roll the work into “engineering” and surface it later as a change order.
How to budget upfront. Add an “eval engineering” line to the budget at 10 to 15 percent of build cost in months one to four, dropping to 4 to 6 percent at steady state. Name an eval-engineering owner distinct from the feature engineering lead. Verify a draft test set exists by kickoff, not by month three.
Tax 2: Model-upgrade re-eval tax
What it is. Three to five times per year, frontier model providers; Anthropic, OpenAI, Google DeepMind; ship non-trivial upgrades. Each upgrade requires re-running the full eval suite on the new model, triaging the regressions (typically 5 to 15 percent of test cases shift behavior), adjusting prompts and retrieval to the new model’s quirks, and re-locking thresholds. Two to four engineering weeks per major upgrade.
Typical cost. $4,000 to $12,000 per month annualized; concentrated in 2 to 3 burst weeks four to five times per year, but sized monthly because the budget needs to absorb it without surprise.
Why hidden. Treated as “future work” by most budget that does not name it. Vendors do not surface model-upgrade re-eval as a separate line because it would force a conversation about post-launch retainer scope that vendors prefer to defer. Buyers do not request it as a line because they assume “the system works once it ships.”
How to budget upfront. Add a “model-upgrade re-eval reserve” of 6 to 10 percent of annualized retainer cost, with a 14-day SLA from major model release to re-eval report. Reference the upgrade cadence explicitly in the SOW: “the retainer covers up to four model-upgrade re-evals per year.”
Tax 3: Regression-triage tax
What it is. When the eval suite goes red; a score drops, a previously passing test case fails, a new test surfaces a failure; somebody decides why and what to do. Reading reasoning traces, comparing against the last green run, hypothesizing the cause (prompt change, retrieval change, model drift, content drift), validating the hypothesis, and deciding whether to fix forward, roll back, or accept the regression. Engineering judgment, not running a script.
Typical cost. $6,000 to $18,000 per month at steady state; 1 to 2 days per week of senior engineering time on a serious system. Higher during model-upgrade weeks; higher during major feature launches.
Why hidden. The closest thing in the 2018 template is “bug triage,” which is sized for deterministic systems where the cost per triage is small. AI regression triage is closer to incident response than to bug triage, and the cost per triage reflects the senior engineering judgment required.
How to budget upfront. Add a “regression triage” line at 10 to 14 percent of total project cost across a 12-month engagement. Name a triage process; who looks at red eval runs, on what cadence, with what authority to ship or block. A 48-hour SLA on triage is typical and reasonable; 7-day SLAs are how serious regressions go uninvestigated.
Tax 4: Prompt-registry curation tax
What it is. Production AI systems run on dozens to hundreds of prompts: system prompts, retrieval prompts, tool-use prompts, judge prompts, error-recovery prompts. Each is a piece of operational code that needs versioning, testing, and a curation cadence. Without a prompt registry, prompts drift across files, environments break silently, and a one-line change in a system prompt can cause a 15-percent eval drop nobody can attribute.
Typical cost. $2,000 to $6,000 per month; usually 0.5 to 1 day per week of senior engineering time at steady state, larger during major prompt revisions.
Why hidden. Prompts feel like configuration, not code, so they end up in random files, Notion pages, or environment variables. The cost of not having a registry is invisible until the third time someone re-implements a known-bad prompt because the previous version was not tracked.
How to budget upfront. Add a “prompt registry curation” line at 1 to 3 percent of build cost, ongoing at steady state. Use one of the prompt-management primitives; LangSmith Hub, Braintrust, Promptfoo registry, or a typed in-repo registry; and name an owner.
Tax 5: Cost-spike alert tax
What it is. AI inference cost is a stochastic variable, not a deterministic one. A prompt-injection attack, a misconfigured retry loop, a model that decides to emit a 12,000-token response, or a downstream service spamming requests can each multiply daily cost by 5 to 50x. Without alerting, the bill arrives at the decline of the month and the conversation is forensic. With alerting, the cost is capped at hours, not weeks.
Typical cost. $1,000 to $4,000 per month for tooling, alerting, on-call rotation coverage, and the engineering time to investigate spikes when they fire. The cost of not budgeting it: a single uncapped spike can run six figures.
Why hidden. No 2018 budget had a “cost-spike alert” line because deterministic infrastructure does not produce stochastic cost spikes. Cloud cost has been variable since 2010, but AI cost variance is meaningfully higher and the long-tail spike risk is meaningfully larger. Per-call cost on the new line is detailed in why AI inference cost is the new database cost line.
How to budget upfront. Add a “FinOps and cost alerting” line at 1 to 2 percent of inference spend. Wire daily and weekly cost alerts on per-feature, per-model, and per-tenant axes. Set a hard kill-switch threshold and document who can pull it.
Tax 6: Post-launch on-call tax
What it is. The cost of operating the AI system after launch; incident response when an eval threshold breaches in production, prompt-injection investigation, hallucination remediation when a customer surfaces a bad output, capacity coordination on demand spikes, and the slow accumulation of operational improvements that come from running the system rather than building it. Distinct from feature work; closer to SRE for non-deterministic systems.
Typical cost. $5,000 to $20,000 per month on a serious enterprise engagement, scaling with traffic and risk profile. Higher in regulated industries; lower for internal-facing systems.
Why hidden. The 2018 template had a “support” line sized for ticketing and break-fix on deterministic systems. Operating a non-deterministic system is a different discipline with different cadence and different escalation paths.
How to budget upfront. Add a “post-launch operations” line as a named retainer at 25 to 40 percent of build cost annualized. Reference the AI project cost curve; year-two operational spend should be 40 percent below year-one build spend, but it should not be zero.
How to budget the six taxes upfront
The six taxes total 25 to 45 percent of total project cost over a 12-month engagement. The exact percentage depends on the project’s risk profile, traffic, regulatory exposure, and how aggressively the system is upgraded across model release cycles. The variance across projects is meaningful. The variance across the question “is this in the budget at many” is binary.
Five concrete moves.
One. Add the six lines to the budget template; eval-construction, model-upgrade re-eval, regression triage, prompt registry, cost-spike alert, post-launch operations. Size each against the project’s profile, with explicit ranges rather than point estimates.
Two. Name an owner per line. Eval engineering owner. FinOps owner. Operations owner. Without named ownership, lines without owners fall to whoever is least busy that week, which is the most expensive form of staffing.
Threee. Reference the line items in the SOW and the maintenance retainer. Buyers should be able to read the contract and see the six taxes named. Vendors should be able to read the contract and see what they are committing to deliver against.
Four. Re-baseline quarterly. The taxes are not static; they shift with model release cadence, with feature launches, with traffic growth. A quarterly re-baseline against actuals catches drift before it becomes a budget surprise.
Five. Treat the lines as planned investment, not contingency. Contingency budgets get cut first when a line is over. Planned investment lines get re-baselined and defended. The taxes are predictable enough to plan for, not random enough to absorb in contingency.
The six-tax framework is decomposable, defensible, and auditable. It is also how serious AI engagements have been budgeted since the discipline of eval engineering matured. The buyers who got the bill anyway in 2024 and 2025 are the ones who used the 2018 template. The buyers who plan against the six-tax framework in 2026 are the ones who pay the same money as a planned line item rather than as a recurring dispute.
Frequently asked questions
Are these six taxes universal or do some projects skip some?
Many six show up on serious enterprise AI projects in 2026. Smaller projects (sub-$100K) can sometimes absorb tax 4 (prompt registry) and tax 5 (cost-spike alert) into general engineering, because the prompt count is small enough and the inference spend is bounded enough. The other four; eval-construction, model-upgrade re-eval, regression triage, post-launch on-call; are universal. A project that claims it does not pay one of those four is a project where someone else is paying it on the project’s behalf, usually invisibly.
How does this differ from the AI agency tax?
The AI agency tax is the 30 percent coordination overhead that appears on engagements running on legacy SOW templates. The six hidden taxes here are the work that overhead is paying for. Same money, two views. The agency tax view is “we are paying 30 percent more than the budget said.” The six-tax view is “we are paying for eval engineering, model upgrades, regression triage, prompt registry, cost alerting, and operations; the budget did not name them, but the work is real.”
Can a project skip the prompt registry tax by using LLM-managed prompts?
No. LLM-managed prompts (a prompt that another model authors or revises) shift the cost from prompt authoring to prompt evaluation, which lands on the eval-construction tax. The work does not disappear; it changes shape. The registry curation tax is the cost of versioning, testing, and rolling out prompt changes; that cost exists whether prompts are human-authored or model-authored.
Is the cost-spike alert tax a recurring monthly cost?
Yes, in two pieces. Recurring tooling cost (FinOps platform, cost dashboard, alerting infrastructure) is small but ongoing. Recurring engineering cost is the on-call rotation that investigates spikes when they fire. On a system processing meaningful production traffic, expect 1 to 4 spike investigations per quarter; each takes 4 to 12 senior engineering hours. The cost is small per-incident; the cost of not having the rotation is one uncapped spike landing as a six-figure surprise.
How does this relate to the eval cost decomposition?
Tax 1 (eval-construction) and tax 2 (model-upgrade re-eval) and tax 3 (regression triage) are three of the four sub-lines decomposed in the hidden cost of AI evals. The fourth sub-line in that piece (eval harness build) is part of tax 1 here. The framings are complementary: the eval cost decomposition is the deep dive into the largest single category; the six-tax framework is the wider set of unbudgeted lines a finance team needs to add.
What’s the right way to communicate these taxes to a CFO?
Show the six lines, with size ranges, and the cost of not budgeting each. CFOs respond to two things: predictability (a planned line beats a surprise change order) and decomposition (an auditable line beats an unattributed cost). The six-tax framework offers both. Reference decoding AI project TCO for the longer CFO-facing TCO breakdown.
Can the post-launch on-call tax be outsourced to the agency that built the system?
Yes, and it is the most common arrangement. The maintenance retainer covers post-launch operations as a named scope. The cost ratio of “build agency does ops” to “internal team does ops” runs roughly 1.2x to 1.5x; the agency premium is real but small, and the agency has system context the internal team would have to build from scratch. For year one, build-agency operations is usually the right call. For year two and beyond, the question is whether the internal team has built enough operational maturity to take ownership.
What’s the single highest-ROI move from this list?
Naming the eval-engineering owner. The single most reliable cause of AI project failure in 2026 is the absence of a named eval-engineering owner; the role does not have to be a separate hire, but the role has to be named. Without a named owner, eval engineering becomes “everyone’s job” which is “no one’s job” which is the project surfacing eval gaps as scope creep in month four.
Key takeaways
- Six recurring monthly taxes are absent from the 2018 software budget template but real on most 2026 AI project: eval-construction, model-upgrade re-eval, regression triage, prompt registry, cost-spike alert, post-launch operations.
- Together they run $20,000 to $80,000 per month on a serious enterprise engagement, totaling 25 to 45 percent of project cost over a 12-month engagement.
- The taxes are not scope creep or vendor opportunism. They are line items the original budget did not have a category for, surfacing later because the work is real and the discipline matured after the template was last updated.
- The fix is paperwork: add the six lines to the budget template, size each against the project’s profile, name an owner per line, reference them in the SOW and retainer, re-baseline quarterly.
- Treat the lines as planned investment, not contingency. Contingency gets cut; planned lines get defended. The six taxes are predictable enough to plan, not random enough to absorb.
The six taxes are not hidden adversarially. They are hidden categorically; by templates that do not name them, by RFPs that do not request them, by buyers who do not own them. The fix is naming.
Arthur Wandzel