Inference cost in 2026 occupies the same line on the P&L that database cost occupied in 2014: a confused, fast-growing, under-attributed expense the CFO has not yet decided whether to call infra or COGS. The 2010s SaaS cohort answered that question; wrong at first, then right; and the answer reshaped how most mature SaaS company sized engagement margins, set pricing, and budgeted infrastructure for a decade. Companies budgeting AI inference in 2026 against a 2018 OpEx mental model are mispricing their gross margin by the same kind of structural error.
This piece argues a single thesis. Inference is COGS, not infra. Naming it that way changes how it gets attributed, budgeted, and constrained; and the companies that name it correctly first will hold a 10 to 15 point gross-margin advantage over the ones that do not for the next three years. The argument extends the AI project economics manifesto’s principle that inference is a pass-through line, not agency margin into the buyer’s own P&L.
The 2014 database analogy
In 2010, database cost was a rounding error on the AWS bill; one line in the infra stack, next to load balancers and S3. Nobody attributed it to a feature because the database serving the feed was the same database serving billing.
By 2014 the situation had changed. RDS bills had gone from 2 percent of revenue to 12 percent of revenue at companies whose product surface had not changed proportionally. SaaS had become read-heavy enough that database cost was a function of customer behavior, not engineering headcount. Two customers on the same plan could generate 50× the load. The line item that was infra in 2010 was, by 2014, behaving exactly like a unit-cost-of-goods-sold input.
Mature companies; Stripe, Shopify, Atlassian; answered first. They moved database cost from infra to cost of revenue, attributed it per feature, set per-call SLAs engineering had to defend, and stood up FinOps practices. Companies that did this in 2014 had 8 to 12 points more gross margin in 2018 than the ones that left database cost in infra.
Inference cost in 2026 looks identical. It is on the infra line. It is growing 5× faster than the rest of the infra budget. Per-feature attribution is patchy. Engineering teams do not have a cost-per-call number they defend. Finance teams have no FinOps practice auditing inference spend. The CFO has not decided whether inference is infra or COGS. That decision is the largest unforced gross-margin lever sitting in front of most AI-powered company right now.
Why inference is structurally COGS
The textbook test for whether a cost is COGS or OpEx is whether it varies with units of revenue. Inference cost passes that test on three structural properties.
Per-call, not per-deployment. A feature costing $0.012 per invocation costs $12,000 at one million calls and $1.2M at one hundred million. The cost is a direct function of customer use. Compare with a Postgres instance that costs $4,000 a month whether nobody or 100,000 customers query it. The first is COGS. The second is infra.
High dispersion per customer. Two customers paying the same subscription can generate 30× different inference cost based on how they use the AI features. Power users running 200 queries a day cost orders of magnitude more than light users. This is the dispersion pattern the 2014 SaaS cohort discovered with database cost, except sharper. A company that fails to attribute inference per customer is signing customers whose contribution margin is, in some cases, negative.
Volatile. A model upgrade can cut input-token cost 40 percent overnight. A prompt regression can spike output-token cost 3× in a release. A retrieval index change can double or halve the cost of most call going through it. Infra cost does not behave this way. COGS cost does.
A cost line that varies per call, per customer, and per release belongs in cost of revenue, attributed to the feature, with a unit cost the engineering team owns. Calling it infra hides exactly the information the company needs to govern it.
What changes when the CFO treats inference as COGS
The accounting reclassification is one line in the chart of accounts. The operational consequences are six.
Gross margin gets honest. A SaaS company at 78 percent gross margin that has been running inference cost as infra is, in many cases, running 64 to 70 percent once inference is reattributed. The number is uncomfortable but real. The company that surfaces it first stops over-investing in features that destroy margin; see decoding AI project TCO: 7 cost lines most CFOs miss for the line items finance teams routinely under-budget.
Pricing decisions become defensible. When inference is infra, “should we charge more for the AI features” gets answered by gut. When inference is COGS attributed per feature, the model is mechanical: cost-per-call × expected volume × desired contribution margin = price floor. Pricing decisions stop being theological.
Engineering priorities shift. Prompt optimization moves from slack-time work to highest-leverage engineering. A 30 percent token-count reduction on a feature serving ten million calls a month is a six-figure annual saving. The team starts treating it that way.
Customer cohorts get repriced. Bottom-quartile customers who consume the most inference for the lowest revenue become legible as the negative-margin segment they were usually going to be. Either pricing changes, plan structure changes, or the segment churns.
Contracts include cost-per-call SLAs. The vendor delivering AI work can no longer hide behind “model costs change” and pass through whatever bill the provider writes. The engagement carries a cost-per-call ceiling with a structured renegotiation clause when foundation-model prices materially shift; the discipline the case against fixed-price AI development contracts describes from the agency side.
FinOps becomes a function, not a hobby. Somebody owns inference spend the way somebody owns AWS spend at scale. The discipline that grew up around cloud spend in the 2018-to-2022 cycle ports directly to inference, except faster, because the dollar amounts are bigger and the cost surface is more volatile.
Per-feature attribution: the operational change
The first operational change is request tagging. Most inference call carries a feature_id, a customer_id, and a model_version tag at minimum. Inference spend rolls up by feature daily. Without this, most other change here is a wish.
The pattern is the same one that worked for database query attribution in the 2014 cohort: wrap most call with a context object carrying the feature attribution; aggregate at the gateway or observability layer; bill cost back to the feature owner. Three traps.
Embedding cost is its own line. A retrieval-augmented feature has two cost surfaces: embedding cost on indexing, and inference cost on generation. Tagging only the generation call hides 15 to 25 percent of feature cost. Both surfaces roll up under the same feature_id.
Eval cost is part of the COGS line. Running the eval suite that gates production traffic is not infra overhead; it is part of the cost of operating the feature, like database backups in classical SaaS. The hidden cost of AI evals: where 35 percent of project budget goes makes the broader case.
Multi-tenant features need customer-level attribution. Tagging by feature is the floor. Tagging by feature and customer is the ceiling; without it, the negative-margin-customer pattern stays invisible.
Cost-per-call SLAs: the engineering change
Once attribution is in place, most AI feature has a cost-per-call number. The next discipline is making that number a constraint engineering defends, not a metric engineering watches.
Set a target cost-per-call as part of the feature spec, the same way latency p95 is part of the spec. A feature that ships at $0.018 when the spec said $0.012 is not done; it is over-budget on a constraint the team agreed to. Remediation is the same as for any failed acceptance criterion: prompt simplification, model downgrade where evals allow, retrieval pruning, output-length caps, caching.
A real example. A 2026 support agent ships at $0.038 against a $0.012 target; failing 3×. Output-token cost is 70 percent of the bill because verbose response style is unconstrained. A prompt change capping responses at 300 tokens for routine queries drops cost to $0.014. Routing simpler queries to a cheaper model closes the gap. Cost lands at $0.011, on-spec, evals unchanged. Two engineer-weeks, $400,000 annualized saved. None of this happens if cost-per-call is not a constraint.
Corollary: features that cannot meet their cost-per-call target should not ship.
FinOps for inference: the org change
A company that treats inference as COGS needs a function whose job is to optimize inference spend the way a FinOps team optimizes AWS spend. Five activities: weekly inference cost reviews with feature owners surfacing the top three growing cost surfaces; a model-mix policy routing traffic across providers based on cost-per-quality with eval evidence; quarterly prompt-token audits on highest-volume features; cost-per-call trend tracking against locked SLAs with drift escalation; and a model-version registry so the cost impact of foundation-model migrations is visible in advance, not after the bill.
The team is small; one to three people in a 200-person engineering org; and the leverage is high. Taking 10 percent off a $4M annualized inference spend pays for the function ten times over in year one. By 2028 most AI-powered SaaS company will have one. By 2030 the ones that did not stand it up early will be 8 to 12 points behind on gross margin, exactly the way the 2014 database cohort split.
The migration timeline
The migration from inference-as-infra to inference-as-COGS is a 12 to 18 month journey done deliberately and a 36 to 48 month journey done by drift. Three phases.
Phase 1, months 0 to 4: attribution. Stand up request tagging, roll up cost by feature and customer, surface cost-per-call to engineering. Deliverable: a cost dashboard the CFO and VP Engineering both look at on Monday review. No optimization yet; the goal is visibility.
Phase 2, months 4 to 12: SLAs and FinOps. Set cost-per-call targets for the top 10 features by spend. Stand up the FinOps function. Run the first model-mix, prompt-token, and retrieval-index audits. Reclassify inference on the chart of accounts from infra to cost of revenue. Deliverable: honest gross margin and a function that owns it.
Phase 3, months 12 to 18: pricing and contracts. Reprice features whose contribution margin is negative. Restructure agency contracts to carry cost-per-call SLAs with structured renegotiation on foundation-model price moves; see the AI project cost curve: why year 2 spend should drop 40 percent for the trajectory the year-two contract should price against. Deliverable: a unit-economics model defensible to investors, customers, and procurement.
Companies that finish phase 3 by mid-2027 hold a margin advantage for the next three years that the rest of the cohort spends those three years trying to close.
Frequently asked questions
Why is inference better classified as COGS than infra?
Because inference cost varies per call, per customer, and per release; the textbook properties of cost of revenue, not infrastructure. A cost that scales with units of revenue belongs in COGS.
How is the 2014 database cost analogy similar?
In both cases an infrastructure line item became read-heavy enough to behave as cost of revenue, and the companies that reclassified it first saw 8 to 12 points of gross margin clarity the rest of the cohort took years to recover.
What does per-feature attribution require?
Request tagging at the inference gateway with feature_id, customer_id, and model_version at minimum. Roll up daily into a cost warehouse. Include embedding cost and eval inference cost under the same feature.
What is a reasonable cost-per-call SLA?
It depends on the feature. A simple classifier might have a $0.001 SLA; a multi-step agent with retrieval might have $0.04. The discipline of having one in the spec matters more than the number; treat it as an acceptance criterion the same as latency p95.
How big a FinOps function do I need?
One to three people for a 200-person engineering org. The leverage is high: saving 10 percent on a $4M inference bill pays for the function ten times over.
How does this interact with foundation-model price changes?
A model price drop flows through to the customer’s bill the way an AWS price drop flowed through in 2018; not absorbed as agency margin. The cost-per-call SLA carries a structured renegotiation clause: if the underlying model price changes by more than 15 percent, the SLA gets re-baselined.
Does this apply to internal AI teams or only agency engagements?
Both. Internal teams pay the same per-call cost on a different invoice. The reclassification, attribution, SLAs, and FinOps function are operating disciplines, not contracting disciplines.
What is the first thing to do this quarter?
Tag most inference call with feature_id and customer_id. Roll up cost by feature on a daily dashboard. Surface cost-per-call to engineering. Do nothing else until visibility is there.
How does this connect to the AI project economics manifesto?
The manifesto names inference as a pass-through line in the agency engagement. This piece extends that into the buyer’s own P&L: the same logic that prevents agency markup on inference requires the buyer to attribute inference as COGS internally.
Key takeaways
- Inference cost in 2026 sits on the infra line the way database cost sat on the infra line in 2014. The reclassification to COGS is the largest unforced gross-margin lever in front of AI-powered companies.
- Inference passes the COGS test on three properties: it varies per call, per customer, and per release. A cost with those properties does not belong in infra.
- The operational consequences of reclassification are honest gross margin, defensible pricing, prompt-optimization as priority engineering work, customer cohort repricing, cost-per-call SLAs in vendor contracts, and a FinOps function that owns the line.
- The migration is 12 to 18 months done deliberately and 36 to 48 months done by drift. Phase 1 is attribution. Phase 2 is SLAs and FinOps. Phase 3 is pricing and contracts.
- Companies that finish the migration by mid-2027 will hold an 8 to 12 point gross-margin advantage over the cohort that does not, identical to the 2014 database split. The window is open now.
Arthur Wandzel