Home About Who We Are Team Services Startups Businesses Enterprise Case Studies Blog Guides Contact Connect with Us
Back to Guides
Enterprise Software 13 min read

The AI cost-per-action framework: a unit-economics model that survives model upgrades

The AI cost-per-action framework: a unit-economics model that survives model upgrades

The standard ”$ per request” measure for AI features breaks the moment a model ships, a prompt changes, or a retrieval index gets re-tuned; which in 2026 is roughly most six weeks. Cost-per-action is a unit-economics primitive that survives those events because it is defined around the user-facing outcome, not the underlying call graph. This piece argues that finance and engineering teams should standardize on cost-per-action as the durable unit of AI economics, and shows how to decompose it into the six lines that drive it.

The framework is the operational layer underneath the AI project economics manifesto’s principle that evaluation is the unit of account. Manifesto principles need accounting primitives. Cost-per-action is the primitive.

Why ”$ per request” breaks

Cost-per-request divides the inference bill by API calls. Easy to compute. The problem is the number tracks the call graph, not the outcome the customer paid for. Three failures.

A “request” is not stable across releases. A 2026 support agent that handled a query in one LLM call last quarter handles it in three this quarter; intent classification, RAG, self-critique. The user action (one resolved query) is unchanged. Cost-per-request drops 65 percent because each call is shorter, while cost-per-action rises 20 percent because there are now three calls. Cost-per-request tells finance the system got cheaper. Cost-per-action tells finance the truth.

A “request” hides retrieval and embedding cost. A RAG call has three surfaces; embedding on indexing, retrieval on each query, generation on the answer. Counting only generation understates real cost by 15 to 30 percent and surfaces later as an unexplained gross-margin gap.

A “request” is not what the buyer paid for. A buyer of an AI sales assistant is not paying for API calls. They are paying for qualified leads, drafted emails, booked meetings. The unit the buyer values does not appear on the invoice.

Defining the action

An action, in this framework, is one user-facing AI feature invocation that produces a deliverable the customer paid for. The definition has three required properties.

Customer-facing. The action is named in language the buyer would recognize. “One support query resolved.” “One sales email drafted.” “One contract clause classified.” Not “one LLM call” or “one prompt invocation.” If the buyer would not see the action in their workflow, it is the wrong unit.

Outcome-bounded. The action has a completion condition. The action ends when the deliverable is produced; not when one specific call returns. A multi-step agent invocation that retrieves, generates, self-critiques, and revises is one action, not four. The completion condition should be unambiguous enough to count from logs.

Workload-stable. The action’s definition does not change when the implementation changes. If the engineering team refactors a one-call feature into a three-call agent next quarter, the action count is unchanged because the user did the same thing. The implementation moves; the unit does not.

Worked example. An AI sales assistant has three actions: “researched a lead” (one row appended to the CRM), “drafted an outreach email” (one row in the drafts table), “scheduled a meeting” (one accepted invitation). Each has a completion condition checkable from logs. Each is named in workflow language. Each is stable across implementations.

Defining actions surfaces a frequent finding: features advertised as “one feature” are three or four actions with different cost profiles. A support agent that does intent classification, retrieval, generation, and self-critique is four actions with cost ratios ranging 1× to 12×. Lumping them as “one query” hides the optimization surface.

The six-line decomposition

Once the action is defined, cost-per-action decomposes into six standardized lines. Standardizing the decomposition is the move that makes cost-per-action portable across teams, vendors, and audit cycles.

1. Input tokens. Tokens fed to most model call inside the action; system prompt, user message, retrieved context, prior turns. Quoted in dollars per action at average input length. For RAG features this is typically the largest line, because retrieved context dwarfs the user message.

2. Output tokens. Tokens generated by most model call inside the action. Quoted in dollars per action at the eval-set 95th-percentile output length, not the average; output-length tail is where surprises live.

3. Embedding cost amortized. Cost of embedding the retrieval corpus, divided by the number of actions over the re-index interval. A 10M-vector index re-embedded monthly that serves 5M actions/month carries roughly $0.0002 per action of amortized embedding cost.

4. Retrieval cost. Per-query cost of the vector store at the 95th percentile of retrieved-chunks. Many teams skip this assuming retrieval is “free” once the index exists; at scale the retrieval line is 5 to 15 percent of total cost-per-action and absolutely worth surfacing.

5. Eval cost amortized. Cost of running the eval suite that gates production traffic, divided by the actions served between eval runs. A weekly eval costing $400 against 500K actions/week is $0.0008 per action. This is the line most often missing from cost-per-action models; and the line that, once visible, justifies the eval-engineering budget.

6. Observability cost amortized. Cost of the trace storage, dashboard infrastructure, and alerting that operate the feature, divided by actions served. Typically 8 to 15 percent of total cost-per-action. Treating it as overhead rather than COGS produces under-instrumented systems whose first regression is invisible until a customer reports it; see why AI inference cost is the new database cost line for why these amortizations belong in cost of revenue rather than infra.

The six lines sum to the total cost-per-action. The standardization is what makes the number portable: a buyer comparing two vendors gets the same six lines from both, with the same definitions, and can audit each line independently. A vendor that refuses to break the cost into these six lines is hiding something.

Why this survives model upgrades

Three properties make cost-per-action survive the events that break cost-per-request.

Model-independent. Migrating from Claude 4.5 to Claude 4.7 changes input prices, output prices, latency, reasoning behavior. It does not change “one resolved support query.” Cost-per-action recomputes against new prices on the same denominator; YoY trend lines stay valid.

Implementation-independent. Refactoring a one-call feature into an agent loop changes the call graph but not the deliverable. Cost-per-action surfaces the refactor’s cost impact honestly without distorting the trend.

Decomposition isolates the moving line. When a model halves input-token price, line 1 drops and lines 2-6 are unchanged. When an upgrade increases output verbosity 30 percent, line 2 rises and the others stay flat. The decomposition tells the CFO exactly which line moved and why. Cost-per-request absorbs many of it into one opaque number.

This is the property that lets the framework outlive any specific 2026 model. Foundation prices will fall 60 to 80 percent over four years, agent frameworks will rewrite themselves, retrieval architectures will shift hybrid. Cost-per-action survives because it is defined against the customer’s deliverable, not against any of those moving parts.

Setting thresholds the CFO can defend

The unit by itself is not enough. The CFO needs cost-per-action thresholds, set at the level of the feature, that the engineering team has to defend against.

The mechanical way to set the threshold. Start with the desired contribution margin (e.g., 65 percent). Subtract from the customer’s per-action revenue contribution. The remainder is the cost-per-action ceiling. Lock it in the feature spec. Make it an acceptance criterion the same as latency p95 and eval threshold.

A worked example. An AI sales assistant priced at $200/seat/month, with an expected workload of 800 actions/seat/month, gives $0.25 per action of revenue contribution. Targeting 65 percent contribution margin sets the cost-per-action ceiling at $0.0875. The engineering team’s spec includes a $0.0875 cost-per-action target, and a feature shipping at $0.12 is over budget by 37 percent and must be optimized before launch. This conversation does not happen unless the threshold is in the spec.

Three rules for setting useful thresholds.

Set them at the action level, not the feature level. A feature with three actions has three thresholds, because the three actions have different cost profiles. Aggregating to a feature-level number averages away the surface where optimization happens.

Reset them after most model migration. When the model price moves, the threshold moves. The discipline is to recompute the contribution-margin math against the new model price and reset the threshold inside the same release cycle. Letting old thresholds drift produces zombie SLAs that no longer reflect the economics.

Tie them to procurement. Vendor contracts that deliver AI work should carry the threshold as a contractual commitment, with structured renegotiation when foundation-model prices materially shift; see the case against fixed-price AI development contracts for the contracting consequence.

Implementation: from spec to dashboard in 30 days

Week 1: action definition. Engineering and product agree on the action list per feature with completion conditions checkable from logs. Eight to twelve actions across a typical product surface. Output: an action manifest YAML in the repo.

Week 2: instrumentation. Wrap most inference call with a context carrying action_id and feature_id. Aggregate at the gateway. Roll the six lines up daily. Output: a cost warehouse returning cost-per-action by feature on demand.

Week 3: thresholds. Compute the contribution-margin math per action. Lock thresholds in the feature specs. Add to CI as numerical assertions. Output: most feature ships with a cost-per-action target.

Week 4: dashboard and review cadence. A weekly review surfacing actions trending toward threshold, breached actions, and actions whose cost shifted on a model migration. The motion runs on existing observability; only the discipline is new.

Common failure modes

Three patterns to avoid.

Defining actions too coarsely. “One agent invocation” is the same trap as “one request.” If the agent does four user-distinguishable things in one invocation, that is four actions, not one. Coarse action definitions hide exactly the optimization surface the framework exists to expose.

Skipping the amortized lines. Cost-per-action without the eval and observability lines understates real cost by 20 to 35 percent. The understatement surfaces as a margin surprise three quarters later. The discipline is to include the lines from day one even when they look small. The hidden cost of AI evals: where 35 percent of project budget goes is the broader story behind the amortized eval line.

Computing once and not maintaining. A cost-per-action number computed at launch and rarely recomputed is wrong by month four. The framework only works if the rollup runs daily, the thresholds get reset on model migrations, and the weekly review surfaces drift. Static cost-per-action numbers are decorative.

Frequently asked questions

What is cost-per-action and how is it different from cost-per-request?

Cost-per-action is the total cost of one user-facing AI feature invocation that produces a deliverable the customer paid for. Cost-per-request counts API calls. The two diverge whenever an action calls multiple times, when retrieval and embedding costs are non-trivial, or when the implementation changes without the deliverable changing.

How do I define an action in my product?

An action must be customer-facing, outcome-bounded, and workload-stable. Name it in language the buyer would recognize. Give it an unambiguous completion condition checkable from logs. Make sure refactoring the implementation does not change the count. Most products have eight to twelve actions across the surface.

What are the six lines in the decomposition?

Input tokens, output tokens, amortized embedding cost, retrieval cost, amortized eval cost, and amortized observability cost. The standardization makes cost-per-action portable across teams and vendors. A vendor that refuses to break out these six lines is hiding something.

Why do amortized eval and observability costs belong in cost-per-action?

Because both are structurally required to operate the feature in production, not optional overhead. A feature whose eval suite stops running is not a working feature. A feature whose observability is dark is one regression away from a customer-reported bug. Treating them as overhead produces under-instrumented systems.

How does this survive model upgrades?

The action definition does not change when the model changes, the implementation changes, or the prices change. The six-line decomposition isolates which line moves under each event. Year-over-year trend lines stay valid because the denominator is stable.

How do I set a cost-per-action threshold?

Start with desired contribution margin. Subtract from per-action revenue contribution. The remainder is the cost-per-action ceiling. Lock it in the feature spec. Make it an acceptance criterion the same as latency p95 and eval pass rate.

Should the cost-per-action threshold change after a model migration?

Yes. When the underlying model price moves, the contribution-margin math shifts and the threshold needs to be recomputed inside the same release cycle. Letting old thresholds drift produces zombie SLAs that no longer reflect the economics.

Does this apply to internal AI teams or only agency engagements?

Both. Internal teams use the same framework with the same six lines. The threshold conversation moves from procurement to product-finance, but the unit and the decomposition are identical.

How long does implementation take?

Roughly 30 days for a small product. Week 1: action definition. Week 2: instrumentation. Week 3: thresholds. Week 4: dashboard and review cadence. The motion runs on existing observability; only the discipline is new.

How does this connect to the AI project economics manifesto?

The manifesto names evaluation as the unit of account at the project level. Cost-per-action is the operational primitive at the feature level; the framework that makes manifesto-level principles enforceable in the daily P&L.

Key takeaways

  • Cost-per-request is a brittle metric because it tracks the call graph, not the customer’s deliverable. Cost-per-action is the unit-economics primitive that survives model upgrades, implementation refactors, and prompt rewrites.
  • An action is customer-facing, outcome-bounded, and workload-stable. Defined right, it produces a count that does not change when the implementation does.
  • Cost-per-action decomposes into six standardized lines: input tokens, output tokens, amortized embedding, retrieval, amortized eval, amortized observability. The standardization makes the number portable across vendors and audit cycles.
  • The framework survives model upgrades because the action definition is independent of the underlying model and the six-line decomposition isolates which line moves under each event.
  • Set thresholds at the action level, reset them after model migrations, tie them to procurement contracts. A feature without a cost-per-action threshold in the spec is mispriced by default.

Last Updated: Jun 9, 2026

AW

Arthur Wandzel

SFAI Labs helps companies build AI-powered products that work. We focus on practical solutions, not hype.

See how companies like yours are using AI

  • AI strategy aligned to business outcomes
  • From proof-of-concept to production in weeks
  • Trusted by enterprise teams across industries
Get in Touch →
No commitment · Free consultation

Related articles