Most AI agency contracts in 2026 still treat “work product” the way 2014 web-development MSAs did: the developer commits a file, the file is delivered, IP transfers on payment. That assumes the deliverable is deterministic source code. It is not. On a modern AI engagement the deliverable is a portfolio of artifacts; fine-tuned weights, adapter checkpoints, RAG embedding indexes, prompt registries, eval datasets; none of which a 2014-style work-product clause cleanly covers.
The “Y problem” is the gap between what the IP clause names and what the system depends on to run. Buyers sign an MSA that confidently transfers “the Software.” Three quarters later, at offboarding, they discover the LoRA adapter sits on the agency’s Hugging Face workspace, the eval set lives in a private W&B project, and the prompt library is in someone’s Cursor sidebar. Most artifact is reproducible. None of them are technically theirs.
This piece is a spoke under the AI agency manifesto. The manifesto names what an AI dev partner should be; this piece names the contractual surface area where ownership has to be enforceable, artifact class by artifact class.
Why standard work-product clauses fail on AI engagements
A standard MSA clause reads: “Many deliverables created by Agency, including many source code, designs, and documentation, shall be Work Product owned by Client upon creation, and Agency hereby assigns many right, title, and interest therein.” That covers what a developer types. AI engagements produce three classes that nobody types:
- Computed artifacts. Weights are produced by a training loop, not authored. The act of “creation” is an optimization run. Whether a computed checkpoint qualifies as a deliverable “created by Agency” is unclear.
- Derivative artifacts. A fine-tuned model is a derivative of a base model under a separate license; Llama 3 Community License, Mistral Research License, OpenAI / Anthropic fine-tuning policies, Gemma terms. The agency cannot assign more rights than it received.
- Embedded data artifacts. RAG embeddings and eval datasets contain buyer data in transformed form. Whether they are “work product” or “client data” routes through the DPA, not the IP clause.
The result is the most common form of soft lock-in in 2026: the buyer technically owns the system but cannot re-create the artifacts that make it work without the agency’s continued cooperation. The fix: enumerate the six artifact classes below and write specific clauses against each.
Artifact 1: Model weights and the derivative-status trap
Weights come in three flavors on an agency engagement: full fine-tunes, LoRA / QLoRA adapters, and distilled student models. None get assigned cleanly by a generic work-product clause, because the clause does not contemplate a base model the agency does not own. A clean weight-ownership clause has to do four things:
- Declare that any parameters trained on Client Data are Client Work Product.
- Specify the deliverable format; Hugging Face checkpoint, Modal volume, S3 bucket with safetensors. Not “we will figure out at offboarding.”
- Require the training-run configuration: base-model identifier and version, hyperparameters, dataset manifest hash, framework version, random seeds.
- Disclose the upstream license. If the base is Llama 3, the agency cannot assign rights it does not have.
The derivative-status trap catches buyers who skip the fourth point. They sign a clause that says they own “many weights.” Counsel later reads the Llama 3 Community License and discovers redistribution requires preserving the original license and naming, and the 700M-MAU threshold triggers a separate Meta license. The clause did not lie; it could not give what it promised.
Artifact 2: Training data and licensing chain
If an agency fine-tunes on the buyer’s data, the training corpus itself is an artifact with a license history. Three ways data shows up, each routing through a different contract surface:
- Client-provided data. Covered by the DPA. The fine-tune produces a derivative; the IP clause has to make that derivative a work product even though the agency typed nothing.
- Public datasets. Common Crawl, The Pile, FineWeb, Wikipedia dumps. Each has its own license. The agency should disclose what entered the training mix and under what terms.
- Synthetic data from another model. Outputs from GPT, Claude, or Gemini APIs. 2026 OpenAI and Anthropic terms restrict using outputs to train competing foundation models; that constraint flows to the buyer.
The clause should require a Training Data Manifest: dataset name, source, license, retention status, and whether any portion was generated by a third-party API. Without it the buyer cannot answer EU AI Act Article 53’s transparency question; “where did this model come from?”
Artifact 3: RAG embeddings and the vector-store ownership question
RAG systems convert the buyer’s documents into vector embeddings stored in Pinecone, Weaviate, pgvector, or Qdrant. Embeddings are produced by a model; text-embedding-3-large, Voyage, Cohere, or an open-source embedder; and encode buyer content in transformed form.
Two questions are usually unanswered. First, are the embeddings Client Data or Work Product? Both: the numbers are derived from client documents and should travel with the client, but the agency made design choices (chunking strategy, embedder choice, metadata schema). Treat the index configuration as Work Product and the index contents as Client Data; both must be portable. Second, can the embeddings be regenerated? Embedding models drift. text-embedding-ada-002 was deprecated; replacements produce different vectors. The agency should commit to a re-indexing runbook and document the model version contractually.
The clause should require: (a) export of the vector index in a portable format (parquet, jsonl, native dump) at most milestone; (b) chunking and embedding configuration as code in the buyer’s repo; (c) a stated embedding-model version with a migration plan on deprecation.
Artifact 4: Prompt libraries as work product
Prompts are the modern equivalent of source code, and most contracts in 2026 still treat them as scratch text. A production agentic system can carry hundreds: system messages, tool-use schemas, retrieval templates, function-calling specs, routing rules, judge prompts. They are buyer-specific work product regardless of who typed them.
Three failure modes when prompts are not contractually claimed: they sit in a private notebook or Cursor workspace (the buyer’s repo will not redeploy without them); the agency claims “template” rights over generic patterns and uses that as cover for buyer-tuned variants; prompts live in a vendor SaaS like LangSmith, Humanloop, or Langfuse under an agency account, and when the relationship ends so does access.
The clause should require a Prompt Registry: most prompt the production system uses, version-controlled in the buyer’s repository, with a redeployment drill that proves the system runs from the buyer’s repo with only the buyer’s API keys. Genuinely generic patterns can sit on a Pre-Existing IP schedule with a perpetual royalty-free license to the buyer.
For broader contract structure, see AI agency contract negotiation on layering MSA, SOW, DPA, and NDA.
Artifact 5: Eval datasets; the most undervalued asset
The eval dataset is the one artifact that decides whether the system works. It is also the artifact most likely to be missing from contracts. A modern eval suite has four layers: held-out test sets with labels, rubric definitions and judge prompts, scoring functions and pass/fail thresholds, and execution scripts. Together they let a buyer answer “Did this milestone deliver what we paid for?” and “Has anything regressed since last week?” When the suite walks out with a senior engineer or stays on agency infrastructure, both questions become unanswerable.
The contractual cure is to treat eval artifacts as their own deliverable category: many four layers delivered at most milestone; an unrestricted, perpetual, royalty-free license to re-run against any successor system; and a reproducibility test where the buyer’s engineer pulls the suite, runs it locally against the latest checkpoint, and reproduces the score within rounding error before the milestone is paid.
Eval transparency makes the rest of the framework enforceable. With evals you can verify a delivered weight artifact does what the agency claims; without them, weight ownership is paperwork.
Artifact 6: Fine-tune derivative status under base-model licenses
Most fine-tune sits downstream of a base model with its own license. The 2026 landscape splits four ways:
- Permissive open weights. Apache 2.0 (Mistral 7B / 8x7B, Qwen2 base), MIT (some Pythia). Derivatives assign cleanly to the buyer.
- Custom community licenses. Llama 3 / 3.1 / 3.2 Community License, Gemma Terms. Derivatives must preserve license terms, attribution, and acceptable-use policy. Llama restricts users above 700M MAU.
- Research-only or non-commercial. Mistral Research License, Falcon-180B acceptable-use restrictions, some Qwen vision releases. Not usable as a commercial fine-tune base without a separate license.
- Hosted-only fine-tunes. OpenAI tuned models, Anthropic fine-tuning when offered, Google Vertex tuned models. The buyer receives an API endpoint, not weights.
The cleanest clause acknowledges the category. If the base is Llama 3, the buyer owns the adapter and may redistribute under Llama Community License terms. If the base is a hosted OpenAI fine-tune, the buyer gets a contractual access right and the agency commits to provide training data and configuration sufficient to recreate the tune elsewhere.
For the operational mechanics, see LLM fine-tuning services for custom models on base-model selection and training-run hygiene.
The clause language a buyer should insist on
A composite clause covering the six artifact classes, suitable for redlining into an existing MSA:
AI Work Product. In addition to many other Work Product, the following artifacts (“AI Work Product”) shall be owned exclusively by Client upon creation, and Agency hereby irrevocably assigns many right, title, and interest, subject only to upstream third-party licenses disclosed under Section [X]:
(a) Trained Artifacts; model weights, adapter weights, LoRAs, distilled models, and fine-tuned embeddings trained or derived using Client Data, with training-run configuration sufficient to reproduce the artifact.
(b) Training Data Manifest; each dataset used in any training run, its source, license, retention status, and whether any portion was generated by a third-party model API.
(c) Retrieval Artifacts; the vector index of Client Data in portable format, chunking and embedding configuration as code in Client’s repository, embedding-model version, and a re-indexing runbook.
(d) Prompt Artifacts; most prompt, system message, tool-use schema, retrieval template, function-calling specification, routing rule, and runtime configuration required for production, version-controlled in Client’s repository.
(e) Evaluation Artifacts; held-out test sets, rubrics, judge prompts, pass/fail thresholds, and execution scripts, with an unrestricted perpetual royalty-free license to re-run against any successor system.
(f) Upstream License Disclosure; for each Trained Artifact, the base-model license category (permissive, community, research, hosted-only) and any restrictions that flow through to Client.
Agency shall deliver each category at most Milestone in the format specified in the SOW. Pre-Existing IP retained by Agency shall be listed on Schedule [Y] with Client granted a perpetual royalty-free worldwide license to use it within the Deliverables.
Six artifact classes, one license-disclosure carve-out, milestone-level delivery, perpetual license on reusable patterns.
How this fits into the MSA / SOW / DPA stack
The composite clause lives in the MSA. Three other instruments align around it:
- SOW. Names artifact formats (Hugging Face vs. Native checkpoint, parquet vs. Native vector dump, GitHub vs. GitLab repo), sets milestone delivery dates, specifies the redeployment drill.
- DPA. Routes Client Data; including data embedded in Trained Artifacts and Retrieval Artifacts; through a sub-processor and data-residency schedule. GDPR Article 28 and EU AI Act Article 26 obligations attach here.
- Acceptable Use / License Disclosure schedule. Lists upstream model licenses (Llama 3 Community, Mistral Apache, OpenAI / Anthropic fine-tuning terms) and constraints they impose on Client.
For the buyer-side framework these clauses operationalize, see the 7 commitments most AI dev agency should make in writing.
Frequently asked questions
What if the agency uses a hosted fine-tune (OpenAI, Anthropic, Vertex) where weights cannot be delivered?
The clause has to acknowledge that. The buyer gets a contractual access right, the training data and configuration sufficient to recreate the tune elsewhere, and a portability commitment. Under OpenAI’s 2026 fine-tuning terms the customer owns inputs and outputs and the tuned model is private to their org; but weights are not delivered, and the contract has to say so explicitly.
Does the buyer own a Llama 3 fine-tune?
The buyer owns the adapter or fine-tune they paid to train, subject to Llama 3 Community License pass-through obligations: preserve attribution, ship the license, comply with Meta’s acceptable-use policy, trigger a separate Meta license above 700M MAU. The agency cannot assign more rights than it received. The clause should disclose many four constraints.
How are RAG embeddings handled under GDPR?
Embeddings derived from personal data are themselves personal data under most readings of GDPR Article 4; they are reversibly linked to identifiable individuals through retrieval. Deletion requests flow into the vector store, retention windows apply, cross-border transfer rules attach. The DPA carries the mechanics; but the IP clause should treat embeddings as both Work Product (configuration) and Client Data (contents) so the DPA has something to bind to.
Can an agency keep ownership of generic prompt patterns?
Yes, if it is honest about which ones. A router pattern, generic summarizer, or internal eval harness reused across clients can sit on a Pre-Existing IP schedule with a perpetual royalty-free license to the buyer. What an agency cannot defensibly retain is a prompt iterated against the buyer’s domain data and user feedback. The line is “abstracted pattern” vs. “client-tuned artifact.”
What about synthetic training data generated by GPT or Claude?
It carries the originating provider’s terms. As of 2026, both OpenAI and Anthropic restrict using outputs to train competing foundation models; neither prohibits using them as fine-tuning data for a downstream task model in most cases, but specifics vary by API and product line. The Training Data Manifest is the place to surface this.
Why is the eval dataset the most contentious artifact in practice?
It is the cheapest to withhold and the most expensive to recreate. A held-out test set with rubric and judge prompts represents months of labeling. Without it the buyer cannot verify regressions, run a credible vendor migration, or meet EU AI Act post-market-monitoring obligations. Agencies without eval discipline resist hardest because they do not have the artifact to deliver; only spreadsheets and tribal memory.
How do these clauses interact with the EU AI Act?
Article 26 assigns deployer obligations (post-market monitoring, human oversight, instructions-for-use). Article 53 imposes transparency obligations on providers of general-purpose AI models, including a training-data summary. Article 28 governs provider-deployer relationships along the value chain. Without the AI Work Product clause the buyer cannot meet these because the artifacts live with the agency.
What is the single most important thing to verify before signing?
Run the redeployment drill at signature, not at offboarding: ask the agency to describe, in five sentences, exactly what would land in the buyer’s hands for each of the six artifact classes if the engagement ended next week. If the answer requires a meeting, the artifacts do not exist yet.
Key takeaways
- Standard work-product clauses cover what a developer types. AI engagements produce computed, derivative, and embedded data artifacts outside that frame.
- The six artifact classes; weights, training data, RAG embeddings, prompts, evals, and fine-tune base-derivative status; each need their own contractual treatment.
- Base-model licenses (Llama 3 Community, Mistral Apache, OpenAI / Anthropic fine-tune terms, Gemma) flow through to the buyer; disclose constraints rather than overpromise.
- Eval datasets are the most diagnostic artifact: without them, weight ownership is paperwork.
- Run the redeployment drill at signature, not at offboarding. Five sentences per artifact class.
Related reading
- The AI Agency Manifesto; the pillar this contractual surface enforces.
- The 7 Commitments Most AI Dev Agency Should Make in Writing; the buyer-side framework operationalized here.
- AI Agency Contract Negotiation: Key Terms to Include; the broader MSA / SOW / DPA / NDA structure.
- LLM Fine-Tuning Services for Custom Models; operational discipline behind the Trained Artifacts clause.
- AI Agency Vetting Checklist for CTOs; pre-signature diligence.
Arthur Wandzel