LLM performance drops sharply after 60-70% of the advertised context window is consumed — not gradually, but in sudden cliffs (Chroma Research, 2025). Most developers never notice because they blame the AI for “getting dumber” mid-session. The fix is straightforward: treat context as a finite resource, not an infinite scroll. This guide covers when to continue a conversation, when to start fresh, and how to engineer your context so the AI stays sharp across sessions.
Context Rot and Why Your AI Gets Worse Mid-Session
Every message you send, every file you reference, and every response the AI generates consumes tokens from a fixed-size context window. Once that window fills up, the model doesn’t crash — it degrades. Chroma Research evaluated 18 state-of-the-art models including GPT-4.1, Claude 4, and Gemini 2.5 and found that reliability decreases with longer inputs, even on simple tasks like retrieval and text replication.
This degradation has a name: context rot. A Stanford study showed that adding just 4,000 tokens of irrelevant context causes LLM accuracy to drop from 70-75% down to 55-60%. For coding tasks, the effect compounds — models start making confident statements about code they barely read, suggest refactors that miss edge cases, and loop on bugs they introduced earlier in the conversation.
The “lost in the middle” phenomenon makes it worse. Models pay close attention to information at the beginning and end of their context window, but the middle gets fuzzy. A critical architecture decision you discussed 40 messages ago may as well not exist.
| Model | Advertised Window | Reliable Window | Drop-off Behavior |
|---|---|---|---|
| Claude 4.5 Sonnet | 200K tokens | ~130K tokens (65%) | Sudden degradation |
| GPT-4.1 | 1M tokens | ~600-700K tokens | Gradual then sharp |
| Cursor (practical) | Varies | 70K-120K tokens | Auto-trims older files |
| Claude Code | 200K tokens | ~130K, then auto-compact | Managed compression |
The Decision Framework: Continue, Compact, or Clear
Every AI coding session reaches a fork. The wrong choice costs you time — either repeating context you shouldn’t have lost, or fighting an AI that’s drowning in stale information. Here’s the data-backed decision framework.
Continue the Conversation
Stay in the current session when:
- You’re iterating on the same feature or file — the AI has accumulated useful context about your code structure, naming conventions, and the problem you’re solving
- Context utilization is below 50% — you have headroom before degradation
- You’re mid-task — breaking a multi-step operation (refactoring, test writing, debugging) loses the thread of what’s been tried and decided
- The debugging history matters — if the AI needs to know what approaches failed, that history is valuable context
Compact the Conversation
Summarize and continue when:
- Context hits 70-75% utilization — this is the sweet spot. Waiting until 85-90% risks losing important details during compression
- You’re continuing the same task but the conversation is long — /compact in Claude Code achieves ~50% token reduction while preserving critical decisions
- You need to shed noise but keep decisions — twenty messages of back-and-forth about a CSS bug can compress to “fixed the layout issue by switching to grid; the component uses flexbox for the inner container”
In Claude Code, you can guide what gets preserved: /compact focus on the database schema decisions and migration plan. This targeted compression keeps the context relevant.
Clear and Start Fresh
Reset completely when:
- You’re switching to an unrelated task — context from Task A leaks into Task B, causing the AI to reference irrelevant code or apply the wrong patterns
- The AI contradicts its own earlier decisions — this signals the context is too noisy for coherent reasoning
- Debugging exceeds ~20 messages without resolution — at this point, the accumulated failed approaches pollute the context more than they inform it. Start fresh with a clean problem statement
- You completed a task and are starting the next — each task deserves a clean slate with focused context
Tool-Specific Context Strategies
Claude Code
Claude Code provides the most explicit context management of any AI coding tool. Three built-in mechanisms handle different scenarios:
/compact summarizes the conversation history into a condensed version, achieving roughly 50% token reduction. Since v2.0.64, compaction is instant — no waiting. You can customize what gets preserved by adding instructions: /compact keep the API route structure decisions.
/clear wipes all conversation context and starts from zero. Use this when switching tasks entirely.
Auto-compact triggers automatically at 75-92% context utilization, compressing the conversation without interrupting your flow. Claude Code handles this transparently in the background.
CLAUDE.md is the persistence layer. This file loads automatically at the start of every session with higher adherence than user prompts. Store universal project rules here — architecture decisions, coding conventions, file structure. Keep it focused: task-specific instructions belong in linked files (architecture.md, learnings.md), not in the root CLAUDE.md.
Memory files in the .claude/ directory persist across sessions. Claude Code writes learnings, patterns, and project-specific knowledge here automatically, building an external memory that survives conversation resets.
Claude Code also uses 5.5x fewer tokens than Cursor for equivalent tasks because it plans upfront then executes, reducing the back-and-forth that inflates context.
Cursor
Cursor uses a layered context architecture:
| Layer | Mechanism | Persistence | When It Loads |
|---|---|---|---|
| 1 | .cursorrules / .mdc files | Permanent | Every session |
| 2 | Notepads | Permanent | When @referenced |
| 3 | @Docs | External | When @referenced |
| 4 | @Files / @Codebase | Project | When @referenced |
| 5 | Conversation | Session | Always |
Best practices for Cursor context management:
- Keep .cursorrules under 500 lines with one concern per file. Split large specs into multiple composable
.mdcfiles - Name Notepads descriptively — “Auth_Rules” and “API_Guidelines” rather than “Notes1”. Include @File references inside Notepads to create reusable context bundles
- Be surgical with @mentions — reference specific files with @Files instead of letting Cursor auto-detect. Close editor tabs you don’t need; open tabs contribute to context noise
- Start a new chat after ~20 messages or when switching tasks. Cursor’s chat sessions default to ~20,000 tokens, and it silently trims older context to maintain responsiveness
AGENTS.md — The Cross-Tool Standard
AGENTS.md is an emerging standard that works across Claude Code, Cursor, GitHub Copilot, and other agentic tools. A single markdown file at your repository root provides consistent instructions regardless of which tool your team uses. This prevents the fragmentation of maintaining separate CLAUDE.md and .cursorrules files with overlapping content.
From Prompt Engineering to Context Engineering
MIT Technology Review marked 2025 as the year developers shifted “from vibe coding to context engineering.” The distinction matters: prompt engineering optimizes what you ask; context engineering optimizes everything surrounding the question — memory, tools, retrieved data, and conversation history.
Five strategies form the foundation:
- Selection — Include only files and context relevant to the current task. A Stanford study showed that irrelevant context actively harms performance; more is not better
- Compression — Summarize long conversation histories rather than carrying every message. Use /compact in Claude Code or start a new chat with a summary in Cursor
- Ordering — Place critical information at the beginning and end of your context. The “lost in the middle” problem means information buried in the middle of long contexts gets less attention
- Isolation — Separate unrelated tasks into different sessions. Context bleed between tasks is one of the most common productivity killers
- Format optimization — Structure context with headers, bullet points, and tables. LLMs parse structured text faster and more accurately than prose
Anthropic’s 2026 Agentic Coding Trends Report found that developers use AI in 60% of their work but fully delegate only 0-20% of tasks. The remaining 80-100% requires active oversight — and the quality of that oversight depends on how well you manage the context window.
Frequently Asked Questions
How do I know when my context window is getting full?
Claude Code displays context utilization as a percentage and auto-compacts at 75-92%. Cursor doesn’t show utilization directly, but you’ll notice slower responses, repeated suggestions, or the AI forgetting earlier decisions. As a rule, compact or restart after 15-20 substantive messages or when responses start degrading in quality.
Should I put everything in CLAUDE.md or keep it minimal?
Keep CLAUDE.md minimal — only universally applicable instructions that every session needs. Task-specific details, architecture documentation, and project context belong in linked files (architecture.md, learnings.md, conventions.md). CLAUDE.md loads into every session, so bloating it wastes context on irrelevant instructions for any given task.
Does a bigger context window mean I don’t need to manage context?
Bigger windows don’t eliminate context rot. Chroma Research tested models with 1M+ token windows and found the same degradation pattern — just shifted to a higher absolute number. A model with a 1M token window becomes unreliable around 600-700K tokens. Larger windows make it easier to degrade output quality without noticing, because the failure is silent.
What’s the difference between /compact and /clear in Claude Code?
/compact summarizes the conversation into a condensed version and continues with that summary as context — preserving key decisions and progress while reducing token count by ~50%. /clear wipes everything and starts a blank session. Use /compact when continuing the same task; use /clear when switching to something unrelated.
How do I carry context between sessions when I close and reopen my editor?
Use persistent instruction files: CLAUDE.md (Claude Code), .cursorrules/.mdc (Cursor), or AGENTS.md (cross-tool). For project state, maintain a scratchpad or progress file that summarizes current work-in-progress. Claude Code’s memory files in .claude/ persist automatically across sessions, building cumulative project knowledge without manual effort.
Key Takeaways
- Context rot is measurable — LLMs degrade at 60-70% of advertised window capacity, with sudden drops rather than gradual decline
- Compact at 70-75%, not 85-90% — earlier compression preserves more useful context and gives the model headroom for complex operations
- One task per conversation — separating tasks prevents context bleed, the most common source of AI “hallucination” in coding sessions
- Use persistent instruction files (CLAUDE.md, .cursorrules, AGENTS.md) to avoid repeating project context every session
- Context engineering is the new productivity lever — MIT Technology Review identified the shift from prompt engineering to context engineering as the defining trend of 2025-2026
Building with AI tools? SFAI Labs practices context engineering across every client project — we manage the AI so you can focus on the product. Talk to us about AI development, automation, and custom solutions.
Nenad Radovanovic