Context engineering beats a bigger context window

Claude Code now runs up to a 1M-token context window on flagship models — but a bigger window does not fix a cluttered one. Every turn resends the entire conversation, and answer quality degrades before the hard limit is ever reached. The lever that matters is context engineering, not window size.

Definition

Context window — the maximum number of tokens a model can consider at once, covering the system prompt, the full conversation history, tool results, and the model’s own replies.

Context engineering — deliberately curating what enters that window each turn, instead of trusting a larger window to absorb the clutter.

The window got bigger; the failure mode didn’t

Anthropic’s current paid-plan limits (verified against Anthropic support, June 2026):

Surface	Models	Context limit
Claude chat (paid)	Opus 4.8 / 4.7 / 4.6, Sonnet 4.6	500K
Claude chat (paid)	other models	200K (~500 pages)
Claude Code	Opus 4.8 / 4.7 / 4.6	1M
Claude Code	Sonnet 4.6	1M

On Pro, the 1M window for Opus and Sonnet 4.6 in Claude Code requires enabling usage credits (usage-based Enterprise plans are exempt).

A 1M window sounds like it ends the problem. It doesn’t, for two reasons.

Cause: every turn replays, and the middle gets lost

Replay. An agent turn is not just your latest prompt. The model re-reads the entire state every time: prior messages, its own previous responses, tool outputs, file reads, and any docs it fetched. A single large file read or doc fetch keeps costing tokens for the rest of the session.

Lost in the middle. Recall is strongest at the very start and very end of the window and weakest in between. So a fact buried mid-context can be ignored while the token count is still far under the cap — quality drops before the hard limit, not at it.

The result: throughput and accuracy fall as the window fills with low-signal content, regardless of how high the ceiling is.

Solution: engineer the context

Symptom — the agent forgets earlier instructions, repeats work, or references files it was never shown. Cause — a bloated, low-density window. Solution — four moves:

Keep CLAUDE.md lean. It loads at session start and costs tokens every turn. Put only universally applicable facts in it; pass task-specific detail in the prompt. See keep CLAUDE.md to universal instructions.
/compact proactively. Summarize the conversation before it sprawls, and append an instruction telling it what to preserve (for example, the auth flow you’re mid-change on).
/clear between tasks. When you switch to unrelated work, reset to zero tokens instead of dragging the old context along.
Plan outside, paste the result in. Do exploratory back-and-forth somewhere cheap, then inject only the final plan — not the dead ends that produced it.

The two reset commands, in their official form:

/compact focus on the auth flow I'm mid-change on
/clear   # reset to zero tokens between unrelated tasks

Push heavy reads into a subagent

The highest-leverage move for large reads is delegation. A subagent runs in its own context window and returns only a summary to the main agent. Reading an MCP server’s API reference or a pile of source files can cost tens of thousands of tokens — in one demo run, an API planner reading Stripe’s docs through the Context7 MCP server consumed ~54K (illustrative of the scale, not a benchmark). Done in the main window, that clutter persists for the rest of the session; done in a subagent, the parent receives just the conclusion.

Reach for it when a task needs a large doc or many files read but the main loop only needs the answer. A subagent is also where you drop to a cheaper, faster tier — Haiku 4.5 reads the pile while Opus or Sonnet keeps the plan — so the context isolation and the token savings compound. See subagent context isolation and pick the right Claude tier.

Bottom line

The window will keep growing. The discipline that makes agents reliable — context engineering — does not change with it. Treat the window as a budget to spend deliberately, not a bucket to fill.