How infinite conversations fit into finite context. The secret to long-running coding sessions.
β Back to Main | β Conversation Loop
Language models have a fixed context window (e.g., 200K tokens). But coding sessions can go on for hours β reading files, editing code, running tests, iterating. Without memory management, the conversation would hit the limit and break.
Claude Code solves this with context compaction β intelligently shrinking the conversation history while preserving what matters.
flowchart TD
START["Every API response"] --> COUNT["Count cumulative<br/>input tokens"]
COUNT --> CHECK{"tokens > budget?<br/>(default: 200K)"}
CHECK -->|"No"| CONTINUE["β
Continue normally"]
CHECK -->|"Yes"| IDENTIFY["Identify messages<br/>to compress"]
IDENTIFY --> PRESERVE["Preserve recent N<br/>messages (configurable)"]
PRESERVE --> SUMMARIZE["Generate XML summary<br/>of removed messages"]
SUMMARIZE --> INJECT["Inject summary as<br/>system context message"]
INJECT --> STRIP["Strip old messages<br/>from history"]
STRIP --> RESUME["Resume conversation<br/>with compacted context"]
style START fill:#3b82f6,color:#fff
style CONTINUE fill:#22c55e,color:#fff
style RESUME fill:#22c55e,color:#fff
graph TB
subgraph "BEFORE Compaction"
direction TB
M1["π’ System Prompt"]
M2["π€ User: Read config.rs"]
M3["π€ Assistant: Sure, reading..."]
M4["π§ Tool: read_file β 200 lines"]
M5["π€ Assistant: I see the issue..."]
M6["π§ Tool: edit_file β applied"]
M7["π€ User: Now fix the tests"]
M8["π€ Assistant: Looking at tests..."]
M9["π§ Tool: read_file β test.rs"]
M10["π€ Assistant: Here's the fix..."]
end
subgraph "AFTER Compaction"
direction TB
N1["π’ System Prompt"]
N2["π Summary: Previously read config.rs,<br/>found and fixed a bug in line 42"]
N7["π€ User: Now fix the tests"]
N8["π€ Assistant: Looking at tests..."]
N9["π§ Tool: read_file β test.rs"]
N10["π€ Assistant: Here's the fix..."]
end
M1 -.->|"Kept"| N1
M2 -.->|"Summarized"| N2
M3 -.->|"Summarized"| N2
M4 -.->|"Summarized"| N2
M5 -.->|"Summarized"| N2
M6 -.->|"Summarized"| N2
M7 -.->|"Kept"| N7
M8 -.->|"Kept"| N8
M9 -.->|"Kept"| N9
M10 -.->|"Kept"| N10
ββββββββββββββββββββββββββββββββββββββββββββββββ
β CompactionConfig β
ββββββββββββββββββββββββββββββββββββββββββββββββ€
β max_estimated_tokens: u32 (default: 200K) β
β preserve_recent_messages: u32 (configurable) β
ββββββββββββββββββββββββββββββββββββββββββββββββ€
β Env override: β
β CLAUDE_CODE_AUTO_COMPACT_INPUT_TOKENS=150000 β
ββββββββββββββββββββββββββββββββββββββββββββββββ
Before compaction can trigger, the system needs to know how many tokens have been used. Here's how it estimates:
flowchart LR
A["Each message"] --> B["Count characters<br/>in all content blocks"]
B --> C["Apply ratio:<br/>~4 chars = 1 token"]
C --> D["Sum across<br/>all messages"]
D --> E["Compare to budget"]
Note: This is an estimate. The actual token count from the API (returned in usage) is used for cost tracking, but the estimate is used for compaction decisions to avoid an extra API call.
When old messages are removed, a structured summary replaces them:
<conversation_summary>
The user asked to read config.rs and fix a bug.
The assistant read the file (200 lines), identified an
off-by-one error on line 42, and applied an edit to fix it.
The fix was verified with cargo check (0 errors).
</conversation_summary>This summary is injected as a system-role message so the model knows what happened before, even though the detailed messages are gone.
sequenceDiagram
participant RT as π§ Runtime
participant API as π API
participant C as π¦ Compactor
Note over RT: Turn 15 β tokens approaching 200K
RT->>API: Send messages (198K tokens)
API-->>RT: Response + usage: 201K input tokens
RT->>RT: Check: 201K > 200K budget β οΈ
RT->>C: Trigger compaction
C->>C: Identify turns 1-10 for removal
C->>C: Generate summary of turns 1-10
C->>C: Preserve turns 11-15
C->>C: Inject summary message
C-->>RT: Compacted history (~80K tokens)
Note over RT: TurnSummary.auto_compaction = true
RT->>API: Next request with compacted context
API-->>RT: Continues naturally
Users can also trigger compaction manually via the /compact slash command:
flowchart LR
A["User types /compact"] --> B["Force compaction<br/>regardless of token count"]
B --> C["Same compaction flow"]
C --> D["Refreshed context<br/>with summary"]
This is useful when:
- The conversation feels "confused" by old context
- You want to start a new subtask within the same session
- You're working on a different part of the codebase
Context Window: 200K tokens
βββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β System Prompt β ~2K tokens β
ββββββββββββββββββββββββββΌβββββββββββββββββββββββββββββ
β Compacted Summary β ~1K tokens (after compact) β
ββββββββββββββββββββββββββΌβββββββββββββββββββββββββββββ
β Recent Messages β ~50K-150K tokens β
ββββββββββββββββββββββββββΌβββββββββββββββββββββββββββββ
β Tool Definitions β ~5K tokens β
ββββββββββββββββββββββββββΌβββββββββββββββββββββββββββββ
β Available for Response β Remaining β
ββββββββββββββββββββββββββ΄βββββββββββββββββββββββββββββ
| Aspect | Detail |
|---|---|
| Trigger | Cumulative input tokens exceed budget (default 200K) |
| What's removed | Oldest messages beyond preserve_recent count |
| What's kept | System prompt, summary, recent N messages |
| Summary format | XML-wrapped natural language summary |
| Manual trigger | /compact slash command |
| Config override | CLAUDE_CODE_AUTO_COMPACT_INPUT_TOKENS env var |
| Impact | Transparent to user β conversation continues seamlessly |
- Tool System β β The tools that generate all those tokens
- Session Management β β How conversations persist across restarts