Automatically trim old tool results to keep agent context within token limits.
As agents run long tasks, tool results accumulate in the conversation history. Large tool outputs — file reads, API responses, search results — can consume most of the context window, leaving little room for new reasoning.
Context pruning trims these old tool results in-memory before each LLM request, without touching the persisted session history. It uses a two-pass strategy:
- Soft trim — truncate oversized tool results to head + tail, dropping the middle.
- Hard clear — if the context is still too full, replace entire tool results with a short placeholder.
Context pruning is distinct from session compaction. Compaction permanently summarizes and truncates conversation history. Pruning is non-destructive: the original tool results remain in the session store and are never modified — only the message slice sent to the LLM is trimmed.
Pruning is opt-in — it only runs when mode: "cache-ttl" is set on the agent. The flow:
history → limitHistoryTurns → sanitizeHistory → LLM
Note:
pruneContextMessages(PruneStage) is not part of the main pipeline above. It runs opt-in and separately — only whenmode: "cache-ttl"is set. The diagram above reflects the standard history preparation path.
Before each LLM call, GoClaw:
- Counts tokens in all messages using the tiktoken BPE tokenizer (falls back to
chars / 4heuristic when tiktoken is unavailable). - Calculates the ratio:
totalTokens / contextWindowTokens. - If ratio is below
softTrimRatio— context is small enough, no pruning needed. - Pass 0 (per-result guard) — Any single tool result exceeding 30% of the context window is force-trimmed before the main passes begin.
- If ratio meets or exceeds
softTrimRatio— soft trim eligible tool results (Pass 1). - If ratio still meets or exceeds
hardClearRatioafter soft trim, and prunable chars exceedminPrunableToolChars— hard clear remaining tool results (Pass 2).
Protected messages: The last keepLastAssistants assistant messages and all tool results after them are never pruned. Messages before the first user message are also protected.
Soft trim keeps the beginning and end of a long tool result, dropping the middle.
A tool result is eligible for soft trim when its character count exceeds softTrim.maxChars.
The trimmed result looks like:
<first 3000 chars of tool output>
...
<last 3000 chars of tool output>
[Tool result trimmed: kept first 3000 chars and last 3000 chars of 38400 chars.]
Media tool protection: Results from read_image, read_document, read_audio, and read_video receive a higher soft trim budget (headChars=4000, tailChars=4000) because their content is an irreplaceable description generated by a dedicated vision/audio provider. Re-generating it would require another LLM call. Media tool results are also exempt from hard clear — they are never replaced with the placeholder.
The agent retains enough context to understand what the tool returned without consuming the full output.
Hard clear replaces the entire content of old tool results with a short placeholder string. It runs as a second pass only if the context ratio is still too high after soft trim.
Hard clear processes prunable tool results one by one, recalculating the ratio after each replacement, and stops as soon as the ratio drops below hardClearRatio.
A hard-cleared tool result becomes:
[Old tool result content cleared]
This placeholder is configurable. Hard clear can also be disabled entirely.
Context pruning is opt-in. To enable it, set mode: "cache-ttl" in the agent config.
{
"contextPruning": {
"mode": "cache-ttl"
}
}All other fields have sensible defaults and are optional.
{
"contextPruning": {
"mode": "cache-ttl",
"keepLastAssistants": 3,
"softTrimRatio": 0.25,
"hardClearRatio": 0.5,
"minPrunableToolChars": 50000,
"softTrim": {
"maxChars": 6000,
"headChars": 3000,
"tailChars": 3000
},
"hardClear": {
"enabled": true,
"placeholder": "[Old tool result content cleared]"
}
}
}| Field | Default | Description |
|---|---|---|
mode |
(unset — pruning disabled) | Set to "cache-ttl" to enable pruning. Omit or leave empty to keep pruning off. |
keepLastAssistants |
3 |
Number of recent assistant turns to protect from pruning. |
softTrimRatio |
0.25 |
Trigger soft trim when context fills this fraction of the context window. |
hardClearRatio |
0.5 |
Trigger hard clear when context fills this fraction after soft trim. |
minPrunableToolChars |
50000 |
Minimum total chars in prunable tool results before hard clear runs. Prevents aggressive clearing on small contexts. |
softTrim.maxChars |
6000 |
Tool results longer than this are eligible for soft trim. |
softTrim.headChars |
3000 |
Characters to keep from the start of a trimmed tool result. |
softTrim.tailChars |
3000 |
Characters to keep from the end of a trimmed tool result. |
hardClear.enabled |
true |
Set to false to disable hard clear entirely (soft trim only). |
hardClear.placeholder |
"[Old tool result content cleared]" |
Replacement text for hard-cleared tool results. |
{
"contextPruning": {
"mode": "cache-ttl"
}
}Trigger earlier and keep less context per tool result:
{
"contextPruning": {
"mode": "cache-ttl",
"softTrimRatio": 0.2,
"hardClearRatio": 0.4,
"softTrim": {
"maxChars": 2000,
"headChars": 800,
"tailChars": 800
}
}
}{
"contextPruning": {
"mode": "cache-ttl",
"hardClear": {
"enabled": false
}
}
}{
"contextPruning": {
"mode": "cache-ttl",
"hardClear": {
"placeholder": "[Tool output removed to save context]"
}
}
}Context pruning and memory consolidation serve complementary roles — pruning manages live context during a session; consolidation manages long-term recall across sessions.
Within a session: pruning trims tool results → keeps LLM context lean
On session.completed: episodic_worker summarizes → L1 episodic memory
After ≥5 episodes: dreaming_worker promotes → L0 long-term memory
Key distinction: pruning never touches the persisted session store. Once a session completes, the consolidation pipeline (not pruning) takes over and determines what is worth keeping long-term. This means:
- Pruned tool results are still visible to
episodic_workervia the session store when it reads messages for summarization. - Content that was hard-cleared from live context is still summarized into episodic memory on session completion — nothing is permanently lost by pruning.
- For content that has been promoted to episodic or long-term memory by
dreaming_worker, the auto-injector re-surfaces it as concise L0 abstracts at the start of the next turn. This replaces the need to keep bulky tool results alive in context.
Once the consolidation pipeline has promoted a body of knowledge to L0 (via dreaming) or L1 (via episodic), you can allow pruning to be more aggressive for that agent. The agent will not lose information — it will be re-injected from memory rather than carried forward in raw session history.
- No session data is modified. Pruning only affects the message slice passed to the LLM. The original tool results remain in the session store.
- Recent context is always preserved. The last
keepLastAssistantsassistant turns and their associated tool results are never touched. - Soft-trimmed results still provide signal. The agent sees the beginning and end of long outputs, which usually contain the most relevant information (headers, summaries, final lines).
- Hard-cleared results may cause repeated tool calls. If an agent can no longer see a tool result, it may re-run the tool to recover the information. This is expected behavior.
- Context window size matters. Pruning thresholds are ratios of the actual model context window. Agents configured with larger context windows will prune less aggressively.
Pruning never triggers
Confirm that mode is set to "cache-ttl" — pruning is opt-in and disabled by default. Also confirm that contextWindow is set on the agent — pruning needs a token count to calculate ratios.
Agent re-runs tools unexpectedly
Hard clear removes tool result content entirely. If the agent needs that content, it will call the tool again. Lower hardClearRatio or increase minPrunableToolChars to delay hard clear, or disable it with hardClear.enabled: false.
Trimmed results cut off important content
Increase softTrim.headChars and softTrim.tailChars, or raise softTrim.maxChars so fewer results are eligible for trimming.
Context still overflows despite pruning being enabled
Pruning only acts on tool results. If long user messages or system prompt components dominate the context, pruning will not help. Consider session compaction or reduce the system prompt size.
GoClaw now uses the tiktoken BPE tokenizer for accurate token counting instead of the legacy chars / 4 heuristic. This matters especially for CJK content (Vietnamese and Chinese characters), where the heuristic significantly underestimates token usage. With tiktoken enabled, all pruning ratios are calculated against actual token counts rather than character estimates.
Before normal pruning passes begin, any single tool result that exceeds 30% of the context window is force-trimmed. This catches outlier outputs (e.g., a massive file read or API response) even when the overall context ratio is still below softTrimRatio. The trimmed result keeps a 70/30 head/tail split.
Results from read_image, read_document, read_audio, and read_video are handled specially:
- They receive a higher soft trim budget: headChars=4000, tailChars=4000 (vs. the standard 3000/3000).
- They are exempt from hard clear — media descriptions are generated by dedicated vision/audio providers (Gemini, Anthropic) and cannot be regenerated without another LLM call.
During history compaction, up to 30 most recent MediaRefs are preserved. This ensures the agent can still reference previously shared images and documents after compaction without losing track of media context.
When context is compacted, the summary now preserves key identifiers — agent IDs, task IDs, and session keys — in a structured format. This ensures that agents can continue referencing their active tasks and sessions after compaction without losing critical tracking context.
Tool output is now capped at the source before being added to context. Rather than waiting for the pruning pipeline to trim oversized results after the fact, GoClaw limits tool output size at ingestion time. This reduces unnecessary memory pressure and makes the pruning pipeline more predictable.
- Sessions & History — session compaction, history limits
- Memory System — 3-tier memory architecture and consolidation pipeline
- Configuration Reference — full agent config reference