Context Pruning

Automatically trim old tool results to keep agent context within token limits.

Overview

As agents run long tasks, tool results accumulate in the conversation history. Large tool outputs — file reads, API responses, search results — can consume most of the context window, leaving little room for new reasoning.

Context pruning trims these old tool results in-memory before each LLM request, without touching the persisted session history. It uses a two-pass strategy:

Soft trim — truncate oversized tool results to head + tail, dropping the middle.
Hard clear — if the context is still too full, replace entire tool results with a short placeholder.

Context pruning is distinct from session compaction. Compaction permanently summarizes and truncates conversation history. Pruning is non-destructive: the original tool results remain in the session store and are never modified — only the message slice sent to the LLM is trimmed.

How Pruning Triggers

Pruning is opt-in — it only runs when mode: "cache-ttl" is set on the agent. The flow:

history → limitHistoryTurns → sanitizeHistory → LLM

Note: pruneContextMessages (PruneStage) is not part of the main pipeline above. It runs opt-in and separately — only when mode: "cache-ttl" is set. The diagram above reflects the standard history preparation path.

Before each LLM call, GoClaw:

Counts tokens in all messages using the tiktoken BPE tokenizer (falls back to chars / 4 heuristic when tiktoken is unavailable).
Calculates the ratio: totalTokens / contextWindowTokens.
If ratio is below softTrimRatio — context is small enough, no pruning needed.
Pass 0 (per-result guard) — Any single tool result exceeding 30% of the context window is force-trimmed before the main passes begin.
If ratio meets or exceeds softTrimRatio — soft trim eligible tool results (Pass 1).
If ratio still meets or exceeds hardClearRatio after soft trim, and prunable chars exceed minPrunableToolChars — hard clear remaining tool results (Pass 2).

Protected messages: The last keepLastAssistants assistant messages and all tool results after them are never pruned. Messages before the first user message are also protected.

Soft Trim

Soft trim keeps the beginning and end of a long tool result, dropping the middle.

A tool result is eligible for soft trim when its character count exceeds softTrim.maxChars.

The trimmed result looks like:

<first 3000 chars of tool output>
...
<last 3000 chars of tool output>

[Tool result trimmed: kept first 3000 chars and last 3000 chars of 38400 chars.]

Media tool protection: Results from read_image, read_document, read_audio, and read_video receive a higher soft trim budget (headChars=4000, tailChars=4000) because their content is an irreplaceable description generated by a dedicated vision/audio provider. Re-generating it would require another LLM call. Media tool results are also exempt from hard clear — they are never replaced with the placeholder.

The agent retains enough context to understand what the tool returned without consuming the full output.

Hard Clear

Hard clear replaces the entire content of old tool results with a short placeholder string. It runs as a second pass only if the context ratio is still too high after soft trim.

Hard clear processes prunable tool results one by one, recalculating the ratio after each replacement, and stops as soon as the ratio drops below hardClearRatio.

A hard-cleared tool result becomes:

[Old tool result content cleared]

This placeholder is configurable. Hard clear can also be disabled entirely.

Configuration

Context pruning is opt-in. To enable it, set mode: "cache-ttl" in the agent config.

{
  "contextPruning": {
    "mode": "cache-ttl"
  }
}

All other fields have sensible defaults and are optional.

Full configuration reference

{
  "contextPruning": {
    "mode": "cache-ttl",
    "keepLastAssistants": 3,
    "softTrimRatio": 0.25,
    "hardClearRatio": 0.5,
    "minPrunableToolChars": 50000,
    "softTrim": {
      "maxChars": 6000,
      "headChars": 3000,
      "tailChars": 3000
    },
    "hardClear": {
      "enabled": true,
      "placeholder": "[Old tool result content cleared]"
    }
  }
}

Field	Default	Description
`mode`	(unset — pruning disabled)	Set to `"cache-ttl"` to enable pruning. Omit or leave empty to keep pruning off.
`keepLastAssistants`	`3`	Number of recent assistant turns to protect from pruning.
`softTrimRatio`	`0.25`	Trigger soft trim when context fills this fraction of the context window.
`hardClearRatio`	`0.5`	Trigger hard clear when context fills this fraction after soft trim.
`minPrunableToolChars`	`50000`	Minimum total chars in prunable tool results before hard clear runs. Prevents aggressive clearing on small contexts.
`softTrim.maxChars`	`6000`	Tool results longer than this are eligible for soft trim.
`softTrim.headChars`	`3000`	Characters to keep from the start of a trimmed tool result.
`softTrim.tailChars`	`3000`	Characters to keep from the end of a trimmed tool result.
`hardClear.enabled`	`true`	Set to `false` to disable hard clear entirely (soft trim only).
`hardClear.placeholder`	`"[Old tool result content cleared]"`	Replacement text for hard-cleared tool results.

Configuration Examples

Enable pruning (minimum config)

{
  "contextPruning": {
    "mode": "cache-ttl"
  }
}

Aggressive — for long tool-heavy workflows

Trigger earlier and keep less context per tool result:

{
  "contextPruning": {
    "mode": "cache-ttl",
    "softTrimRatio": 0.2,
    "hardClearRatio": 0.4,
    "softTrim": {
      "maxChars": 2000,
      "headChars": 800,
      "tailChars": 800
    }
  }
}

Soft trim only — disable hard clear

{
  "contextPruning": {
    "mode": "cache-ttl",
    "hardClear": {
      "enabled": false
    }
  }
}

Custom placeholder

{
  "contextPruning": {
    "mode": "cache-ttl",
    "hardClear": {
      "placeholder": "[Tool output removed to save context]"
    }
  }
}

Pruning and the Consolidation Pipeline

Context pruning and memory consolidation serve complementary roles — pruning manages live context during a session; consolidation manages long-term recall across sessions.

Within a session:          pruning trims tool results → keeps LLM context lean
On session.completed:      episodic_worker summarizes → L1 episodic memory
After ≥5 episodes:         dreaming_worker promotes → L0 long-term memory

Key distinction: pruning never touches the persisted session store. Once a session completes, the consolidation pipeline (not pruning) takes over and determines what is worth keeping long-term. This means:

Pruned tool results are still visible to episodic_worker via the session store when it reads messages for summarization.
Content that was hard-cleared from live context is still summarized into episodic memory on session completion — nothing is permanently lost by pruning.
For content that has been promoted to episodic or long-term memory by dreaming_worker, the auto-injector re-surfaces it as concise L0 abstracts at the start of the next turn. This replaces the need to keep bulky tool results alive in context.

Practical consequence

Once the consolidation pipeline has promoted a body of knowledge to L0 (via dreaming) or L1 (via episodic), you can allow pruning to be more aggressive for that agent. The agent will not lose information — it will be re-injected from memory rather than carried forward in raw session history.

Impact on Agent Behavior

No session data is modified. Pruning only affects the message slice passed to the LLM. The original tool results remain in the session store.
Recent context is always preserved. The last keepLastAssistants assistant turns and their associated tool results are never touched.
Soft-trimmed results still provide signal. The agent sees the beginning and end of long outputs, which usually contain the most relevant information (headers, summaries, final lines).
Hard-cleared results may cause repeated tool calls. If an agent can no longer see a tool result, it may re-run the tool to recover the information. This is expected behavior.
Context window size matters. Pruning thresholds are ratios of the actual model context window. Agents configured with larger context windows will prune less aggressively.

Common Issues

Pruning never triggers

Confirm that mode is set to "cache-ttl" — pruning is opt-in and disabled by default. Also confirm that contextWindow is set on the agent — pruning needs a token count to calculate ratios.

Agent re-runs tools unexpectedly

Hard clear removes tool result content entirely. If the agent needs that content, it will call the tool again. Lower hardClearRatio or increase minPrunableToolChars to delay hard clear, or disable it with hardClear.enabled: false.

Trimmed results cut off important content

Increase softTrim.headChars and softTrim.tailChars, or raise softTrim.maxChars so fewer results are eligible for trimming.

Context still overflows despite pruning being enabled

Pruning only acts on tool results. If long user messages or system prompt components dominate the context, pruning will not help. Consider session compaction or reduce the system prompt size.

Pipeline Improvements

Tiktoken BPE Token Counting

GoClaw now uses the tiktoken BPE tokenizer for accurate token counting instead of the legacy chars / 4 heuristic. This matters especially for CJK content (Vietnamese and Chinese characters), where the heuristic significantly underestimates token usage. With tiktoken enabled, all pruning ratios are calculated against actual token counts rather than character estimates.

Pass 0 Per-Result Guard

Before normal pruning passes begin, any single tool result that exceeds 30% of the context window is force-trimmed. This catches outlier outputs (e.g., a massive file read or API response) even when the overall context ratio is still below softTrimRatio. The trimmed result keeps a 70/30 head/tail split.

Media Tool Protection

Results from read_image, read_document, read_audio, and read_video are handled specially:

They receive a higher soft trim budget: headChars=4000, tailChars=4000 (vs. the standard 3000/3000).
They are exempt from hard clear — media descriptions are generated by dedicated vision/audio providers (Gemini, Anthropic) and cannot be regenerated without another LLM call.

MediaRefs Compaction

During history compaction, up to 30 most recent MediaRefs are preserved. This ensures the agent can still reference previously shared images and documents after compaction without losing track of media context.

Structured Compaction Summary

When context is compacted, the summary now preserves key identifiers — agent IDs, task IDs, and session keys — in a structured format. This ensures that agents can continue referencing their active tasks and sessions after compaction without losing critical tracking context.

Tool Output Capping at Source

Tool output is now capped at the source before being added to context. Rather than waiting for the pruning pipeline to trim oversized results after the fact, GoClaw limits tool output size at ingestion time. This reduces unnecessary memory pressure and makes the pruning pipeline more predictable.

What's Next

Sessions & History — session compaction, history limits
Memory System — 3-tier memory architecture and consolidation pipeline
Configuration Reference — full agent config reference

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Context Pruning

Overview

How Pruning Triggers

Soft Trim

Hard Clear

Configuration

Full configuration reference

Configuration Examples

Enable pruning (minimum config)

Aggressive — for long tool-heavy workflows

Soft trim only — disable hard clear

Custom placeholder

Pruning and the Consolidation Pipeline

Practical consequence

Impact on Agent Behavior

Common Issues

Pipeline Improvements

Tiktoken BPE Token Counting

Pass 0 Per-Result Guard

Media Tool Protection

MediaRefs Compaction

Structured Compaction Summary

Tool Output Capping at Source

What's Next

FilesExpand file tree

context-pruning.md

Latest commit

History

context-pruning.md

File metadata and controls

Context Pruning

Overview

How Pruning Triggers

Soft Trim

Hard Clear

Configuration

Full configuration reference

Configuration Examples

Enable pruning (minimum config)

Aggressive — for long tool-heavy workflows

Soft trim only — disable hard clear

Custom placeholder

Pruning and the Consolidation Pipeline

Practical consequence

Impact on Agent Behavior

Common Issues

Pipeline Improvements

Tiktoken BPE Token Counting

Pass 0 Per-Result Guard

Media Tool Protection

MediaRefs Compaction

Structured Compaction Summary

Tool Output Capping at Source

What's Next