fix(condenser): Tool-call aware condensation #1412

csmith49 · 2025-12-16T22:01:48Z

Certain LLM APIs place restrictions on how messages should be structured, especially with respect to tool calls. This isn't normally a problem, but sometimes the condensers can violate these properties in ways that are difficult to recover from.

This PR makes the primary condenser -- the LLMSummarizingCondenser -- aware of these restrictions. To do so, we introduce the concept of manipulation indices, which are spots where the conversation history can be changed without violating these properties. The condenser ensures that summaries are only inserted at these indices, and that when events are forgotten it happens from one manipulation index to another.

Design Choices

We could make the View object ensure that these properties cannot be violated. Unfortunately, that means the condenser might produce a Condensation that violates a property and the View is now responsible for deciding how to fix it. That's unfortunate because it means you can't read what a condensation is doing purely from the event any more, you need to know how the views will be processed.

So instead we make it so the View informs the condensers of where changes can be made. This keeps the condensation events literal and also means we can enforce more/less constraints on the conversation history by modifying the code generating the manipulation indices.

The API restrictions are currently codified in a few functions in the View class: _enforce_batch_atomicity and filter_unmatched_tool_calls. These are exactly the same as before, but have been extended with warnings if their property is violated by a condenser. We can remove them and simplify the View at a later date.

Tradeoffs

This of course means there is some slack in the condenser's solutions. The condenser determines forgetting ranges based on the resource usage (events, tokens) of individual events, and these computed ranges are then "projected" into the manipulation index space. The end result is that some condensations may not be what the intuitive semantics imply, but we get the guarantee that the resulting conversation history is well-formed.

Summary of Changes

A View.get_manipulation_indices function to tell condensers where changes can be made.
Modifications to View methods to throw a warning when they "fix" the conversation history.
Modifications to the LLMSummarizingCondenser to use the manipulation indices. No other current condenser needs to be updated.
Unit tests for all of the above.

Agent Server images for this PR

• GHCR package: https://github.com/OpenHands/agent-sdk/pkgs/container/agent-server

Variants & Base Images

Variant	Architectures	Base Image	Docs / Tags
java	amd64, arm64	`eclipse-temurin:17-jdk`	Link
python	amd64, arm64	`nikolaik/python-nodejs:python3.12-nodejs22`	Link
golang	amd64, arm64	`golang:1.21-bookworm`	Link

Pull (multi-arch manifest)

# Each variant is a multi-arch manifest supporting both amd64 and arm64
docker pull ghcr.io/openhands/agent-server:24c1423-python

Run

docker run -it --rm \
  -p 8000:8000 \
  --name agent-server-24c1423-python \
  ghcr.io/openhands/agent-server:24c1423-python

All tags pushed for this build

ghcr.io/openhands/agent-server:24c1423-golang-amd64
ghcr.io/openhands/agent-server:24c1423-golang_tag_1.21-bookworm-amd64
ghcr.io/openhands/agent-server:24c1423-golang-arm64
ghcr.io/openhands/agent-server:24c1423-golang_tag_1.21-bookworm-arm64
ghcr.io/openhands/agent-server:24c1423-java-amd64
ghcr.io/openhands/agent-server:24c1423-eclipse-temurin_tag_17-jdk-amd64
ghcr.io/openhands/agent-server:24c1423-java-arm64
ghcr.io/openhands/agent-server:24c1423-eclipse-temurin_tag_17-jdk-arm64
ghcr.io/openhands/agent-server:24c1423-python-amd64
ghcr.io/openhands/agent-server:24c1423-nikolaik_s_python-nodejs_tag_python3.12-nodejs22-amd64
ghcr.io/openhands/agent-server:24c1423-python-arm64
ghcr.io/openhands/agent-server:24c1423-nikolaik_s_python-nodejs_tag_python3.12-nodejs22-arm64
ghcr.io/openhands/agent-server:24c1423-golang
ghcr.io/openhands/agent-server:24c1423-java
ghcr.io/openhands/agent-server:24c1423-python

About Multi-Architecture Support

Each variant tag (e.g., 24c1423-python) is a multi-arch manifest supporting both amd64 and arm64
Docker automatically pulls the correct architecture for your platform
Individual architecture tags (e.g., 24c1423-python-amd64) are also available if needed

The test file was importing from 'resolve_model_configs' (plural) but the actual file is 'resolve_model_config.py' (singular). Also updated the test functions to match the actual function signature which takes only model_ids and uses the global MODELS dictionary. Co-authored-by: openhands <openhands@all-hands.dev>

github-actions · 2025-12-16T22:04:53Z

Coverage Report •

File	Stmts	Miss	Cover	Missing
openhands-sdk/openhands/sdk/context
view.py	194	77	60%	87, 92, 97–98, 103–104, 109–113, 137–138, 141–147, 150–152, 156, 160–163, 166–168, 172–174, 176, 180–182, 185, 188, 190–191, 193, 209–213, 215, 247–248, 279, 290–291, 299, 302, 358–361, 363–365, 376–377, 379, 381, 403–406, 409, 411–412, 419, 421–422
openhands-sdk/openhands/sdk/context/condenser
llm_summarizing_condenser.py	90	55	38%	47, 54, 68, 72–73, 76–79, 82–83, 85, 88–89, 97, 99–103, 105, 125, 127, 134, 138, 142–146, 148, 171–172, 174, 176–178, 180–182, 184, 188–189, 191–192, 194, 204, 207, 210, 215, 218, 221, 229–230, 234
TOTAL	13578	6213	54%

blocks are preserved

This reverts commit 827507b.

openhands-ai · 2025-12-23T22:43:20Z

Looks like there are a few issues preventing this PR from being merged!

GitHub Actions are failing:
- Pre-commit checks

If you'd like me to help, just leave a comment, like

@OpenHands please fix the failing actions on PR #1412 at branch `csmith49/tool-call-aware-condensation`

Feel free to include any additional details that might help me get this PR into a better state.

_{^{You can manage your notification settings}}

When thinking is enabled, the Claude API requires that the final assistant message starts with a thinking block. After condensation, some ActionEvents may have thinking blocks while others don't, causing API rejection. This fix: 1. Extends manipulation_indices to track thinking blocks in batches and merge all batches from the last thinking batch to the end as a single atomic unit, preventing partial removal that would leave inconsistent thinking block state. 2. Adds _enforce_thinking_block_consistency method to ensure that when a batch with thinking blocks is removed, all subsequent batches without thinking blocks are also removed. 3. Updates existing tests to include thinking_blocks attribute on mock ActionEvent objects. 4. Adds comprehensive tests for thinking block consistency scenarios. Fixes #1438 Co-authored-by: openhands <openhands@all-hands.dev>

github-actions · 2025-12-23T23:05:59Z

Hi! I started running the integration tests on your PR. You will receive a comment with the results shortly.

Instead of merging all batches from the last thinking batch to the end, we now simply remove cut points after batches without thinking blocks. This ensures any valid cut leaves a final batch with thinking blocks, while giving the condenser more manipulation points. The key insight: cut points are only allowed after batches WITH thinking. This way, the final batch after any cut will always have thinking blocks. Co-authored-by: openhands <openhands@all-hands.dev>

The previous implementation only removed cut points immediately after non-thinking batches. But if non-batch events (like Condensation or ConversationErrorEvent) follow a non-thinking batch, those cut points were incorrectly kept. The fix uses a whitelist approach: only allow cut points that are either before the first batch (no batches kept) or immediately after a batch WITH thinking blocks. For the trajectory in issue #1438 (188 events, 90 batches, 3 with thinking): - Before: 12 manipulation indices (including invalid ones like 187, 188) - After: 6 manipulation indices (all valid: 0, 1, 2, 4, 61, 126) Co-authored-by: openhands <openhands@all-hands.dev>

github-actions · 2025-12-23T23:18:12Z

🧪 Integration Tests Results

Overall Success Rate: 94.1%
Total Cost: $2.26
Models Tested: 6
Timestamp: 2025-12-23 23:18:06 UTC

📁 Detailed Logs & Artifacts

Click the links below to access detailed agent/LLM logs showing the complete reasoning process for each model. On the GitHub Actions page, scroll down to the 'Artifacts' section to download the logs.

litellm_proxy_vertex_ai_gemini_3_pro_preview: 📥 View & Download Logs
litellm_proxy_gpt_5.1_codex_max: 📥 View & Download Logs
litellm_proxy_claude_sonnet_4_5_20250929: 📥 View & Download Logs
litellm_proxy_mistral_devstral_2512: 📥 View & Download Logs
litellm_proxy_deepseek_deepseek_chat: 📥 View & Download Logs
litellm_proxy_moonshot_kimi_k2_thinking: 📥 View & Download Logs

📊 Summary

Model	Overall	Integration (Required)	Behavior (Optional)	Tests Passed	Skipped	Total	Cost	Tokens
litellm_proxy_vertex_ai_gemini_3_pro_preview	100.0%	100.0%	N/A	9/9	0	9	$0.53	448,075
litellm_proxy_gpt_5.1_codex_max	77.8%	77.8%	N/A	7/9	0	9	$0.16	216,463
litellm_proxy_claude_sonnet_4_5_20250929	100.0%	100.0%	N/A	9/9	0	9	$0.53	260,798
litellm_proxy_mistral_devstral_2512	87.5%	87.5%	N/A	7/8	1	9	$0.16	393,687
litellm_proxy_deepseek_deepseek_chat	100.0%	100.0%	N/A	8/8	1	9	$0.04	406,096
litellm_proxy_moonshot_kimi_k2_thinking	100.0%	100.0%	N/A	8/8	1	9	$0.84	1,361,148

📋 Detailed Results

litellm_proxy_vertex_ai_gemini_3_pro_preview

Overall Success Rate: 100.0% (9/9)
Integration Tests (Required): 100.0% (9/9)
Total Cost: $0.53
Token Usage: prompt: 432,840, completion: 15,235, cache_read: 290,173, reasoning: 10,602
Run Suffix: litellm_proxy_vertex_ai_gemini_3_pro_preview_e624161_gemini_3_pro_run_N9_20251223_230626

litellm_proxy_gpt_5.1_codex_max

Overall Success Rate: 77.8% (7/9)
Integration Tests (Required): 77.8% (7/9)
Total Cost: $0.16
Token Usage: prompt: 212,710, completion: 3,753, cache_read: 130,176, reasoning: 1,920
Run Suffix: litellm_proxy_gpt_5.1_codex_max_e624161_gpt51_codex_run_N9_20251223_230625

Failed Tests:

t09_token_condenser ⚠️ REQUIRED: Condensation not triggered. Token counting may not work. (Cost: $0.0014)
t08_image_file_viewing ⚠️ REQUIRED: Agent did not identify yellow color in the logo. Response: i’m sorry—i don’t actually see the image contents. could you re-upload the logo.png here so i can check its colors? (Cost: $0.0071)

litellm_proxy_claude_sonnet_4_5_20250929

Overall Success Rate: 100.0% (9/9)
Integration Tests (Required): 100.0% (9/9)
Total Cost: $0.53
Token Usage: prompt: 247,414, completion: 13,384, cache_read: 172,954, cache_write: 73,438, reasoning: 2,282
Run Suffix: litellm_proxy_claude_sonnet_4_5_20250929_e624161_sonnet_run_N9_20251223_230626

litellm_proxy_mistral_devstral_2512

Overall Success Rate: 87.5% (7/8)
Integration Tests (Required): 87.5% (7/9)
Total Cost: $0.16
Token Usage: prompt: 389,856, completion: 3,831
Run Suffix: litellm_proxy_mistral_devstral_2512_e624161_devstral_2512_run_N9_20251223_230617
Skipped Tests: 1

Skipped Tests:

t08_image_file_viewing: This test requires a vision-capable LLM model. Please use a model that supports image input.

Failed Tests:

t02_add_bash_hello ⚠️ REQUIRED: Shell script is not executable (Cost: $0.0084)

litellm_proxy_deepseek_deepseek_chat

Overall Success Rate: 100.0% (8/8)
Integration Tests (Required): 100.0% (8/9)
Total Cost: $0.04
Token Usage: prompt: 395,193, completion: 10,903, cache_read: 370,560
Run Suffix: litellm_proxy_deepseek_deepseek_chat_e624161_deepseek_run_N9_20251223_230629
Skipped Tests: 1

Skipped Tests:

t08_image_file_viewing: This test requires a vision-capable LLM model. Please use a model that supports image input.

litellm_proxy_moonshot_kimi_k2_thinking

Overall Success Rate: 100.0% (8/8)
Integration Tests (Required): 100.0% (8/9)
Total Cost: $0.84
Token Usage: prompt: 1,350,037, completion: 11,111, cache_read: 1,246,613
Run Suffix: litellm_proxy_moonshot_kimi_k2_thinking_e624161_kimi_k2_run_N9_20251223_230629
Skipped Tests: 1

Skipped Tests:

t08_image_file_viewing: This test requires a vision-capable LLM model. Please use a model that supports image input.

This reverts commit b76b942.

This reverts commit 58e20e8.

This reverts commit e624161.

github-actions · 2025-12-23T23:24:58Z

Hi! I started running the integration tests on your PR. You will receive a comment with the results shortly.

github-actions · 2025-12-23T23:32:14Z

🧪 Integration Tests Results

Overall Success Rate: 96.1%
Total Cost: $1.65
Models Tested: 6
Timestamp: 2025-12-23 23:32:07 UTC

📁 Detailed Logs & Artifacts

Click the links below to access detailed agent/LLM logs showing the complete reasoning process for each model. On the GitHub Actions page, scroll down to the 'Artifacts' section to download the logs.

litellm_proxy_gpt_5.1_codex_max: 📥 View & Download Logs
litellm_proxy_claude_sonnet_4_5_20250929: 📥 View & Download Logs
litellm_proxy_deepseek_deepseek_chat: 📥 View & Download Logs
litellm_proxy_mistral_devstral_2512: 📥 View & Download Logs
litellm_proxy_moonshot_kimi_k2_thinking: 📥 View & Download Logs
litellm_proxy_vertex_ai_gemini_3_pro_preview: 📥 View & Download Logs

📊 Summary

Model	Overall	Integration (Required)	Behavior (Optional)	Tests Passed	Skipped	Total	Cost	Tokens
litellm_proxy_gpt_5.1_codex_max	88.9%	88.9%	N/A	8/9	0	9	$0.42	472,752
litellm_proxy_claude_sonnet_4_5_20250929	100.0%	100.0%	N/A	9/9	0	9	$0.46	275,856
litellm_proxy_deepseek_deepseek_chat	100.0%	100.0%	N/A	8/8	1	9	$0.04	395,014
litellm_proxy_mistral_devstral_2512	87.5%	87.5%	N/A	7/8	1	9	$0.15	363,650
litellm_proxy_moonshot_kimi_k2_thinking	100.0%	100.0%	N/A	8/8	1	9	$0.17	256,547
litellm_proxy_vertex_ai_gemini_3_pro_preview	100.0%	100.0%	N/A	9/9	0	9	$0.40	258,147

📋 Detailed Results

litellm_proxy_gpt_5.1_codex_max

Overall Success Rate: 88.9% (8/9)
Integration Tests (Required): 88.9% (8/9)
Total Cost: $0.42
Token Usage: prompt: 452,881, completion: 19,871, cache_read: 307,840, reasoning: 17,344
Run Suffix: litellm_proxy_gpt_5.1_codex_max_e43a697_gpt51_codex_run_N9_20251223_232521

Failed Tests:

t09_token_condenser ⚠️ REQUIRED: Condensation not triggered. Token counting may not work. (Cost: $0.22)

litellm_proxy_claude_sonnet_4_5_20250929

Overall Success Rate: 100.0% (9/9)
Integration Tests (Required): 100.0% (9/9)
Total Cost: $0.46
Token Usage: prompt: 267,212, completion: 8,644, cache_read: 194,003, cache_write: 72,725, reasoning: 2,502
Run Suffix: litellm_proxy_claude_sonnet_4_5_20250929_e43a697_sonnet_run_N9_20251223_232520

litellm_proxy_deepseek_deepseek_chat

Overall Success Rate: 100.0% (8/8)
Integration Tests (Required): 100.0% (8/9)
Total Cost: $0.04
Token Usage: prompt: 386,606, completion: 8,408, cache_read: 352,320
Run Suffix: litellm_proxy_deepseek_deepseek_chat_e43a697_deepseek_run_N9_20251223_232533
Skipped Tests: 1

Skipped Tests:

t08_image_file_viewing: This test requires a vision-capable LLM model. Please use a model that supports image input.

litellm_proxy_mistral_devstral_2512

Overall Success Rate: 87.5% (7/8)
Integration Tests (Required): 87.5% (7/9)
Total Cost: $0.15
Token Usage: prompt: 359,891, completion: 3,759
Run Suffix: litellm_proxy_mistral_devstral_2512_e43a697_devstral_2512_run_N9_20251223_232519
Skipped Tests: 1

Skipped Tests:

t08_image_file_viewing: This test requires a vision-capable LLM model. Please use a model that supports image input.

Failed Tests:

t02_add_bash_hello ⚠️ REQUIRED: Shell script is not executable (Cost: $0.0085)

litellm_proxy_moonshot_kimi_k2_thinking

Overall Success Rate: 100.0% (8/8)
Integration Tests (Required): 100.0% (8/9)
Total Cost: $0.17
Token Usage: prompt: 247,309, completion: 9,238, cache_read: 200,341
Run Suffix: litellm_proxy_moonshot_kimi_k2_thinking_e43a697_kimi_k2_run_N9_20251223_232524
Skipped Tests: 1

Skipped Tests:

t08_image_file_viewing: This test requires a vision-capable LLM model. Please use a model that supports image input.

litellm_proxy_vertex_ai_gemini_3_pro_preview

Overall Success Rate: 100.0% (9/9)
Integration Tests (Required): 100.0% (9/9)
Total Cost: $0.40
Token Usage: prompt: 245,020, completion: 13,127, cache_read: 134,900, reasoning: 9,186
Run Suffix: litellm_proxy_vertex_ai_gemini_3_pro_preview_e43a697_gemini_3_pro_run_N9_20251223_232521

Calvin Smith and others added 22 commits December 10, 2025 08:25

token-aware utils, first pass

ea026aa

tests and type fixes for token-aware utils

4093731

condenser api now takes agent llm

980c1af

trigger condensation w/ token limits

0d2b16d

minor refactor of llm summarizing condenser

d3d875e

one last refactor

7504dba

resolution of multiple condensation reasons at once

753d088

updating tests

226b488

events_from_tail calculation fix

81b5590

fixing aggressive condensation logic

cf710cd

tests for combos of reasons

de66479

Merge branch 'main' into csmith49/token-aware-condensation

3bafbec

linting

16a5be5

minor formatting errors

bc63019

ignoring unknown attributes in tests

4bba5fd

Merge branch 'main' into csmith49/token-aware-condensation

119e868

fixing type hints with overloaded prepare_llm_messages

093eeb9

removing TYPE_CHECKING flags

e5518b5

initial manipulation indices function

da89f2b

manipulation index aware summarizing condenser

6a21600

warnings on old view filters

6eb62c5

csmith49 mentioned this pull request Dec 16, 2025

Bug: Condensation summary can be inserted between action and observation, breaking LLM API message ordering #1395

Open

linting in tests

288e7c0

csmith49 mentioned this pull request Dec 18, 2025

Condenser creates inconsistent thinking blocks causing Claude API errors #1438

Open

Calvin Smith added 2 commits December 18, 2025 10:30

test that thinking

40ba7d6

blocks are preserved

linting

0297ea2

csmith49 marked this pull request as ready for review December 19, 2025 05:12

csmith49 mentioned this pull request Dec 19, 2025

Fix batch atomicity when condensation forgets ObservationEvents #1450

Merged

Calvin Smith added 2 commits December 22, 2025 13:05

condenser api llm -> agent_llm

827507b

Revert "condenser api llm -> agent_llm"

65c203f

This reverts commit 827507b.

Base automatically changed from csmith49/token-aware-condensation to main December 23, 2025 21:34

csmith49 and others added 10 commits December 23, 2025 15:41

Merge branch 'main' into csmith49/tool-call-aware-condensation

74c0d52

fixing silly merge

76fb6d1

minor caching optimization on views manipulationIndices

bce355c

minor documenation cleanup

ebbebf8

simplification to view's manipulation indices api

af5a1e6

more realistic view tests

eaf0112

linting

342643a

simplifying action batch signatures with base model

6a31a37

minor comments pass

1992a18

convert from additive to subtractive strategy

e4c766c

csmith49 added the integration-test Runs the integration tests and comments the results label Dec 23, 2025

openhands-agent added 2 commits December 23, 2025 23:07

Calvin Smith added 4 commits December 23, 2025 17:18

Revert "Fix thinking block consistency for non-batch events"

c78e2fd

This reverts commit b76b942.

Revert "Simplify thinking block consistency in manipulation_indices"

ff10b03

This reverts commit 58e20e8.

Revert "Fix thinking block consistency after condensation"

1ca993d

This reverts commit e624161.

linting

e43a697

csmith49 added integration-test Runs the integration tests and comments the results and removed integration-test Runs the integration tests and comments the results labels Dec 23, 2025

csmith49 merged commit 014c6d4 into main Dec 23, 2025
35 checks passed

csmith49 deleted the csmith49/tool-call-aware-condensation branch December 23, 2025 23:44

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix(condenser): Tool-call aware condensation #1412

fix(condenser): Tool-call aware condensation #1412

csmith49 commented Dec 16, 2025 •

edited by github-actions bot

Loading

Uh oh!

github-actions bot commented Dec 16, 2025 •

edited

Loading

Uh oh!

openhands-ai bot commented Dec 23, 2025

Uh oh!

github-actions bot commented Dec 23, 2025

Uh oh!

github-actions bot commented Dec 23, 2025

Uh oh!

github-actions bot commented Dec 23, 2025

Uh oh!

github-actions bot commented Dec 23, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

fix(condenser): Tool-call aware condensation #1412

fix(condenser): Tool-call aware condensation #1412

Conversation

csmith49 commented Dec 16, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Design Choices

Tradeoffs

Summary of Changes

Uh oh!

github-actions bot commented Dec 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

openhands-ai bot commented Dec 23, 2025

Uh oh!

github-actions bot commented Dec 23, 2025

Uh oh!

github-actions bot commented Dec 23, 2025

🧪 Integration Tests Results

📁 Detailed Logs & Artifacts

📊 Summary

📋 Detailed Results

litellm_proxy_vertex_ai_gemini_3_pro_preview

litellm_proxy_gpt_5.1_codex_max

litellm_proxy_claude_sonnet_4_5_20250929

litellm_proxy_mistral_devstral_2512

litellm_proxy_deepseek_deepseek_chat

litellm_proxy_moonshot_kimi_k2_thinking

Uh oh!

github-actions bot commented Dec 23, 2025

Uh oh!

github-actions bot commented Dec 23, 2025

🧪 Integration Tests Results

📁 Detailed Logs & Artifacts

📊 Summary

📋 Detailed Results

litellm_proxy_gpt_5.1_codex_max

litellm_proxy_claude_sonnet_4_5_20250929

litellm_proxy_deepseek_deepseek_chat

litellm_proxy_mistral_devstral_2512

litellm_proxy_moonshot_kimi_k2_thinking

litellm_proxy_vertex_ai_gemini_3_pro_preview

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

csmith49 commented Dec 16, 2025 •

edited by github-actions bot

Loading

github-actions bot commented Dec 16, 2025 •

edited

Loading