-
Notifications
You must be signed in to change notification settings - Fork 95
fix(condenser): Tool-call aware condensation #1412
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
The test file was importing from 'resolve_model_configs' (plural) but the actual file is 'resolve_model_config.py' (singular). Also updated the test functions to match the actual function signature which takes only model_ids and uses the global MODELS dictionary. Co-authored-by: openhands <openhands@all-hands.dev>
Coverage Report •
|
||||||||||||||||||||||||||||||
blocks are preserved
This reverts commit 827507b.
|
Looks like there are a few issues preventing this PR from being merged!
If you'd like me to help, just leave a comment, like Feel free to include any additional details that might help me get this PR into a better state. You can manage your notification settings |
When thinking is enabled, the Claude API requires that the final assistant message starts with a thinking block. After condensation, some ActionEvents may have thinking blocks while others don't, causing API rejection. This fix: 1. Extends manipulation_indices to track thinking blocks in batches and merge all batches from the last thinking batch to the end as a single atomic unit, preventing partial removal that would leave inconsistent thinking block state. 2. Adds _enforce_thinking_block_consistency method to ensure that when a batch with thinking blocks is removed, all subsequent batches without thinking blocks are also removed. 3. Updates existing tests to include thinking_blocks attribute on mock ActionEvent objects. 4. Adds comprehensive tests for thinking block consistency scenarios. Fixes #1438 Co-authored-by: openhands <openhands@all-hands.dev>
|
Hi! I started running the integration tests on your PR. You will receive a comment with the results shortly. |
Instead of merging all batches from the last thinking batch to the end, we now simply remove cut points after batches without thinking blocks. This ensures any valid cut leaves a final batch with thinking blocks, while giving the condenser more manipulation points. The key insight: cut points are only allowed after batches WITH thinking. This way, the final batch after any cut will always have thinking blocks. Co-authored-by: openhands <openhands@all-hands.dev>
The previous implementation only removed cut points immediately after non-thinking batches. But if non-batch events (like Condensation or ConversationErrorEvent) follow a non-thinking batch, those cut points were incorrectly kept. The fix uses a whitelist approach: only allow cut points that are either before the first batch (no batches kept) or immediately after a batch WITH thinking blocks. For the trajectory in issue #1438 (188 events, 90 batches, 3 with thinking): - Before: 12 manipulation indices (including invalid ones like 187, 188) - After: 6 manipulation indices (all valid: 0, 1, 2, 4, 61, 126) Co-authored-by: openhands <openhands@all-hands.dev>
🧪 Integration Tests ResultsOverall Success Rate: 94.1% 📁 Detailed Logs & ArtifactsClick the links below to access detailed agent/LLM logs showing the complete reasoning process for each model. On the GitHub Actions page, scroll down to the 'Artifacts' section to download the logs.
📊 Summary
📋 Detailed Resultslitellm_proxy_vertex_ai_gemini_3_pro_preview
litellm_proxy_gpt_5.1_codex_max
Failed Tests:
litellm_proxy_claude_sonnet_4_5_20250929
litellm_proxy_mistral_devstral_2512
Skipped Tests:
Failed Tests:
litellm_proxy_deepseek_deepseek_chat
Skipped Tests:
litellm_proxy_moonshot_kimi_k2_thinking
Skipped Tests:
|
|
Hi! I started running the integration tests on your PR. You will receive a comment with the results shortly. |
🧪 Integration Tests ResultsOverall Success Rate: 96.1% 📁 Detailed Logs & ArtifactsClick the links below to access detailed agent/LLM logs showing the complete reasoning process for each model. On the GitHub Actions page, scroll down to the 'Artifacts' section to download the logs.
📊 Summary
📋 Detailed Resultslitellm_proxy_gpt_5.1_codex_max
Failed Tests:
litellm_proxy_claude_sonnet_4_5_20250929
litellm_proxy_deepseek_deepseek_chat
Skipped Tests:
litellm_proxy_mistral_devstral_2512
Skipped Tests:
Failed Tests:
litellm_proxy_moonshot_kimi_k2_thinking
Skipped Tests:
litellm_proxy_vertex_ai_gemini_3_pro_preview
|
Certain LLM APIs place restrictions on how messages should be structured, especially with respect to tool calls. This isn't normally a problem, but sometimes the condensers can violate these properties in ways that are difficult to recover from.
This PR makes the primary condenser -- the
LLMSummarizingCondenser-- aware of these restrictions. To do so, we introduce the concept of manipulation indices, which are spots where the conversation history can be changed without violating these properties. The condenser ensures that summaries are only inserted at these indices, and that when events are forgotten it happens from one manipulation index to another.Design Choices
We could make the
Viewobject ensure that these properties cannot be violated. Unfortunately, that means the condenser might produce aCondensationthat violates a property and theViewis now responsible for deciding how to fix it. That's unfortunate because it means you can't read what a condensation is doing purely from the event any more, you need to know how the views will be processed.So instead we make it so the
Viewinforms the condensers of where changes can be made. This keeps the condensation events literal and also means we can enforce more/less constraints on the conversation history by modifying the code generating the manipulation indices.The API restrictions are currently codified in a few functions in the
Viewclass:_enforce_batch_atomicityandfilter_unmatched_tool_calls. These are exactly the same as before, but have been extended with warnings if their property is violated by a condenser. We can remove them and simplify theViewat a later date.Tradeoffs
This of course means there is some slack in the condenser's solutions. The condenser determines forgetting ranges based on the resource usage (events, tokens) of individual events, and these computed ranges are then "projected" into the manipulation index space. The end result is that some condensations may not be what the intuitive semantics imply, but we get the guarantee that the resulting conversation history is well-formed.
Summary of Changes
View.get_manipulation_indicesfunction to tell condensers where changes can be made.Viewmethods to throw a warning when they "fix" the conversation history.LLMSummarizingCondenserto use the manipulation indices. No other current condenser needs to be updated.Agent Server images for this PR
• GHCR package: https://github.com/OpenHands/agent-sdk/pkgs/container/agent-server
Variants & Base Images
eclipse-temurin:17-jdknikolaik/python-nodejs:python3.12-nodejs22golang:1.21-bookwormPull (multi-arch manifest)
# Each variant is a multi-arch manifest supporting both amd64 and arm64 docker pull ghcr.io/openhands/agent-server:24c1423-pythonRun
All tags pushed for this build
About Multi-Architecture Support
24c1423-python) is a multi-arch manifest supporting both amd64 and arm6424c1423-python-amd64) are also available if needed