-
Notifications
You must be signed in to change notification settings - Fork 95
feat(condenser): Token-aware condensation in LLMSummarizingCondenser #1380
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Coverage Report •
|
|||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
@OpenHands please fix the failing actions on PR #1380 at branch |
|
I'm on it! csmith49 can track my progress at all-hands.dev |
The test file was importing from 'resolve_model_configs' (plural) but the actual file is 'resolve_model_config.py' (singular). Also updated the test functions to match the actual function signature which takes only model_ids and uses the global MODELS dictionary. Co-authored-by: openhands <openhands@all-hands.dev>
|
I have successfully fixed the failing GitHub Actions on PR #1380 for the Issues Fixed:
Verification Results:
The failing "Agent Server" GitHub Action should now pass. The changes are minimal and focused only on fixing the test issues without affecting the core token-aware condensation functionality. |
|
[Automatic Post]: I have assigned @simonrosenberg as a reviewer based on git blame information. Thanks in advance for the help! |
🧪 Integration Tests ResultsOverall Success Rate: 98.0% 📁 Detailed Logs & ArtifactsClick the links below to access detailed agent/LLM logs showing the complete reasoning process for each model. On the GitHub Actions page, scroll down to the 'Artifacts' section to download the logs.
📊 Summary
📋 Detailed Resultslitellm_proxy_vertex_ai_gemini_3_pro_preview
litellm_proxy_deepseek_deepseek_chat
Skipped Tests:
litellm_proxy_mistral_devstral_2512
Skipped Tests:
Failed Tests:
litellm_proxy_moonshot_kimi_k2_thinking
Skipped Tests:
litellm_proxy_gpt_5.1_codex_max
litellm_proxy_claude_sonnet_4_5_20250929
|
|
Hi! I started running the integration tests on your PR. You will receive a comment with the results shortly. |
🧪 Integration Tests ResultsOverall Success Rate: 88.2% 📁 Detailed Logs & ArtifactsClick the links below to access detailed agent/LLM logs showing the complete reasoning process for each model. On the GitHub Actions page, scroll down to the 'Artifacts' section to download the logs.
📊 Summary
📋 Detailed Resultslitellm_proxy_mistral_devstral_2512
Skipped Tests:
Failed Tests:
litellm_proxy_vertex_ai_gemini_3_pro_preview
Failed Tests:
litellm_proxy_deepseek_deepseek_chat
Skipped Tests:
Failed Tests:
litellm_proxy_moonshot_kimi_k2_thinking
Skipped Tests:
Failed Tests:
litellm_proxy_claude_sonnet_4_5_20250929
Failed Tests:
litellm_proxy_gpt_5.1_codex_max
Failed Tests:
|
|
Hi! I started running the integration tests on your PR. You will receive a comment with the results shortly. |
🧪 Integration Tests ResultsOverall Success Rate: 90.2% 📁 Detailed Logs & ArtifactsClick the links below to access detailed agent/LLM logs showing the complete reasoning process for each model. On the GitHub Actions page, scroll down to the 'Artifacts' section to download the logs.
📊 Summary
📋 Detailed Resultslitellm_proxy_mistral_devstral_2512
Skipped Tests:
Failed Tests:
litellm_proxy_gpt_5.1_codex_max
Failed Tests:
litellm_proxy_claude_sonnet_4_5_20250929
litellm_proxy_deepseek_deepseek_chat
Skipped Tests:
litellm_proxy_vertex_ai_gemini_3_pro_preview
litellm_proxy_moonshot_kimi_k2_thinking
Skipped Tests:
|
|
Apologies @xingyaoww @enyst for the false starts on this PR. I've got an integration test added that tries to generate a really long token sequence without user intervention. All the models pass the test except devstral and gpt 5.1 (which refuse to do something so inefficient) and the models I've checked have reasonable summaries generated and events forgotten. |
|
Hi! I started running the integration tests on your PR. You will receive a comment with the results shortly. |
🧪 Integration Tests ResultsOverall Success Rate: 96.1% 📁 Detailed Logs & ArtifactsClick the links below to access detailed agent/LLM logs showing the complete reasoning process for each model. On the GitHub Actions page, scroll down to the 'Artifacts' section to download the logs.
📊 Summary
📋 Detailed Resultslitellm_proxy_moonshot_kimi_k2_thinking
Skipped Tests:
litellm_proxy_vertex_ai_gemini_3_pro_preview
litellm_proxy_claude_sonnet_4_5_20250929
litellm_proxy_gpt_5.1_codex_max
Failed Tests:
litellm_proxy_deepseek_deepseek_chat
Skipped Tests:
litellm_proxy_mistral_devstral_2512
Skipped Tests:
Failed Tests:
|
|
Did some minor prompt tweaks, GPT 5.1 still refuses to follow due to inefficiency concerns. Considering the existing integration tests have some sporadic failures, I'm okay with this for the moment. I'd like to get these changes in soon b/c they're blocking some upstream fixes on tool-call structure. |
xingyaoww
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM except one nit
| return 0 | ||
|
|
||
| # Check if all events combined don't exceed the token count | ||
| total_tokens = get_total_token_count(events, llm) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: ideally we can first pre-calculate the number of tokens for each event, instead of call get_total_token_count frequently (it can be a relatively computational/network intensive operation)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not obvious to me that'll be more efficient -- really depends on what happens under the hood in the LiteLLM call, which acts over lists of messages.
If it's network-intensive but the list of messages is processed all at once, the best approach is to minimize the number of calls. The binary search approach implemented here will find some optimal split in O(log n) calls instead of the O(n) calls needed to calculate the tokens for each event.
If it's computationally intensive then the cost of each operation scales with the number of events being processed, and we might want to handle each event independently like you suggest.
I don't know how to balance those two without some actual benchmarking.
Considering this PR is blocking some time-critical fixes, and token-aware condensation is not enabled by default yet, I'll mark this as "needs further examination" and open a new issue.
Ahaha how cool are LLMs! 😂 Codex in the latest log:
If I were to guess, we do tell it in the system prompt things like:
|
|
@OpenHands Look at the integration tests workflow, to see the list of LLMs it uses. We want to update 5.1 codex to 5.2-codex. Use the openhands LLM api key in your env to verify that the hello world example in examples/ works with openhands/gpt-5.2-codex or find the correct model name from /v1/models. Open a new PR from main (not this PR!) with the update, once you verified it works. You do not need to run other tests, only a hello world slightly modified temporarily if you need, so that you can run it and see its result in logs and console. |
|
I'm on it! enyst can track my progress at all-hands.dev |
|
Here’s a concise summary of what I did and the outcome. What I changed
Verification
Checklist
Notes
|
This PR adds functionality to support the
LLMSummarizingCondenserin using tokens to trigger condensation, and to direct the condensation strategy.The main challenges addressed are 1) getting accurate token counts and 2) maintaining backwards compatibility. The former means the condensers need access to the LLM used by the agent -- the
LLMSummarizingCondenserhas an LLM, but it's not guaranteed to be the same model -- and the latter means we need to handle several different condensation strategies simultaneously.That last point required a bit of a rework to the internal logic. Now, the condenser examines the events to determine if a condensation request is pending, if there are too many tokens, or if there are too many events. Any one of those is a reason to condense, and based on which holds we need to slightly modify the events we forget. If several reasons hold at once we just pick the one that causes the most aggressive condensation.
One large benefit to this change is that it enables us to set condensation limits dynamically based on the model used by the agent -- just set
max_tokensequal to a fraction of the context window of the chosen model. I don't yet know what that fraction should be so none of that logic is implemented in this PR.This PR is partially based on #912 and addresses much of the same problems.
Changes
Condenser.condense(...)interface to ensure the condenser has access to the same LLM used by the agent (needed for accurate token counts).utils.pyfile in the condenser module with utility functions for calculating token counts, optimal prefixes to forget, etc.LLMSummarizingCondenser.max_tokensparameter for setting token limits.LLMSummarizingCondenserto handle multiple condensation reasons simultaneously.Agent Server images for this PR
• GHCR package: https://github.com/OpenHands/agent-sdk/pkgs/container/agent-server
Variants & Base Images
eclipse-temurin:17-jdknikolaik/python-nodejs:python3.12-nodejs22golang:1.21-bookwormPull (multi-arch manifest)
# Each variant is a multi-arch manifest supporting both amd64 and arm64 docker pull ghcr.io/openhands/agent-server:6e4cbcb-pythonRun
All tags pushed for this build
About Multi-Architecture Support
6e4cbcb-python) is a multi-arch manifest supporting both amd64 and arm646e4cbcb-python-amd64) are also available if needed