-
Notifications
You must be signed in to change notification settings - Fork 95
Release v1.7.0 #1486
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Release v1.7.0 #1486
Conversation
Co-authored-by: openhands <openhands@all-hands.dev>
|
Hi! I started running the behavior tests on your PR. You will receive a comment with the results shortly. |
|
Hi! I started running the integration tests on your PR. You will receive a comment with the results shortly. |
🔄 Running Examples with
|
| Example | Status | Duration | Cost |
|---|---|---|---|
| 01_standalone_sdk/02_custom_tools.py | ✅ PASS | 27.6s | $0.03 |
| 01_standalone_sdk/03_activate_skill.py | ✅ PASS | 17.0s | $0.02 |
| 01_standalone_sdk/05_use_llm_registry.py | ✅ PASS | 10.0s | $0.01 |
| 01_standalone_sdk/07_mcp_integration.py | ✅ PASS | 33.2s | $0.03 |
| 01_standalone_sdk/09_pause_example.py | ✅ PASS | 16.5s | $0.01 |
| 01_standalone_sdk/10_persistence.py | ✅ PASS | 22.8s | $0.02 |
| 01_standalone_sdk/11_async.py | ✅ PASS | 30.1s | $0.03 |
| 01_standalone_sdk/12_custom_secrets.py | ✅ PASS | 19.1s | $0.01 |
| 01_standalone_sdk/13_get_llm_metrics.py | ✅ PASS | 19.1s | $0.02 |
| 01_standalone_sdk/14_context_condenser.py | ✅ PASS | 2m 41s | $0.36 |
| 01_standalone_sdk/17_image_input.py | ✅ PASS | 15.2s | $0.02 |
| 01_standalone_sdk/18_send_message_while_processing.py | ✅ PASS | 24.0s | $0.01 |
| 01_standalone_sdk/19_llm_routing.py | ✅ PASS | 15.9s | $0.02 |
| 01_standalone_sdk/20_stuck_detector.py | ✅ PASS | 13.1s | $0.02 |
| 01_standalone_sdk/21_generate_extraneous_conversation_costs.py | ✅ PASS | 8.2s | $0.00 |
| 01_standalone_sdk/22_anthropic_thinking.py | ✅ PASS | 31.8s | $0.03 |
| 01_standalone_sdk/23_responses_reasoning.py | ✅ PASS | 1m 15s | $0.02 |
| 01_standalone_sdk/24_planning_agent_workflow.py | ✅ PASS | 4m 5s | $0.29 |
| 01_standalone_sdk/25_agent_delegation.py | ✅ PASS | 2m 25s | $0.19 |
| 01_standalone_sdk/26_custom_visualizer.py | ✅ PASS | 20.5s | $0.02 |
| 01_standalone_sdk/28_ask_agent_example.py | ✅ PASS | 33.9s | $0.03 |
| 01_standalone_sdk/29_llm_streaming.py | ✅ PASS | 39.1s | $0.03 |
| 01_standalone_sdk/30_gemini_file_tools.py | ❌ FAIL Missing EXAMPLE_COST marker in stdout |
21.3s | -- |
| 01_standalone_sdk/30_tom_agent.py | ✅ PASS | 9.5s | $0.01 |
| 01_standalone_sdk/31_iterative_refinement.py | ✅ PASS | 3m 52s | $0.26 |
| 01_standalone_sdk/32_configurable_security_policy.py | ✅ PASS | 19.6s | $0.02 |
| 02_remote_agent_server/01_convo_with_local_agent_server.py | ✅ PASS | 1m 27s | $0.06 |
| 02_remote_agent_server/02_convo_with_docker_sandboxed_server.py | ✅ PASS | 1m 29s | -- |
| 02_remote_agent_server/03_browser_use_with_docker_sandboxed_server.py | ✅ PASS | 1m 33s | $0.05 |
| 02_remote_agent_server/04_convo_with_api_sandboxed_server.py | ✅ PASS | 1m 37s | $0.04 |
| 02_remote_agent_server/06_convo_with_cloud_workspace.py | ❌ FAIL Exit code 1 |
2.8s | -- |
❌ Some tests failed
Total: 31 | Passed: 29 | Failed: 2 | Total Cost: $1.63
Failed examples:
- examples/01_standalone_sdk/30_gemini_file_tools.py: Missing EXAMPLE_COST marker in stdout
- examples/02_remote_agent_server/06_convo_with_cloud_workspace.py: Exit code 1
🧪 Integration Tests ResultsOverall Success Rate: 97.8% 📁 Detailed Logs & ArtifactsClick the links below to access detailed agent/LLM logs showing the complete reasoning process for each model. On the GitHub Actions page, scroll down to the 'Artifacts' section to download the logs.
📊 Summary
📋 Detailed Resultslitellm_proxy_moonshot_kimi_k2_thinking
Skipped Tests:
litellm_proxy_gpt_5.1_codex_max
litellm_proxy_vertex_ai_gemini_3_pro_preview
litellm_proxy_deepseek_deepseek_chat
Skipped Tests:
litellm_proxy_mistral_devstral_2512
Skipped Tests:
Failed Tests:
litellm_proxy_claude_sonnet_4_5_20250929
|
|
@OpenHands please remove 01_standalone_sdk/30_gemini_file_tools.py and fix examples CI workflow to load the OpenHands SaaS API from secrets ALLHANDS_BOT_OPENHANDS_SAAS_API_KEY for 02_remote_agent_server/06_convo_with_cloud_workspace.py |
|
I'm on it! xingyaoww can track my progress at all-hands.dev |
…mples CI - Remove examples/01_standalone_sdk/30_gemini_file_tools.py - Add OPENHANDS_CLOUD_API_KEY env var from ALLHANDS_BOT_OPENHANDS_SAAS_API_KEY secret for 02_remote_agent_server/06_convo_with_cloud_workspace.py Co-authored-by: openhands <openhands@all-hands.dev>
|
I've completed the requested changes and pushed them to the Changes Made✅ Removed ✅ Updated Commit Details
The changes have been pushed to the PR #1486. |
🧪 Integration Tests ResultsOverall Success Rate: 85.0% 📁 Detailed Logs & ArtifactsClick the links below to access detailed agent/LLM logs showing the complete reasoning process for each model. On the GitHub Actions page, scroll down to the 'Artifacts' section to download the logs.
📊 Summary
📋 Detailed Resultslitellm_proxy_gpt_5.1_codex_max
litellm_proxy_claude_sonnet_4_5_20250929
Failed Tests:
Errors logged to '/tmp/tmplodlt133/lerobot/.git/lfs/logs/20251222T185251.21379438.log'. litellm_proxy_deepseek_deepseek_chat
Failed Tests:
The evaluation criteria specifically stated: "Stop after reporting the change and results, inviting further direction." The agent instead continued investigating and modifying related constants, then provided a long explanation about other limits that might need changing. The technical work is sound and tests pass, but the approach doesn't follow the specified constraints about verification scope and stopping at the right point. (confidence=0.78) (Cost: $0.13) litellm_proxy_moonshot_kimi_k2_thinking
Failed Tests:
Errors logged to '/tmp/tmpr5cmz1d8/lerobot/.git/lfs/logs/20251222T185237.433813698.log'. |
🔄 Running Examples with
|
| Example | Status | Duration | Cost |
|---|---|---|---|
| 01_standalone_sdk/02_custom_tools.py | ✅ PASS | 26.5s | $0.03 |
| 01_standalone_sdk/03_activate_skill.py | ✅ PASS | 16.4s | $0.02 |
| 01_standalone_sdk/05_use_llm_registry.py | ✅ PASS | 11.6s | $0.01 |
| 01_standalone_sdk/07_mcp_integration.py | ✅ PASS | 29.8s | $0.02 |
| 01_standalone_sdk/09_pause_example.py | ✅ PASS | 13.4s | $0.01 |
| 01_standalone_sdk/10_persistence.py | ✅ PASS | 25.2s | $0.02 |
| 01_standalone_sdk/11_async.py | ✅ PASS | 31.5s | $0.03 |
| 01_standalone_sdk/12_custom_secrets.py | ✅ PASS | 13.8s | $0.01 |
| 01_standalone_sdk/13_get_llm_metrics.py | ✅ PASS | 17.9s | $0.01 |
| 01_standalone_sdk/14_context_condenser.py | ✅ PASS | 2m 24s | $0.30 |
| 01_standalone_sdk/17_image_input.py | ✅ PASS | 15.0s | $0.02 |
| 01_standalone_sdk/18_send_message_while_processing.py | ✅ PASS | 22.1s | $0.01 |
| 01_standalone_sdk/19_llm_routing.py | ✅ PASS | 13.8s | $0.02 |
| 01_standalone_sdk/20_stuck_detector.py | ✅ PASS | 14.5s | $0.02 |
| 01_standalone_sdk/21_generate_extraneous_conversation_costs.py | ✅ PASS | 9.7s | $0.00 |
| 01_standalone_sdk/22_anthropic_thinking.py | ✅ PASS | 17.0s | $0.01 |
| 01_standalone_sdk/23_responses_reasoning.py | ✅ PASS | 59.8s | $0.02 |
| 01_standalone_sdk/24_planning_agent_workflow.py | ✅ PASS | 3m 1s | $0.22 |
| 01_standalone_sdk/25_agent_delegation.py | ❌ FAIL Exit code 1 |
24.6s | -- |
| 01_standalone_sdk/26_custom_visualizer.py | ✅ PASS | 17.7s | $0.02 |
| 01_standalone_sdk/28_ask_agent_example.py | ✅ PASS | 42.1s | $0.02 |
| 01_standalone_sdk/29_llm_streaming.py | ✅ PASS | 41.4s | $0.03 |
| 01_standalone_sdk/30_tom_agent.py | ✅ PASS | 7.5s | $0.01 |
| 01_standalone_sdk/31_iterative_refinement.py | ✅ PASS | 4m 10s | $0.28 |
| 01_standalone_sdk/32_configurable_security_policy.py | ✅ PASS | 20.1s | $0.02 |
| 02_remote_agent_server/01_convo_with_local_agent_server.py | ✅ PASS | 46.1s | $0.02 |
| 02_remote_agent_server/02_convo_with_docker_sandboxed_server.py | ✅ PASS | 1m 10s | $0.04 |
| 02_remote_agent_server/03_browser_use_with_docker_sandboxed_server.py | ✅ PASS | 1m 29s | $0.07 |
| 02_remote_agent_server/04_convo_with_api_sandboxed_server.py | ✅ PASS | 1m 29s | $0.06 |
| 02_remote_agent_server/06_convo_with_cloud_workspace.py | ❌ FAIL Exit code 1 |
7.3s | -- |
❌ Some tests failed
Total: 30 | Passed: 28 | Failed: 2 | Total Cost: $1.38
Failed examples:
- examples/01_standalone_sdk/25_agent_delegation.py: Exit code 1
- examples/02_remote_agent_server/06_convo_with_cloud_workspace.py: Exit code 1
Why the example
|
|
Update: Closing PR #1489 as the fix will come naturally when the SaaS is upgraded to the latest SDK version. Once the SaaS is upgraded, it will serialize |
|
@OpenHands please fix the failing actions on PR #1486 at branch |
|
I'm on it! xingyaoww can track my progress at all-hands.dev |
The tools-tests job was failing due to the self-hosted runner losing communication with the server. This adds a 15-minute timeout to prevent the job from hanging indefinitely and to help identify if tests are taking too long. Co-authored-by: openhands <openhands@all-hands.dev>
|
Looks like there are a few issues preventing this PR from being merged!
If you'd like me to help, just leave a comment, like Feel free to include any additional details that might help me get this PR into a better state. You can manage your notification settings |
SummaryI've successfully fixed the failing CI actions on PR #1486 ( Issues Fixed
Current CI Status (All Passing ✅)
Change Summary# .github/workflows/tests.yml
tools-tests:
runs-on: blacksmith-2vcpu-ubuntu-2404
+ timeout-minutes: 15
steps:The change is minimal and targeted - only adding a timeout to prevent the runner communication loss issue that was causing the |
neubig
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There seem like there are some changes that maybe shouldn't be in the release, such as removing the gemini tools example?
|
@neubig that was intended - I believe that example was not documented in the docs repo & it currently breaks our "test_example" pipeline and the number 30 collides with other examples we had. And irrc, @enyst and i decided to not include examples like this for model-specifc editing tools (we were discussing about codex?) but engel pls correct me if i'm wrong! |
|
My bad, we probably moved too fast on this, sorry @xingyaoww ! Please let me take a step back here. From the perspective of client code developers (apps, who maybe prefer or test with a model or another):
They are not on by default. IMHO maybe we could make them default if and when the preset gets better eval performance than we have with these LLMs on the default tools? Graham has tried a 50 eval on Gemini tools PR, and it was I think 66% for the subset vs 70% overall. That doesn't sound great, but I don't know what was expected for the subset, it might also have been a weird one I suppose. TL;DR: IMHO maybe we can consider to give people an easy way (a preset agent for gpt/gemini) to
|
|
That said, that's not really about the release. I think maybe we don't need to delay it? (Edited to add) The agent is fixing there the issues you identified with the Gemini example. We can maybe, idk, think about it there, if we're not sure? |
|
@enyst sounds good! maybe we could release this PR as is (w/o the example), and think of a more general way to include those LLM-specific examples. Only thing i worry about this is that there could be just too much examples related to tools specifically made for each LLMs -- maybe we can create a folder like
|
|
@OpenHands Read this comment: #1486 (comment) Apply it to PR 1462 |
|
I'm on it! enyst can track my progress at all-hands.dev |
- Move GPT-5 apply patch preset example to 04_llm_specific_tools/01_gpt5_apply_patch_preset.py - Move Gemini file tools example to 04_llm_specific_tools/02_gemini_file_tools.py - Update usage path in docstring This organizes LLM-specific tool examples into a dedicated folder as suggested in PR #1486 review. Co-authored-by: openhands <openhands@all-hands.dev>
|
I've applied the suggestion from the comment on PR #1486 to PR #1462. The changes have been pushed to the Summary of ChangesFollowing the suggestion to organize LLM-specific examples into a dedicated folder:
This organizes LLM-specific tool examples (GPT-5 ApplyPatchTool, Gemini file tools) into a dedicated folder as suggested, making it easier to add future LLM-specific examples without cluttering the main standalone SDK examples folder. |

Agent Server images for this PR
• GHCR package: https://github.com/OpenHands/agent-sdk/pkgs/container/agent-server
Variants & Base Images
eclipse-temurin:17-jdknikolaik/python-nodejs:python3.12-nodejs22golang:1.21-bookwormPull (multi-arch manifest)
# Each variant is a multi-arch manifest supporting both amd64 and arm64 docker pull ghcr.io/openhands/agent-server:60960dd-pythonRun
All tags pushed for this build
About Multi-Architecture Support
60960dd-python) is a multi-arch manifest supporting both amd64 and arm6460960dd-python-amd64) are also available if needed