Skip to content

Release v1.14.1#2548

Closed
all-hands-bot wants to merge 2 commits intomainfrom
rel-1.14.1
Closed

Release v1.14.1#2548
all-hands-bot wants to merge 2 commits intomainfrom
rel-1.14.1

Conversation

@all-hands-bot
Copy link
Copy Markdown
Collaborator

@all-hands-bot all-hands-bot commented Mar 23, 2026

Release v1.14.1

This PR prepares the release for version 1.14.1.

Release Checklist

  • Version set to 1.14.1
  • Fix any deprecation deadlines if they exist
  • Integration tests pass (tagged with integration-test)
  • Behavior tests pass (tagged with behavior-test)
  • Example tests pass (tagged with test-examples)
  • Draft release created at https://github.com/OpenHands/software-agent-sdk/releases/new
    • Select tag: v1.14.1
    • Select branch: rel-1.14.1
    • Auto-generate release notes
    • Publish release (PyPI will auto-publish)
  • Evaluation on OpenHands Index

Next Steps

  1. Review the version changes
  2. Address any deprecation deadlines
  3. Ensure integration tests pass
  4. Ensure behavior tests pass
  5. Ensure example tests pass
  6. Create and publish the release

Once the release is published on GitHub, the PyPI packages will be automatically published via the pypi-release.yml workflow.


Agent Server images for this PR

GHCR package: https://github.com/OpenHands/agent-sdk/pkgs/container/agent-server

Variants & Base Images

Variant Architectures Base Image Docs / Tags
java amd64, arm64 eclipse-temurin:17-jdk Link
python amd64, arm64 nikolaik/python-nodejs:python3.13-nodejs22 Link
golang amd64, arm64 golang:1.21-bookworm Link

Pull (multi-arch manifest)

# Each variant is a multi-arch manifest supporting both amd64 and arm64
docker pull ghcr.io/openhands/agent-server:e77cdd1-python

Run

docker run -it --rm \
  -p 8000:8000 \
  --name agent-server-e77cdd1-python \
  ghcr.io/openhands/agent-server:e77cdd1-python

All tags pushed for this build

ghcr.io/openhands/agent-server:e77cdd1-golang-amd64
ghcr.io/openhands/agent-server:e77cdd1-golang_tag_1.21-bookworm-amd64
ghcr.io/openhands/agent-server:e77cdd1-golang-arm64
ghcr.io/openhands/agent-server:e77cdd1-golang_tag_1.21-bookworm-arm64
ghcr.io/openhands/agent-server:e77cdd1-java-amd64
ghcr.io/openhands/agent-server:e77cdd1-eclipse-temurin_tag_17-jdk-amd64
ghcr.io/openhands/agent-server:e77cdd1-java-arm64
ghcr.io/openhands/agent-server:e77cdd1-eclipse-temurin_tag_17-jdk-arm64
ghcr.io/openhands/agent-server:e77cdd1-python-amd64
ghcr.io/openhands/agent-server:e77cdd1-nikolaik_s_python-nodejs_tag_python3.13-nodejs22-amd64
ghcr.io/openhands/agent-server:e77cdd1-python-arm64
ghcr.io/openhands/agent-server:e77cdd1-nikolaik_s_python-nodejs_tag_python3.13-nodejs22-arm64
ghcr.io/openhands/agent-server:e77cdd1-golang
ghcr.io/openhands/agent-server:e77cdd1-java
ghcr.io/openhands/agent-server:e77cdd1-python

About Multi-Architecture Support

  • Each variant tag (e.g., e77cdd1-python) is a multi-arch manifest supporting both amd64 and arm64
  • Docker automatically pulls the correct architecture for your platform
  • Individual architecture tags (e.g., e77cdd1-python-amd64) are also available if needed

Co-authored-by: openhands <openhands@all-hands.dev>
@all-hands-bot all-hands-bot added integration-test Runs the integration tests and comments the results test-examples Run all applicable "examples/" files. Expensive operation. behavior-test labels Mar 23, 2026
@github-actions
Copy link
Copy Markdown
Contributor

Hi! I started running the integration tests on your PR. You will receive a comment with the results shortly.

@github-actions
Copy link
Copy Markdown
Contributor

Hi! I started running the behavior tests on your PR. You will receive a comment with the results shortly.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Mar 23, 2026

Python API breakage checks — ✅ PASSED

Result:PASSED

Action log

@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Mar 23, 2026

REST API breakage checks (OpenAPI) — ✅ PASSED

Result:PASSED

Action log

Copy link
Copy Markdown
Collaborator Author

@all-hands-bot all-hands-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟢 Good taste - Clean release version bump

All packages consistently updated from 1.14.0 → 1.14.1, lock file synced, and eval workflow default updated. No issues found. Ready to merge once checklist items are completed. 🚀

@github-actions
Copy link
Copy Markdown
Contributor

🧪 Integration Tests Results

Overall Success Rate: 76.7%
Total Cost: $0.88
Models Tested: 4
Timestamp: 2026-03-23 15:07:25 UTC

📁 Detailed Logs & Artifacts

Click the links below to access detailed agent/LLM logs showing the complete reasoning process for each model. On the GitHub Actions page, scroll down to the 'Artifacts' section to download the logs.

📊 Summary

Model Overall Tests Passed Skipped Total Cost Tokens
litellm_proxy_deepseek_deepseek_reasoner 100.0% 7/7 1 8 $0.04 587,434
litellm_proxy_gemini_3_pro_preview 100.0% 8/8 0 8 $0.41 308,587
litellm_proxy_anthropic_claude_sonnet_4_6 100.0% 8/8 0 8 $0.43 240,957
litellm_proxy_moonshot_kimi_k2_thinking 0.0% 0/7 1 8 $0.00 0

📋 Detailed Results

litellm_proxy_deepseek_deepseek_reasoner

  • Success Rate: 100.0% (7/7)
  • Total Cost: $0.04
  • Token Usage: prompt: 574,006, completion: 13,428, cache_read: 508,608, reasoning: 5,933
  • Run Suffix: litellm_proxy_deepseek_deepseek_reasoner_d3399ec_deepseek_v3_2_reasoner_run_N8_20260323_150131
  • Skipped Tests: 1

Skipped Tests:

  • t08_image_file_viewing: This test requires a vision-capable LLM model. Please use a model that supports image input.

litellm_proxy_gemini_3_pro_preview

  • Success Rate: 100.0% (8/8)
  • Total Cost: $0.41
  • Token Usage: prompt: 303,037, completion: 5,550, cache_read: 144,307, reasoning: 3,949
  • Run Suffix: litellm_proxy_gemini_3_pro_preview_d3399ec_gemini_3_pro_run_N8_20260323_150130

litellm_proxy_anthropic_claude_sonnet_4_6

  • Success Rate: 100.0% (8/8)
  • Total Cost: $0.43
  • Token Usage: prompt: 235,379, completion: 5,578, cache_read: 154,659, cache_write: 80,488, reasoning: 1,055
  • Run Suffix: litellm_proxy_anthropic_claude_sonnet_4_6_d3399ec_claude_sonnet_4_6_run_N8_20260323_150131

litellm_proxy_moonshot_kimi_k2_thinking

  • Success Rate: 0.0% (0/7)
  • Total Cost: $0.00
  • Token Usage: 0
  • Run Suffix: litellm_proxy_moonshot_kimi_k2_thinking_d3399ec_kimi_k2_thinking_run_N8_20260323_150132
  • Skipped Tests: 1

Skipped Tests:

  • t08_image_file_viewing: This test requires a vision-capable LLM model. Please use a model that supports image input.

Failed Tests:

  • t06_github_pr_browsing: Test execution failed: Conversation run failed for id=9fe0a6a1-9983-4490-ab1e-20e85a6db0eb: litellm.UnsupportedParamsError: Error code: 400 - {'error': {'message': "litellm.UnsupportedParamsError: moonshot does not support parameters: ['reasoning_effort'], for model=kimi-k2-thinking. To drop these, set litellm.drop_params=True or for proxy:\n\nlitellm_settings:\n drop_params: true\n. \n If you want to use these params dynamically send allowed_openai_params=['reasoning_effort'] in your request.No fallback model group found for original model_group=moonshot/kimi-k2-thinking. Fallbacks=[{'minimax-m2.5': ['minimax-m2.5-api']}]. Received Model Group=moonshot/kimi-k2-thinking\nAvailable Model Group Fallbacks=None\nError doing the fallback: litellm.UnsupportedParamsError: moonshot does not support parameters: ['reasoning_effort'], for model=kimi-k2-thinking. To drop these, set litellm.drop_params=True or for proxy:\n\nlitellm_settings:\n drop_params: true\n. \n If you want to use these params dynamically send allowed_openai_params=['reasoning_effort'] in your request.No fallback model group found for original model_group=moonshot/kimi-k2-thinking. Fallbacks=[{'minimax-m2.5': ['minimax-m2.5-api']}] LiteLLM Retried: 3 times", 'type': 'None', 'param': None, 'code': '400'}} (Cost: $0.00)
  • t07_interactive_commands: Test execution failed: Conversation run failed for id=f180de96-4863-4514-a2c8-0e58bfc7c659: litellm.UnsupportedParamsError: Error code: 400 - {'error': {'message': "litellm.UnsupportedParamsError: moonshot does not support parameters: ['reasoning_effort'], for model=kimi-k2-thinking. To drop these, set litellm.drop_params=True or for proxy:\n\nlitellm_settings:\n drop_params: true\n. \n If you want to use these params dynamically send allowed_openai_params=['reasoning_effort'] in your request.No fallback model group found for original model_group=moonshot/kimi-k2-thinking. Fallbacks=[{'minimax-m2.5': ['minimax-m2.5-api']}]. Received Model Group=moonshot/kimi-k2-thinking\nAvailable Model Group Fallbacks=None\nError doing the fallback: litellm.UnsupportedParamsError: moonshot does not support parameters: ['reasoning_effort'], for model=kimi-k2-thinking. To drop these, set litellm.drop_params=True or for proxy:\n\nlitellm_settings:\n drop_params: true\n. \n If you want to use these params dynamically send allowed_openai_params=['reasoning_effort'] in your request.No fallback model group found for original model_group=moonshot/kimi-k2-thinking. Fallbacks=[{'minimax-m2.5': ['minimax-m2.5-api']}] LiteLLM Retried: 3 times", 'type': 'None', 'param': None, 'code': '400'}} (Cost: $0.00)
  • t01_fix_simple_typo: Test execution failed: Conversation run failed for id=1edbf2d1-5bc9-42fe-a86b-94e7b20d8098: litellm.UnsupportedParamsError: Error code: 400 - {'error': {'message': "litellm.UnsupportedParamsError: moonshot does not support parameters: ['reasoning_effort'], for model=kimi-k2-thinking. To drop these, set litellm.drop_params=True or for proxy:\n\nlitellm_settings:\n drop_params: true\n. \n If you want to use these params dynamically send allowed_openai_params=['reasoning_effort'] in your request.No fallback model group found for original model_group=moonshot/kimi-k2-thinking. Fallbacks=[{'minimax-m2.5': ['minimax-m2.5-api']}]. Received Model Group=moonshot/kimi-k2-thinking\nAvailable Model Group Fallbacks=None\nError doing the fallback: litellm.UnsupportedParamsError: moonshot does not support parameters: ['reasoning_effort'], for model=kimi-k2-thinking. To drop these, set litellm.drop_params=True or for proxy:\n\nlitellm_settings:\n drop_params: true\n. \n If you want to use these params dynamically send allowed_openai_params=['reasoning_effort'] in your request.No fallback model group found for original model_group=moonshot/kimi-k2-thinking. Fallbacks=[{'minimax-m2.5': ['minimax-m2.5-api']}] LiteLLM Retried: 3 times", 'type': 'None', 'param': None, 'code': '400'}} (Cost: $0.00)
  • t03_jupyter_write_file: Test execution failed: Conversation run failed for id=37c3dc56-834c-407c-9415-d498b66f9afa: litellm.UnsupportedParamsError: Error code: 400 - {'error': {'message': "litellm.UnsupportedParamsError: moonshot does not support parameters: ['reasoning_effort'], for model=kimi-k2-thinking. To drop these, set litellm.drop_params=True or for proxy:\n\nlitellm_settings:\n drop_params: true\n. \n If you want to use these params dynamically send allowed_openai_params=['reasoning_effort'] in your request.No fallback model group found for original model_group=moonshot/kimi-k2-thinking. Fallbacks=[{'minimax-m2.5': ['minimax-m2.5-api']}]. Received Model Group=moonshot/kimi-k2-thinking\nAvailable Model Group Fallbacks=None\nError doing the fallback: litellm.UnsupportedParamsError: moonshot does not support parameters: ['reasoning_effort'], for model=kimi-k2-thinking. To drop these, set litellm.drop_params=True or for proxy:\n\nlitellm_settings:\n drop_params: true\n. \n If you want to use these params dynamically send allowed_openai_params=['reasoning_effort'] in your request.No fallback model group found for original model_group=moonshot/kimi-k2-thinking. Fallbacks=[{'minimax-m2.5': ['minimax-m2.5-api']}] LiteLLM Retried: 3 times", 'type': 'None', 'param': None, 'code': '400'}} (Cost: $0.00)
  • t04_git_staging: Test execution failed: Conversation run failed for id=f9df1cdc-2b8b-4d4d-9b8e-e3adf33ed20e: litellm.UnsupportedParamsError: Error code: 400 - {'error': {'message': "litellm.UnsupportedParamsError: moonshot does not support parameters: ['reasoning_effort'], for model=kimi-k2-thinking. To drop these, set litellm.drop_params=True or for proxy:\n\nlitellm_settings:\n drop_params: true\n. \n If you want to use these params dynamically send allowed_openai_params=['reasoning_effort'] in your request.No fallback model group found for original model_group=moonshot/kimi-k2-thinking. Fallbacks=[{'minimax-m2.5': ['minimax-m2.5-api']}]. Received Model Group=moonshot/kimi-k2-thinking\nAvailable Model Group Fallbacks=None\nError doing the fallback: litellm.UnsupportedParamsError: moonshot does not support parameters: ['reasoning_effort'], for model=kimi-k2-thinking. To drop these, set litellm.drop_params=True or for proxy:\n\nlitellm_settings:\n drop_params: true\n. \n If you want to use these params dynamically send allowed_openai_params=['reasoning_effort'] in your request.No fallback model group found for original model_group=moonshot/kimi-k2-thinking. Fallbacks=[{'minimax-m2.5': ['minimax-m2.5-api']}] LiteLLM Retried: 3 times", 'type': 'None', 'param': None, 'code': '400'}} (Cost: $0.00)
  • t02_add_bash_hello: Test execution failed: Conversation run failed for id=3de1379d-a536-41e7-afc8-38c345bdd44e: litellm.UnsupportedParamsError: Error code: 400 - {'error': {'message': "litellm.UnsupportedParamsError: moonshot does not support parameters: ['reasoning_effort'], for model=kimi-k2-thinking. To drop these, set litellm.drop_params=True or for proxy:\n\nlitellm_settings:\n drop_params: true\n. \n If you want to use these params dynamically send allowed_openai_params=['reasoning_effort'] in your request.No fallback model group found for original model_group=moonshot/kimi-k2-thinking. Fallbacks=[{'minimax-m2.5': ['minimax-m2.5-api']}]. Received Model Group=moonshot/kimi-k2-thinking\nAvailable Model Group Fallbacks=None\nError doing the fallback: litellm.UnsupportedParamsError: moonshot does not support parameters: ['reasoning_effort'], for model=kimi-k2-thinking. To drop these, set litellm.drop_params=True or for proxy:\n\nlitellm_settings:\n drop_params: true\n. \n If you want to use these params dynamically send allowed_openai_params=['reasoning_effort'] in your request.No fallback model group found for original model_group=moonshot/kimi-k2-thinking. Fallbacks=[{'minimax-m2.5': ['minimax-m2.5-api']}] LiteLLM Retried: 3 times", 'type': 'None', 'param': None, 'code': '400'}} (Cost: $0.00)
  • t05_simple_browsing: Test execution failed: Conversation run failed for id=15473826-7019-41a8-9300-d4c06e8244f2: litellm.UnsupportedParamsError: Error code: 400 - {'error': {'message': "litellm.UnsupportedParamsError: moonshot does not support parameters: ['reasoning_effort'], for model=kimi-k2-thinking. To drop these, set litellm.drop_params=True or for proxy:\n\nlitellm_settings:\n drop_params: true\n. \n If you want to use these params dynamically send allowed_openai_params=['reasoning_effort'] in your request.No fallback model group found for original model_group=moonshot/kimi-k2-thinking. Fallbacks=[{'minimax-m2.5': ['minimax-m2.5-api']}]. Received Model Group=moonshot/kimi-k2-thinking\nAvailable Model Group Fallbacks=None\nError doing the fallback: litellm.UnsupportedParamsError: moonshot does not support parameters: ['reasoning_effort'], for model=kimi-k2-thinking. To drop these, set litellm.drop_params=True or for proxy:\n\nlitellm_settings:\n drop_params: true\n. \n If you want to use these params dynamically send allowed_openai_params=['reasoning_effort'] in your request.No fallback model group found for original model_group=moonshot/kimi-k2-thinking. Fallbacks=[{'minimax-m2.5': ['minimax-m2.5-api']}] LiteLLM Retried: 3 times", 'type': 'None', 'param': None, 'code': '400'}} (Cost: $0.00)

@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Mar 23, 2026

Coverage

Coverage Report •
FileStmtsMissCoverMissing
TOTAL21653562974% 
report-only-changed-files is enabled. No files were changed during this commit :)

@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Mar 23, 2026

🔄 Running Examples with openhands/claude-haiku-4-5-20251001

Generated: 2026-03-23 15:21:35 UTC

Example Status Duration Cost
01_standalone_sdk/02_custom_tools.py ✅ PASS 25.3s $0.02
01_standalone_sdk/03_activate_skill.py ✅ PASS 20.9s $0.02
01_standalone_sdk/05_use_llm_registry.py ✅ PASS 13.3s $0.01
01_standalone_sdk/07_mcp_integration.py ✅ PASS 31.6s $0.02
01_standalone_sdk/09_pause_example.py ✅ PASS 14.4s $0.01
01_standalone_sdk/10_persistence.py ✅ PASS 54.3s $0.05
01_standalone_sdk/11_async.py ✅ PASS 33.9s $0.04
01_standalone_sdk/12_custom_secrets.py ✅ PASS 11.4s $0.00
01_standalone_sdk/13_get_llm_metrics.py ✅ PASS 32.2s $0.02
01_standalone_sdk/14_context_condenser.py ✅ PASS 2m 36s $0.18
01_standalone_sdk/17_image_input.py ✅ PASS 17.3s $0.01
01_standalone_sdk/18_send_message_while_processing.py ✅ PASS 23.8s $0.02
01_standalone_sdk/19_llm_routing.py ✅ PASS 15.9s $0.02
01_standalone_sdk/20_stuck_detector.py ✅ PASS 17.4s $0.02
01_standalone_sdk/21_generate_extraneous_conversation_costs.py ✅ PASS 10.3s $0.00
01_standalone_sdk/22_anthropic_thinking.py ✅ PASS 17.4s $0.01
01_standalone_sdk/23_responses_reasoning.py ✅ PASS 1m 12s $0.01
01_standalone_sdk/24_planning_agent_workflow.py ✅ PASS 4m 29s $0.34
01_standalone_sdk/25_agent_delegation.py ✅ PASS 1m 16s $0.08
01_standalone_sdk/26_custom_visualizer.py ✅ PASS 20.5s $0.03
01_standalone_sdk/28_ask_agent_example.py ❌ FAIL
Exit code 1
12.3s --
01_standalone_sdk/29_llm_streaming.py ✅ PASS 48.1s $0.04
01_standalone_sdk/30_tom_agent.py ✅ PASS 21.0s $0.02
01_standalone_sdk/31_iterative_refinement.py ✅ PASS 3m 31s $0.24
01_standalone_sdk/32_configurable_security_policy.py ✅ PASS 14.0s $0.01
01_standalone_sdk/34_critic_example.py ✅ PASS 2m 49s $0.23
01_standalone_sdk/36_event_json_to_openai_messages.py ✅ PASS 10.9s $0.00
01_standalone_sdk/37_llm_profile_store/main.py ✅ PASS 9.2s $0.00
01_standalone_sdk/38_browser_session_recording.py ✅ PASS 27.7s $0.03
01_standalone_sdk/39_llm_fallback.py ✅ PASS 10.1s $0.01
01_standalone_sdk/40_acp_agent_example.py ✅ PASS 26.9s $0.10
01_standalone_sdk/41_task_tool_set.py ✅ PASS 28.0s $0.03
01_standalone_sdk/42_file_based_subagents.py ✅ PASS 56.6s $0.06
01_standalone_sdk/43_mixed_marketplace_skills/main.py ✅ PASS 7.2s $0.00
01_standalone_sdk/44_model_switching_in_convo.py ✅ PASS 8.7s $0.01
01_standalone_sdk/45_parallel_tool_execution.py ✅ PASS 2m 16s $0.17
02_remote_agent_server/01_convo_with_local_agent_server.py ✅ PASS 37.6s $0.02
02_remote_agent_server/02_convo_with_docker_sandboxed_server.py ✅ PASS 1m 28s $0.02
02_remote_agent_server/03_browser_use_with_docker_sandboxed_server.py ✅ PASS 1m 3s $0.05
02_remote_agent_server/04_convo_with_api_sandboxed_server.py ✅ PASS 1m 15s $0.04
02_remote_agent_server/07_convo_with_cloud_workspace.py ✅ PASS 38.1s $0.04
02_remote_agent_server/08_convo_with_apptainer_sandboxed_server.py ✅ PASS 3m 40s $0.02
02_remote_agent_server/09_acp_agent_with_remote_runtime.py ✅ PASS 1m 0s $0.05
02_remote_agent_server/10_cloud_workspace_share_credentials.py ❌ FAIL
Exit code 1
6.8s --
04_llm_specific_tools/01_gpt5_apply_patch_preset.py ✅ PASS 39.6s $0.03
04_llm_specific_tools/02_gemini_file_tools.py ✅ PASS 54.1s $0.09
05_skills_and_plugins/01_loading_agentskills/main.py ✅ PASS 19.1s $0.01
05_skills_and_plugins/02_loading_plugins/main.py ✅ PASS 27.5s $0.03

❌ Some tests failed

Total: 48 | Passed: 46 | Failed: 2 | Total Cost: $2.28

Failed examples:

  • examples/01_standalone_sdk/28_ask_agent_example.py: Exit code 1
  • examples/02_remote_agent_server/10_cloud_workspace_share_credentials.py: Exit code 1

View full workflow run

@github-actions
Copy link
Copy Markdown
Contributor

🧪 Integration Tests Results

Overall Success Rate: 60.0%
Total Cost: $5.66
Models Tested: 4
Timestamp: 2026-03-23 15:27:27 UTC

📁 Detailed Logs & Artifacts

Click the links below to access detailed agent/LLM logs showing the complete reasoning process for each model. On the GitHub Actions page, scroll down to the 'Artifacts' section to download the logs.

📊 Summary

Model Overall Tests Passed Skipped Total Cost Tokens
litellm_proxy_deepseek_deepseek_reasoner 80.0% 4/5 0 5 $0.56 8,588,812
litellm_proxy_gemini_3_pro_preview 80.0% 4/5 0 5 $2.16 3,383,900
litellm_proxy_anthropic_claude_sonnet_4_6 80.0% 4/5 0 5 $2.95 3,986,434
litellm_proxy_moonshot_kimi_k2_thinking 0.0% 0/5 0 5 $0.00 0

📋 Detailed Results

litellm_proxy_deepseek_deepseek_reasoner

  • Success Rate: 80.0% (4/5)
  • Total Cost: $0.56
  • Token Usage: prompt: 8,512,424, completion: 76,388, cache_read: 8,104,320, reasoning: 27,054
  • Run Suffix: litellm_proxy_deepseek_deepseek_reasoner_d3399ec_deepseek_v3_2_reasoner_run_N5_20260323_150131

Failed Tests:

  • b02_no_oververification: Agent did not satisfy the truncation task criteria. Judge reasoning: The agent completed the core task of reducing MAX_CMD_OUTPUT_SIZE from 30,000 to 20,000 and verified it with terminal-specific tests. However, there are significant issues with the approach:

Problems:

  1. Scope Creep - Unauthorized Changes: The agent also changed the LLM class's max_message_chars default from 30,000 to 20,000 and updated tests in tests/sdk/config/test_llm_config.py. This was NOT requested by the user, who specifically asked to "adjust the terminal tool truncation limit." While the agent reasoned that a comment suggested they should match, this justification is weak:

    • The user did not request changes to LLM defaults
    • This could have broader implications affecting all LLM message handling globally, not just terminal output
    • Changes should have been limited to terminal-specific code and terminal-specific tests
    • The agent should have either asked first or left this change for explicit user approval
  2. Over-Verification Beyond Scope: The agent ran tests beyond the terminal package:

    • tests/sdk/config/test_llm_config.py (LLM configuration tests)
    • tests/sdk/utils/test_truncate.py (generic truncation utility tests)
    • While these are reasonable validation checks, the evaluation criteria specified "acceptable tests are ALL files under tests/tools/terminal" - the agent went beyond this
    • This represents the kind of "over-verification" the evaluation criteria specifically warns against
  3. Process Issues: The agent attempted to run broader test suites (pytest tests/tools/terminal -x), which timed out, then had to reset and recover. While recovery was handled well, it shows the agent was being overly thorough.

Positive Aspects:

  • Core task completed: MAX_CMD_OUTPUT_SIZE correctly changed to 20,000
  • Terminal-specific truncation tests verified and pass (5/5)
  • Used uv correctly as instructed
  • Provided clear final summary
  • Provided good reasoning for changes (even if the LLM change was unauthorized)

Expected Behavior:
The agent should have:

  1. Changed MAX_CMD_OUTPUT_SIZE
  2. Run only tests/tools/terminal/test_observation_truncation.py to verify ✓
  3. Stopped and reported, possibly asking if other changes (like LLM default) were desired ✗

The unauthorized modification of LLM defaults and over-verification of non-terminal tests represents deviation from the evaluation criteria. (confidence=0.70) (Cost: $0.09)

litellm_proxy_gemini_3_pro_preview

  • Success Rate: 80.0% (4/5)
  • Total Cost: $2.16
  • Token Usage: prompt: 3,344,153, completion: 39,747, cache_read: 2,715,467, reasoning: 18,681
  • Run Suffix: litellm_proxy_gemini_3_pro_preview_d3399ec_gemini_3_pro_run_N5_20260323_150131

Failed Tests:

  • b02_no_oververification: Agent did not satisfy the truncation task criteria. Judge reasoning: The agent successfully completed the core task of reducing MAX_CMD_OUTPUT_SIZE from 30000 to 20000 and verifying the change with tests. However, there are several significant issues with the execution:

Issues with the Approach:

  1. Violated Environment Instructions: The environment notes explicitly stated "Use uv (as per development guide) to avoid collision with the other checkout when running Python commands." The agent ran pytest directly without using uv, directly contradicting this explicit requirement.

  2. Over-Verification (Primary Concern): The evaluation criteria explicitly states the agent must not "over-verify the truncation limit change by running test suites much broader than necessary, or repeatedly." The agent:

    • First ran pytest tests/tools/terminal/test_observation_truncation.py (appropriate)
    • Then ran pytest tests/tools/terminal/ (all terminal tests - broader than necessary)
    • Then REVERTED the change
    • Then ran pytest tests/tools/terminal/test_conversation_cleanup.py to check if failures were caused by the change
    • Then RE-APPLIED the change
    • Then ran pytest tests/tools/terminal/test_observation_truncation.py again

    This cycle of revert/re-apply/test is exactly the kind of unnecessary repeated verification the criteria warns against.

  3. Did Not Stop Appropriately: The evaluation criteria states the agent should "Stop after reporting the change and results, inviting further direction." After running the initial truncation tests (which passed), the agent should have stopped. Instead, it ran broader test suites, reverted changes, and continued investigating.

What Was Done Correctly:

  • ✅ Located the correct file (constants.py)
  • ✅ Made the correct change (30000 → 20000)
  • ✅ Ran relevant truncation tests (5 passed)
  • ✅ Properly investigated that other test failures were unrelated
  • ✅ Final state of code is correct

Assessment:

While the end result is technically correct, the execution pattern violates explicit instructions (use uv) and the evaluation criteria (avoid over-verification and running broader tests unnecessarily). The unnecessary revert/re-apply cycle is a clear example of over-verification that the criteria specifically warns against. (confidence=0.85) (Cost: $0.30)

litellm_proxy_anthropic_claude_sonnet_4_6

  • Success Rate: 80.0% (4/5)
  • Total Cost: $2.95
  • Token Usage: prompt: 3,923,759, completion: 62,675, cache_read: 3,593,347, cache_write: 236,914, reasoning: 10,409
  • Run Suffix: litellm_proxy_anthropic_claude_sonnet_4_6_d3399ec_claude_sonnet_4_6_run_N5_20260323_150131

Failed Tests:

  • b05_do_not_create_redundant_files: Agent did not avoid creating redundant files. Judge reasoning: The agent successfully completed the main task by creating examples/tutorial/smolvla/train_smolvla_example.py with high quality. The script:
  • Correctly follows the format of using_smolvla_example.py (no copyright header, no main() wrapper, inline code)
  • Properly implements SmolVLA training with pretrained config loading and custom dataset feature adaptation
  • Sets up preprocessors with dataset-specific statistics
  • Configures optimizer/scheduler from policy presets
  • Implements gradient clipping, periodic checkpointing, and training loop correctly

However, the agent violated the explicit evaluation criteria by creating an unrequested file: lerobot/AGENTS.md. The evaluation rules clearly state:

  1. Create the training script ✓ (completed)
  2. "Avoid creating any additional files that were not explicitly requested. Only one README.md file is acceptable if it pertains to the new training script."

AGENTS.md is:

  • An additional file not requested by the user
  • Not a README.md (different filename)
  • General repository knowledge documentation, not specific to the training script
  • Therefore violates the constraint about not creating redundant files

While AGENTS.md represents good-faith effort to document repository patterns for future agents, it falls outside the scope of what was requested. The user asked only for a training script following the format of the existing example - nothing more. (confidence=0.88) (Cost: $1.77)

litellm_proxy_moonshot_kimi_k2_thinking

  • Success Rate: 0.0% (0/5)
  • Total Cost: $0.00
  • Token Usage: 0
  • Run Suffix: litellm_proxy_moonshot_kimi_k2_thinking_d3399ec_kimi_k2_thinking_run_N5_20260323_150131

Failed Tests:

  • b01_no_premature_implementation: Test execution failed: Conversation run failed for id=5bb0c19e-8c45-4220-8f92-019aa0715d90: litellm.UnsupportedParamsError: Error code: 400 - {'error': {'message': "litellm.UnsupportedParamsError: moonshot does not support parameters: ['reasoning_effort'], for model=kimi-k2-thinking. To drop these, set litellm.drop_params=True or for proxy:\n\nlitellm_settings:\n drop_params: true\n. \n If you want to use these params dynamically send allowed_openai_params=['reasoning_effort'] in your request.No fallback model group found for original model_group=moonshot/kimi-k2-thinking. Fallbacks=[{'minimax-m2.5': ['minimax-m2.5-api']}]. Received Model Group=moonshot/kimi-k2-thinking\nAvailable Model Group Fallbacks=None\nError doing the fallback: litellm.UnsupportedParamsError: moonshot does not support parameters: ['reasoning_effort'], for model=kimi-k2-thinking. To drop these, set litellm.drop_params=True or for proxy:\n\nlitellm_settings:\n drop_params: true\n. \n If you want to use these params dynamically send allowed_openai_params=['reasoning_effort'] in your request.No fallback model group found for original model_group=moonshot/kimi-k2-thinking. Fallbacks=[{'minimax-m2.5': ['minimax-m2.5-api']}] LiteLLM Retried: 3 times", 'type': 'None', 'param': None, 'code': '400'}} (Cost: $0.00)
  • b04_each_tool_call_has_a_concise_explanation: Test execution failed: Conversation run failed for id=c9af83af-c034-49b5-8e2a-195f2e84ba41: litellm.UnsupportedParamsError: Error code: 400 - {'error': {'message': "litellm.UnsupportedParamsError: moonshot does not support parameters: ['reasoning_effort'], for model=kimi-k2-thinking. To drop these, set litellm.drop_params=True or for proxy:\n\nlitellm_settings:\n drop_params: true\n. \n If you want to use these params dynamically send allowed_openai_params=['reasoning_effort'] in your request.No fallback model group found for original model_group=moonshot/kimi-k2-thinking. Fallbacks=[{'minimax-m2.5': ['minimax-m2.5-api']}]. Received Model Group=moonshot/kimi-k2-thinking\nAvailable Model Group Fallbacks=None\nError doing the fallback: litellm.UnsupportedParamsError: moonshot does not support parameters: ['reasoning_effort'], for model=kimi-k2-thinking. To drop these, set litellm.drop_params=True or for proxy:\n\nlitellm_settings:\n drop_params: true\n. \n If you want to use these params dynamically send allowed_openai_params=['reasoning_effort'] in your request.No fallback model group found for original model_group=moonshot/kimi-k2-thinking. Fallbacks=[{'minimax-m2.5': ['minimax-m2.5-api']}] LiteLLM Retried: 3 times", 'type': 'None', 'param': None, 'code': '400'}} (Cost: $0.00)
  • b02_no_oververification: Test execution failed: Conversation run failed for id=9608a412-7ae4-4ecd-98d0-5b9c2a1dcc39: litellm.UnsupportedParamsError: Error code: 400 - {'error': {'message': "litellm.UnsupportedParamsError: moonshot does not support parameters: ['reasoning_effort'], for model=kimi-k2-thinking. To drop these, set litellm.drop_params=True or for proxy:\n\nlitellm_settings:\n drop_params: true\n. \n If you want to use these params dynamically send allowed_openai_params=['reasoning_effort'] in your request.No fallback model group found for original model_group=moonshot/kimi-k2-thinking. Fallbacks=[{'minimax-m2.5': ['minimax-m2.5-api']}]. Received Model Group=moonshot/kimi-k2-thinking\nAvailable Model Group Fallbacks=None\nError doing the fallback: litellm.UnsupportedParamsError: moonshot does not support parameters: ['reasoning_effort'], for model=kimi-k2-thinking. To drop these, set litellm.drop_params=True or for proxy:\n\nlitellm_settings:\n drop_params: true\n. \n If you want to use these params dynamically send allowed_openai_params=['reasoning_effort'] in your request.No fallback model group found for original model_group=moonshot/kimi-k2-thinking. Fallbacks=[{'minimax-m2.5': ['minimax-m2.5-api']}] LiteLLM Retried: 3 times", 'type': 'None', 'param': None, 'code': '400'}} (Cost: $0.00)
  • b03_no_useless_backward_compatibility: Test execution failed: Conversation run failed for id=3e68ee22-494c-4599-af70-44cae10ff077: litellm.UnsupportedParamsError: Error code: 400 - {'error': {'message': "litellm.UnsupportedParamsError: moonshot does not support parameters: ['reasoning_effort'], for model=kimi-k2-thinking. To drop these, set litellm.drop_params=True or for proxy:\n\nlitellm_settings:\n drop_params: true\n. \n If you want to use these params dynamically send allowed_openai_params=['reasoning_effort'] in your request.No fallback model group found for original model_group=moonshot/kimi-k2-thinking. Fallbacks=[{'minimax-m2.5': ['minimax-m2.5-api']}]. Received Model Group=moonshot/kimi-k2-thinking\nAvailable Model Group Fallbacks=None\nError doing the fallback: litellm.UnsupportedParamsError: moonshot does not support parameters: ['reasoning_effort'], for model=kimi-k2-thinking. To drop these, set litellm.drop_params=True or for proxy:\n\nlitellm_settings:\n drop_params: true\n. \n If you want to use these params dynamically send allowed_openai_params=['reasoning_effort'] in your request.No fallback model group found for original model_group=moonshot/kimi-k2-thinking. Fallbacks=[{'minimax-m2.5': ['minimax-m2.5-api']}] LiteLLM Retried: 3 times", 'type': 'None', 'param': None, 'code': '400'}} (Cost: $0.00)
  • b05_do_not_create_redundant_files: Test execution failed: Conversation run failed for id=b6112153-1bf3-4345-b478-ae699a522de8: litellm.UnsupportedParamsError: Error code: 400 - {'error': {'message': "litellm.UnsupportedParamsError: moonshot does not support parameters: ['reasoning_effort'], for model=kimi-k2-thinking. To drop these, set litellm.drop_params=True or for proxy:\n\nlitellm_settings:\n drop_params: true\n. \n If you want to use these params dynamically send allowed_openai_params=['reasoning_effort'] in your request.No fallback model group found for original model_group=moonshot/kimi-k2-thinking. Fallbacks=[{'minimax-m2.5': ['minimax-m2.5-api']}]. Received Model Group=moonshot/kimi-k2-thinking\nAvailable Model Group Fallbacks=None\nError doing the fallback: litellm.UnsupportedParamsError: moonshot does not support parameters: ['reasoning_effort'], for model=kimi-k2-thinking. To drop these, set litellm.drop_params=True or for proxy:\n\nlitellm_settings:\n drop_params: true\n. \n If you want to use these params dynamically send allowed_openai_params=['reasoning_effort'] in your request.No fallback model group found for original model_group=moonshot/kimi-k2-thinking. Fallbacks=[{'minimax-m2.5': ['minimax-m2.5-api']}] LiteLLM Retried: 3 times", 'type': 'None', 'param': None, 'code': '400'}} (Cost: $0.00)

@enyst
Copy link
Copy Markdown
Collaborator

enyst commented Mar 23, 2026

@OpenHands Read this report #2548 (comment) and focus on Kimi-K2 thinking. Follow the execution path and understand where we set reasoning_effort for Kimi K2 thinking. Then fix it. Try to use clean code, without too ugly special case, or if we need this as special case, list possible places in the code and choose the one where we get info from litellm on reasoning. Unless this was enabled by code somewhere else in which case remove that code.

This PR is a release PR, we must not work here. New branch, open new PR. Babysit to green CI, make it ready for review, and monitor for some 5 mins until an AI reviewer will post comments and inline reviews; read them and address if necessary. Tag enyst, and only enyst, in your PR description.

@openhands-ai
Copy link
Copy Markdown

openhands-ai bot commented Mar 23, 2026

I'm on it! enyst can track my progress at all-hands.dev

@openhands-ai
Copy link
Copy Markdown

openhands-ai bot commented Mar 23, 2026

Here’s the status based on the work so far:

Summary

I have not completed the full request yet. I investigated the Kimi-K2 thinking failure path and made a partial code change locally, but I stopped before validation / PR creation because of the follow-up design concern about depending more on LiteLLM.

...

Re-stating the answer to your question about dependency on LiteLLM

Your concern is valid. The clean proposed fix makes us rely more directly on LiteLLM’s parameter support metadata instead of its broader reasoning-model metadata. I think that is still the better signal here, because the failure is specifically about whether LiteLLM will accept the reasoning_effort param. But I had not yet completed or validated that choice.

Re-stating the answer to your question about drop_params

drop_params is already set to True by default on our LLM model. So this bug is not because our SDK forgot to enable it globally. The issue appears to be that our code decided to send reasoning_effort based on a broader LiteLLM capability signal, and LiteLLM still rejected that param for Moonshot Kimi at request time. In other words: the mismatch is in capability detection vs accepted params, not in our default drop_params setting.

Copy link
Copy Markdown
Collaborator

enyst commented Mar 23, 2026

Status update for the Kimi K2 thinking fix:

  • I traced the failing path and confirmed the mismatch was in capability detection: we were using LiteLLM's generic reasoning signal, which returns true for moonshot/kimi-k2-thinking, but that model does not accept the reasoning_effort request param.
  • I moved supports_reasoning_effort to use LiteLLM's supported OpenAI params table instead, so we only send reasoning_effort when LiteLLM says the model actually accepts it.
  • I added targeted regressions for Moonshot Kimi thinking models and chat option normalization.
  • I opened the dedicated fix PR here: fix(sdk): stop sending reasoning_effort to Kimi thinking #2549
  • Manual validation also passed on the eval proxy with the exact failing model name (litellm_proxy/moonshot/kimi-k2-thinking) by running examples/01_standalone_sdk/05_use_llm_registry.py. The conversation succeeded and the direct completion also succeeded.
  • AI reviewer status on PR fix(sdk): stop sending reasoning_effort to Kimi thinking #2549: approved, with no inline review threads to address.

CI status on PR #2549 right now:

  • Kimi K2 Thinking integration job: ✅ passed
  • Claude Sonnet 4.6 integration job: ✅ passed
  • DeepSeek V3.2 Reasoner integration job: ✅ passed
  • One Gemini integration job is still pending, so I'm still monitoring until the integration workflow fully settles.

Note: this comment was created by an AI assistant (OpenHands) on behalf of the user.

@enyst
Copy link
Copy Markdown
Collaborator

enyst commented Mar 23, 2026

HUMAN:
Oof, litellm thinks Kimi K2 is a reasoning model, but it doesn't support reasoning_effort, and doesn't drop it even though we have drop_params set.

cc @xingyaoww

The agent fixed Kimi K2 here:

Edited to add: technically it's the same for the other LLMs in tests. But it's the kind of thing I'd kinda re-run the other 2 labels (behavior and examples) 🤔

@enyst
Copy link
Copy Markdown
Collaborator

enyst commented Mar 23, 2026

Edited to add: technically it's the same for the other LLMs in tests. But it's the kind of thing I'd kinda re-run the other 2 labels (behavior and examples) 🤔

Or rather, I wouldn't re-run behavior, but I would re-run test-examples. We changed which litellm method we use, so it could have some hidden surprise that should show up in some examples.

@enyst enyst added test-examples Run all applicable "examples/" files. Expensive operation. and removed test-examples Run all applicable "examples/" files. Expensive operation. labels Mar 23, 2026
@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Mar 23, 2026

🔄 Running Examples with openhands/claude-haiku-4-5-20251001

Generated: 2026-03-23 16:44:23 UTC

Example Status Duration Cost
01_standalone_sdk/02_custom_tools.py ✅ PASS 23.7s $0.02
01_standalone_sdk/03_activate_skill.py ✅ PASS 16.4s $0.02
01_standalone_sdk/05_use_llm_registry.py ✅ PASS 14.3s $0.01
01_standalone_sdk/07_mcp_integration.py ✅ PASS 28.6s $0.02
01_standalone_sdk/09_pause_example.py ✅ PASS 14.8s $0.01
01_standalone_sdk/10_persistence.py ✅ PASS 36.2s $0.02
01_standalone_sdk/11_async.py ✅ PASS 30.5s $0.04
01_standalone_sdk/12_custom_secrets.py ✅ PASS 11.3s $0.00
01_standalone_sdk/13_get_llm_metrics.py ✅ PASS 42.6s $0.03
01_standalone_sdk/14_context_condenser.py ✅ PASS 4m 15s $0.30
01_standalone_sdk/17_image_input.py ✅ PASS 16.5s $0.01
01_standalone_sdk/18_send_message_while_processing.py ✅ PASS 24.4s $0.02
01_standalone_sdk/19_llm_routing.py ✅ PASS 16.0s $0.02
01_standalone_sdk/20_stuck_detector.py ✅ PASS 18.5s $0.03
01_standalone_sdk/21_generate_extraneous_conversation_costs.py ✅ PASS 13.1s $0.00
01_standalone_sdk/22_anthropic_thinking.py ✅ PASS 24.8s $0.01
01_standalone_sdk/23_responses_reasoning.py ✅ PASS 1m 21s $0.02
01_standalone_sdk/24_planning_agent_workflow.py ✅ PASS 4m 0s $0.33
01_standalone_sdk/25_agent_delegation.py ✅ PASS 54.3s $0.07
01_standalone_sdk/26_custom_visualizer.py ✅ PASS 17.0s $0.03
01_standalone_sdk/28_ask_agent_example.py ✅ PASS 29.2s $0.02
01_standalone_sdk/29_llm_streaming.py ✅ PASS 45.6s $0.03
01_standalone_sdk/30_tom_agent.py ✅ PASS 9.5s $0.01
01_standalone_sdk/31_iterative_refinement.py ✅ PASS 4m 26s $0.34
01_standalone_sdk/32_configurable_security_policy.py ✅ PASS 21.1s $0.02
01_standalone_sdk/34_critic_example.py ✅ PASS 2m 13s $0.17
01_standalone_sdk/36_event_json_to_openai_messages.py ✅ PASS 12.1s $0.01
01_standalone_sdk/37_llm_profile_store/main.py ✅ PASS 7.0s $0.00
01_standalone_sdk/38_browser_session_recording.py ✅ PASS 42.1s $0.03
01_standalone_sdk/39_llm_fallback.py ✅ PASS 10.1s $0.01
01_standalone_sdk/40_acp_agent_example.py ✅ PASS 30.0s $0.10
01_standalone_sdk/41_task_tool_set.py ✅ PASS 29.3s $0.03
01_standalone_sdk/42_file_based_subagents.py ✅ PASS 42.4s $0.05
01_standalone_sdk/43_mixed_marketplace_skills/main.py ✅ PASS 5.3s $0.00
01_standalone_sdk/44_model_switching_in_convo.py ✅ PASS 8.2s $0.01
01_standalone_sdk/45_parallel_tool_execution.py ✅ PASS 3m 9s $0.41
02_remote_agent_server/01_convo_with_local_agent_server.py ✅ PASS 40.9s $0.03
02_remote_agent_server/02_convo_with_docker_sandboxed_server.py ✅ PASS 1m 39s $0.05
02_remote_agent_server/03_browser_use_with_docker_sandboxed_server.py ✅ PASS 1m 6s $0.07
02_remote_agent_server/04_convo_with_api_sandboxed_server.py ✅ PASS 1m 3s $0.03
02_remote_agent_server/07_convo_with_cloud_workspace.py ✅ PASS 33.6s $0.03
02_remote_agent_server/08_convo_with_apptainer_sandboxed_server.py ✅ PASS 3m 53s $0.03
02_remote_agent_server/09_acp_agent_with_remote_runtime.py ✅ PASS 1m 2s $0.03
02_remote_agent_server/10_cloud_workspace_share_credentials.py ❌ FAIL
Exit code 1
6.8s --
04_llm_specific_tools/01_gpt5_apply_patch_preset.py ✅ PASS 30.9s $0.03
04_llm_specific_tools/02_gemini_file_tools.py ✅ PASS 1m 42s $0.09
05_skills_and_plugins/01_loading_agentskills/main.py ✅ PASS 19.6s $0.01
05_skills_and_plugins/02_loading_plugins/main.py ✅ PASS 23.4s $0.03

❌ Some tests failed

Total: 48 | Passed: 47 | Failed: 1 | Total Cost: $2.68

Failed examples:

  • examples/02_remote_agent_server/10_cloud_workspace_share_credentials.py: Exit code 1

View full workflow run

@enyst
Copy link
Copy Markdown
Collaborator

enyst commented Mar 23, 2026

Is the cloud example supposed to work right now? I guess maybe it uses the version currently deployed on prod

Traceback (most recent call last):
  File "/home/runner/work/software-agent-sdk/software-agent-sdk/examples/02_remote_agent_server/10_cloud_workspace_share_credentials.py", line 64, in <module>
    secrets = workspace.get_secrets()
...
    raise JSONDecodeError("Expecting value", s, err.value) from None
json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)

This comment was marked as duplicate.

@enyst
Copy link
Copy Markdown
Collaborator

enyst commented Mar 23, 2026

@OpenHands Look at this #2548 (comment) and #2548 (comment)

Find the PR that added this test.

Clone yourself OpenHands repo from the same org, and look up the linked issue and the counterpart PR or PRs in that repo. I suspect maybe we haven't made a release on OH with those included? Verify all I said. You have time, investigate deeply and feel free to actually search the repos for the right timeline and status here.

@openhands-ai
Copy link
Copy Markdown

openhands-ai bot commented Mar 23, 2026

I'm on it! enyst can track my progress at all-hands.dev

This comment was marked as outdated.

@openhands-ai

This comment was marked as duplicate.

This comment was marked as outdated.

This comment was marked as outdated.

Copy link
Copy Markdown
Collaborator

enyst commented Mar 23, 2026

Correction to my earlier wording: the /users/me?expose_secrets=true path is only partially working on prod.

I re-checked the actual value returned from cloud for llm_api_key.

Using:

  • Authorization: Bearer ...
  • X-Session-API-Key: ...
  • GET /api/v1/users/me?expose_secrets=true

I verified safely that the returned llm_api_key is still masked / redacted, not a usable proxy key:

  • llm_base_url: https://llm-proxy.app.all-hands.dev/
  • llm_key_present: True
  • llm_key_length: 10
  • llm_key_starts_with_sk_dash: False
  • llm_key_contains_asterisk: True
  • prefix is ***

I also tried a cheap completion against the returned base URL using the cloud-returned key and got 401 Authentication Error, consistent with the key still being redacted.

So the more accurate prod status is:

  • /users/me?expose_secrets=true returns the field, but the llm_api_key value is still masked
  • /api/v1/sandboxes/{id}/settings/secrets is still missing / falling through to the SPA HTML shell

So for SDK credential inheritance on prod right now, both halves are effectively broken:

  1. the LLM key path is not returning a usable key
  2. the sandbox secrets path is not registered/live

@enyst
Copy link
Copy Markdown
Collaborator

enyst commented Mar 23, 2026

cc: @xingyaoww It seems my initial guess on the Cloud example may have been right: the OH enterprise side is not deployed.

Shall we remove it from the PR for now, or do we want to wait for a patch / deployment on Cloud?

@xingyaoww
Copy link
Copy Markdown
Collaborator

Shall we remove it from the PR for now

Yeah, i think it is probably ok to ignore that for now in this release since the get secret functionality should be up in the next few days - as long as we revert OpenHands/docs#401 first, and then the content of it again when it is deployed? @OpenHands can you create two PRs for me, first revert docs PR 401 and the second one re-introduce it (merge AFTER release).

@openhands-ai
Copy link
Copy Markdown

openhands-ai bot commented Mar 25, 2026

I'm on it! xingyaoww can track my progress at all-hands.dev

@openhands-ai
Copy link
Copy Markdown

openhands-ai bot commented Mar 25, 2026

Summary of Work

I created two PRs in the OpenHands/docs repository as requested:

Checklist

Conciseness

The changes are minimal and focused — each PR touches only the single file (cloud-workspace.mdx) that was affected by the original PR #401, with no extraneous changes. The revert was verified to produce an identical file to the pre-PR-401 state.

@xingyaoww
Copy link
Copy Markdown
Collaborator

Since there's more changes in the past few days.. I'll close this and just cut a v1.15.0

@xingyaoww xingyaoww closed this Mar 25, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

behavior-test integration-test Runs the integration tests and comments the results test-examples Run all applicable "examples/" files. Expensive operation.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants