feat(monitor): transplant compat monitor and swebench runner#182
Open
shaluoyan523 wants to merge 4 commits intoOpenDCAI:mainfrom
Open
feat(monitor): transplant compat monitor and swebench runner#182shaluoyan523 wants to merge 4 commits intoOpenDCAI:mainfrom
shaluoyan523 wants to merge 4 commits intoOpenDCAI:mainfrom
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Monitor 兼容版移植说明
变更背景
当前
main分支上的 monitor 已退化为较早期的 sandbox console,仅保留:而
/home/dataset-local/data1/Mycel-compat-monitor-pr93中的 monitor 已扩展出:本分支的目标是将这套兼容版 monitor 能力移植回最新版
main,并补齐当前主线上的运行环境适配。本次变更内容
1. 移植 monitor 前后端
移植并恢复了以下 monitor 能力:
EvaluationPageEvaluationDetailPageSessionDetailPageThread TraceConversation / Events / Steps多视图/api/monitor/evaluations/api/monitor/evaluation/{evaluation_id}/api/monitor/evaluation/runs/api/monitor/session/{session_id}/api/monitor/thread/{thread_id}/trace对应文件:
backend/web/monitor.pybackend/web/routers/monitor.pyfrontend/monitor/src/App.tsxfrontend/monitor/src/styles.cssfrontend/monitor/vite.config.ts2. 适配最新版 main 的后端结构
为兼容当前主线的存储拆分与路由结构,补了以下适配:
backend.web.monitor/api/monitor/health/api/monitor/resources/api/monitor/resources/refresh/api/monitor/sandbox/{lease_id}/browse/api/monitor/sandbox/{lease_id}/readSQLiteDBRole.RUN_EVENTSQLiteDBRole.SANDBOXDB_PATH3. 修复 monitor 显示异常
修复了几个会导致“看起来不对劲”的问题:
Threads页此前只看chat_sessions,运行中的 SWE-Bench 线程只写 checkpoint 时不会显示Evaluation detail在没有 session、只有 checkpoint 的阶段不会渲染线程行/api/threads/{thread_id},会因为缺少 Bearer token 报:Conversation load failed: Missing or invalid Authorization header/api/monitor/thread/{thread_id}/conversation4. 恢复 SWE-Bench 运行入口
当前主线 monitor UI 里保留了 SWE-Bench 入口,但执行脚本已经不在仓库中。为让 monitor 的 evaluation 功能可实际执行,本分支恢复了:
eval/swebench/run_slice.py并做了当前环境适配:
--eval-timeout-sec--git-timeout-sec~/.leon/models.json读取OPENAI_API_KEYLEON_SANDBOX_DB_PATH5. 补齐评测依赖声明
将 monitor 的 SWE-Bench 运行依赖加入项目依赖声明:
datasetsswebenchsocksio对应文件:
pyproject.tomluv.lock已验证内容
编译/构建验证
已完成:
python3 -m py_compile backend/web/monitor.pypython3 -m py_compile backend/web/routers/monitor.pypython3 -m py_compile eval/swebench/run_slice.pycd frontend/monitor && npm run build接口验证
已确认以下接口可用:
/api/monitor/evaluations/api/monitor/evaluation/{evaluation_id}/api/monitor/evaluation/runs/api/monitor/session/{session_id}/api/monitor/thread/{thread_id}/trace/api/monitor/thread/{thread_id}/conversation/api/monitor/resources运行态验证
已通过 monitor 发起 1 条最小 SWE-Bench 测试任务,并验证:
当前分支说明
本分支为:
monitor-compat-transplant目的:
main后续建议
建议后续继续拆两步:
backend/web/monitor.py中与 SWE-Bench runner 强绑定的逻辑进一步抽到独立 service,降低 monitor 文件体积。