sync: 2026-04-28 — absorb 54 upstream commits#1
Merged
Conversation
…2279) * feat(frontend): add Playwright E2E tests with CI workflow Add end-to-end testing infrastructure using Playwright (Chromium only). 14 tests across 5 spec files cover landing page, chat workspace, thread history, sidebar navigation, and agent chat — all with mocked LangGraph/Backend APIs via network interception (zero backend dependency). New files: - playwright.config.ts — Chromium, 30s timeout, auto-start Next.js - tests/e2e/utils/mock-api.ts — shared API mocks & SSE stream helpers - tests/e2e/{landing,chat,thread-history,sidebar,agent-chat}.spec.ts - .github/workflows/e2e-tests.yml — push main + PR trigger, paths filter Updated: package.json, Makefile, .gitignore, CONTRIBUTING.md, frontend/CLAUDE.md, frontend/AGENTS.md, frontend/README.md Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: apply Copilot suggestions --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com> Co-authored-by: Willem Jiang <willem.jiang@gmail.com>
) ls_tool was the only file-system tool that did not call mask_local_paths_in_output() before returning its result, causing host absolute paths (e.g. /Users/.../backend/.deer-flow/knowledge-base/...) to leak to the LLM instead of the expected virtual paths (/mnt/knowledge-base/...). This patch: - Adds the mask_local_paths_in_output() call to ls_tool, consistent with bash_tool, glob_tool and grep_tool. - Initialises thread_data = None before the is_local_sandbox branch (same pattern as glob_tool) so the variable is always in scope. - Adds three new tests covering user-data path masking, skills path masking and the empty-directory edge case.
…e#2335) When NEXT_PUBLIC_BACKEND_BASE_URL is unset, the frontend proxies API requests to the gateway. Only /api/agents and /api/skills had rewrite rules, causing 404s for /api/models, /api/threads, /api/memory, /api/mcp, /api/suggestions, /api/runs, etc. Add a catch-all /api/:path* rewrite that proxies all remaining gateway API routes. The existing /api/langgraph rewrite takes priority because it is pushed to the array first (Next.js checks rewrites in order). Fixes bytedance#2327 Co-authored-by: JasonOA888 <JasonOA888@users.noreply.github.com>
…tedance#2323) ATT&CK矩阵ID:T1059.004 数据来源:进程启动触发检测 告警原因:该进程的命令行显示出反弹shelI的特征 命令行:timeout 1 bash -c exec 3<>/dev/tcp/127.0.0.1/2024 进程路径:/usr/bin/timeout 进程链:-[337650] /usr/sbin/sshd -D -[397971] /usr/sbin/sshd -D -R -[397977]-bash -[398903] make dev -[398920] bash ./scripts/serve.sh --dev -[399037]bash ./scripts/wait-for-port.sh 2024 60 LangGraph
…ce#2291) Bumps [langsmith](https://github.com/langchain-ai/langsmith-sdk) from 0.6.4 to 0.7.31. - [Release notes](https://github.com/langchain-ai/langsmith-sdk/releases) - [Commits](langchain-ai/langsmith-sdk@v0.6.4...v0.7.31) --- updated-dependencies: - dependency-name: langsmith dependency-version: 0.7.31 dependency-type: indirect ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
bytedance#2252) * fix(mcp): prevent RuntimeError from escaping except block in get_cached_mcp_tools When `asyncio.get_event_loop()` raises RuntimeError and the fallback `asyncio.run()` also fails, the exception escapes unhandled because Python does not route exceptions raised inside an `except` block to sibling `except` clauses. Wrap the fallback call in its own try/except so failures are logged and the function returns [] as intended. * fix: use logger.exception to preserve stack traces on MCP init failure
…ance#2305) * fix(subagent): inherit parent agent's tool_groups in task_tool When a custom agent defines tool_groups (e.g. [file:read, file:write, bash]), the restriction is correctly applied to the lead agent. However, when the lead agent delegates work to a subagent via the task tool, get_available_tools() is called without the groups parameter, causing the subagent to receive ALL tools (including web_search, web_fetch, image_search, etc.) regardless of the parent agent's configuration. This fix propagates tool_groups through run metadata so that task_tool passes the same group filter when building the subagent's tool set. Changes: - agent.py: include tool_groups in run metadata - task_tool.py: read tool_groups from metadata and pass to get_available_tools() * fix: initialize metadata before conditional block and update tests for tool_groups propagation - Initialize metadata = {} before the 'if runtime is not None' block to avoid Ruff F821 (possibly-undefined variable) and simplify the parent_tool_groups expression. - Update existing test assertion to expect groups=None in get_available_tools call signature. - Add 3 new test cases: - test_task_tool_propagates_tool_groups_to_subagent - test_task_tool_no_tool_groups_passes_none - test_task_tool_runtime_none_passes_groups_none
…ce#2349) Bumps [pytest](https://github.com/pytest-dev/pytest) from 9.0.2 to 9.0.3. - [Release notes](https://github.com/pytest-dev/pytest/releases) - [Changelog](https://github.com/pytest-dev/pytest/blob/main/CHANGELOG.rst) - [Commits](pytest-dev/pytest@9.0.2...9.0.3) --- updated-dependencies: - dependency-name: pytest dependency-version: 9.0.3 dependency-type: direct:development ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
…nt conversion (bytedance#2332) * fix: disable host-side upload conversion by default * fix: address PR review comments on upload conversion gate
…DOM elements (bytedance#2321) * fix * add test * fix
* fix: Catch httpx.ReadError in the error handling * fix
…ytedance#2217) * fix(token-usage): enable stream usage for openai-compatible models * fix(token-usage): narrow stream_usage default to ChatOpenAI
* fix command palette hydration mismatch * style: format command dialog description
bytedance#2254) * fix(setup-agent): prevent data loss when setup fails on existing agent directory Record whether the agent directory pre-existed before mkdir, and only run shutil.rmtree cleanup when the directory was newly created during this call. Previously, any failure would delete the entire directory including pre-existing SOUL.md and config.yaml. * fix: address PR review — init variables before try, remove unused result * style: fix ruff I001 import block formatting in test file * style: add missing blank lines between top-level definitions in test file
…ytedance#1803) (bytedance#2107) * Refactor tests for SKILL.md parser Updated tests for SKILL.md parser to handle quoted names and descriptions correctly. Added new tests for parsing plain and single-quoted names, and ensured multi-line descriptions are processed properly. * Implement tool name validation and deduplication Add tool name mismatch warning and deduplication logic * Refactor skill file parsing and error handling * Add tests for tool name deduplication Added tests for tool name deduplication in get_available_tools(). Ensured that duplicates are not returned, the first occurrence is kept, and warnings are logged for skipped duplicates. * Apply suggestions from code review Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com> * Update minimal config to include tools list * Update test for nonexistent skill file Ensure the test for nonexistent files checks for None. * Refactor tool loading and add skill management support Refactor tool loading logic to include skill management tools based on configuration and clean up comments. * Enhance code comments for tool loading logic Added comments to clarify the purpose of various code sections related to tool loading and configuration. * Fix assertion for duplicate tool name warning * Fix indentation issues in tools.py * Fix the lint error of test_tool_deduplication * Fix the lint error of tools.py * Fix the lint error * Fix the lint error * make format --------- Co-authored-by: Willem Jiang <willem.jiang@gmail.com> Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
…dering (bytedance#2382) * Initial plan * fix(frontend): avoid invalid paragraph nesting in reasoning trigger Agent-Logs-Url: https://github.com/bytedance/deer-flow/sessions/4c9eb0c2-ff29-4629-a61c-4e33d736d918 Co-authored-by: WillemJiang <219644+WillemJiang@users.noreply.github.com> * test(frontend): strengthen reasoning trigger DOM nesting assertion Agent-Logs-Url: https://github.com/bytedance/deer-flow/sessions/4c9eb0c2-ff29-4629-a61c-4e33d736d918 Co-authored-by: WillemJiang <219644+WillemJiang@users.noreply.github.com> --------- Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com> Co-authored-by: WillemJiang <219644+WillemJiang@users.noreply.github.com>
…e#2352) - Remove f-string prefix on 7 strings with no placeholders (F541) in analyze.py, aggregate_benchmark.py, run_loop.py, generate_review.py - Remove unused `os` import in quick_validate.py (F401) Found by ruff via HUMMBL Arbiter (https://hummbl.io/audit).
…nce#2393) The tool is registered as `present_files` (plural) in present_file_tool.py, but four references in documentation and prompt strings incorrectly used the singular form `present_file`. This could cause confusion and potentially lead to incorrect tool invocations. Changed files: - backend/docs/GUARDRAILS.md - backend/docs/ARCHITECTURE.md - backend/packages/harness/deerflow/agents/lead_agent/prompt.py (2 occurrences)
Bumps [lxml](https://github.com/lxml/lxml) from 6.0.2 to 6.1.0. - [Release notes](https://github.com/lxml/lxml/releases) - [Changelog](https://github.com/lxml/lxml/blob/master/CHANGES.txt) - [Commits](lxml/lxml@lxml-6.0.2...lxml-6.1.0) --- updated-dependencies: - dependency-name: lxml dependency-version: 6.1.0 dependency-type: indirect ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
…dance#2440) Bumps [python-dotenv](https://github.com/theskumar/python-dotenv) from 1.2.1 to 1.2.2. - [Release notes](https://github.com/theskumar/python-dotenv/releases) - [Changelog](https://github.com/theskumar/python-dotenv/blob/main/CHANGELOG.md) - [Commits](theskumar/python-dotenv@v1.2.1...v1.2.2) --- updated-dependencies: - dependency-name: python-dotenv dependency-version: 1.2.2 dependency-type: indirect ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
… warning (bytedance#2446) * fix: remove mismatched context param in debug.py to suppress Pydantic warning The ainvoke call passed context={"thread_id": ...} but the agent graph has no context_schema (ContextT defaults to None), causing a PydanticSerializationUnexpectedValue warning on every invocation. Align with the production run_agent path by injecting context via Runtime into configurable["__pregel_runtime"] instead. Closes bytedance#2445 Made-with: Cursor * refactor: derive runtime thread_id from config to avoid duplication Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> Made-with: Cursor --------- Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
…ce#2462) Bumps [dompurify](https://github.com/cure53/DOMPurify) from 3.3.1 to 3.4.1. - [Release notes](https://github.com/cure53/DOMPurify/releases) - [Commits](cure53/DOMPurify@3.3.1...3.4.1) --- updated-dependencies: - dependency-name: dompurify dependency-version: 3.4.1 dependency-type: indirect ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
…dance#2443) (bytedance#2457) * fix(skills): validate bundled SKILL.md front-matter in CI (fixes bytedance#2443) Adds a parametrized backend test that runs `_validate_skill_frontmatter` against every bundled SKILL.md under `skills/public/`, so a broken front-matter fails CI with a per-skill error message instead of surfacing as a runtime gateway-load warning. The new test caught two pre-existing breakages on `main` and fixes them: * `bootstrap/SKILL.md`: the unquoted description had a second `:` mid-line ("Also trigger for updates: ..."), which YAML parses as a nested mapping ("mapping values are not allowed here"). Rewrites the description as a folded scalar (`>-`), which preserves the original wording (including the embedded colon, double quotes, and apostrophes) without further escaping. This complements PR bytedance#2436 (single-file colon→hyphen patch) with a more general convention that survives future edits. * `chart-visualization/SKILL.md`: used `dependency:` which is not in `ALLOWED_FRONTMATTER_PROPERTIES`. Renamed to `compatibility:`, the documented field for "Required tools, dependencies" per skill-creator. No code reads `dependency` (verified by grep across backend/). * Apply suggestions from code review Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com> * Fix the lint error --------- Co-authored-by: Willem Jiang <willem.jiang@gmail.com> Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
…2467) Bumps [uuid](https://github.com/uuidjs/uuid) from 13.0.0 to 14.0.0. - [Release notes](https://github.com/uuidjs/uuid/releases) - [Changelog](https://github.com/uuidjs/uuid/blob/main/CHANGELOG.md) - [Commits](uuidjs/uuid@v13.0.0...v14.0.0) --- updated-dependencies: - dependency-name: uuid dependency-version: 14.0.0 dependency-type: direct:production ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
* feat: add optional prompt-toolkit support to debug.py Use PromptSession.prompt_async() for arrow-key navigation and input history when prompt-toolkit is available, falling back to plain input() with a helpful install tip otherwise. Made-with: Cursor * fix: handle EOFError gracefully in debug.py Catch EOFError alongside KeyboardInterrupt so that Ctrl-D exits cleanly instead of printing a traceback. Made-with: Cursor
…der uvicorn reload (bytedance#2331) * fix(gateway): bound lifespan shutdown hooks to prevent worker hang Gateway worker can hang indefinitely in `uvicorn --reload` mode with the listening socket still bound — all /api/* requests return 504, and SIGKILL is the only recovery. Root cause (py-spy dump from a reproduction showed 16+ stacked frames of signal_handler -> Event.set -> threading.Lock.__enter__ on the main thread): CPython's `threading.Event` uses `Condition(Lock())` where the inner Lock is non-reentrant. uvicorn's BaseReload signal handler calls `should_exit.set()` directly from signal context; if a second signal (SIGTERM/SIGHUP from the reload supervisor, or watchfiles-triggered reload) arrives while the first handler holds the Lock, the reentrant call deadlocks on itself. The reload supervisor keeps sending those signals only when the worker fails to exit promptly. DeerFlow's lifespan currently awaits `stop_channel_service()` with no timeout; if a channel's `stop()` stalls (e.g. Feishu/Slack WebSocket waiting for an ack), the worker can't exit, the supervisor keeps signaling, and the deadlock becomes reachable. This is a defense-in-depth fix — it does not repair the upstream uvicorn/CPython issue, but it ensures DeerFlow's lifespan exits within a bounded window so the supervisor has no reason to keep firing signals. No behavior change on the happy path. Wraps the shutdown hook in `asyncio.wait_for(timeout=5.0)` and logs a warning on timeout before proceeding to worker exit. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * Update backend/app/gateway/app.py Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * style: apply make format (ruff) to test assertions Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Co-authored-by: Willem Jiang <willem.jiang@gmail.com> Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
…nt types (bytedance#2253) * feat(subagents): support per-subagent skill loading and custom subagent types (bytedance#2230) Add per-subagent skill configuration and custom subagent type registration, aligned with Codex's role-based config layering and per-session skill injection. Backend: - SubagentConfig gains `skills` field (None=all, []=none, list=whitelist) - New CustomSubagentConfig for user-defined subagent types in config.yaml - SubagentsAppConfig gains `custom_agents` section and `get_skills_for()` - Registry resolves custom agents with three-layer config precedence - SubagentExecutor loads skills per-session as conversation items (Codex pattern) - task_tool no longer appends skills to system_prompt - Lead agent system prompt dynamically lists all registered subagent types - setup_agent tool accepts optional skills parameter - Gateway agents API transparently passes skills in CRUD operations Frontend: - Agent/CreateAgentRequest/UpdateAgentRequest types include skills field - Agent card displays skills as badges alongside tool_groups Config: - config.example.yaml documents custom_agents and per-agent skills override Tests: - 40 new tests covering all skill config, custom agents, and registry logic - Existing tests updated for new get_skills_prompt_section signature Closes bytedance#2230 * fix: address review feedback on skills PR - Remove stale get_skills_prompt_section monkeypatches from test_task_tool_core_logic.py (task_tool no longer imports this function after skill injection moved to executor) - Add key prefixes (tg:/sk:) to agent-card badges to prevent React key collisions between tool_groups and skills * fix(ci): resolve lint and test failures - Format agent-card.tsx with prettier (lint-frontend) - Remove stale "Skills Appendix" system_prompt assertion — skills are now loaded per-session by SubagentExecutor, not appended to system_prompt * fix(ci): sort imports in test_subagent_skills_config.py (ruff I001) * fix(ci): use nullish coalescing in agent-card badge condition (eslint) * fix: address review feedback on skills PR - Use model_fields_set in AgentUpdateRequest to distinguish "field omitted" from "explicitly set to null" — fixes skills=None ambiguity where None means "inherit all" but was treated as "don't change" - Move lazy import of get_subagent_config outside loop in _build_available_subagents_description to avoid repeated import overhead --------- Co-authored-by: Willem Jiang <willem.jiang@gmail.com>
…ance#2484) (bytedance#2485) The exception handler in JinaClient.crawl used logger.exception, which emits an ERROR-level record with the full httpx/httpcore/anyio traceback for every transient network failure (timeout, connection refused). Other search/crawl providers in the project log the same class of recoverable failures as a single line. One offline/slow-network session could produce dozens of multi-frame ERROR stack traces, drowning out real problems. Switch to logger.warning with a concise message that includes the exception type and its str, matching the style used elsewhere for recoverable transient failures (aio_sandbox, ddg, etc.). The exception type now also surfaces into the returned "Error: ..." string so callers retain diagnostic signal. Adds a regression test that asserts the log record is WARNING, carries no exc_info, and includes the exception class name. Co-authored-by: voidborne-d <voidborne-d@users.noreply.github.com> Co-authored-by: Willem Jiang <willem.jiang@gmail.com>
…nce#2492) * feat(trace): Add `run_name` to the trace info for suggestions and memory. before(in langsmith): CodexChatModel CodexChatModel lead_agent after: suggest_agent memory_agent lead_agent feat(trace): Add `run_name` to the trace info for suggestions and memory. before(in langsmith): CodexChatModel CodexChatModel lead_agent after: suggest_agent memory_agent lead_agent * feat(trace): Add `run_name` to the trace info for system agents. before(in langsmith): CodexChatModel CodexChatModel CodexChatModel CodexChatModel lead_agent after: suggest_agent title_agent security_agent memory_agent lead_agent * chore(code format):code format --------- Co-authored-by: Willem Jiang <willem.jiang@gmail.com>
bytedance#2466) * fix(debug): keep terminal clean by redirecting all logs to file - Redirect all logs to debug.log file to prevent background task logs from interfering with interactive terminal prompts - Honor AppConfig.log_level setting instead of hard-coding to INFO - Make logging setup idempotent by clearing pre-existing handlers - Defer deerflow imports until after logging is configured to ensure import-time side effects are captured in debug.log - Display active log level in startup banner - Add prompt_toolkit installation tip for enhanced readline support Made-with: Cursor * attaching the file handler before importing/calling get_app_config() Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> --------- Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Willem Jiang <willem.jiang@gmail.com>
…zation (bytedance#2458) * fix(middelware): narrow skill rescue to skill-related tool outputs * fix(summarization): address skill rescue review feedback * fix: wire summarization skill rescue config * fix: remove dead skill tool helper * fix(lint): fix format --------- Co-authored-by: Willem Jiang <willem.jiang@gmail.com>
* fix: gate deferred MCP tool execution * style: format deferred tool middleware * fix: address deferred tool review feedback
* fix: read lead agent options from context * fix: validate runtime context config
* feat(models): 适配 MindIE引擎的模型 * test: add unit tests for MindIEChatModel adapter and fix PR review comments * chore: update uv.lock with pytest-asyncio * build: add pytest-asyncio to test dependencies * fix: address PR review comments (lazy import, cache clients, safe newline escape, strict xml regex) --------- Co-authored-by: Willem Jiang <willem.jiang@gmail.com>
…ce#2494) * fix: use subprocess instead of os.system in local_backend.py The sandbox backend and skill evaluation scripts use subprocess * fixing the failing test --------- Co-authored-by: Willem Jiang <willem.jiang@gmail.com>
bytedance#2451) * feat(mcp): support custom tool interceptors via extensions_config.json Add a generic extension point for registering custom MCP tool interceptors through `extensions_config.json`. This allows downstream projects to inject per-request header manipulation, auth context propagation, or other cross-cutting concerns without modifying DeerFlow source code. Interceptors are declared as Python callable paths in a new `mcpInterceptors` array field and loaded via the existing `resolve_variable` reflection mechanism: ```json { "mcpInterceptors": [ "my_package.mcp.auth:build_auth_interceptor" ] } ``` Each entry must resolve to a no-arg builder function that returns an async interceptor compatible with `MultiServerMCPClient`'s `tool_interceptors` interface. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * test(mcp): add unit tests for custom tool interceptors Cover all branches of the mcpInterceptors loading logic: - valid interceptor loaded and appended to tool_interceptors - multiple interceptors loaded in declaration order - builder returning None is skipped - resolve_variable ImportError logged and skipped - builder raising exception logged and skipped - absent mcpInterceptors field is safe (no-op) - custom interceptors coexist with OAuth interceptor Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * Potential fix for pull request finding Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com> * fix(mcp): validate mcpInterceptors type and fix lint warnings Address review feedback: 1. Validate mcpInterceptors config value before iterating: - Accept a single string and normalize to [string] - Ignore None silently - Log warning and skip for non-list/non-string types 2. Fix ruff F841 lint errors in tests: - Rename _make_mock_env to _make_patches, embed mock_client - Remove unused `as mock_cls` bindings where not needed - Extract _get_interceptors() helper to reduce repetition 3. Add two new test cases for type validation: - test_mcp_interceptors_single_string_is_normalized - test_mcp_interceptors_invalid_type_logs_warning Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix(mcp): validate interceptor return type and fix import mock path Address review feedback: 1. Validate builder return type with callable() check: - callable interceptor → append to tool_interceptors - None → silently skip (builder opted out) - non-callable → log warning with type name and skip 2. Fix test mock path: resolve_variable is a top-level import in tools.py, so mock deerflow.mcp.tools.resolve_variable instead of deerflow.reflection.resolve_variable to correctly intercept calls. 3. Add test_custom_interceptor_non_callable_return_logs_warning to cover the new non-callable validation branch. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * docs(mcp): add mcpInterceptors example and documentation - Add mcpInterceptors field to extensions_config.example.json - Add "Custom Tool Interceptors" section to MCP_SERVER.md with configuration format, example interceptor code, and edge case behavior notes Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> --------- Co-authored-by: IECspace <IECspace@users.noreply.github.com> Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Co-authored-by: Willem Jiang <willem.jiang@gmail.com> Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
…ytedance#2449) * fix: cap prompt caching breakpoints at 4 to prevent API 400 errors (fixes bytedance#2448) The previous _apply_prompt_caching() attached cache_control to every text block in the system prompt, every content block in the last N messages, and the last tool definition. In multi-turn conversations with structured content blocks this easily exceeded the 4-breakpoint hard limit enforced by both the Anthropic API and AWS Bedrock, producing a 400 Bad Request (or a silent "No generations found in stream" when streaming). Fix: collect all candidate blocks in document order, then apply cache_control only to the last MAX_CACHE_BREAKPOINTS (4) of them. Later breakpoints cover a larger prefix and therefore yield better cache hit rates, making this the optimal placement strategy as well as the safe one. Adds 13 unit tests covering the budget cap, edge cases, and correct last-candidate placement. * docs: clarify _apply_prompt_caching docstring includes tool definitions Per Copilot review: the implementation also caches the last tool definition (see the candidates list at lines 202-205), so the docstring summary should explicitly mention tools alongside system and recent messages. * Fix the lint error * style: fix ruff format check for test_claude_provider_prompt_caching.py Add the missing blank line before the 'Edge cases' section comment so that ruff format --check passes in CI. --------- Co-authored-by: octo-patch <octo-patch@github.com> Co-authored-by: Willem Jiang <willem.jiang@gmail.com>
* fix(channels): accept single slack allowed user * docs: address Slack allowed_users review notes * ci: rerun backend unit tests * docs: clarify Slack allowed_users config --------- Co-authored-by: Willem Jiang <willem.jiang@gmail.com>
…ance#2525) * feat(dev): add pre-commit hooks for ruff, eslint, and prettier * fix: use local uv-based ruff hooks and uv run for pre-commit install Agent-Logs-Url: https://github.com/bytedance/deer-flow/sessions/a1e34cc5-0d4b-4400-9e6a-e687d964ff1e Co-authored-by: WillemJiang <219644+WillemJiang@users.noreply.github.com> --------- Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
* fix(channles):update the logger for the channel config * fix(channels): normalize credential values and add tests for disabled-but-configured warning Agent-Logs-Url: https://github.com/bytedance/deer-flow/sessions/dfc0a566-aa59-49f9-a74d-610292fb0a63 Co-authored-by: WillemJiang <219644+WillemJiang@users.noreply.github.com> * fix the backend lint error --------- Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
* fix(harness): constrain view_image to thread data paths Fixes bytedance#2530 * fix(harness): address view_image review findings * style(harness): format view_image changes * fix(harness): address view_image review comments
* fix(aio-sandbox): redact env values in container logs Fixes bytedance#2534 * fix(aio-sandbox): address env log review comments
* fix(skills): scan skill archives before install Fixes bytedance#2536 * fix(skills): scan archive support files before install * style(skills): format archive installer * fix(skills): address archive install review comments
) * fix(sandbox): prevent local custom mount symlink escapes Fixes bytedance#2506 * fix(sandbox): harden custom mount symlink handling * fix(sandbox): format internal symlink directory listings
* fix(sandbox): block host bash traversal escapes Fixes bytedance#2535 * fix(sandbox): harden local bash path guards * fix(sandbox): avoid bash cd argument false positives * Fix the lint error Add function to resolve and validate user data path. * Fix the lint error --------- Co-authored-by: Willem Jiang <willem.jiang@gmail.com>
bytedance#2523) * feat(models): 适配 MindIE引擎的模型 * test: add unit tests for MindIEChatModel adapter and fix PR review comments * chore: update uv.lock with pytest-asyncio * build: add pytest-asyncio to test dependencies * fix: address PR review comments (lazy import, cache clients, safe newline escape, strict xml regex) * fix(mindie): preserve string args without JSON quotes in XML tool call serialization * fix(mindie): preserve string args without JSON quotes in XML tool call serialization * test_mindie_provider:format * Potential fix for pull request finding Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com> * fix(mindie): prevent nested tool_call params from leaking into outer args * fixed by escaping XML entities in _fix_messages and unescaping during parse, with regression tests added. --------- Co-authored-by: Willem Jiang <willem.jiang@gmail.com> Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
Absorbs 54 upstream commits from bytedance/deer-flow#main onto argus.
Conflict resolution:
- backend/packages/harness/deerflow/models/factory.py — both sides
edited the lines around model_class instantiation. Upstream added
MindIE retry config (commit 2bb1a2d) and stream-usage-by-default
helper (c99865f); ours added per-loop httpx client injection.
Resolved by keeping all three: MindIE retry runs first (mutating
model_settings_from_config), then our final_kwargs construction
plus per-loop client, then model_class(**final_kwargs).
Auto-merges that worked clean:
- prompt.py — upstream renamed present_file → present_files (5ba1dac)
and added the dynamic _build_available_subagents_description helper
(30d619d). Both edit areas are far from our <file_editing> block.
- factory tests, prompt tests — additive; existing argus tests
untouched.
Lockfile bumps:
- lxml 6.0.2 → 6.1.0 (1ca2621). Drops the test_artifacts_router.py
--ignore in argus-ci.yml that was added with the original CI commit.
- pytest 9.0.2 → 9.0.3, langsmith 0.6.4 → 0.7.31, python-dotenv
1.2.1 → 1.2.2. All transparent.
- pytest-asyncio added to test deps (came in with the MindIE provider).
Local test environment shows 8 failures on the merged tree, all
confirmed as environment artifacts of our pre-baked docker image:
- test_mindie_provider.py × 6: image lacks pytest-asyncio. GitHub
Actions installs it fresh from uv.lock and the tests pass.
- test_artifacts_router.py × 2: image's python:3.12-slim base has no
/etc/mime.types, so mimetypes.guess_type('.xhtml') returns None.
Ubuntu runners on GitHub Actions return application/xhtml+xml and
the test passes.
Verified by manually installing pytest-asyncio + lxml 6.1.0 into the
container — MindIE failures all flip to passing; xhtml failures persist
because they depend on the OS-level mime database.
Upstream's lint workflow caught three issues in our patches that the
post-merge tree exposed:
- prompt.py: two lines in the <file_editing> block exceeded
line-length=240 (E501). Tightened the prose; same meaning, both
lines now 224.
- test_postgres_aprune.py: ruff I001 wanted the deerflow import
grouped with other imports, but it has to live below
pytest.importorskip(). Already had # noqa: E402 for that reason;
extended to E402, I001.
- factory.py + test_postgres_aprune.py: ruff format wanted the
weakref dict declaration and two set comprehensions on single
lines. Applied uvx ruff format.
No behavior change.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Weekly upstream sync. Brings argus up to upstream main tip 395c143.
Upstream commits touching our patched files
Plus smaller dep bumps (pytest, langsmith, python-dotenv, pytest-asyncio added).
Test plan
Notes
Local docker test environment showed 8 failures, all confirmed environment artifacts (missing pytest-asyncio in pre-baked image, missing OS /etc/mime.types). GitHub Actions CI installs deps fresh from uv.lock and runs on Ubuntu — should be clean.