test(agent-harness): restore TAURI-RUST-4 dedup seam test#2665
Conversation
`run_tool_call_loop_dedups_duplicate_tool_names_before_provider_call` and its `CapturingProvider` helper (which records the tool-spec names reaching the provider) were dropped when tool_loop_tests.rs was rewritten in tinyhumansai#2631. They guard TAURI-RUST-4: duplicate tool names (e.g. a synthesised delegation tool whose delegate_name shadows a same-named skill tool) must be deduplicated before the provider call, since some providers 400 on duplicate tool names. Restore both verbatim. The current 17-arg `run_tool_call_loop` signature and the dedup logic in tool_loop.rs are unchanged, so the test compiles and passes as a genuine regression guard. Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (1)
📝 WalkthroughWalkthroughThis PR adds test coverage for tool name deduplication in the tool call loop. A ChangesTool Deduplication Regression Test
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Possibly related PRs
Suggested labels
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Comment |
M3gA-Mind
left a comment
There was a problem hiding this comment.
Straightforward test restoration. Everything looks correct.
CapturingProvider — records ChatRequest.tools names per chat() call. Uses parking_lot::Mutex consistently with the rest of the file (line 9). .lock() without unwrap() is correct for parking_lot. The guard.remove(0) call would panic if called more times than scripted responses exist, but the test is designed for exactly one chat() invocation (final-text response, no tool calls), so this is fine.
Test logic — sets up a registry + extra_tools collision with two EchoTool instances both named "echo", asserts exactly one "echo" in ChatRequest.tools. Directly exercises the dedup path and the right failure mode (duplicate names reaching the provider → 400 from Anthropic/cloud).
Signature check — the 17-arg run_tool_call_loop call matches what the other tests in this file use (e.g. lines 212, 264, 303), no drift introduced.
CI green, no conflicts. LGTM.
Summary
run_tool_call_loop_dedups_duplicate_tool_names_before_provider_calland itsCapturingProvidertest helper insrc/openhuman/agent/harness/tool_loop_tests.rs.tool_loop.rsis currently untested.Problem
#2631 rewrote
tool_loop_tests.rsand, in the process, removed the seam test that guards TAURI-RUST-4 along with theCapturingProviderhelper it depends on.CapturingProviderrecords the tool-spec names that reachChatRequest.tools; the survivingScriptedProviderdiscards the request, so it cannot observe deduplication. The dedup logic itself still lives intool_loop.rs, but with the test gone a future change could silently let duplicate tool names reach the provider — which some providers (Anthropic, OpenHuman cloud after the uniqueness rollout) reject with a 400.Solution
CapturingProvider(struct +Providerimpl) and the#[tokio::test]verbatim from the pre-feat(agent): agentic coding runtime — gated OS capabilities (filesystem, shell, install) via deterministic permission tiers + chat approvals #2631 tree (0a9f7a0e~1).run_tool_call_loopwith a registry tool namedechoplus anextra_toolscollision also namedecho(the exact shadowing pattern from the bug report) and asserts exactly oneechoreachesChatRequest.tools.main: the 17-argrun_tool_call_loopsignature is unchanged, so the restored test exercises live behavior.cargo test --lib …dedups_duplicate_tool_names…→1 passed; 0 failed.Submission Checklist
echo) against the live loop.cargo testgreen); no production lines added.## Related— N/A: no feature change.Closes #NNN— N/A: tracked privately on my fork; intentionally not linked.Impact
Related
AI Authored PR Metadata (required for Codex/Linear PRs)
Linear Issue
Commit & Branch
fix/restore-tool-dedup-seam-testc5282d03Validation Run
pnpm --filter openhuman-app format:check— N/A: noapp/TS changes. (cargo fmt --checkon the changed file is clean.)pnpm typecheck— N/A: no TypeScript changes.cargo test --lib run_tool_call_loop_dedups_duplicate_tool_names_before_provider_call→1 passed; 0 failed.cargo fmt --checkclean; lib crate compiles (test built + ran green).Validation Blocked
command:pnpm rust:check(pre-push hook)error:Node/pnpm + vendoredtauri-ceftoolchain unavailable in this environment.impact:Pushed with--no-verify; the Rust core change was verified directly viacargo test. CI re-runs the full gate.Behavior Changes
Parity Contract
tool_loop.rsis unchanged; the test re-asserts it.Duplicate / Superseded PR Handling
Summary by CodeRabbit
Release Notes