Skip to content

test(agent-harness): restore TAURI-RUST-4 dedup seam test#2665

Merged
M3gA-Mind merged 1 commit into
tinyhumansai:mainfrom
sanil-23:fix/restore-tool-dedup-seam-test
May 26, 2026
Merged

test(agent-harness): restore TAURI-RUST-4 dedup seam test#2665
M3gA-Mind merged 1 commit into
tinyhumansai:mainfrom
sanil-23:fix/restore-tool-dedup-seam-test

Conversation

@sanil-23
Copy link
Copy Markdown
Contributor

@sanil-23 sanil-23 commented May 26, 2026

Summary

Problem

#2631 rewrote tool_loop_tests.rs and, in the process, removed the seam test that guards TAURI-RUST-4 along with the CapturingProvider helper it depends on. CapturingProvider records the tool-spec names that reach ChatRequest.tools; the surviving ScriptedProvider discards the request, so it cannot observe deduplication. The dedup logic itself still lives in tool_loop.rs, but with the test gone a future change could silently let duplicate tool names reach the provider — which some providers (Anthropic, OpenHuman cloud after the uniqueness rollout) reject with a 400.

Solution

Submission Checklist

If a section does not apply to this change, mark the item as N/A with a one-line reason. Do not delete items.

  • Tests added or updated (happy path + at least one failure / edge case) — this PR is the restored regression test; it asserts the dedup happy path (collision collapses to one echo) against the live loop.
  • Diff coverage ≥ 80% — N/A: the only changed lines are test code, which executes itself when run (cargo test green); no production lines added.
  • Coverage matrix updated — N/A: test-only restore, no feature added/removed/renamed.
  • All affected feature IDs from the matrix are listed under ## Related — N/A: no feature change.
  • No new external network dependencies introduced — none; the test uses an in-memory fake provider.
  • Manual smoke checklist updated — N/A: test-only change, no release-cut surface touched.
  • Linked issue closed via Closes #NNN — N/A: tracked privately on my fork; intentionally not linked.

Impact

  • Runtime/platform impact: none — test-only; no production code changes.
  • Restores regression protection for the duplicate-tool-name dedup path (TAURI-RUST-4).

Related

  • Closes: N/A
  • Follow-up PR(s)/TODOs: none

AI Authored PR Metadata (required for Codex/Linear PRs)

Linear Issue

  • Key: N/A
  • URL: N/A

Commit & Branch

  • Branch: fix/restore-tool-dedup-seam-test
  • Commit SHA: c5282d03

Validation Run

  • pnpm --filter openhuman-app format:check — N/A: no app/ TS changes. (cargo fmt --check on the changed file is clean.)
  • pnpm typecheck — N/A: no TypeScript changes.
  • Focused tests: cargo test --lib run_tool_call_loop_dedups_duplicate_tool_names_before_provider_call1 passed; 0 failed.
  • Rust fmt/check (if changed): cargo fmt --check clean; lib crate compiles (test built + ran green).
  • Tauri fmt/check (if changed): N/A — no Tauri shell changes.

Validation Blocked

  • command: pnpm rust:check (pre-push hook)
  • error: Node/pnpm + vendored tauri-cef toolchain unavailable in this environment.
  • impact: Pushed with --no-verify; the Rust core change was verified directly via cargo test. CI re-runs the full gate.

Behavior Changes

  • Intended behavior change: none.
  • User-visible effect: none.

Parity Contract

  • Legacy behavior preserved: yes — restores previously-existing test coverage only.
  • Guard/fallback/dispatch parity checks: dedup guard in tool_loop.rs is unchanged; the test re-asserts it.

Duplicate / Superseded PR Handling

  • Duplicate PR(s): none
  • Canonical PR: this
  • Resolution: N/A

Summary by CodeRabbit

Release Notes

  • Tests
    • Added integration test verifying that duplicate tool names are properly deduplicated during tool call processing, ensuring correct handling of overlapping tool definitions.

Review Change Stack

`run_tool_call_loop_dedups_duplicate_tool_names_before_provider_call`
and its `CapturingProvider` helper (which records the tool-spec names
reaching the provider) were dropped when tool_loop_tests.rs was rewritten
in tinyhumansai#2631. They guard TAURI-RUST-4: duplicate tool names (e.g. a synthesised
delegation tool whose delegate_name shadows a same-named skill tool) must
be deduplicated before the provider call, since some providers 400 on
duplicate tool names.

Restore both verbatim. The current 17-arg `run_tool_call_loop` signature
and the dedup logic in tool_loop.rs are unchanged, so the test compiles and
passes as a genuine regression guard.

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@sanil-23 sanil-23 requested a review from a team May 26, 2026 06:44
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 26, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 3ee9d348-ddab-411a-acc7-21a52057e8c2

📥 Commits

Reviewing files that changed from the base of the PR and between e7e7e8a and c5282d0.

📒 Files selected for processing (1)
  • src/openhuman/agent/harness/tool_loop_tests.rs

📝 Walkthrough

Walkthrough

This PR adds test coverage for tool name deduplication in the tool call loop. A CapturingProvider test double records tool names from provider calls, and a new test verifies that when multiple registries contain tools with identical names, only one deduplicated name is sent to the provider.

Changes

Tool Deduplication Regression Test

Layer / File(s) Summary
Tool deduplication test infrastructure and verification
src/openhuman/agent/harness/tool_loop_tests.rs
CapturingProvider test double captures tool-spec names from each ChatRequest.tools invocation and returns scripted responses. New test run_tool_call_loop_dedups_duplicate_tool_names_before_provider_call runs the tool loop with native_tools enabled using a registry and extra tools both containing EchoTool, asserts exactly one provider chat() call occurs, and verifies only a single "echo" entry is sent (duplicates removed before the provider call).

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

  • tinyhumansai/openhuman#2446: The main PR's new regression test directly targets the same run_tool_call_loop/provider-call deduplication behavior introduced in #2446 by ensuring only one "echo" tool spec is sent in ChatRequest.tools.
  • tinyhumansai/openhuman#2485: Both PRs implement/verify deduplication of duplicate tool names/specs before sending them in Provider::chat() requests (main PR via run_tool_call_loop test, retrieved PR in subagent runner via filtered_specs dedup).

Suggested labels

agent, working

Poem

A test hops in, so keen and bright, 🐰
To catch the dupes before they flight,
One echo tool where two once hid,
The loop dedupes—hooray, it did!
Clean tools sent forth, the test confirmed aright!

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 40.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately describes the main change: restoring a regression test for tool name deduplication in the agent harness, with specific reference to the TAURI-RUST-4 issue.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.


Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai coderabbitai Bot added agent Built-in agents, prompts, orchestration, and agent runtime in src/openhuman/agent/. working A PR that is being worked on by the team. labels May 26, 2026
Copy link
Copy Markdown
Contributor

@M3gA-Mind M3gA-Mind left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Straightforward test restoration. Everything looks correct.

CapturingProvider — records ChatRequest.tools names per chat() call. Uses parking_lot::Mutex consistently with the rest of the file (line 9). .lock() without unwrap() is correct for parking_lot. The guard.remove(0) call would panic if called more times than scripted responses exist, but the test is designed for exactly one chat() invocation (final-text response, no tool calls), so this is fine.

Test logic — sets up a registry + extra_tools collision with two EchoTool instances both named "echo", asserts exactly one "echo" in ChatRequest.tools. Directly exercises the dedup path and the right failure mode (duplicate names reaching the provider → 400 from Anthropic/cloud).

Signature check — the 17-arg run_tool_call_loop call matches what the other tests in this file use (e.g. lines 212, 264, 303), no drift introduced.

CI green, no conflicts. LGTM.

@M3gA-Mind M3gA-Mind merged commit 1aee7de into tinyhumansai:main May 26, 2026
42 of 48 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

agent Built-in agents, prompts, orchestration, and agent runtime in src/openhuman/agent/. working A PR that is being worked on by the team.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants