feat(mcp): surface run_id in run-backed tool results for citation (DRC-3532)#1418
feat(mcp): surface run_id in run-backed tool results for citation (DRC-3532)#1418iamcxa wants to merge 5 commits into
Conversation
…C-3532)
The cloud-backend MCP tools (row_count_diff, profile_diff, value_diff, query,
query_diff, top_k_diff, histogram_diff) routed through _tool_run_backed returned
only run["result"], dropping run_id. The recce-cloud summary agent therefore
never saw the run_id and could not emit deterministic {{run:<run_id>}} citation
markers, forcing fragile server-side fuzzy prose matching.
Merge run_id into the result dict (additive; existing result fields preserved).
Only added when the response carries a run_id and the result is a dict, so
run-less or non-dict responses are untouched (never synthesize a run_id).
Cross-repo (DRC-3532): the recce-cloud summary agent prompt is updated to emit
the markers; server-side marker replacement already exists in recce_task_func.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Signed-off-by: Kent <iamcxa@gmail.com>
Codecov Report❌ Patch coverage is
... and 4 files with indirect coverage changes 🚀 New features to boost your workflow:
|
…s state Local-mode ad-hoc diff tools (row_count_diff, profile_diff, query, query_diff, value_diff) now route through _tool_run_backed_local, which uses the same submit_run machinery as the recce server's run_router to persist Runs to default_context().runs. The tool result carries run_id alongside the diff output, matching the CloudBackend._tool_run_backed shape (commit 98670f9). When default_context() is None (cloud SSE path already handled by CloudBackend), the method falls back to direct task execution without run_id — identical to pre-DRC-3634 behaviour, no double execution. The MCP server remains the sole state exporter; runs created mid-session via _tool_run_backed_local are included in the exported payload because they land in the same in-process RecceContext.runs list that export_state() serialises. Tests: 12 new tests in TestLocalModeRunBacked covering run_id presence, run persistence in context.runs, run type mapping (query/query_base), and the isolation boundary (value_diff_detail stays outside run-backed scope). 175 MCP tests green (135 server + 40 cloud backend); ruff clean. Signed-off-by: Kent <iamcxa@gmail.com>
…ding warehouses
profile_diff/profile filtered the requested `columns` with a case-sensitive
membership test (`column.name in selected_columns`). get_columns() returns
physical catalog names, which case-folding warehouses (e.g. Snowflake) store
UPPERCASE, while the Recce Cloud summary agent supplies lowercase
manifest-convention names. The exact-case filter dropped every column, so the
per-column profiling loop never ran and the tool returned a completely empty
result ({"base":{"columns":[],"data":[]},"current":{"columns":[],"data":[]}}).
This made the cloud summary non-deterministic: lowercase column names yielded an
empty profile that the downstream trust gate suppressed.
Fix: lowercase both sides of the membership test before filtering, mirroring the
case-insensitive normalisation in valuediff's _build_column_case_lookup. The SQL
path is unchanged — it still profiles via the physical column.name — so lowercase
input now resolves to the same physical (uppercase) columns and returns the same
non-empty profile as uppercase input. No-op on already-lowercase duckdb models and
on quoted/case-sensitive identifiers; applied to both ProfileDiffTask and
ProfileTask.
Adds regression tests using an UPPERCASE duckdb CSV header (duckdb preserves
header casing, reproducing Snowflake physical-name casing end-to-end with no
mocking): asserts lowercase and uppercase column requests return identical
non-empty profiles, plus an adapter-safety no-op test on lowercase physical names.
DRC-3674
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Signed-off-by: Kent <iamcxa@gmail.com>
…sertions test_tool_row_count_diff and test_tool_query_with_base_flag asserted exact equality with the mock result, but run-backed local mode (DRC-3532, _tool_run_backed_local) now merges a run_id key into the tool result. Compare on the result fields ignoring run_id; the dedicated TestLocalModeRunBacked tests cover run_id surfacing. Fixes the Test DBT Versions CI failure on PR #1418. 135 test_mcp_server tests green. Signed-off-by: Kent <iamcxa@gmail.com>
kentwelcome
left a comment
There was a problem hiding this comment.
Claude Code Review: 1 blocker (silent error swallowing in local run-backed tools), 2 issues. See review comment.
Code Review: PR #1418SHA Blockers
Issues
Notes
Limits
|
Closes DRC-3532.
Why (cross-repo, DRC-3532)
The recce-cloud summary agent wants to deep-link each factual statement in its PR
summary to the exact run it executed, via deterministic
{{run:<run_id>}}inlinemarkers (server-side marker replacement already exists in recce-cloud-infra's
recce_task_func). But the agent never received the run_id: the cloud-backendMCP tools that back the agent's analysis (row_count_diff, profile_diff,
value_diff, query, query_diff, top_k_diff, histogram_diff) route through
_tool_run_backed, which returned onlyrun["result"]and droppedrun_id.Without run_id the agent cannot cite runs deterministically, forcing fragile
server-side fuzzy prose matching (low/unstable coverage, occasional wrong links).
What
_tool_run_backednow mergesrun_idinto the result dict (additive — existingresult fields preserved). Only added when the response actually carries a run_id
and the result is a dict, so run-less or non-dict responses are untouched (the
agent must never be handed, or synthesize, a run_id it was not given).
Test
tests/test_mcp_cloud_backend.py:40 cloud-backend tests + 123 local mcp_server tests green.
Paired change
recce-cloud-infra: the summary agent prompt emits
{{run:<run_id>}}markersusing this run_id; fuzzy linkify is demoted to a legacy fallback. Tracked under
DRC-3532. Durable structured-citation design is DRC-3634.
🤖 Generated with Claude Code
Consumed by (cross-repo)
recce-cloud-infra PR DataRecce/recce-cloud-infra#1427 (Summary v2: trust architecture — value-grounding + run citations) depends on this PR. The instance-launcher image bakes the recce build from this branch; #1427's run-citation deep links (
[value](…/runs/<id>)) andprofile_diff-grounded row counts require the changes here. Suggested merge order: this PR → recce release/image → #1427.Also on this branch —
profile_difflowercase-column fix (DRC-3674 B)Commit
889a01b7(fix(profile): resolve lowercase columns to physical names on case-folding warehouses):ProfileDiffTask/ProfileTaskfiltered requested columns with a case-sensitivecolumn.name in selected_columnstest. On Snowflake (physical names UPPERCASE) the agent's lowercase column names matched nothing → the filter dropped every column → empty profile → the cloud summary suppressed the whole analysis. Fix lowercases both sides of the membership test (no-op on duckdb; SQL path unchanged). Regression test reproduces the empty-profile case with an uppercase-header duckdb table. This unblocks #1427's CLV/profile grounding (which was intermittently empty in UAT depending on the LLM's column casing).