feat(mcp): surface run_id in run-backed tool results for citation (DRC-3532) by iamcxa · Pull Request #1418 · DataRecce/recce

iamcxa · 2026-06-03T09:54:52Z

Closes DRC-3532.

Why (cross-repo, DRC-3532)

The recce-cloud summary agent wants to deep-link each factual statement in its PR
summary to the exact run it executed, via deterministic {{run:<run_id>}} inline
markers (server-side marker replacement already exists in recce-cloud-infra's
recce_task_func). But the agent never received the run_id: the cloud-backend
MCP tools that back the agent's analysis (row_count_diff, profile_diff,
value_diff, query, query_diff, top_k_diff, histogram_diff) route through
_tool_run_backed, which returned only run["result"] and dropped run_id.

Without run_id the agent cannot cite runs deterministically, forcing fragile
server-side fuzzy prose matching (low/unstable coverage, occasional wrong links).

What

_tool_run_backed now merges run_id into the result dict (additive — existing
result fields preserved). Only added when the response actually carries a run_id
and the result is a dict, so run-less or non-dict responses are untouched (the
agent must never be handed, or synthesize, a run_id it was not given).

Test

tests/test_mcp_cloud_backend.py:

new: run-backed result surfaces run_id (merged, fields preserved)
new: run-less response leaves the result untouched (no invented run_id)
updated: the existing run-tool routing test now expects the surfaced run_id

40 cloud-backend tests + 123 local mcp_server tests green.

Paired change

recce-cloud-infra: the summary agent prompt emits {{run:<run_id>}} markers
using this run_id; fuzzy linkify is demoted to a legacy fallback. Tracked under
DRC-3532. Durable structured-citation design is DRC-3634.

🤖 Generated with Claude Code

Consumed by (cross-repo)

recce-cloud-infra PR DataRecce/recce-cloud-infra#1427 (Summary v2: trust architecture — value-grounding + run citations) depends on this PR. The instance-launcher image bakes the recce build from this branch; #1427's run-citation deep links ([value](…/runs/<id>)) and profile_diff-grounded row counts require the changes here. Suggested merge order: this PR → recce release/image → #1427.

Also on this branch — `profile_diff` lowercase-column fix (DRC-3674 B)

Commit 889a01b7 (fix(profile): resolve lowercase columns to physical names on case-folding warehouses): ProfileDiffTask / ProfileTask filtered requested columns with a case-sensitive column.name in selected_columns test. On Snowflake (physical names UPPERCASE) the agent's lowercase column names matched nothing → the filter dropped every column → empty profile → the cloud summary suppressed the whole analysis. Fix lowercases both sides of the membership test (no-op on duckdb; SQL path unchanged). Regression test reproduces the empty-profile case with an uppercase-header duckdb table. This unblocks #1427's CLV/profile grounding (which was intermittently empty in UAT depending on the LLM's column casing).

…C-3532) The cloud-backend MCP tools (row_count_diff, profile_diff, value_diff, query, query_diff, top_k_diff, histogram_diff) routed through _tool_run_backed returned only run["result"], dropping run_id. The recce-cloud summary agent therefore never saw the run_id and could not emit deterministic {{run:<run_id>}} citation markers, forcing fragile server-side fuzzy prose matching. Merge run_id into the result dict (additive; existing result fields preserved). Only added when the response carries a run_id and the result is a dict, so run-less or non-dict responses are untouched (never synthesize a run_id). Cross-repo (DRC-3532): the recce-cloud summary agent prompt is updated to emit the markers; server-side marker replacement already exists in recce_task_func. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Signed-off-by: Kent <iamcxa@gmail.com>

codecov · 2026-06-03T10:03:30Z

Codecov Report

❌ Patch coverage is 93.26425% with 13 lines in your changes missing coverage. Please review.

Files with missing lines	Patch %	Lines
recce/mcp_server.py	66.66%	13 Missing ⚠️

Files with missing lines	Coverage Δ
recce/tasks/profile.py	`83.33% <100.00%> (+0.28%)`	⬆️
tests/tasks/test_profile.py	`99.13% <100.00%> (+0.21%)`	⬆️
tests/test_mcp_cloud_backend.py	`100.00% <100.00%> (ø)`
tests/test_mcp_server.py	`99.87% <100.00%> (+<0.01%)`	⬆️
recce/mcp_server.py	`90.12% <66.66%> (-1.15%)`	⬇️

... and 4 files with indirect coverage changes

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

…s state Local-mode ad-hoc diff tools (row_count_diff, profile_diff, query, query_diff, value_diff) now route through _tool_run_backed_local, which uses the same submit_run machinery as the recce server's run_router to persist Runs to default_context().runs. The tool result carries run_id alongside the diff output, matching the CloudBackend._tool_run_backed shape (commit 98670f9). When default_context() is None (cloud SSE path already handled by CloudBackend), the method falls back to direct task execution without run_id — identical to pre-DRC-3634 behaviour, no double execution. The MCP server remains the sole state exporter; runs created mid-session via _tool_run_backed_local are included in the exported payload because they land in the same in-process RecceContext.runs list that export_state() serialises. Tests: 12 new tests in TestLocalModeRunBacked covering run_id presence, run persistence in context.runs, run type mapping (query/query_base), and the isolation boundary (value_diff_detail stays outside run-backed scope). 175 MCP tests green (135 server + 40 cloud backend); ruff clean. Signed-off-by: Kent <iamcxa@gmail.com>

…ding warehouses profile_diff/profile filtered the requested `columns` with a case-sensitive membership test (`column.name in selected_columns`). get_columns() returns physical catalog names, which case-folding warehouses (e.g. Snowflake) store UPPERCASE, while the Recce Cloud summary agent supplies lowercase manifest-convention names. The exact-case filter dropped every column, so the per-column profiling loop never ran and the tool returned a completely empty result ({"base":{"columns":[],"data":[]},"current":{"columns":[],"data":[]}}). This made the cloud summary non-deterministic: lowercase column names yielded an empty profile that the downstream trust gate suppressed. Fix: lowercase both sides of the membership test before filtering, mirroring the case-insensitive normalisation in valuediff's _build_column_case_lookup. The SQL path is unchanged — it still profiles via the physical column.name — so lowercase input now resolves to the same physical (uppercase) columns and returns the same non-empty profile as uppercase input. No-op on already-lowercase duckdb models and on quoted/case-sensitive identifiers; applied to both ProfileDiffTask and ProfileTask. Adds regression tests using an UPPERCASE duckdb CSV header (duckdb preserves header casing, reproducing Snowflake physical-name casing end-to-end with no mocking): asserts lowercase and uppercase column requests return identical non-empty profiles, plus an adapter-safety no-op test on lowercase physical names. DRC-3674 Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Signed-off-by: Kent <iamcxa@gmail.com>

…sertions test_tool_row_count_diff and test_tool_query_with_base_flag asserted exact equality with the mock result, but run-backed local mode (DRC-3532, _tool_run_backed_local) now merges a run_id key into the tool result. Compare on the result fields ignoring run_id; the dedicated TestLocalModeRunBacked tests cover run_id surfacing. Fixes the Test DBT Versions CI failure on PR #1418. 135 test_mcp_server tests green. Signed-off-by: Kent <iamcxa@gmail.com>

kentwelcome

Claude Code Review: 1 blocker (silent error swallowing in local run-backed tools), 2 issues. See review comment.

kentwelcome · 2026-06-15T08:27:12Z

Code Review: PR #1418

SHA 2e8dd51d · Verdict NO-GO

Blockers

recce/mcp_server.py:613-624 — Failed runs return a success-shaped result, silently swallowing the error.
Evidence: submit_run.fn (recce/apis/run_func.py:245-258) catches every exception except DuckDBExternalAccessBlocked, stores it in run.error, and return None — so await asyncio.wrap_future(future) (line 611) never raises. _tool_run_backed_local then reads only run.result (None → {}) and never inspects run.status/run.error, returning {"run_id": ...}. Before this PR, task.execute() raised and call_tool re-raised (mcp_server.py:1473) so the MCP SDK set isError=True. Now a failing query/query_diff/profile_diff/value_diff/row_count_diff in local mode (bad SQL, missing model, permission denied) returns an empty success — the agent loses both the error and the _classify_db_error classification (TABLE_NOT_FOUND / PERMISSION_DENIED / SYNTAX_ERROR), and a citation agent can report "empty/no diff" for a query that actually errored.
Pass D.

Concrete example — agent calls query_diff with a non-existent column:
```
query_diff({"sql_template": "SELECT bad_col FROM {{ ref('orders') }}", "primary_keys": ["id"]})
```
- Before: QueryDiffTask.execute() raises duckdb BinderException('Referenced column "bad_col" not found'); call_tool re-raises → MCP response isError=True carrying the message. Agent knows it failed.
- After: submit_run.fn catches the BinderException (it is not DuckDBExternalAccessBlocked), sets run.status=FAILED / run.error="Referenced column ... not found" / run.result=None, and returns None. Future resolves cleanly; _tool_run_backed_local maps run.result is None → {} and returns a success result:
```
{ "run_id": "a1b2c3d4-..." }
```
  No data, no error. The summary agent reads this as "ran fine, empty diff" and can emit {{run:a1b2c3d4}} citing a run that actually errored.
Exact culprit lines (the failure is the absence of a status check across three lines):
```
609    run, future = _submit_run_fn(run_type, params, triggered_by="recce_ai")
610    # Await the executor future so run.result is populated before we return.
611    await asyncio.wrap_future(future)        # <-- discards return value, never raises:
                                                #     submit_run.fn returns None on error,
                                                #     so the future resolves "successfully"
612    # Normalise run.result to a plain dict (tasks may return Pydantic models).
613    raw_result = run.result
614    if raw_result is None:                    # <-- FAILED run (result=None, msg in run.error)
615        result: Dict[str, Any] = {}           #     collapses into empty {}, status/error ignored
...
624    return {**result, "run_id": str(run.run_id)}   # <-- success-shaped dict returned unconditionally
```
- Line 611 awaits but never raises — the error signal is lost here (the line 610 comment assumes success).
- Lines 614-615 treat a FAILED run identically to a genuinely-empty result; nothing reads run.status/run.error.
- Line 624 returns {"run_id": ...} regardless of failure.
Lines 614-615 are not wrong in isolation — a successful task can legitimately return empty. The bug is that nothing distinguishes "succeeded empty" from "failed," because line 611 already swallowed that signal.

Suggested fix — add a guard between 611 and 613:
```
await asyncio.wrap_future(future)
if run.status == RunStatus.FAILED:
    raise RecceException(run.error or f"{run_type} run failed")
raw_result = run.result
```
This re-raises so call_tool (mcp_server.py:1450-1474) restores isError=True, while still leaving the persisted Run in the context for citation.

Issues

recce/mcp_server.py (tool descriptions) — The 5 run-backed tools now return a top-level run_id, but none of their Tool(...description=...) blocks document it.
Evidence: grep run_id over the descriptions finds only run_check (line 1207, unrelated). CLAUDE.md → "MCP tool description = LLM agent contract. Description MUST match actual response format" and "Prefer additive (_meta) over modifying existing field types." A new top-level field is a response-shape change the contract doesn't declare.
Pass C.
PR description omits the DRC-3634 local-mode rewrite — the largest change in the diff.
Evidence: the "What" section describes only CloudBackend._tool_run_backed merging run_id (additive). The actual diff reroutes row_count_diff/query/query_diff/profile_diff/value_diff through a new _tool_run_backed_local that executes via submit_run, persists Runs to the in-process context, and changes return shape + error semantics. That behavior change is documented only in inline # DRC-3634 comments — exactly where the Blocker hides.
Pass F.

Notes

tests/test_mcp_server.py::TestLocalModeRunBacked covers only happy paths (every task.execute is mocked to return a model). No test exercises a raising task, so the error-swallowing regression above is uncaught by the suite. Adding a "task raises → tool surfaces error (isError), not {run_id}" case would have flagged it. Pass E.
_tool_run_backed_local fallback branch (default_context() is None, lines 591-606) preserves the old raise-on-error behavior, but in live local mode default_context() is set, so the swallowing path (lines 609-624) is the one that runs.

Limits

Local environment is missing the mcp and dbt modules (pytest.importorskip("mcp") skipped the MCP suites; tests/tasks conftest fails on No module named 'dbt'). Could not execute the test suite here — author reports 40 cloud-backend + 123 mcp_server green; CI gates. Findings above are from static tracing, not a failed test run.

iamcxa and others added 2 commits June 13, 2026 00:05

iamcxa requested a review from kentwelcome June 15, 2026 04:05

iamcxa added 2 commits June 15, 2026 12:08

Merge branch 'main' into feature/drc-3532-mcp-expose-run-id

2e8dd51

kentwelcome requested changes Jun 15, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(mcp): surface run_id in run-backed tool results for citation (DRC-3532)#1418

feat(mcp): surface run_id in run-backed tool results for citation (DRC-3532)#1418
iamcxa wants to merge 5 commits into
mainfrom
feature/drc-3532-mcp-expose-run-id

iamcxa commented Jun 3, 2026 •

edited

Loading

Uh oh!

codecov Bot commented Jun 3, 2026 •

edited

Loading

Uh oh!

kentwelcome left a comment

Uh oh!

kentwelcome commented Jun 15, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

iamcxa commented Jun 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Why (cross-repo, DRC-3532)

What

Test

Paired change

Consumed by (cross-repo)

Also on this branch — profile_diff lowercase-column fix (DRC-3674 B)

Uh oh!

codecov Bot commented Jun 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

kentwelcome left a comment

Choose a reason for hiding this comment

Uh oh!

kentwelcome commented Jun 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Code Review: PR #1418

Blockers

Issues

Notes

Limits

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

iamcxa commented Jun 3, 2026 •

edited

Loading

Also on this branch — `profile_diff` lowercase-column fix (DRC-3674 B)

codecov Bot commented Jun 3, 2026 •

edited

Loading

kentwelcome commented Jun 15, 2026 •

edited

Loading