Skip to content

Add corrective-retrieval signal to search responses#36

Open
RuneLind wants to merge 2 commits into
mainfrom
phase0-corrective-signal
Open

Add corrective-retrieval signal to search responses#36
RuneLind wants to merge 2 commits into
mainfrom
phase0-corrective-signal

Conversation

@RuneLind
Copy link
Copy Markdown
Owner

Search responses now carry the signal a client needs to decide whether to re-query: a response-level bestScore, a per-result confidenceBand, a retryHints object, noConfidentResults, and a new min_relevance filter param. Defaults are unchanged — existing callers are unaffected.

This is Phase 0 of the corrective-retrieval / query-decomposition work tracked in mimir/plans/huginn-muninn-corrective-rag.md. Huginn stays a pure retriever; the agentic corrective loop (CRAG-style grade-and-requery) and query decomposition live in Muninn — but Muninn needs Huginn to expose a cheap, no-LLM quality signal first. confidenceBand thresholds are tied to the searcher's existing LOW_CONFIDENCE_THRESHOLD, so a low band is the reranker's own noise zone; rank-based brief-mode results are capped at medium (never claim high).

  • confidence_band() + confidenceBand on every shaped result; bestScore on the response (taken before any filter, so callers can tell "found something below your bar" from "found nothing")
  • GraphSearchAugmenter.get_retry_hints()detectedEntities / graph-neighbour relatedTerms / narrowerQuery / heuristic broaderQuery (the broadening works with no graph); emitted when results are empty or bestScore is weak
  • min_relevance param on /api/search, build_search_tool_fn, and the HTTP MCP adapter (all signature variants) — drops weak results; when it empties the set, returns retryHints + noConfidentResults instead of low-quality filler
  • SearchTrace.set_response_meta() — additive optional response block (no schema bump); HTTP MCP adapter renders (NN% relevant · band) + a compact retry-hint footer

Testing

  • Unit tests — 22 new (confidence_band boundaries, shape_search_results band tagging, _broaden_query heuristics, get_retry_hints with/without graph, mcp_search_tool corrective signal incl. trace block); full suite 636 passing
  • Manual testing locally — live-checked /api/search against a wiki-collection server: min_relevance=0.99{"results":[],"bestScore":0.75,"noConfidentResults":true,"retryHints":{...}}; normal queries unchanged (confidenceBand + graph_context present)
  • E2E / smoke against the full production server (run huginn-pr-smoke-test after restarting it on this code)

Notes

  • Branch was cut from local main, which was one commit ahead of origin/main — so this PR also carries the pre-existing 7f71a22 Fix extract_jira_graph import when run as a script.
  • Descoped to follow-ups (noted in the plan): retryHints.suggestedCollections (needs entity→collection attribution at graph-load), a reranked: false response flag (so callers know bestScore is rank-based in brief mode), and wiring min_relevance into the stdio MCP tool args.

RuneLind added 2 commits May 12, 2026 10:03
Add a sys.path bootstrap to the repo root before importing main.utils.frontmatter, mirroring the pattern already used in cross_collection_gap_analysis.py.

Commit a90d124 switched this script to import from the main package, but the daily Jira update runs it as `uv run scripts/knowledge_graph/extract_jira_graph.py`, which puts the script's directory on sys.path rather than the repo root. That made `import main` fail with ModuleNotFoundError and silently skipped the Jira graph re-extraction step in the daily job.
Search responses (HTTP /api/search, the MCP search tools, and the HTTP
MCP adapter) now carry the signal a client needs to decide whether to
re-query: a response-level bestScore, a per-result confidenceBand
(high/medium/low; rank-based brief-mode results are capped at medium), a
retryHints object (detectedEntities, graph-neighbour relatedTerms,
narrowerQuery, a heuristic broaderQuery), and noConfidentResults. A new
min_relevance param drops weak results and, when it empties the set,
returns retryHints + noConfidentResults instead of low-quality filler.
The search trace gains an additive "response" block recording the same
decisions. Defaults are unchanged, so existing callers are unaffected.

This is Phase 0 of the corrective-retrieval / query-decomposition work
(see mimir/plans/huginn-muninn-corrective-rag.md). Huginn stays a pure
retriever; the agentic corrective loop lives in Muninn, which will
consume this contract — but it needs Huginn to expose a cheap, no-LLM
quality signal first. confidenceBand thresholds are tied to the
searcher's existing LOW_CONFIDENCE_THRESHOLD so "low" means the
reranker's own noise zone.

- confidence_band() in search_response_formatter; shape_search_results tags every result
- GraphSearchAugmenter.get_retry_hints() + _broaden_query() (works with no graph)
- SearchTrace.set_response_meta(); optional "response" key, no schema bump
- min_relevance threaded through build_search_tool_fn and the HTTP adapter's 4 signature variants
- 22 new tests; full suite (636) green; live-checked against a wiki-collection server
@RuneLind
Copy link
Copy Markdown
Owner Author

Smoke test — green ✅

Server PID 1862 started 20:16, after commit 030d49c (20:13) — running the new code. /health lists all 12 collections.

Direct /api/search (brief + full):

  • New fields present and correct: bestScore (top-level), per-result confidenceBand, traceId (trace env on).
  • confidenceBand honest about rerank state: brief-mode (rank-based) results → medium (never high); a Norwegian query that does rerank (q=hva er lovvalg, nav-wiki) → top hit relevance: 0.999, confidenceBand: high, second 0.73 → high (≥0.65 threshold).
  • relevance rounded to 3 decimals; breadcrumb, metadata, graph_context, modifiedTime, matchedChunks[].relevance all intact. snippet brief-only, matchedChunks full-only.
  • grep -c "_score\|_reranked"0 in both modes (no internal-field leak).

min_relevance path: ?q=knowledge+graph+rag&min_relevance=0.95{"results":[],"bestScore":0.75,"noConfidentResults":true,"retryHints":{"detectedEntities":["RAG","Knowledge Graph","Graph RAG"],"relatedTerms":["Supabase","Pinecone",…]}}.

HTTP MCP adapter (knowledge_api_mcp_adapter._search_knowledge_impl, live): result lines render (NN% relevant · band); min_relevance empties → No results found ... *No confident match — try: related terms: …* + trace-url pointer. Bot-facing text shape preserved + augmented.

Caveat (known, documented in plan): brief mode skips the reranker, so bestScore/confidenceBand are rank-based there and don't trip the weak-result gate — brief-mode corrective decisions should lean on noConfidentResults + a snippet-reading grader, not bestScore. (Phase 1 / Muninn concern.)

Bot leg skipped: hivemind isn't active in this session, so no muninn peer to drive a bot search + read the muninn-db trace. Change is purely additive (new response keys, default-off param) and the two layers below the bot are verified above; the 22 new unit tests + full suite (636) are green.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant