Add corrective-retrieval signal to search responses by RuneLind · Pull Request #36 · RuneLind/huginn

RuneLind · 2026-05-12T18:18:42Z

Search responses now carry the signal a client needs to decide whether to re-query: a response-level bestScore, a per-result confidenceBand, a retryHints object, noConfidentResults, and a new min_relevance filter param. Defaults are unchanged — existing callers are unaffected.

This is Phase 0 of the corrective-retrieval / query-decomposition work tracked in mimir/plans/huginn-muninn-corrective-rag.md. Huginn stays a pure retriever; the agentic corrective loop (CRAG-style grade-and-requery) and query decomposition live in Muninn — but Muninn needs Huginn to expose a cheap, no-LLM quality signal first. confidenceBand thresholds are tied to the searcher's existing LOW_CONFIDENCE_THRESHOLD, so a low band is the reranker's own noise zone; rank-based brief-mode results are capped at medium (never claim high).

confidence_band() + confidenceBand on every shaped result; bestScore on the response (taken before any filter, so callers can tell "found something below your bar" from "found nothing")
GraphSearchAugmenter.get_retry_hints() → detectedEntities / graph-neighbour relatedTerms / narrowerQuery / heuristic broaderQuery (the broadening works with no graph); emitted when results are empty or bestScore is weak
min_relevance param on /api/search, build_search_tool_fn, and the HTTP MCP adapter (all signature variants) — drops weak results; when it empties the set, returns retryHints + noConfidentResults instead of low-quality filler
SearchTrace.set_response_meta() — additive optional response block (no schema bump); HTTP MCP adapter renders (NN% relevant · band) + a compact retry-hint footer

Testing

Unit tests — 22 new (confidence_band boundaries, shape_search_results band tagging, _broaden_query heuristics, get_retry_hints with/without graph, mcp_search_tool corrective signal incl. trace block); full suite 636 passing
Manual testing locally — live-checked /api/search against a wiki-collection server: min_relevance=0.99 → {"results":[],"bestScore":0.75,"noConfidentResults":true,"retryHints":{...}}; normal queries unchanged (confidenceBand + graph_context present)
E2E / smoke against the full production server (run huginn-pr-smoke-test after restarting it on this code)

Notes

Branch was cut from local main, which was one commit ahead of origin/main — so this PR also carries the pre-existing 7f71a22 Fix extract_jira_graph import when run as a script.
Descoped to follow-ups (noted in the plan): retryHints.suggestedCollections (needs entity→collection attribution at graph-load), a reranked: false response flag (so callers know bestScore is rank-based in brief mode), and wiring min_relevance into the stdio MCP tool args.

Add a sys.path bootstrap to the repo root before importing main.utils.frontmatter, mirroring the pattern already used in cross_collection_gap_analysis.py. Commit a90d124 switched this script to import from the main package, but the daily Jira update runs it as `uv run scripts/knowledge_graph/extract_jira_graph.py`, which puts the script's directory on sys.path rather than the repo root. That made `import main` fail with ModuleNotFoundError and silently skipped the Jira graph re-extraction step in the daily job.

Search responses (HTTP /api/search, the MCP search tools, and the HTTP MCP adapter) now carry the signal a client needs to decide whether to re-query: a response-level bestScore, a per-result confidenceBand (high/medium/low; rank-based brief-mode results are capped at medium), a retryHints object (detectedEntities, graph-neighbour relatedTerms, narrowerQuery, a heuristic broaderQuery), and noConfidentResults. A new min_relevance param drops weak results and, when it empties the set, returns retryHints + noConfidentResults instead of low-quality filler. The search trace gains an additive "response" block recording the same decisions. Defaults are unchanged, so existing callers are unaffected. This is Phase 0 of the corrective-retrieval / query-decomposition work (see mimir/plans/huginn-muninn-corrective-rag.md). Huginn stays a pure retriever; the agentic corrective loop lives in Muninn, which will consume this contract — but it needs Huginn to expose a cheap, no-LLM quality signal first. confidenceBand thresholds are tied to the searcher's existing LOW_CONFIDENCE_THRESHOLD so "low" means the reranker's own noise zone. - confidence_band() in search_response_formatter; shape_search_results tags every result - GraphSearchAugmenter.get_retry_hints() + _broaden_query() (works with no graph) - SearchTrace.set_response_meta(); optional "response" key, no schema bump - min_relevance threaded through build_search_tool_fn and the HTTP adapter's 4 signature variants - 22 new tests; full suite (636) green; live-checked against a wiki-collection server

RuneLind · 2026-05-12T18:20:47Z

Smoke test — green ✅

Server PID 1862 started 20:16, after commit 030d49c (20:13) — running the new code. /health lists all 12 collections.

Direct /api/search (brief + full):

New fields present and correct: bestScore (top-level), per-result confidenceBand, traceId (trace env on).
confidenceBand honest about rerank state: brief-mode (rank-based) results → medium (never high); a Norwegian query that does rerank (q=hva er lovvalg, nav-wiki) → top hit relevance: 0.999, confidenceBand: high, second 0.73 → high (≥0.65 threshold).
relevance rounded to 3 decimals; breadcrumb, metadata, graph_context, modifiedTime, matchedChunks[].relevance all intact. snippet brief-only, matchedChunks full-only.
grep -c "_score\|_reranked" → 0 in both modes (no internal-field leak).

min_relevance path: ?q=knowledge+graph+rag&min_relevance=0.95 → {"results":[],"bestScore":0.75,"noConfidentResults":true,"retryHints":{"detectedEntities":["RAG","Knowledge Graph","Graph RAG"],"relatedTerms":["Supabase","Pinecone",…]}}.

HTTP MCP adapter (knowledge_api_mcp_adapter._search_knowledge_impl, live): result lines render (NN% relevant · band); min_relevance empties → No results found ... *No confident match — try: related terms: …* + trace-url pointer. Bot-facing text shape preserved + augmented.

Caveat (known, documented in plan): brief mode skips the reranker, so bestScore/confidenceBand are rank-based there and don't trip the weak-result gate — brief-mode corrective decisions should lean on noConfidentResults + a snippet-reading grader, not bestScore. (Phase 1 / Muninn concern.)

Bot leg skipped: hivemind isn't active in this session, so no muninn peer to drive a bot search + read the muninn-db trace. Change is purely additive (new response keys, default-off param) and the two layers below the bot are verified above; the 22 new unit tests + full suite (636) are green.

RuneLind added 2 commits May 12, 2026 10:03

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add corrective-retrieval signal to search responses#36

Add corrective-retrieval signal to search responses#36
RuneLind wants to merge 2 commits into
mainfrom
phase0-corrective-signal

RuneLind commented May 12, 2026

Uh oh!

RuneLind commented May 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

RuneLind commented May 12, 2026

Testing

Notes

Uh oh!

RuneLind commented May 12, 2026

Smoke test — green ✅

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant