diff --git a/README.md b/README.md
index 1fd26bb..21c2222 100644
--- a/README.md
+++ b/README.md
@@ -1,14 +1,42 @@
 # duh
 
-**Multi-model consensus engine** -- because one LLM opinion isn't enough.
+**The trust layer for AI applications.**
 
-duh asks multiple LLMs the same question, forces them to challenge each other's answers, and produces a single revised response that's stronger than any individual model could generate alone.
+duh is a multi-model consensus engine that sits between your application and LLM providers — arbitrating, verifying, and scoring AI outputs before they reach your users. Think of it as the Cloudflare of AI: a verification and routing layer that makes the models behind it trustworthy.
+
+## Why this exists
+
+Single-model answers are fragile. They hallucinate. They carry training bias. They give you no way to audit why a conclusion was reached. And if your only provider goes down or changes behavior, you're exposed.
+
+Models are commoditizing. The value is moving above the model layer — into orchestration, verification, and trust. duh captures that layer.
+
+The output isn't "an AI answer." It's confidence-scored analysis with adversarial fact-checking and preserved dissent. Every decision records who proposed what, who challenged it, what survived review, and what the dissenting positions were. This is transformative synthesis, not answer aggregation.
 
 ## What it does
 
-- **Proposes** -- The strongest available model answers your question
-- **Challenges** -- Other models find genuine flaws (forced disagreement, no sycophancy allowed)
-- **Revises** -- The proposer addresses every valid challenge and produces an improved answer
+```
+PROPOSE  -->  CHALLENGE  -->  REVISE  -->  COMMIT
+```
+
+1. **Propose** -- The strongest available model answers your question
+2. **Challenge** -- Other models find genuine flaws (forced disagreement, no sycophancy allowed)
+3. **Revise** -- The proposer addresses every valid challenge
+4. **Commit** -- Decision extracted with confidence score, intent classification, and preserved dissent
+
+Every step is stored. Every challenge is attributed. Every confidence score is domain-capped and calibrated against historical outcomes.
+
+## How to use it
+
+duh runs anywhere in your stack:
+
+| Interface | Use case |
+|-----------|----------|
+| **CLI** | `duh ask "question"` -- interactive consensus from the terminal |
+| **REST API** | `POST /api/ask` -- integrate into any application, any language |
+| **WebSocket** | Real-time streaming -- watch models debate live |
+| **Python client** | `pip install duh-client` -- async and sync wrappers |
+| **MCP server** | `duh mcp` -- AI agent integration via Model Context Protocol |
+| **Web UI** | `duh serve` -- consensus streaming, thread browser, 3D decision space |
 
 ## Quick start
 
@@ -25,75 +53,63 @@ Or use a `.env` file (see `.env.example`).
 
 ## Features
 
-- **Multi-model consensus** -- Claude, GPT, Gemini, Mistral, and Perplexity debate. Sycophantic challenges are detected and flagged.
-- **Web UI** -- Real-time consensus streaming, thread browser, 3D decision space, calibration dashboard. `duh serve` serves both API and frontend.
-- **Epistemic confidence** -- Rigor scoring + domain-capped confidence. Calibration analysis with ECE tracking.
-- **Authentication** -- JWT auth with user accounts, RBAC (admin/contributor/viewer), password reset via email.
-- **Voting protocol** -- Fan out to all models in parallel, aggregate answers via majority or weighted synthesis.
+### Consensus & reasoning
+- **Multi-model consensus** -- Claude, GPT, Gemini, Mistral, and Perplexity debate. Sycophantic challenges detected and flagged.
+- **Voting protocol** -- Fan out to all models in parallel, aggregate via majority or weighted synthesis.
 - **Query decomposition** -- Break complex questions into subtask DAGs, solve in parallel, synthesize results.
-- **REST API** -- Full HTTP API with API key auth, rate limiting, WebSocket streaming, and Prometheus metrics.
-- **MCP server** -- AI agent integration via `duh mcp` (Model Context Protocol).
-- **Python client** -- Async and sync client library for the REST API (`pip install duh-client`).
-- **Batch processing** -- Process multiple questions from a file (`duh batch`).
-- **Export** -- Export threads as JSON, Markdown, or PDF (`duh export`).
+- **Protocol auto-selection** -- Classifies your question and routes to consensus (reasoning) or voting (judgment) automatically.
+- **Question refinement** -- Pre-consensus clarification step catches ambiguous questions before they waste model calls.
+- **Convergence detection** -- Early exit when challenges repeat (Jaccard similarity >= 0.7). No wasted rounds.
+
+### Trust & verification
+- **Epistemic confidence** -- Rigor scoring (0.5-1.0) + domain-capped confidence (factual 95%, technical 90%, creative 85%, judgment 80%, strategic 70%). Calibrated against historical outcomes via ECE tracking.
+- **Sycophancy detection** -- Identifies deference markers in challenges. Rubber-stamp agreements are flagged, not counted.
+- **Preserved dissent** -- Minority positions are extracted and attributed by model. Disagreement is a feature, not a bug.
 - **Decision taxonomy** -- Auto-classify decisions by intent, category, and genus for structured recall.
-- **Outcome tracking** -- Record success/failure/partial feedback on past decisions.
-- **Tool-augmented reasoning** -- Models can call web search, read files, and execute code during consensus.
-- **Persistent memory** -- SQLite or PostgreSQL. Every thread, contribution, decision, vote, and subtask stored. Search with `duh recall`.
-- **Backup & restore** -- `duh backup` / `duh restore` with merge mode for SQLite and JSON export.
-- **Cost tracking** -- Per-model token costs in real-time. Configurable warn threshold and hard limit.
-- **Local models** -- Ollama and LM Studio via the OpenAI-compatible API. Mix cloud + local.
-- **Rich CLI** -- Styled panels, spinners, and formatted output.
+- **Outcome tracking** -- Record success/failure/partial feedback. Calibration improves over time.
 
-## Commands
+### Grounding & tools
+- **Native web search** -- Anthropic, Google, Mistral, and Perplexity search server-side during consensus. Citations extracted, persisted, and displayed with domain grouping.
+- **Tool-augmented reasoning** -- Web search, file read, and code execution available to models during any phase.
+- **Citations** -- Deduplicated, grouped by hostname, attributed by phase (propose/challenge/revise). Displayed in CLI, Web UI, and API responses.
 
-```bash
-duh ask "question"                      # Run consensus query
-duh ask "question" --decompose          # Decompose into subtasks first
-duh ask "question" --protocol voting    # Use voting protocol instead
-duh ask "question" --protocol auto      # Auto-select protocol by question type
-duh ask "question" --tools              # Enable tool use (web search, file read, code exec)
-duh feedback <thread-id> --result success   # Record outcome for a decision
-duh recall "keyword"                    # Search past decisions
-duh threads                             # List past threads
-duh show <thread-id>                    # Inspect full debate history
-duh models                              # List available models
-duh cost                                # Show cumulative costs
-duh serve                               # Start REST API server
-duh serve --host 0.0.0.0 --port 9000   # Custom host/port
-duh mcp                                 # Start MCP server for AI agents
-duh batch questions.txt                 # Process multiple questions
-duh batch questions.jsonl --format json # Batch with JSON output
-duh export <thread-id>                  # Export thread as JSON
-duh export <thread-id> --format markdown # Export as Markdown
-duh export <thread-id> --format pdf     # Export as PDF
-duh backup ./backup.db                  # Backup database
-duh restore ./backup.db                 # Restore database
-duh calibration                         # Show confidence calibration
-duh user-create --email u@x.com --password ... # Create user
-duh user-list                           # List users
-```
-
-## How consensus works
+### Web UI
+- **Live consensus streaming** -- Watch models debate in real-time via WebSocket. Challengers stream in as they finish (parallel, not batched).
+- **Thread browser** -- Search, filter, and revisit past consensus threads with full debate history.
+- **3D decision space** -- Interactive scatter plot of decisions by confidence, rigor, and category. InstancedMesh handles 1000+ points.
+- **Calibration dashboard** -- ECE analysis, accuracy by confidence bucket, overall calibration rating.
+- **Shareable threads** -- Public share links for consensus results (no auth required).
+- **Executive overview** -- Auto-generated summary of key decision points after consensus completes.
+
+### Infrastructure
+- **17 models across 5 providers** -- Claude (Opus/Sonnet/Haiku), GPT (5.4/5.2/5 mini/o3), Gemini (3.1 Pro/3 Pro/3 Flash/2.5 Pro/2.5 Flash), Mistral (Large/Medium/Small/Codestral), Perplexity (Sonar/Sonar Pro/Reasoning Pro/Deep Research).
+- **Local models** -- Ollama and LM Studio via the OpenAI-compatible API. Mix cloud and local in the same consensus.
+- **Authentication** -- JWT auth with user accounts, RBAC (admin/contributor/viewer), password reset via SMTP email.
+- **Persistent memory** -- SQLite or PostgreSQL. Every thread, turn, contribution, decision, vote, subtask, and citation stored.
+- **Cost tracking** -- Per-model token costs in real-time with warn thresholds and hard limits.
+- **Export** -- Threads as JSON, Markdown, or PDF. PDF includes TOC, bookmarks, provider-colored callout boxes, and confidence/rigor meters.
+- **Batch processing** -- Process multiple questions from a file with any protocol.
+- **Backup & restore** -- SQLite copy or JSON export, with merge mode for restores.
+
+## Protocols
+
+### Consensus (default)
 
 ```
 PROPOSE  -->  CHALLENGE  -->  REVISE  -->  COMMIT
 ```
 
-1. Strongest model proposes an answer
-2. Other models challenge with forced disagreement (4 framing types: flaw, alternative, risk, devil's advocate)
-3. Proposer revises, addressing each valid challenge
-4. Decision extracted with confidence score and preserved dissent
+Strongest model proposes. Others challenge with forced disagreement (4 framing types: flaw, alternative, risk, devil's advocate). Proposer revises, addressing each valid challenge. Decision extracted with confidence score and preserved dissent.
 
 Convergence detection (Jaccard similarity >= 0.7) stops early when challenges repeat.
 
-### Voting protocol
+### Voting
 
 ```
 FAN-OUT (all models)  -->  AGGREGATE (majority / weighted)
 ```
 
-All models answer independently in parallel. A meta-judge (strongest model) picks the best answer (majority) or synthesizes all answers weighted by capability (weighted).
+All models answer independently in parallel. A meta-judge picks the best answer (majority) or synthesizes all answers weighted by capability (weighted).
 
 ### Decomposition
 
@@ -101,7 +117,119 @@ All models answer independently in parallel. A meta-judge (strongest model) pick
 DECOMPOSE  -->  SCHEDULE (topological sort)  -->  SYNTHESIZE
 ```
 
-Complex questions are broken into a subtask DAG. Independent subtasks run in parallel. Results are synthesized into a final answer by the strongest model.
+Complex questions are broken into a subtask DAG. Independent subtasks run in parallel. Results synthesized by the strongest model.
+
+## Commands
+
+### Consensus
+
+```bash
+duh ask "question"                        # Run consensus (default protocol)
+duh ask "question" --refine               # Clarify ambiguous questions first
+duh ask "question" --decompose            # Decompose into subtasks first
+duh ask "question" --protocol voting      # Use voting protocol
+duh ask "question" --protocol auto        # Auto-select by question type
+duh ask "question" --tools                # Enable tool use (on by default)
+duh ask "question" --no-tools             # Disable tool use
+duh ask "question" --rounds 5             # Override max consensus rounds
+duh ask "question" --proposer anthropic:claude-opus-4-6   # Override proposer
+duh ask "question" --challengers openai:gpt-5.4,google:gemini-3.1-pro  # Override challengers
+duh ask "question" --panel anthropic:claude-opus-4-6,openai:gpt-5.4    # Restrict model panel
+```
+
+### Memory & recall
+
+```bash
+duh recall "keyword"                      # Search past decisions
+duh recall "keyword" --limit 20           # Limit results
+duh threads                               # List past threads
+duh threads --status complete --limit 50  # Filter by status
+duh show <thread-id>                      # Full debate history (prefix match OK)
+duh feedback <id> --result success        # Record outcome
+duh feedback <id> --result failure --notes "..."  # With notes
+```
+
+### Export & data
+
+```bash
+duh export <thread-id>                    # Export as JSON (default)
+duh export <thread-id> --format markdown  # Export as Markdown
+duh export <thread-id> --format pdf -o report.pdf  # Export as PDF
+duh export <thread-id> --content decision # Decision only (vs full)
+duh export <thread-id> --no-dissent       # Suppress dissent section
+duh backup ./backup.db                    # Backup database
+duh backup ./backup.json --format json    # Backup as JSON
+duh restore ./backup.db                   # Restore (replace)
+duh restore ./backup.db --merge           # Restore (merge with existing)
+```
+
+### Models & cost
+
+```bash
+duh models                                # List all available models
+duh cost                                  # Cumulative cost breakdown by model
+```
+
+### Calibration
+
+```bash
+duh calibration                           # Confidence calibration analysis
+duh calibration --category technical      # Filter by category
+duh calibration --since 2026-01-01        # Filter by date range
+```
+
+### Server & integrations
+
+```bash
+duh serve                                 # Start REST API + Web UI
+duh serve --host 0.0.0.0 --port 9000     # Custom host/port
+duh serve --reload                        # Auto-reload for development
+duh mcp                                   # Start MCP server for AI agents
+duh batch questions.txt                   # Batch consensus (text file)
+duh batch questions.jsonl --format json   # Batch with JSON output
+duh batch questions.txt --protocol voting # Batch with voting protocol
+```
+
+### User management
+
+```bash
+duh user-create --email u@x.com --password ...  # Create user
+duh user-list                             # List users
+```
+
+## REST API
+
+```
+POST /api/ask              Consensus query (any protocol)
+POST /api/refine           Analyze question for ambiguity
+POST /api/enrich           Rewrite question with clarifications
+GET  /api/threads          List threads (filter by status)
+GET  /api/threads/:id      Thread with full debate history + citations
+GET  /api/share/:token     Public thread view (no auth)
+GET  /api/threads/:id/export  Export as PDF or Markdown
+GET  /api/recall           Search past decisions
+POST /api/feedback         Record outcome
+GET  /api/models           List available models
+GET  /api/cost             Cost breakdown by model
+GET  /api/calibration      Confidence calibration analysis
+GET  /api/decisions/space  Decision space data (3D viz)
+WS   /ws/ask               Stream consensus in real-time
+```
+
+API key auth, rate limiting, and JWT authentication included. Full reference: [docs/api-reference.md](docs/api-reference.md).
+
+## Supported models
+
+| Provider | Models | Context | Notes |
+|----------|--------|---------|-------|
+| **Anthropic** | Claude Opus 4.6, Sonnet 4.6, Sonnet 4.5, Haiku 4.5 | 200K | Native web search |
+| **OpenAI** | GPT-5.4, GPT-5.2, GPT-5 mini, o3 | 200K-1M | Search on select models |
+| **Google** | Gemini 3.1 Pro, 3 Pro, 3 Flash, 2.5 Pro, 2.5 Flash | 1M | Native grounding search |
+| **Mistral** | Large, Medium, Small, Codestral | 128-256K | Native web search |
+| **Perplexity** | Sonar, Sonar Pro, Reasoning Pro, Deep Research | 128-200K | Always searches (challenger-only) |
+| **Local** | Any Ollama or LM Studio model | Varies | Via OpenAI-compatible API |
+
+Set API keys as environment variables or in `.env`. Models are auto-discovered from available keys.
 
 ## Phase 0 benchmark
 
@@ -125,10 +253,14 @@ Full documentation: [docs/](docs/index.md)
 - [Authentication](docs/guides/authentication.md)
 - [Config Reference](docs/reference/config-reference.md)
 
+## Hosted service
+
+**[duh.bot](https://duh.bot)** -- commercial hosted consensus. Pay-per-question, no infrastructure to manage. Same engine, managed for you.
+
 ## Sponsor
 
 If duh is useful to you, consider [sponsoring the project](https://github.com/sponsors/msitarzewski).
 
 ## License
 
-[AGPL-3.0](LICENSE)
+[AGPL-3.0](LICENSE) -- Run it yourself (open source) or use the hosted service at [duh.bot](https://duh.bot).
diff --git a/src/duh/api/routes/ask.py b/src/duh/api/routes/ask.py
index 4768669..3dd634b 100644
--- a/src/duh/api/routes/ask.py
+++ b/src/duh/api/routes/ask.py
@@ -104,7 +104,15 @@ async def _handle_consensus(  # type: ignore[no-untyped-def]
     from duh.cli.app import _run_consensus
 
     use_native_search = config.tools.enabled and config.tools.web_search.native
-    decision, confidence, rigor, dissent, cost, _overview = await _run_consensus(
+    (
+        decision,
+        confidence,
+        rigor,
+        dissent,
+        cost,
+        _overview,
+        _citations,
+    ) = await _run_consensus(
         body.question,
         config,
         pm,
@@ -176,9 +184,15 @@ async def _handle_decompose(body: AskRequest, config, pm) -> AskResponse:  # typ
     if len(subtask_specs) == 1:
         from duh.cli.app import _run_consensus
 
-        decision, confidence, rigor, dissent, cost, _overview = await _run_consensus(
-            body.question, config, pm
-        )
+        (
+            decision,
+            confidence,
+            rigor,
+            dissent,
+            cost,
+            _overview,
+            _citations,
+        ) = await _run_consensus(body.question, config, pm)
         return AskResponse(
             decision=decision,
             confidence=confidence,
diff --git a/src/duh/cli/app.py b/src/duh/cli/app.py
index 1decb4d..c0bf7e7 100644
--- a/src/duh/cli/app.py
+++ b/src/duh/cli/app.py
@@ -210,10 +210,12 @@ async def _run_consensus(
     proposer_override: str | None = None,
     challengers_override: list[str] | None = None,
     web_search: bool = False,
-) -> tuple[str, float, float, str | None, float, str | None]:
+) -> tuple[
+    str, float, float, str | None, float, str | None, list[dict[str, str | None]]
+]:
     """Run the full consensus loop.
 
-    Returns (decision, confidence, rigor, dissent, total_cost, overview).
+    Returns (decision, confidence, rigor, dissent, total_cost, overview, citations).
     """
     from duh.consensus.convergence import check_convergence
     from duh.consensus.handlers import (
@@ -332,6 +334,17 @@ async def _run_consensus(
     if display and ctx.tool_calls_log:
         display.show_tool_use(ctx.tool_calls_log)
 
+    # Collect all citations across rounds
+    all_citations: list[dict[str, str | None]] = []
+    for rr in ctx.round_history:
+        all_citations.extend(rr.proposal_citations)
+        for ch in rr.challenges:
+            all_citations.extend(ch.citations)
+    # Include current round (may not be archived yet)
+    all_citations.extend(ctx.proposal_citations)
+    for ch in ctx.challenges:
+        all_citations.extend(ch.citations)
+
     return (
         ctx.decision or "",
         ctx.confidence,
@@ -339,6 +352,7 @@ async def _run_consensus(
         ctx.dissent,
         pm.total_cost,
         ctx.overview,
+        all_citations,
     )
 
 
@@ -490,7 +504,7 @@ def ask(
         _error(str(e))
         return  # unreachable
 
-    decision, confidence, rigor, dissent, cost, overview = result
+    decision, confidence, rigor, dissent, cost, overview, citations = result
 
     from duh.cli.display import ConsensusDisplay
 
@@ -498,6 +512,7 @@ def ask(
     display.show_final_decision(
         decision, confidence, rigor, cost, dissent, overview=overview
     )
+    display.show_citations(citations)
 
 
 async def _refine_question(question: str, config: DuhConfig) -> str:
@@ -532,7 +547,9 @@ async def _ask_async(
     panel: list[str] | None = None,
     proposer_override: str | None = None,
     challengers_override: list[str] | None = None,
-) -> tuple[str, float, float, str | None, float, str | None]:
+) -> tuple[
+    str, float, float, str | None, float, str | None, list[dict[str, str | None]]
+]:
     """Async implementation for the ask command."""
     from duh.cli.display import ConsensusDisplay
 
@@ -641,12 +658,19 @@ async def _ask_auto_async(
 
         display = ConsensusDisplay()
         display.start()
-        decision, confidence, rigor, dissent, cost, overview = await _run_consensus(
-            question, config, pm, display=display
-        )
+        (
+            decision,
+            confidence,
+            rigor,
+            dissent,
+            cost,
+            overview,
+            citations,
+        ) = await _run_consensus(question, config, pm, display=display)
         display.show_final_decision(
             decision, confidence, rigor, cost, dissent, overview=overview
         )
+        display.show_citations(citations)
 
 
 async def _ask_decompose_async(
@@ -719,10 +743,11 @@ async def _ask_decompose_async(
     # Single-subtask optimization: skip synthesis
     if len(subtask_specs) == 1:
         result = await _run_consensus(question, config, pm, display=display)
-        decision, confidence, rigor, dissent, cost, overview = result
+        decision, confidence, rigor, dissent, cost, overview, citations = result
         display.show_final_decision(
             decision, confidence, rigor, cost, dissent, overview=overview
         )
+        display.show_citations(citations)
         await engine.dispose()
         return
 
@@ -2371,6 +2396,7 @@ async def _batch_async(
                     _dissent,
                     _cost,
                     _overview,
+                    _citations,
                 ) = await _run_consensus(question, config, pm)
 
             q_cost = pm.total_cost - cost_before
diff --git a/src/duh/cli/display.py b/src/duh/cli/display.py
index ee2132c..500b6cb 100644
--- a/src/duh/cli/display.py
+++ b/src/duh/cli/display.py
@@ -357,6 +357,61 @@ def show_tool_use(self, tool_calls_log: list[dict[str, str]]) -> None:
             )
         )
 
+    # ── Citations ──────────────────────────────────────────────
+
+    def show_citations(
+        self,
+        citations: Sequence[dict[str, str | None]],
+    ) -> None:
+        """Display deduplicated citations grouped by hostname."""
+        if not citations:
+            return
+
+        from urllib.parse import urlparse
+
+        # Deduplicate by URL
+        seen: set[str] = set()
+        unique: list[dict[str, str | None]] = []
+        for c in citations:
+            url = c.get("url") or ""
+            if url and url not in seen:
+                seen.add(url)
+                unique.append(c)
+
+        if not unique:
+            return
+
+        # Group by hostname
+        groups: dict[str, list[dict[str, str | None]]] = {}
+        for c in unique:
+            url = c.get("url") or ""
+            try:
+                host = urlparse(url).netloc or url
+            except Exception:
+                host = url
+            groups.setdefault(host, []).append(c)
+
+        # Sort groups by count descending
+        sorted_groups = sorted(groups.items(), key=lambda kv: len(kv[1]), reverse=True)
+
+        parts: list[str] = []
+        idx = 1
+        for host, group in sorted_groups:
+            for c in group:
+                title = c.get("title") or host
+                url = c.get("url") or ""
+                parts.append(f"  [{idx}] {title}\n      {url}")
+                idx += 1
+
+        body = "\n".join(parts)
+        self._console.print(
+            Panel(
+                body,
+                title=f"[bold cyan]Sources[/bold cyan] ({len(unique)})",
+                border_style="cyan",
+            )
+        )
+
     # ── Final output ──────────────────────────────────────────
 
     def show_final_decision(
diff --git a/src/duh/mcp/server.py b/src/duh/mcp/server.py
index 7e6de1a..53b3e9b 100644
--- a/src/duh/mcp/server.py
+++ b/src/duh/mcp/server.py
@@ -135,9 +135,15 @@ async def _handle_ask(args: dict) -> list[TextContent]:  # type: ignore[type-arg
             )
         ]
     else:
-        decision, confidence, rigor, dissent, cost, _overview = await _run_consensus(
-            question, config, pm
-        )
+        (
+            decision,
+            confidence,
+            rigor,
+            dissent,
+            cost,
+            _overview,
+            _citations,
+        ) = await _run_consensus(question, config, pm)
         return [
             TextContent(
                 type="text",
diff --git a/tests/unit/test_cli.py b/tests/unit/test_cli.py
index 0679876..4158647 100644
--- a/tests/unit/test_cli.py
+++ b/tests/unit/test_cli.py
@@ -76,6 +76,7 @@ def test_displays_decision(
             None,
             0.0042,
             None,
+            [],
         )
 
         result = runner.invoke(cli, ["ask", "What database?"])
@@ -103,6 +104,7 @@ def test_displays_dissent(
             "[model-a]: PostgreSQL would be better for scale.",
             0.01,
             None,
+            [],
         )
 
         result = runner.invoke(cli, ["ask", "What database?"])
@@ -123,7 +125,7 @@ def test_no_dissent_when_none(
         from duh.config.schema import DuhConfig
 
         mock_config.return_value = DuhConfig()
-        mock_run.return_value = ("Answer.", 1.0, 1.0, None, 0.0, None)
+        mock_run.return_value = ("Answer.", 1.0, 1.0, None, 0.0, None, [])
 
         result = runner.invoke(cli, ["ask", "Question?"])
 
@@ -142,7 +144,7 @@ def test_rounds_option(
 
         config = DuhConfig()
         mock_config.return_value = config
-        mock_run.return_value = ("Answer.", 1.0, 1.0, None, 0.0, None)
+        mock_run.return_value = ("Answer.", 1.0, 1.0, None, 0.0, None, [])
 
         result = runner.invoke(cli, ["ask", "--rounds", "5", "Question?"])
 
diff --git a/tests/unit/test_cli_batch.py b/tests/unit/test_cli_batch.py
index 85c0d19..df261e3 100644
--- a/tests/unit/test_cli_batch.py
+++ b/tests/unit/test_cli_batch.py
@@ -452,10 +452,18 @@ async def fake_consensus(
             pm: Any,
             display: Any = None,
             tool_registry: Any = None,
-        ) -> tuple[str, float, float, str | None, float, str | None]:
+        ) -> tuple[
+            str,
+            float,
+            float,
+            str | None,
+            float,
+            str | None,
+            list[dict[str, str | None]],
+        ]:
             nonlocal consensus_called
             consensus_called = True
-            return ("Use SQLite.", 0.85, 1.0, None, 0.01, None)
+            return ("Use SQLite.", 0.85, 1.0, None, 0.01, None, [])
 
         with (
             patch("duh.cli.app.load_config", return_value=config),
@@ -547,7 +555,7 @@ async def fake_consensus(
             display: Any = None,
             tool_registry: Any = None,
         ) -> tuple[str, float, float, str | None, float, str | None]:
-            return ("Answer.", 0.9, 1.0, None, 0.01, None)
+            return ("Answer.", 0.9, 1.0, None, 0.01, None, [])
 
         with (
             patch("duh.cli.app.load_config", return_value=config),
@@ -606,7 +614,7 @@ async def fake_consensus(
             call_count += 1
             if question == "Q2":
                 raise RuntimeError("Provider timeout")
-            return ("Answer.", 0.9, 1.0, None, 0.01, None)
+            return ("Answer.", 0.9, 1.0, None, 0.01, None, [])
 
         with (
             patch("duh.cli.app.load_config", return_value=config),
@@ -653,7 +661,7 @@ async def fake_consensus(
         ) -> tuple[str, float, float, str | None, float, str | None]:
             if question == "Q2":
                 raise RuntimeError("Model unavailable")
-            return ("Answer.", 0.9, 1.0, None, 0.01, None)
+            return ("Answer.", 0.9, 1.0, None, 0.01, None, [])
 
         with (
             patch("duh.cli.app.load_config", return_value=config),
diff --git a/tests/unit/test_cli_voting.py b/tests/unit/test_cli_voting.py
index d7dcc36..6ce5f85 100644
--- a/tests/unit/test_cli_voting.py
+++ b/tests/unit/test_cli_voting.py
@@ -147,7 +147,7 @@ def test_default_protocol_is_consensus(
         from duh.config.schema import DuhConfig
 
         mock_config.return_value = DuhConfig()
-        mock_run.return_value = ("Answer.", 1.0, 1.0, None, 0.0, None)
+        mock_run.return_value = ("Answer.", 1.0, 1.0, None, 0.0, None, [])
 
         result = runner.invoke(cli, ["ask", "Question?"])
         assert result.exit_code == 0
diff --git a/tests/unit/test_mcp_server.py b/tests/unit/test_mcp_server.py
index 9afee17..7cc2745 100644
--- a/tests/unit/test_mcp_server.py
+++ b/tests/unit/test_mcp_server.py
@@ -177,7 +177,7 @@ async def test_consensus_protocol(self) -> None:
             patch(
                 "duh.cli.app._run_consensus",
                 new_callable=AsyncMock,
-                return_value=("Use SQLite.", 0.9, 1.0, "minor dissent", 0.05, None),
+                return_value=("Use SQLite.", 0.9, 1.0, "minor dissent", 0.05, None, []),
             ),
         ):
             result = await _handle_ask({"question": "What DB?", "rounds": 2})