diff --git a/README.md b/README.md index 1fd26bb..21c2222 100644 --- a/README.md +++ b/README.md @@ -1,14 +1,42 @@ # duh -**Multi-model consensus engine** -- because one LLM opinion isn't enough. +**The trust layer for AI applications.** -duh asks multiple LLMs the same question, forces them to challenge each other's answers, and produces a single revised response that's stronger than any individual model could generate alone. +duh is a multi-model consensus engine that sits between your application and LLM providers — arbitrating, verifying, and scoring AI outputs before they reach your users. Think of it as the Cloudflare of AI: a verification and routing layer that makes the models behind it trustworthy. + +## Why this exists + +Single-model answers are fragile. They hallucinate. They carry training bias. They give you no way to audit why a conclusion was reached. And if your only provider goes down or changes behavior, you're exposed. + +Models are commoditizing. The value is moving above the model layer — into orchestration, verification, and trust. duh captures that layer. + +The output isn't "an AI answer." It's confidence-scored analysis with adversarial fact-checking and preserved dissent. Every decision records who proposed what, who challenged it, what survived review, and what the dissenting positions were. This is transformative synthesis, not answer aggregation. ## What it does -- **Proposes** -- The strongest available model answers your question -- **Challenges** -- Other models find genuine flaws (forced disagreement, no sycophancy allowed) -- **Revises** -- The proposer addresses every valid challenge and produces an improved answer +``` +PROPOSE --> CHALLENGE --> REVISE --> COMMIT +``` + +1. **Propose** -- The strongest available model answers your question +2. **Challenge** -- Other models find genuine flaws (forced disagreement, no sycophancy allowed) +3. **Revise** -- The proposer addresses every valid challenge +4. **Commit** -- Decision extracted with confidence score, intent classification, and preserved dissent + +Every step is stored. Every challenge is attributed. Every confidence score is domain-capped and calibrated against historical outcomes. + +## How to use it + +duh runs anywhere in your stack: + +| Interface | Use case | +|-----------|----------| +| **CLI** | `duh ask "question"` -- interactive consensus from the terminal | +| **REST API** | `POST /api/ask` -- integrate into any application, any language | +| **WebSocket** | Real-time streaming -- watch models debate live | +| **Python client** | `pip install duh-client` -- async and sync wrappers | +| **MCP server** | `duh mcp` -- AI agent integration via Model Context Protocol | +| **Web UI** | `duh serve` -- consensus streaming, thread browser, 3D decision space | ## Quick start @@ -25,75 +53,63 @@ Or use a `.env` file (see `.env.example`). ## Features -- **Multi-model consensus** -- Claude, GPT, Gemini, Mistral, and Perplexity debate. Sycophantic challenges are detected and flagged. -- **Web UI** -- Real-time consensus streaming, thread browser, 3D decision space, calibration dashboard. `duh serve` serves both API and frontend. -- **Epistemic confidence** -- Rigor scoring + domain-capped confidence. Calibration analysis with ECE tracking. -- **Authentication** -- JWT auth with user accounts, RBAC (admin/contributor/viewer), password reset via email. -- **Voting protocol** -- Fan out to all models in parallel, aggregate answers via majority or weighted synthesis. +### Consensus & reasoning +- **Multi-model consensus** -- Claude, GPT, Gemini, Mistral, and Perplexity debate. Sycophantic challenges detected and flagged. +- **Voting protocol** -- Fan out to all models in parallel, aggregate via majority or weighted synthesis. - **Query decomposition** -- Break complex questions into subtask DAGs, solve in parallel, synthesize results. -- **REST API** -- Full HTTP API with API key auth, rate limiting, WebSocket streaming, and Prometheus metrics. -- **MCP server** -- AI agent integration via `duh mcp` (Model Context Protocol). -- **Python client** -- Async and sync client library for the REST API (`pip install duh-client`). -- **Batch processing** -- Process multiple questions from a file (`duh batch`). -- **Export** -- Export threads as JSON, Markdown, or PDF (`duh export`). +- **Protocol auto-selection** -- Classifies your question and routes to consensus (reasoning) or voting (judgment) automatically. +- **Question refinement** -- Pre-consensus clarification step catches ambiguous questions before they waste model calls. +- **Convergence detection** -- Early exit when challenges repeat (Jaccard similarity >= 0.7). No wasted rounds. + +### Trust & verification +- **Epistemic confidence** -- Rigor scoring (0.5-1.0) + domain-capped confidence (factual 95%, technical 90%, creative 85%, judgment 80%, strategic 70%). Calibrated against historical outcomes via ECE tracking. +- **Sycophancy detection** -- Identifies deference markers in challenges. Rubber-stamp agreements are flagged, not counted. +- **Preserved dissent** -- Minority positions are extracted and attributed by model. Disagreement is a feature, not a bug. - **Decision taxonomy** -- Auto-classify decisions by intent, category, and genus for structured recall. -- **Outcome tracking** -- Record success/failure/partial feedback on past decisions. -- **Tool-augmented reasoning** -- Models can call web search, read files, and execute code during consensus. -- **Persistent memory** -- SQLite or PostgreSQL. Every thread, contribution, decision, vote, and subtask stored. Search with `duh recall`. -- **Backup & restore** -- `duh backup` / `duh restore` with merge mode for SQLite and JSON export. -- **Cost tracking** -- Per-model token costs in real-time. Configurable warn threshold and hard limit. -- **Local models** -- Ollama and LM Studio via the OpenAI-compatible API. Mix cloud + local. -- **Rich CLI** -- Styled panels, spinners, and formatted output. +- **Outcome tracking** -- Record success/failure/partial feedback. Calibration improves over time. -## Commands +### Grounding & tools +- **Native web search** -- Anthropic, Google, Mistral, and Perplexity search server-side during consensus. Citations extracted, persisted, and displayed with domain grouping. +- **Tool-augmented reasoning** -- Web search, file read, and code execution available to models during any phase. +- **Citations** -- Deduplicated, grouped by hostname, attributed by phase (propose/challenge/revise). Displayed in CLI, Web UI, and API responses. -```bash -duh ask "question" # Run consensus query -duh ask "question" --decompose # Decompose into subtasks first -duh ask "question" --protocol voting # Use voting protocol instead -duh ask "question" --protocol auto # Auto-select protocol by question type -duh ask "question" --tools # Enable tool use (web search, file read, code exec) -duh feedback --result success # Record outcome for a decision -duh recall "keyword" # Search past decisions -duh threads # List past threads -duh show # Inspect full debate history -duh models # List available models -duh cost # Show cumulative costs -duh serve # Start REST API server -duh serve --host 0.0.0.0 --port 9000 # Custom host/port -duh mcp # Start MCP server for AI agents -duh batch questions.txt # Process multiple questions -duh batch questions.jsonl --format json # Batch with JSON output -duh export # Export thread as JSON -duh export --format markdown # Export as Markdown -duh export --format pdf # Export as PDF -duh backup ./backup.db # Backup database -duh restore ./backup.db # Restore database -duh calibration # Show confidence calibration -duh user-create --email u@x.com --password ... # Create user -duh user-list # List users -``` - -## How consensus works +### Web UI +- **Live consensus streaming** -- Watch models debate in real-time via WebSocket. Challengers stream in as they finish (parallel, not batched). +- **Thread browser** -- Search, filter, and revisit past consensus threads with full debate history. +- **3D decision space** -- Interactive scatter plot of decisions by confidence, rigor, and category. InstancedMesh handles 1000+ points. +- **Calibration dashboard** -- ECE analysis, accuracy by confidence bucket, overall calibration rating. +- **Shareable threads** -- Public share links for consensus results (no auth required). +- **Executive overview** -- Auto-generated summary of key decision points after consensus completes. + +### Infrastructure +- **17 models across 5 providers** -- Claude (Opus/Sonnet/Haiku), GPT (5.4/5.2/5 mini/o3), Gemini (3.1 Pro/3 Pro/3 Flash/2.5 Pro/2.5 Flash), Mistral (Large/Medium/Small/Codestral), Perplexity (Sonar/Sonar Pro/Reasoning Pro/Deep Research). +- **Local models** -- Ollama and LM Studio via the OpenAI-compatible API. Mix cloud and local in the same consensus. +- **Authentication** -- JWT auth with user accounts, RBAC (admin/contributor/viewer), password reset via SMTP email. +- **Persistent memory** -- SQLite or PostgreSQL. Every thread, turn, contribution, decision, vote, subtask, and citation stored. +- **Cost tracking** -- Per-model token costs in real-time with warn thresholds and hard limits. +- **Export** -- Threads as JSON, Markdown, or PDF. PDF includes TOC, bookmarks, provider-colored callout boxes, and confidence/rigor meters. +- **Batch processing** -- Process multiple questions from a file with any protocol. +- **Backup & restore** -- SQLite copy or JSON export, with merge mode for restores. + +## Protocols + +### Consensus (default) ``` PROPOSE --> CHALLENGE --> REVISE --> COMMIT ``` -1. Strongest model proposes an answer -2. Other models challenge with forced disagreement (4 framing types: flaw, alternative, risk, devil's advocate) -3. Proposer revises, addressing each valid challenge -4. Decision extracted with confidence score and preserved dissent +Strongest model proposes. Others challenge with forced disagreement (4 framing types: flaw, alternative, risk, devil's advocate). Proposer revises, addressing each valid challenge. Decision extracted with confidence score and preserved dissent. Convergence detection (Jaccard similarity >= 0.7) stops early when challenges repeat. -### Voting protocol +### Voting ``` FAN-OUT (all models) --> AGGREGATE (majority / weighted) ``` -All models answer independently in parallel. A meta-judge (strongest model) picks the best answer (majority) or synthesizes all answers weighted by capability (weighted). +All models answer independently in parallel. A meta-judge picks the best answer (majority) or synthesizes all answers weighted by capability (weighted). ### Decomposition @@ -101,7 +117,119 @@ All models answer independently in parallel. A meta-judge (strongest model) pick DECOMPOSE --> SCHEDULE (topological sort) --> SYNTHESIZE ``` -Complex questions are broken into a subtask DAG. Independent subtasks run in parallel. Results are synthesized into a final answer by the strongest model. +Complex questions are broken into a subtask DAG. Independent subtasks run in parallel. Results synthesized by the strongest model. + +## Commands + +### Consensus + +```bash +duh ask "question" # Run consensus (default protocol) +duh ask "question" --refine # Clarify ambiguous questions first +duh ask "question" --decompose # Decompose into subtasks first +duh ask "question" --protocol voting # Use voting protocol +duh ask "question" --protocol auto # Auto-select by question type +duh ask "question" --tools # Enable tool use (on by default) +duh ask "question" --no-tools # Disable tool use +duh ask "question" --rounds 5 # Override max consensus rounds +duh ask "question" --proposer anthropic:claude-opus-4-6 # Override proposer +duh ask "question" --challengers openai:gpt-5.4,google:gemini-3.1-pro # Override challengers +duh ask "question" --panel anthropic:claude-opus-4-6,openai:gpt-5.4 # Restrict model panel +``` + +### Memory & recall + +```bash +duh recall "keyword" # Search past decisions +duh recall "keyword" --limit 20 # Limit results +duh threads # List past threads +duh threads --status complete --limit 50 # Filter by status +duh show # Full debate history (prefix match OK) +duh feedback --result success # Record outcome +duh feedback --result failure --notes "..." # With notes +``` + +### Export & data + +```bash +duh export # Export as JSON (default) +duh export --format markdown # Export as Markdown +duh export --format pdf -o report.pdf # Export as PDF +duh export --content decision # Decision only (vs full) +duh export --no-dissent # Suppress dissent section +duh backup ./backup.db # Backup database +duh backup ./backup.json --format json # Backup as JSON +duh restore ./backup.db # Restore (replace) +duh restore ./backup.db --merge # Restore (merge with existing) +``` + +### Models & cost + +```bash +duh models # List all available models +duh cost # Cumulative cost breakdown by model +``` + +### Calibration + +```bash +duh calibration # Confidence calibration analysis +duh calibration --category technical # Filter by category +duh calibration --since 2026-01-01 # Filter by date range +``` + +### Server & integrations + +```bash +duh serve # Start REST API + Web UI +duh serve --host 0.0.0.0 --port 9000 # Custom host/port +duh serve --reload # Auto-reload for development +duh mcp # Start MCP server for AI agents +duh batch questions.txt # Batch consensus (text file) +duh batch questions.jsonl --format json # Batch with JSON output +duh batch questions.txt --protocol voting # Batch with voting protocol +``` + +### User management + +```bash +duh user-create --email u@x.com --password ... # Create user +duh user-list # List users +``` + +## REST API + +``` +POST /api/ask Consensus query (any protocol) +POST /api/refine Analyze question for ambiguity +POST /api/enrich Rewrite question with clarifications +GET /api/threads List threads (filter by status) +GET /api/threads/:id Thread with full debate history + citations +GET /api/share/:token Public thread view (no auth) +GET /api/threads/:id/export Export as PDF or Markdown +GET /api/recall Search past decisions +POST /api/feedback Record outcome +GET /api/models List available models +GET /api/cost Cost breakdown by model +GET /api/calibration Confidence calibration analysis +GET /api/decisions/space Decision space data (3D viz) +WS /ws/ask Stream consensus in real-time +``` + +API key auth, rate limiting, and JWT authentication included. Full reference: [docs/api-reference.md](docs/api-reference.md). + +## Supported models + +| Provider | Models | Context | Notes | +|----------|--------|---------|-------| +| **Anthropic** | Claude Opus 4.6, Sonnet 4.6, Sonnet 4.5, Haiku 4.5 | 200K | Native web search | +| **OpenAI** | GPT-5.4, GPT-5.2, GPT-5 mini, o3 | 200K-1M | Search on select models | +| **Google** | Gemini 3.1 Pro, 3 Pro, 3 Flash, 2.5 Pro, 2.5 Flash | 1M | Native grounding search | +| **Mistral** | Large, Medium, Small, Codestral | 128-256K | Native web search | +| **Perplexity** | Sonar, Sonar Pro, Reasoning Pro, Deep Research | 128-200K | Always searches (challenger-only) | +| **Local** | Any Ollama or LM Studio model | Varies | Via OpenAI-compatible API | + +Set API keys as environment variables or in `.env`. Models are auto-discovered from available keys. ## Phase 0 benchmark @@ -125,10 +253,14 @@ Full documentation: [docs/](docs/index.md) - [Authentication](docs/guides/authentication.md) - [Config Reference](docs/reference/config-reference.md) +## Hosted service + +**[duh.bot](https://duh.bot)** -- commercial hosted consensus. Pay-per-question, no infrastructure to manage. Same engine, managed for you. + ## Sponsor If duh is useful to you, consider [sponsoring the project](https://github.com/sponsors/msitarzewski). ## License -[AGPL-3.0](LICENSE) +[AGPL-3.0](LICENSE) -- Run it yourself (open source) or use the hosted service at [duh.bot](https://duh.bot). diff --git a/src/duh/api/routes/ask.py b/src/duh/api/routes/ask.py index 4768669..3dd634b 100644 --- a/src/duh/api/routes/ask.py +++ b/src/duh/api/routes/ask.py @@ -104,7 +104,15 @@ async def _handle_consensus( # type: ignore[no-untyped-def] from duh.cli.app import _run_consensus use_native_search = config.tools.enabled and config.tools.web_search.native - decision, confidence, rigor, dissent, cost, _overview = await _run_consensus( + ( + decision, + confidence, + rigor, + dissent, + cost, + _overview, + _citations, + ) = await _run_consensus( body.question, config, pm, @@ -176,9 +184,15 @@ async def _handle_decompose(body: AskRequest, config, pm) -> AskResponse: # typ if len(subtask_specs) == 1: from duh.cli.app import _run_consensus - decision, confidence, rigor, dissent, cost, _overview = await _run_consensus( - body.question, config, pm - ) + ( + decision, + confidence, + rigor, + dissent, + cost, + _overview, + _citations, + ) = await _run_consensus(body.question, config, pm) return AskResponse( decision=decision, confidence=confidence, diff --git a/src/duh/cli/app.py b/src/duh/cli/app.py index 1decb4d..c0bf7e7 100644 --- a/src/duh/cli/app.py +++ b/src/duh/cli/app.py @@ -210,10 +210,12 @@ async def _run_consensus( proposer_override: str | None = None, challengers_override: list[str] | None = None, web_search: bool = False, -) -> tuple[str, float, float, str | None, float, str | None]: +) -> tuple[ + str, float, float, str | None, float, str | None, list[dict[str, str | None]] +]: """Run the full consensus loop. - Returns (decision, confidence, rigor, dissent, total_cost, overview). + Returns (decision, confidence, rigor, dissent, total_cost, overview, citations). """ from duh.consensus.convergence import check_convergence from duh.consensus.handlers import ( @@ -332,6 +334,17 @@ async def _run_consensus( if display and ctx.tool_calls_log: display.show_tool_use(ctx.tool_calls_log) + # Collect all citations across rounds + all_citations: list[dict[str, str | None]] = [] + for rr in ctx.round_history: + all_citations.extend(rr.proposal_citations) + for ch in rr.challenges: + all_citations.extend(ch.citations) + # Include current round (may not be archived yet) + all_citations.extend(ctx.proposal_citations) + for ch in ctx.challenges: + all_citations.extend(ch.citations) + return ( ctx.decision or "", ctx.confidence, @@ -339,6 +352,7 @@ async def _run_consensus( ctx.dissent, pm.total_cost, ctx.overview, + all_citations, ) @@ -490,7 +504,7 @@ def ask( _error(str(e)) return # unreachable - decision, confidence, rigor, dissent, cost, overview = result + decision, confidence, rigor, dissent, cost, overview, citations = result from duh.cli.display import ConsensusDisplay @@ -498,6 +512,7 @@ def ask( display.show_final_decision( decision, confidence, rigor, cost, dissent, overview=overview ) + display.show_citations(citations) async def _refine_question(question: str, config: DuhConfig) -> str: @@ -532,7 +547,9 @@ async def _ask_async( panel: list[str] | None = None, proposer_override: str | None = None, challengers_override: list[str] | None = None, -) -> tuple[str, float, float, str | None, float, str | None]: +) -> tuple[ + str, float, float, str | None, float, str | None, list[dict[str, str | None]] +]: """Async implementation for the ask command.""" from duh.cli.display import ConsensusDisplay @@ -641,12 +658,19 @@ async def _ask_auto_async( display = ConsensusDisplay() display.start() - decision, confidence, rigor, dissent, cost, overview = await _run_consensus( - question, config, pm, display=display - ) + ( + decision, + confidence, + rigor, + dissent, + cost, + overview, + citations, + ) = await _run_consensus(question, config, pm, display=display) display.show_final_decision( decision, confidence, rigor, cost, dissent, overview=overview ) + display.show_citations(citations) async def _ask_decompose_async( @@ -719,10 +743,11 @@ async def _ask_decompose_async( # Single-subtask optimization: skip synthesis if len(subtask_specs) == 1: result = await _run_consensus(question, config, pm, display=display) - decision, confidence, rigor, dissent, cost, overview = result + decision, confidence, rigor, dissent, cost, overview, citations = result display.show_final_decision( decision, confidence, rigor, cost, dissent, overview=overview ) + display.show_citations(citations) await engine.dispose() return @@ -2371,6 +2396,7 @@ async def _batch_async( _dissent, _cost, _overview, + _citations, ) = await _run_consensus(question, config, pm) q_cost = pm.total_cost - cost_before diff --git a/src/duh/cli/display.py b/src/duh/cli/display.py index ee2132c..500b6cb 100644 --- a/src/duh/cli/display.py +++ b/src/duh/cli/display.py @@ -357,6 +357,61 @@ def show_tool_use(self, tool_calls_log: list[dict[str, str]]) -> None: ) ) + # ── Citations ────────────────────────────────────────────── + + def show_citations( + self, + citations: Sequence[dict[str, str | None]], + ) -> None: + """Display deduplicated citations grouped by hostname.""" + if not citations: + return + + from urllib.parse import urlparse + + # Deduplicate by URL + seen: set[str] = set() + unique: list[dict[str, str | None]] = [] + for c in citations: + url = c.get("url") or "" + if url and url not in seen: + seen.add(url) + unique.append(c) + + if not unique: + return + + # Group by hostname + groups: dict[str, list[dict[str, str | None]]] = {} + for c in unique: + url = c.get("url") or "" + try: + host = urlparse(url).netloc or url + except Exception: + host = url + groups.setdefault(host, []).append(c) + + # Sort groups by count descending + sorted_groups = sorted(groups.items(), key=lambda kv: len(kv[1]), reverse=True) + + parts: list[str] = [] + idx = 1 + for host, group in sorted_groups: + for c in group: + title = c.get("title") or host + url = c.get("url") or "" + parts.append(f" [{idx}] {title}\n {url}") + idx += 1 + + body = "\n".join(parts) + self._console.print( + Panel( + body, + title=f"[bold cyan]Sources[/bold cyan] ({len(unique)})", + border_style="cyan", + ) + ) + # ── Final output ────────────────────────────────────────── def show_final_decision( diff --git a/src/duh/mcp/server.py b/src/duh/mcp/server.py index 7e6de1a..53b3e9b 100644 --- a/src/duh/mcp/server.py +++ b/src/duh/mcp/server.py @@ -135,9 +135,15 @@ async def _handle_ask(args: dict) -> list[TextContent]: # type: ignore[type-arg ) ] else: - decision, confidence, rigor, dissent, cost, _overview = await _run_consensus( - question, config, pm - ) + ( + decision, + confidence, + rigor, + dissent, + cost, + _overview, + _citations, + ) = await _run_consensus(question, config, pm) return [ TextContent( type="text", diff --git a/tests/unit/test_cli.py b/tests/unit/test_cli.py index 0679876..4158647 100644 --- a/tests/unit/test_cli.py +++ b/tests/unit/test_cli.py @@ -76,6 +76,7 @@ def test_displays_decision( None, 0.0042, None, + [], ) result = runner.invoke(cli, ["ask", "What database?"]) @@ -103,6 +104,7 @@ def test_displays_dissent( "[model-a]: PostgreSQL would be better for scale.", 0.01, None, + [], ) result = runner.invoke(cli, ["ask", "What database?"]) @@ -123,7 +125,7 @@ def test_no_dissent_when_none( from duh.config.schema import DuhConfig mock_config.return_value = DuhConfig() - mock_run.return_value = ("Answer.", 1.0, 1.0, None, 0.0, None) + mock_run.return_value = ("Answer.", 1.0, 1.0, None, 0.0, None, []) result = runner.invoke(cli, ["ask", "Question?"]) @@ -142,7 +144,7 @@ def test_rounds_option( config = DuhConfig() mock_config.return_value = config - mock_run.return_value = ("Answer.", 1.0, 1.0, None, 0.0, None) + mock_run.return_value = ("Answer.", 1.0, 1.0, None, 0.0, None, []) result = runner.invoke(cli, ["ask", "--rounds", "5", "Question?"]) diff --git a/tests/unit/test_cli_batch.py b/tests/unit/test_cli_batch.py index 85c0d19..df261e3 100644 --- a/tests/unit/test_cli_batch.py +++ b/tests/unit/test_cli_batch.py @@ -452,10 +452,18 @@ async def fake_consensus( pm: Any, display: Any = None, tool_registry: Any = None, - ) -> tuple[str, float, float, str | None, float, str | None]: + ) -> tuple[ + str, + float, + float, + str | None, + float, + str | None, + list[dict[str, str | None]], + ]: nonlocal consensus_called consensus_called = True - return ("Use SQLite.", 0.85, 1.0, None, 0.01, None) + return ("Use SQLite.", 0.85, 1.0, None, 0.01, None, []) with ( patch("duh.cli.app.load_config", return_value=config), @@ -547,7 +555,7 @@ async def fake_consensus( display: Any = None, tool_registry: Any = None, ) -> tuple[str, float, float, str | None, float, str | None]: - return ("Answer.", 0.9, 1.0, None, 0.01, None) + return ("Answer.", 0.9, 1.0, None, 0.01, None, []) with ( patch("duh.cli.app.load_config", return_value=config), @@ -606,7 +614,7 @@ async def fake_consensus( call_count += 1 if question == "Q2": raise RuntimeError("Provider timeout") - return ("Answer.", 0.9, 1.0, None, 0.01, None) + return ("Answer.", 0.9, 1.0, None, 0.01, None, []) with ( patch("duh.cli.app.load_config", return_value=config), @@ -653,7 +661,7 @@ async def fake_consensus( ) -> tuple[str, float, float, str | None, float, str | None]: if question == "Q2": raise RuntimeError("Model unavailable") - return ("Answer.", 0.9, 1.0, None, 0.01, None) + return ("Answer.", 0.9, 1.0, None, 0.01, None, []) with ( patch("duh.cli.app.load_config", return_value=config), diff --git a/tests/unit/test_cli_voting.py b/tests/unit/test_cli_voting.py index d7dcc36..6ce5f85 100644 --- a/tests/unit/test_cli_voting.py +++ b/tests/unit/test_cli_voting.py @@ -147,7 +147,7 @@ def test_default_protocol_is_consensus( from duh.config.schema import DuhConfig mock_config.return_value = DuhConfig() - mock_run.return_value = ("Answer.", 1.0, 1.0, None, 0.0, None) + mock_run.return_value = ("Answer.", 1.0, 1.0, None, 0.0, None, []) result = runner.invoke(cli, ["ask", "Question?"]) assert result.exit_code == 0 diff --git a/tests/unit/test_mcp_server.py b/tests/unit/test_mcp_server.py index 9afee17..7cc2745 100644 --- a/tests/unit/test_mcp_server.py +++ b/tests/unit/test_mcp_server.py @@ -177,7 +177,7 @@ async def test_consensus_protocol(self) -> None: patch( "duh.cli.app._run_consensus", new_callable=AsyncMock, - return_value=("Use SQLite.", 0.9, 1.0, "minor dissent", 0.05, None), + return_value=("Use SQLite.", 0.9, 1.0, "minor dissent", 0.05, None, []), ), ): result = await _handle_ask({"question": "What DB?", "rounds": 2})