msitarzewski · msitarzewski · Mar 9, 2026 · Mar 9, 2026
diff --git a/.gitignore b/.gitignore
@@ -58,3 +58,6 @@ memory-bank/setup.md
 web/node_modules/
 web/dist/
 
+# npm wrapper
+npm/like-duh/node_modules/
+
diff --git a/alembic/env.py b/alembic/env.py
@@ -35,9 +35,17 @@ def _expand_url(section: dict[str, str]) -> dict[str, str]:
     return section
 
 
+def _resolve_url() -> str:
+    """Return database URL from env var, falling back to alembic.ini."""
+    env_url = os.environ.get("DUH_DATABASE_URL")
+    if env_url:
+        return env_url
+    return config.get_main_option("sqlalchemy.url") or ""
+
+
 def run_migrations_offline() -> None:
     """Run migrations in 'offline' mode."""
-    url = config.get_main_option("sqlalchemy.url")
+    url = _resolve_url()
     context.configure(
         url=url,
         target_metadata=target_metadata,
@@ -59,6 +67,8 @@ def do_run_migrations(connection) -> None:  # type: ignore[no-untyped-def]
 async def run_async_migrations() -> None:
     """Run migrations in 'online' mode with async engine."""
     section = _expand_url(config.get_section(config.config_ini_section, {}))
+    section["sqlalchemy.url"] = _resolve_url()
+    section = _expand_url(section)
     connectable = async_engine_from_config(
         section,
         prefix="sqlalchemy.",
@@ -74,6 +84,8 @@ async def run_async_migrations() -> None:
 def run_migrations_online() -> None:
     """Run migrations in 'online' mode (sync or async)."""
     section = _expand_url(config.get_section(config.config_ini_section, {}))
+    section["sqlalchemy.url"] = _resolve_url()
+    section = _expand_url(section)
     url = section.get("sqlalchemy.url", "")
 
     if _is_async_url(url):

diff --git a/memory-bank/activeContext.md b/memory-bank/activeContext.md
@@ -1,79 +1,84 @@
 # Active Context
 
-**Last Updated**: 2026-03-08
-**Current Phase**: `question-refinement` branch — pre-consensus question refinement, native web search, citations, tools-by-default
-**Next Action**: Branch in progress, uncommitted changes staged
+**Last Updated**: 2026-03-09
+**Current Phase**: Post PR #14 merge — follow-up questions, revision citations, CLI persistence, calibration filters, provider updates
+**Next Action**: Commit and push uncommitted work to new branch
 
-## Latest Work (2026-03-08)
+## Latest Work (2026-03-09)
 
-### Question Refinement
-- Pre-consensus clarification step: analyze question → ask clarifying questions → enrich with answers → proceed to consensus
-- `src/duh/consensus/refine.py` — `analyze_question()` + `enrich_question()`, uses MOST EXPENSIVE model (not cheapest)
-- API: `POST /api/refine` → `RefineResponse{needs_refinement, questions[]}`, `POST /api/enrich` → `EnrichResponse{enriched_question}`
-- CLI: `duh ask --refine "question"` — interactive `click.prompt()` loop, default `--no-refine`
-- Frontend: consensus store `'refining'` status, `submitQuestion` → refine → clarify → enrich → `startConsensus`
-- `RefinementPanel.tsx` — tabbed UI inside GlassPanel, checkmarks on answered tabs, Skip + Start Consensus buttons
-- Graceful fallback: any failure → proceed to consensus with original question
+### Follow-up Questions (new end-to-end feature)
+- `generate_followups()` in `src/duh/consensus/handlers.py:930` — uses cheapest model with JSON mode to suggest 3 follow-up questions after consensus completes
+- Prompt asks for different angles: deeper technical detail, practical implications, risks/edge cases, related decisions
+- `followups` field added to `ConsensusContext` in `machine.py`
+- `_run_consensus` returns 8-tuple now (was 7): `(decision, confidence, rigor, dissent, cost, overview, citations, followups)`
+- All callers updated: CLI ask, CLI auto, CLI decompose, CLI batch, REST API, WebSocket, MCP server
+- **Persistence**: `followups_json` TEXT column on Thread model + SQLite auto-migration in `ensure_schema()`
+- **Thread detail API**: returns `followups` parsed from `followups_json`
+- **WebSocket**: sends `followups` in `complete` event, persists via `_persist_consensus`
+- **Frontend**: `ConsensusNav` + `ThreadNav` show clickable follow-ups in Disclosure section
+  - Clicking a follow-up calls `submitQuestion()` to start a new consensus
+  - `consensus.ts` store: `followups` state, included in reset
+  - `types.ts`: `followups` on `ThreadDetail` and `WSComplete`
 
-### Native Provider Web Search
-- Providers use server-side search instead of DDG proxy when `config.tools.web_search.native` is true
-- `web_search: bool` param added to `ModelProvider.send()` protocol
-- Anthropic: `web_search_20250305` server tool in tools[]
-- Google: `GoogleSearch()` grounding (replaces function tools — can't coexist)
-- Mistral: `{"type": "web_search"}` appended to tools
-- OpenAI: `web_search_options={}` only for `_SEARCH_MODELS` set; others fall back to DDG
-- Perplexity: no-op (always searches natively)
-- `tool_augmented_send`: filters DDG `web_search` tool when native=True, passes flag to provider
+### Revision Citations (enhancement to existing citation system)
+- `revision_citations` field added to both `ConsensusContext` and `RoundResult` in `machine.py`
+- `handle_revise()` now accepts `tool_registry` + `web_search` params — enables tool-augmented revision with web search
+- `handle_revise()` extracts citations from response into `ctx.revision_citations`
+- `handle_propose()` now extracts `proposal_citations` directly in handler (moved from ws.py)
+- WebSocket sends revision citations in REVISE `phase_complete` event
+- `_persist_consensus` saves revision citations to DB as `citations_json` on reviser contribution
+- `ConsensusPanel.tsx` passes `revisionCitations` to REVISE phase card
+- `ConsensusNav.tsx` includes revision citations in Sources section (role: 'revise')
+- `_run_consensus` citation collection now includes revision citations from both round history and current round
 
-### Citations — Persisted + Domain-Grouped
-- `Citation` dataclass (url, title, snippet) on `ModelResponse.citations`
-- Extraction per provider: Anthropic (`web_search_tool_result`), Google (grounding metadata), Perplexity (`response.citations`)
-- **Persistence**: `citations_json` TEXT column on `Contribution` model, SQLite auto-migration via `ensure_schema()`
-- `proposal_citations` tracked on `ConsensusContext` → archived to `RoundResult` → persisted via `_persist_consensus`
-- Thread detail API returns `citations` on `ContributionResponse`
-- **Domain-grouped Sources nav**: ConsensusNav (live) + ThreadNav (stored) group citations by hostname
-  - Nested Disclosure: outer "Sources (17)" → inner "wikipedia.org (3)" → P/C/R role badges per citation
-  - P (green) = propose, C (amber) = challenge, R (blue) = revise
-- `CitationList` shared component for inline display below content
+### CLI Enhancements
+- Top-level `--rounds` and `--challengers` options on `cli()` group cascade to subcommands (subcommand wins if both set)
+- `_parse_challengers()` accepts either int count or comma-separated model refs (e.g. `3` or `openai:gpt-5,google:gemini-2.5-pro`)
+- `challenger_count` param flows through `_run_consensus` → `select_challengers(count=N)`
+- **CLI DB persistence**: new `persist_consensus()` function in `app.py` — CLI `ask` command now persists full consensus round history to DB (proposals, challenges, revisions, citations, decisions, overview, followups)
+- `_ask_async` creates DB factory via `_create_db()`, disposes engine in `finally` block
+- Top-level `--rounds` also cascades into `batch` subcommand
 
-### Anthropic Streaming + max_tokens
-- `AnthropicProvider.send()` now uses streaming internally via `_collect_stream()` — avoids 10-minute timeout
-- `max_tokens` bumped from 16384 → 32768 across all 6 handler defaults (propose, challenge, revise, commit, voting, decomposition)
-- Citations are part of the value — truncating them undermines trust
+### Calibration Date Filters (frontend)
+- `CalibrationDashboard.tsx`: category dropdown + since/until date inputs + Apply button
+- `INTENT_CATEGORIES` constant: `['factual', 'technical', 'creative', 'judgment', 'strategic']`
+- `calibration.ts` store: `since`/`until` state + `setSince`/`setUntil` setters, passed to API call
+- Store tests: 4 new tests for date filter state and API param passing
 
-### Parallel Challenge Streaming
-- `_stream_challenges()` in `ws.py` uses `asyncio.as_completed()` to send each challenge result to the frontend as it finishes
-- Previously: all challengers ran in parallel but results were batched after all completed
-- Now: first challenger to respond appears immediately in the UI
+### Provider Updates
+- **OpenAI**: `_REASONING_EFFORT_MODELS` set (gpt-5, gpt-5-mini, gpt-5-nano, gpt-5.2, gpt-5.4) — sends `reasoning_effort: "high"` when no function tools present (incompatible with tools on /v1/chat/completions)
+- **OpenAI**: also sends `reasoning_effort: "high"` in structured output path (`_send_structured`)
+- **OpenAI**: `gpt-5.2` added to `NO_TEMPERATURE_MODELS` in `catalog.py`
+- **Perplexity**: retry logic for `APIConnectionError` — 2 attempts, 1s delay between retries
+- **Perplexity**: `APIConnectionError` mapped to `ProviderTimeoutError`
 
-### Tools Enabled by Default
-- `web_search` tool wired through CLI, REST, and WebSocket paths by default
-- Provider tool format fix: `tool_augmented_send` builds generic `{name, description, parameters}` — each provider transforms to native format in `send()`
+### Infrastructure
+- `alembic/env.py`: `DUH_DATABASE_URL` env var overrides `alembic.ini` — `_resolve_url()` used in offline, online sync, and online async migration paths
+- `.gitignore`: `npm/like-duh/node_modules/` added
 
-### Sidebar UX
-- New-question button (Heroicons pencil-square) + collapsible sidebar toggle
-- Shell manages `desktopSidebarOpen` (default true) + `mobileSidebarOpen` separately
-- TopBar shows sidebar toggle when desktop sidebar collapsed or always on mobile
-
-### Test Results
-- 1641 Python tests + 194 Vitest tests (1835 total)
-- Build clean, all tests pass
+### Test Updates
+- All test files updated for 8-tuple `_run_consensus` return value
+- `test_cli_display.py`: new `TestShowCitations` class (8 tests — empty, single, dedup, grouping, sort, title fallback, no-url skip, numbered)
+- `test_cli_display.py`: new `TestShowFinalDecisionOverview` class (2 tests — shows/hides overview panel)
+- `test_cli_tools.py`: mock return values corrected from 4-tuple to 8-tuple
+- `test_providers_openai.py`: test switched from `gpt-5.2` to `gpt-4o` (since 5.2 now has special reasoning_effort behavior)
+- `stores.test.ts`: 4 new calibration date filter tests
+- `test_cli_batch.py`, `test_cli_voting.py`, `test_mcp_server.py`: 8-tuple updates
 
 ---
 
 ## Current State
 
-- **Branch `question-refinement`** — in progress, not yet merged
-- **1641 Python tests + 194 Vitest tests** (1835 total)
-- All previous features intact (v0.1–v0.6)
-- Prior work merged: z-index fix, GPT-5.4, .env docs, password reset
+- **Branch `main`** — uncommitted changes across 29 files (+828/-63)
+- All previous features intact (v0.1-v0.6, question-refinement PR #13, messaging-refinement PR #14)
+- Prior merged: question refinement, native web search, citations, tools-by-default, sidebar UX, README rewrite, CLI citation display
 
 ## Open Questions (Still Unresolved)
 
 - Licensing (MIT vs Apache 2.0)
 - Output licensing for multi-provider synthesized content
-- Vector search solution for SQLite (sqlite-vss vs ChromaDB vs FAISS) — v1.0 decision
+- Vector search solution for SQLite (sqlite-vss vs ChromaDB vs FAISS) -- v1.0 decision
 - Client library packaging: monorepo `client/` dir vs separate repo?
 - MCP server transport: stdio vs SSE vs streamable HTTP?
-- Hosted demo economics (try.duh.dev) — deferred to post-1.0
-- A2A protocol — deferred to post-1.0
+- Hosted demo economics (try.duh.dev) -- deferred to post-1.0
+- A2A protocol -- deferred to post-1.0
diff --git a/memory-bank/progress.md b/memory-bank/progress.md
@@ -4,9 +4,31 @@
 
 ---
 
-## Current State: Post v0.6.0 — `question-refinement` Branch In Progress
+## Current State: Post PR #14 — Follow-ups, Revision Citations, CLI Persistence
 
-### Question Refinement + Native Web Search + Citations (2026-03-08)
+### Follow-up Questions + Revision Citations + CLI Persistence + Provider Updates (2026-03-09)
+
+- **Follow-up questions**: `generate_followups()` uses cheapest model w/ JSON mode to suggest 3 follow-up questions after consensus
+  - `followups` on ConsensusContext, `followups_json` TEXT on Thread model + migration
+  - `_run_consensus` now returns 8-tuple (was 7, added `followups`)
+  - All callers updated: CLI, REST, WS, MCP, batch, decompose
+  - Frontend: clickable follow-ups in ConsensusNav + ThreadNav (Disclosure), triggers new consensus
+  - WS `complete` event includes `followups`, thread detail API returns them
+- **Revision citations**: `handle_revise()` now accepts `tool_registry` + `web_search`, extracts citations
+  - `revision_citations` on ConsensusContext + RoundResult, persisted to DB
+  - `handle_propose()` now extracts proposal_citations directly in handler
+  - WS sends revision citations in REVISE phase, ConsensusNav includes them in Sources
+- **CLI persistence**: new `persist_consensus()` in `app.py` — CLI `ask` saves full round history to DB
+  - `_ask_async` creates DB factory, disposes engine in finally block
+- **CLI enhancements**: top-level `--rounds` and `--challengers` cascade to subcommands
+  - `_parse_challengers()` accepts int count or comma-separated model refs
+- **Calibration date filters**: frontend category + since/until date inputs on CalibrationDashboard
+- **OpenAI**: `reasoning_effort: "high"` for GPT-5.x models (when no tools), gpt-5.2 in NO_TEMPERATURE_MODELS
+- **Perplexity**: retry logic for APIConnectionError (2 attempts, 1s delay)
+- **Alembic**: `DUH_DATABASE_URL` env var overrides alembic.ini
+- Tests: new TestShowCitations (8), TestShowFinalDecisionOverview (2), calibration date filter tests (4), all 8-tuple updates
+
+### Question Refinement + Native Web Search + Citations (2026-03-08, merged PR #13 + #14)
 
 - **Question refinement**: pre-consensus clarification step (analyze → clarify → enrich → consensus)
   - `src/duh/consensus/refine.py`, API routes (`/api/refine`, `/api/enrich`), CLI `--refine` flag
@@ -224,9 +246,18 @@ Phase 0 benchmark framework — fully functional, pilot-tested on 5 questions.
 | 2026-03-07 | GPT-5.4 added to model catalog (1M ctx, $2.50/$15.00, no-temperature) | Done |
 | 2026-03-07 | .env.example updated with provider API key placeholders | Done |
 | 2026-03-07 | README updated with all provider env vars | Done |
-| 2026-03-08 | Question refinement (analyze → clarify → enrich → consensus) | In Progress |
-| 2026-03-08 | Native provider web search (Anthropic/Google/Mistral/OpenAI/Perplexity) | In Progress |
-| 2026-03-08 | Citations extraction + frontend CitationList + ConsensusNav Sources | In Progress |
-| 2026-03-08 | Tools enabled by default (web_search wired through CLI/REST/WS) | In Progress |
-| 2026-03-08 | Provider tool format fix (generic → native transform per provider) | In Progress |
-| 2026-03-08 | Sidebar UX (new-question button, collapsible toggle) | In Progress |
+| 2026-03-08 | Question refinement (analyze → clarify → enrich → consensus) | Done (PR #13) |
+| 2026-03-08 | Native provider web search (Anthropic/Google/Mistral/OpenAI/Perplexity) | Done (PR #13) |
+| 2026-03-08 | Citations extraction + frontend CitationList + ConsensusNav Sources | Done (PR #13) |
+| 2026-03-08 | Tools enabled by default (web_search wired through CLI/REST/WS) | Done (PR #13) |
+| 2026-03-08 | Provider tool format fix (generic → native transform per provider) | Done (PR #13) |
+| 2026-03-08 | Sidebar UX (new-question button, collapsible toggle) | Done (PR #13) |
+| 2026-03-08 | README rewrite + CLI citation display (7-tuple _run_consensus) | Done (PR #14) |
+| 2026-03-09 | Follow-up questions (generate, persist, display, clickable) | In Progress |
+| 2026-03-09 | Revision citations (handle_revise with tools/search, persist, display) | In Progress |
+| 2026-03-09 | CLI DB persistence (persist_consensus, _ask_async DB factory) | In Progress |
+| 2026-03-09 | CLI top-level --rounds/--challengers cascade + _parse_challengers | In Progress |
+| 2026-03-09 | Calibration date filters (frontend category/since/until) | In Progress |
+| 2026-03-09 | OpenAI reasoning_effort for GPT-5.x, gpt-5.2 catalog | In Progress |
+| 2026-03-09 | Perplexity retry logic for APIConnectionError | In Progress |
+| 2026-03-09 | Alembic DUH_DATABASE_URL env var support | In Progress |
diff --git a/memory-bank/tasks/2026-03/README.md b/memory-bank/tasks/2026-03/README.md
@@ -9,6 +9,38 @@
 - Files: `mail.py`, `auth.py`, `schema.py`, `loader.py`, `LoginPage.tsx`, `ResetPasswordPage.tsx`, `TopBar.tsx`
 - See: [070307_password-reset.md](./070307_password-reset.md)
 
+## 2026-03-08: Question Refinement + Native Web Search + Citations (PR #13 + #14)
+- Pre-consensus question refinement: analyze → clarify → enrich → consensus
+- Native provider web search (Anthropic/Google/Mistral/OpenAI/Perplexity)
+- Citations: extraction per provider, persistence, domain-grouped Sources nav with P/C/R badges
+- Tools enabled by default (web_search wired through CLI, REST, WS)
+- Sidebar UX: new-question button + collapsible toggle
+- Anthropic streaming + parallel challenge streaming + max_tokens 32768
+- README rewrite: repositioned as AI infrastructure, CLI citation display
+- `_run_consensus` 7-tuple return (added citations)
+- 1641 Python + 194 Vitest tests (1835 total)
+- Files: refine.py, handlers.py, machine.py, ws.py, ask.py, threads.py, app.py, all providers, ConsensusNav.tsx, ThreadNav.tsx, CitationList.tsx, RefinementPanel.tsx, consensus.ts, types.ts
+
+## 2026-03-09: Follow-ups + Revision Citations + CLI Persistence + Provider Updates
+- **Follow-up questions**: `generate_followups()` — cheapest model, JSON mode, 3 questions post-consensus
+  - `followups` on ConsensusContext, `followups_json` on Thread model + migration
+  - `_run_consensus` 8-tuple return (added followups), all callers updated
+  - Frontend: clickable follow-ups in ConsensusNav + ThreadNav Disclosure, triggers new consensus
+- **Revision citations**: `handle_revise()` accepts tool_registry + web_search, extracts citations
+  - `revision_citations` on ConsensusContext + RoundResult, persisted to DB
+  - `handle_propose()` extracts proposal_citations directly in handler
+  - WS sends revision citations in REVISE phase, ConsensusPanel passes to phase card
+- **CLI persistence**: `persist_consensus()` saves full round history to DB from CLI
+  - `_ask_async` creates DB factory, disposes engine in finally
+- **CLI options**: top-level `--rounds`/`--challengers` cascade to subcommands
+  - `_parse_challengers()`: int count or comma-separated model refs
+- **Calibration filters**: category + since/until date inputs on CalibrationDashboard
+- **OpenAI**: `reasoning_effort: "high"` for GPT-5.x (no tools), gpt-5.2 in NO_TEMPERATURE_MODELS
+- **Perplexity**: retry for APIConnectionError (2 attempts, 1s delay)
+- **Alembic**: `DUH_DATABASE_URL` env var overrides alembic.ini
+- Tests: TestShowCitations (8), TestShowFinalDecisionOverview (2), calibration date tests (4), all 8-tuple updates
+- Files: handlers.py, machine.py, app.py, ws.py, ask.py, threads.py, models.py, migrations.py, mcp/server.py, openai.py, perplexity.py, catalog.py, alembic/env.py, CalibrationDashboard.tsx, ConsensusNav.tsx, ConsensusPanel.tsx, ThreadNav.tsx, calibration.ts, consensus.ts, types.ts, + 7 test files
+
 ## 2026-03-07: Z-index Fix + GPT-5.4 + .env Docs
 - Fixed z-index stacking contexts trapping dropdowns (Shell z-10, TopBar z-20 removed)
 - Added CSS z-index tokens (`--z-background`, `--z-dropdown`, `--z-overlay`, `--z-modal`)