Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
73 changes: 54 additions & 19 deletions memory-bank/activeContext.md
Original file line number Diff line number Diff line change
@@ -1,37 +1,72 @@
# Active Context

**Last Updated**: 2026-03-07
**Current Phase**: Post v0.6.0z-index fix + GPT-5.4 + .env improvements
**Next Action**: PR ready for review
**Last Updated**: 2026-03-08
**Current Phase**: `question-refinement` branchpre-consensus question refinement, native web search, citations, tools-by-default
**Next Action**: Branch in progress, uncommitted changes staged

## Latest Work (2026-03-07)
## Latest Work (2026-03-08)

### Z-index stacking context fix
- **Problem**: Nested stacking contexts (`z-10` on main content, `z-20` on TopBar header) trapped dropdowns inside containers. Account menu's `fixed inset-0 z-40` backdrop was meaningless outside its container.
- **Fix**: Removed unnecessary z-index values creating stacking contexts, added `isolate` to Shell root, defined z-index tokens in CSS (`--z-background`, `--z-dropdown`, `--z-overlay`, `--z-modal`), replaced backdrop hack with `useRef` + `mousedown` click-outside pattern (matching ExportMenu).
- Files: `duh-theme.css`, `Shell.tsx`, `TopBar.tsx`, `GridOverlay.tsx`, `ParticleField.tsx`, `ExportMenu.tsx`, `ConsensusComplete.tsx`, `ThreadDetail.tsx`
### Question Refinement
- Pre-consensus clarification step: analyze question → ask clarifying questions → enrich with answers → proceed to consensus
- `src/duh/consensus/refine.py` — `analyze_question()` + `enrich_question()`, uses MOST EXPENSIVE model (not cheapest)
- API: `POST /api/refine` → `RefineResponse{needs_refinement, questions[]}`, `POST /api/enrich` → `EnrichResponse{enriched_question}`
- CLI: `duh ask --refine "question"` — interactive `click.prompt()` loop, default `--no-refine`
- Frontend: consensus store `'refining'` status, `submitQuestion` → refine → clarify → enrich → `startConsensus`
- `RefinementPanel.tsx` — tabbed UI inside GlassPanel, checkmarks on answered tabs, Skip + Start Consensus buttons
- Graceful fallback: any failure → proceed to consensus with original question

### GPT-5.4 added to model catalog
- `gpt-5.4`: 1M context, 128K output, $2.50/$15.00 per MTok, no temperature (uses reasoning.effort)
- Added to `NO_TEMPERATURE_MODELS` set
- File: `src/duh/providers/catalog.py`
### Native Provider Web Search
- Providers use server-side search instead of DDG proxy when `config.tools.web_search.native` is true
- `web_search: bool` param added to `ModelProvider.send()` protocol
- Anthropic: `web_search_20250305` server tool in tools[]
- Google: `GoogleSearch()` grounding (replaces function tools — can't coexist)
- Mistral: `{"type": "web_search"}` appended to tools
- OpenAI: `web_search_options={}` only for `_SEARCH_MODELS` set; others fall back to DDG
- Perplexity: no-op (always searches natively)
- `tool_augmented_send`: filters DDG `web_search` tool when native=True, passes flag to provider

### .env improvements
- Added provider API key placeholders to `.env.example` (ANTHROPIC, OPENAI, GOOGLE, PERPLEXITY, MISTRAL)
- Updated README quick start with all provider env vars + `.env` reference
- Note: Google key env var is `GOOGLE_API_KEY` (not `GEMINI_API_KEY`)
### Citations — Persisted + Domain-Grouped
- `Citation` dataclass (url, title, snippet) on `ModelResponse.citations`
- Extraction per provider: Anthropic (`web_search_tool_result`), Google (grounding metadata), Perplexity (`response.citations`)
- **Persistence**: `citations_json` TEXT column on `Contribution` model, SQLite auto-migration via `ensure_schema()`
- `proposal_citations` tracked on `ConsensusContext` → archived to `RoundResult` → persisted via `_persist_consensus`
- Thread detail API returns `citations` on `ContributionResponse`
- **Domain-grouped Sources nav**: ConsensusNav (live) + ThreadNav (stored) group citations by hostname
- Nested Disclosure: outer "Sources (17)" → inner "wikipedia.org (3)" → P/C/R role badges per citation
- P (green) = propose, C (amber) = challenge, R (blue) = revise
- `CitationList` shared component for inline display below content

### Anthropic Streaming + max_tokens
- `AnthropicProvider.send()` now uses streaming internally via `_collect_stream()` — avoids 10-minute timeout
- `max_tokens` bumped from 16384 → 32768 across all 6 handler defaults (propose, challenge, revise, commit, voting, decomposition)
- Citations are part of the value — truncating them undermines trust

### Parallel Challenge Streaming
- `_stream_challenges()` in `ws.py` uses `asyncio.as_completed()` to send each challenge result to the frontend as it finishes
- Previously: all challengers ran in parallel but results were batched after all completed
- Now: first challenger to respond appears immediately in the UI

### Tools Enabled by Default
- `web_search` tool wired through CLI, REST, and WebSocket paths by default
- Provider tool format fix: `tool_augmented_send` builds generic `{name, description, parameters}` — each provider transforms to native format in `send()`

### Sidebar UX
- New-question button (Heroicons pencil-square) + collapsible sidebar toggle
- Shell manages `desktopSidebarOpen` (default true) + `mobileSidebarOpen` separately
- TopBar shows sidebar toggle when desktop sidebar collapsed or always on mobile

### Test Results
- 1603 Python tests + 185 Vitest tests (1788 total)
- 1641 Python tests + 194 Vitest tests (1835 total)
- Build clean, all tests pass

---

## Current State

- **Branch `ux-cleanup`** — z-index fix, GPT-5.4, .env docs
- **1603 Python tests + 185 Vitest tests** (1788 total)
- **Branch `question-refinement`** — in progress, not yet merged
- **1641 Python tests + 194 Vitest tests** (1835 total)
- All previous features intact (v0.1–v0.6)
- Prior work merged: z-index fix, GPT-5.4, .env docs, password reset

## Open Questions (Still Unresolved)

Expand Down
86 changes: 85 additions & 1 deletion memory-bank/decisions.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# Architectural Decisions

**Last Updated**: 2026-02-18
**Last Updated**: 2026-03-08

---

Expand Down Expand Up @@ -354,3 +354,87 @@
- Manual migration instructions in docs (user friction)
**Consequences**: File-based SQLite databases auto-migrate on startup. Zero friction for local users. PostgreSQL still requires `alembic upgrade head`. Lightweight and self-contained.
**References**: `src/duh/memory/migrations.py`, `src/duh/cli/app.py:107-110`

---

## 2026-03-08: Native Provider Web Search Over DDG Proxy

**Status**: Approved
**Context**: The original web search tool used DuckDuckGo as a proxy — every provider's tool calls went through DDG, which returned index pages rather than real content. Most major providers now offer server-side web search that returns higher-quality results with citations.
**Decision**: Add `web_search: bool` parameter to the `ModelProvider.send()` protocol. When `config.tools.web_search.native` is true, each provider uses its native search capability: Anthropic (`web_search_20250305` server tool), Google (`GoogleSearch()` grounding), Mistral (`{"type": "web_search"}`), OpenAI (`web_search_options`), Perplexity (always native). DDG proxy remains as fallback for providers/models that don't support native search.
**Alternatives**:
- DDG-only (simpler, but returns low-quality index pages instead of real content)
- Single search provider for all (e.g., Bing API — adds external dependency and API key)
- Remove web search entirely (loses grounding capability)
**Consequences**: Higher quality search results with real content. Citations extractable from provider responses. Each provider has different native search API shape — increases per-provider complexity. Google grounding and function declarations can't coexist (grounding replaces function tools).
**References**: `src/duh/providers/anthropic.py`, `src/duh/providers/google.py`, `src/duh/providers/mistral.py`, `src/duh/providers/openai.py`, `src/duh/tools/augmented_send.py`

---

## 2026-03-08: Question Refinement Uses Most Expensive Model

**Status**: Approved
**Context**: Question refinement analyzes user questions before consensus to determine if clarification is needed. The analysis quality directly impacts downstream consensus quality — a poorly refined question wastes all subsequent model calls.
**Decision**: `analyze_question()` and `enrich_question()` in `src/duh/consensus/refine.py` use the most expensive configured model (sorted by cost), not the cheapest. The refinement step is a single model call, so the cost difference is minimal compared to the full consensus round it precedes.
**Alternatives**:
- Cheapest model (saves tokens, but poor analysis leads to poor consensus)
- User-configurable refinement model (adds UX complexity)
- Multi-model refinement (overkill — single strong model is sufficient for question analysis)
**Consequences**: Better question analysis quality. Marginal cost increase (one extra expensive model call). Graceful fallback on failure — original question proceeds to consensus unchanged.
**References**: `src/duh/consensus/refine.py`, `src/duh/api/routes/ask.py`, `src/duh/cli/app.py`

---

## 2026-03-08: Tools Enabled by Default

**Status**: Approved
**Context**: Web search was originally opt-in. Users who didn't know about the `--tools` flag got ungrounded responses. Most queries benefit from web search grounding.
**Decision**: `web_search` tool is enabled by default across CLI, REST API, and WebSocket paths. The `config.tools.web_search` section controls behavior. Native provider search is preferred when available.
**Alternatives**:
- Opt-in only (simpler, but most users miss it)
- Always-on with no config (inflexible for cost-sensitive users)
- Per-question tool selection (too much UX friction)
**Consequences**: Better default experience — responses are grounded in current information. Slightly higher cost per query (search tool calls). Users can disable via config if needed.
**References**: `src/duh/config/schema.py`, `src/duh/cli/app.py`, `src/duh/api/routes/ws.py`

---

## 2026-03-08: Citation Persistence on Contributions

**Status**: Approved
**Context**: Citations were emitted over WebSocket during live consensus but never persisted. Viewing a thread later from the Threads section showed no sources — undermining the trust value of native web search.
**Decision**: Add `citations_json` TEXT column to the `Contribution` model (nullable, JSON-encoded list of `{url, title}`). Track `proposal_citations` on `ConsensusContext` and archive to `RoundResult`. Serialize and persist during `_persist_consensus`. Thread detail API returns parsed citations on `ContributionResponse`. ThreadNav shows domain-grouped sources matching ConsensusNav.
**Alternatives**:
- Separate Citation table with FK to Contribution (more normalized, but adds query complexity for marginal benefit)
- Store citations only on Decision (loses per-role attribution)
- Don't persist (simpler, but citations are essential to trust)
**Consequences**: Citations survive beyond the WebSocket session. Thread detail view shows sources grouped by domain with role attribution (P/C/R). SQLite auto-migration handles existing databases. Slightly larger DB rows due to JSON text.
**References**: `src/duh/memory/models.py:146`, `src/duh/api/routes/threads.py`, `src/duh/api/routes/ws.py`, `web/src/components/threads/ThreadNav.tsx`

---

## 2026-03-08: Anthropic Streaming Internally in send()

**Status**: Approved
**Context**: Increasing `max_tokens` to 32768 triggered Anthropic SDK's 10-minute timeout error: "Streaming is required for operations that may take longer than 10 minutes." The `send()` method used non-streaming `messages.create()`.
**Decision**: `send()` now calls `_collect_stream()` which uses `messages.stream()` as a context manager and collects the final `Message` via `get_final_message()`. The returned object is identical to `messages.create()` output, so all downstream parsing (citations, tool calls, text concatenation) works unchanged.
**Alternatives**:
- Keep non-streaming and lower max_tokens (loses citation content to truncation)
- Full streaming to frontend (larger change, separate concern)
- Increase Anthropic client timeout (fragile, doesn't scale)
**Consequences**: No more timeout errors at any max_tokens value. Test mocks must mock `messages.stream` context manager instead of `messages.create`. Marginal latency increase from stream overhead (negligible vs network time).
**References**: `src/duh/providers/anthropic.py:222-229`

---

## 2026-03-08: Parallel Challenge Streaming via as_completed

**Status**: Approved
**Context**: Challengers were already running in parallel via `asyncio.gather` in `handle_challenge`, but the WebSocket handler sent all results after ALL challengers finished. Users saw nothing until the slowest challenger responded.
**Decision**: New `_stream_challenges()` function in `ws.py` uses `asyncio.as_completed()` to send each challenge result to the frontend immediately as each completes. Builds `ChallengeResult` objects and updates `ctx.challenges` directly, bypassing `handle_challenge`.
**Alternatives**:
- Keep batched approach (simpler, but poor UX — users wait for slowest model)
- Token-level streaming per challenger (much more complex, requires protocol changes)
- Sequential challengers (defeats the purpose of multi-model)
**Consequences**: First challenger to respond appears immediately. More engaging real-time experience. WS test mocks now patch `_stream_challenges` instead of `handle_challenge`. Challenge order in UI reflects completion speed, not configuration order.
**References**: `src/duh/api/routes/ws.py:253-347`, `tests/unit/test_api_ws.py`
37 changes: 36 additions & 1 deletion memory-bank/progress.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,36 @@

---

## Current State: v0.6.0 — "It's Honest" COMPLETE
## Current State: Post v0.6.0 — `question-refinement` Branch In Progress

### Question Refinement + Native Web Search + Citations (2026-03-08)

- **Question refinement**: pre-consensus clarification step (analyze → clarify → enrich → consensus)
- `src/duh/consensus/refine.py`, API routes (`/api/refine`, `/api/enrich`), CLI `--refine` flag
- Frontend: `RefinementPanel.tsx` tabbed UI, consensus store `'refining'` status
- Graceful fallback on failure → original question proceeds to consensus
- **Native provider web search**: Anthropic/Google/Mistral/OpenAI/Perplexity use server-side search
- `web_search: bool` param on `ModelProvider.send()` protocol
- `config.tools.web_search.native` flag controls behavior
- DDG proxy still available as fallback for non-native providers
- **Citations**: `Citation` dataclass on `ModelResponse`, extracted per provider, displayed in frontend
- `CitationList` shared component, `ConsensusNav` collapsible Sources sidebar section
- WS events include `citations` array for PROPOSE and CHALLENGE phases
- **Tools enabled by default**: `web_search` (DuckDuckGo) wired through all paths (CLI, REST, WS)
- **Provider tool format fix**: each provider transforms generic tool defs to native API format
- **Sidebar UX**: new-question button + collapsible sidebar toggle
- **Citation persistence**: `citations_json` on Contribution model, SQLite migration, thread detail API returns citations
- **Domain-grouped Sources**: ConsensusNav + ThreadNav group citations by hostname with Disclosure, P/C/R role badges
- **Anthropic streaming**: `send()` uses `_collect_stream()` internally to avoid 10-min timeout on large max_tokens
- **Parallel challenge streaming**: `_stream_challenges()` sends each result to frontend as it completes via `asyncio.as_completed`
- **max_tokens 32768**: bumped from 16384 across all handlers — citations are essential to trust
- 1641 Python tests + 194 Vitest tests (1835 total), build clean

### Z-index Fix + GPT-5.4 + .env Docs (2026-03-07)

- Z-index stacking context fix, GPT-5.4 model catalog entry, .env.example provider keys
- Password reset flow, SMTP mail module, JWT-scoped tokens
- 1603 Python tests + 185 Vitest tests (1788 total)

### Consensus Navigation & Collapsible Sections

Expand Down Expand Up @@ -195,3 +224,9 @@ Phase 0 benchmark framework — fully functional, pilot-tested on 5 questions.
| 2026-03-07 | GPT-5.4 added to model catalog (1M ctx, $2.50/$15.00, no-temperature) | Done |
| 2026-03-07 | .env.example updated with provider API key placeholders | Done |
| 2026-03-07 | README updated with all provider env vars | Done |
| 2026-03-08 | Question refinement (analyze → clarify → enrich → consensus) | In Progress |
| 2026-03-08 | Native provider web search (Anthropic/Google/Mistral/OpenAI/Perplexity) | In Progress |
| 2026-03-08 | Citations extraction + frontend CitationList + ConsensusNav Sources | In Progress |
| 2026-03-08 | Tools enabled by default (web_search wired through CLI/REST/WS) | In Progress |
| 2026-03-08 | Provider tool format fix (generic → native transform per provider) | In Progress |
| 2026-03-08 | Sidebar UX (new-question button, collapsible toggle) | In Progress |
Loading
Loading