Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
90 changes: 40 additions & 50 deletions memory-bank/activeContext.md
Original file line number Diff line number Diff line change
@@ -1,68 +1,58 @@
# Active Context

**Last Updated**: 2026-02-18
**Current Phase**: Epistemic Confidence (Phase A) — on branch `epistemic-confidence-phase-a`
**Next Action**: Commit, push, create PR to merge to main.
**Current Phase**: Consensus UX — right-side nav, collapsible sections, decision-first layout
**Next Action**: PR open for review.

## What Just Shipped: Epistemic Confidence Phase A
## What Just Shipped: Consensus Navigation & Collapsible Sections

### Core Change
Confidence scoring is now **epistemic** — it reflects inherent uncertainty of the question domain, not just challenge quality.
### Core Changes
The consensus page and thread detail view now have proper navigation and information hierarchy for multi-round deliberations.

**Before**: `confidence = _compute_confidence(challenges)` — measured rigor only (0.5–1.0 based on sycophancy ratio).
**After**: Two separate scores:
- **Rigor** (renamed from old confidence) — how genuine the challenges were (0.5–1.0)
- **Confidence** — `min(domain_cap(intent), rigor)` — rigor clamped by question type ceiling
**Before**: Long vertical scroll of rounds with no way to navigate or collapse. Decision buried at the bottom after all rounds.
**After**:
- Sticky right-side nav panel shows progress through rounds/phases
- All sections are independently collapsible via a shared `Disclosure` primitive
- Decision surfaces to the **top** when consensus is complete (both live + stored threads)
- Individual challengers shown by model name in nav and each collapsible
- Dissent gets equal treatment: collapsible `DissentBanner` with model attribution parsed from `[model:name]:` prefix

### Domain Caps
| Intent | Cap | Rationale |
|--------|-----|-----------|
| factual | 0.95 | Verifiable answers, near-certain |
| technical | 0.90 | Strong consensus possible |
| creative | 0.85 | Subjective, multiple valid answers |
| judgment | 0.80 | Requires weighing trade-offs |
| strategic | 0.70 | Inherent future uncertainty |
| unknown/None | 0.85 | Default conservative cap |
### New Shared Component: `Disclosure`
Reusable chevron + toggle primitive (`web/src/components/shared/Disclosure.tsx`):
- Props: `header`, `defaultOpen`, `forceOpen`, `className`
- Used by: PhaseCard, TurnCard, ConsensusComplete, DissentBanner, ThreadDetail

### Files Changed (47 files, +997, -230)
### Files Changed (17 files)
**New files:**
- `src/duh/calibration.py` — ECE (Expected Calibration Error) computation
- `src/duh/memory/migrations.py` — SQLite schema migration (adds rigor column)
- `tests/unit/test_calibration.py` — 15 calibration tests
- `tests/unit/test_confidence_scoring.py` — 20 epistemic confidence tests
- `tests/unit/test_cli_calibration.py` — 4 CLI calibration tests
- `web/src/components/calibration/CalibrationDashboard.tsx` — Calibration viz
- `web/src/pages/CalibrationPage.tsx` — Calibration page
- `web/src/stores/calibration.ts` — Calibration Zustand store
- `web/src/components/shared/Disclosure.tsx` — Shared collapsible primitive
- `web/src/components/consensus/ConsensusNav.tsx` — Sticky nav for live consensus
- `web/src/components/threads/ThreadNav.tsx` — Sticky nav for thread detail
- `web/src/__tests__/consensus-nav.test.tsx` — 32 tests (Disclosure, PhaseCard, DissentBanner, TurnCard, ConsensusNav)
- `web/src/__tests__/thread-nav.test.tsx` — 8 tests (ThreadNav)

**Modified across full stack:**
- `consensus/handlers.py` — Renamed `_compute_confidence` → `_compute_rigor`, added `_domain_cap()`, `DOMAIN_CAPS`, epistemic formula
- `consensus/machine.py` — Added `rigor` to ConsensusContext, RoundResult
- `consensus/scheduler.py` — Propagates rigor through subtask results
- `consensus/synthesis.py` — Averages rigor across subtask results
- `consensus/voting.py` — Added rigor to VoteResult, VotingAggregation
- `memory/models.py` — Added `rigor` column to Decision ORM
- `memory/repository.py` — Accepts `rigor` param in `save_decision()`
- `memory/context.py` — Shows rigor in context builder output
- `cli/app.py` — All output paths show rigor; new `duh calibration` command; PDF export enhanced
- `cli/display.py` — `show_commit()` and `show_final_decision()` show rigor
- `api/routes/crud.py` — `GET /api/calibration` endpoint; rigor in decision space
- `api/routes/ask.py`, `ws.py`, `threads.py` — Propagate rigor
- `mcp/server.py` — Propagates rigor
- Frontend: ConfidenceMeter, ConsensusComplete, ConsensusPanel, ThreadDetail, TurnCard, ExportMenu, Sidebar, DecisionCloud, stores updated
**Modified:**
- `PhaseCard.tsx` — Uses Disclosure for outer collapse + per-challenger Disclosure
- `TurnCard.tsx` — Uses Disclosure for outer collapse + per-contribution Disclosure
- `ConsensusComplete.tsx` — Collapsible via Disclosure, dissent moved inside panel
- `DissentBanner.tsx` — Uses Disclosure, parses `[model:name]:` prefix for ModelBadge
- `ConsensusPanel.tsx` — Decision at top when complete, scroll target IDs
- `ConsensusPage.tsx` — Flex-row layout with sticky ConsensusNav sidebar
- `ThreadDetail.tsx` — Decision surfaced to top, DissentBanner for dissent, scroll IDs
- `ThreadDetailPage.tsx` — Flex-row layout with sticky ThreadNav sidebar
- Barrel exports: `consensus/index.ts`, `threads/index.ts`, `shared/index.ts`

### Test Results
- 1586 Python tests + 166 Vitest tests (1752 total)
- Build clean, all tests pass

---

## Current State

- **Branch `epistemic-confidence-phase-a`** — all changes uncommitted, ready to commit.
- **1586 Python tests + 126 Vitest tests** (1712 total), ruff clean, mypy strict clean.
- **~62 Python source files + 70 frontend source files** (~132 total).
- All previous features intact (v0.1–v0.5 + export).

## Next Task: Model Selection Controls + Provider Updates

Deferred from before Phase A. See `progress.md` for details.
- **Branch `consensus-nav-collapsible`** — ready for PR.
- **1586 Python tests + 166 Vitest tests** (1752 total).
- **~62 Python source files + 75 frontend source files** (~137 total).
- All previous features intact (v0.1–v0.5 + export + epistemic confidence).

## Open Questions (Still Unresolved)

Expand Down
15 changes: 14 additions & 1 deletion memory-bank/progress.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,19 @@

---

## Current State: Epistemic Confidence Phase A COMPLETE
## Current State: Consensus Nav + Collapsible Sections COMPLETE

### Consensus Navigation & Collapsible Sections

- **Shared `Disclosure` primitive** — reusable chevron + toggle component used across PhaseCard, TurnCard, ConsensusComplete, DissentBanner, ThreadDetail
- **Sticky right-side nav** — `ConsensusNav` (live consensus) and `ThreadNav` (thread detail) show round/phase progress, individual challenger model names, scroll-to-section on click
- **Decision-first layout** — `ConsensusComplete` and thread final decision surface to the top when consensus is complete, collapsible via Disclosure
- **Per-challenger collapsibility** — each individual challenger is its own Disclosure within the CHALLENGE phase, nav shows short model names (e.g. `gpt-4`, `gemini`)
- **DissentBanner refactored** — uses Disclosure, parses `[model:name]:` prefix to extract model attribution and display ModelBadge
- **Responsive** — nav hidden on mobile (`hidden lg:block`), collapsible sections still work
- **Both views** — ConsensusPage (live streaming) and ThreadDetailPage (stored threads) share the same patterns
- 1586 Python tests + 166 Vitest tests (1752 total), build clean
- New files: Disclosure.tsx, ConsensusNav.tsx, ThreadNav.tsx, consensus-nav.test.tsx, thread-nav.test.tsx

### Epistemic Confidence Phase A

Expand Down Expand Up @@ -168,3 +180,4 @@ Phase 0 benchmark framework — fully functional, pilot-tested on 5 questions.
| 2026-02-17 | v0.5.0 — "It Scales" | **Complete** |
| 2026-02-17 | Export to Markdown & PDF (CLI + API + Web UI) | Done |
| 2026-02-18 | Epistemic Confidence Phase A (rigor + domain caps + calibration) | Done |
| 2026-02-18 | Consensus nav + collapsible sections + decision-first layout | Done |
Loading
Loading