Date: 2026-05-19
Maintainer: 0bserver07
Branch: main; last tag v0.9.1 (on PyPI). HEAD: 8606974.
Schema: CURRENT_VERSION = 17. Real store ~/.stackunderflow/store.db at user_version = 17. v018 (static analysis, spec 21) exists on the feat/static-analysis-pass branch but is not merged.
Tests: pytest tests/ -q collects 2781 — 2779 pass, 2 skipped, 14 slow tests deselected by default. Frontend: 168 (node --test stackunderflow-ui/tests/services/*.test.ts). Ruff: 41 baseline. Typecheck + build clean.
This doc gets a fresh agent oriented in 10 minutes. Read it before reading code.
STOP-THE-PRESS NOTE (2026-05-19). The maintainer has asked for no more version bumps for now. Eight releases shipped in ~3 days (v0.7.0 → v0.9.1); each one fixed something the previous broke. v0.9.1 alone was five real production bugs that only surfaced after a maintainer-forced audit — bugs the v0.9.0 release shipped silently. Accumulate under
## [Unreleased]for as long as it takes until the dashboard is verified end-to-end on a real project (not just per-route curl smoke tests, not just unit tests passing). The maintainer explicitly told me: "stop doing aggressive version changes, meanwhile u have left the project riddled with a giant gapping hole with bugs."Versioning rule. Patch bumps only when shipping is warranted (
0.9.X). NO MINOR / MAJOR until the audit gap below is fully closed. The maintainer owns version decisions; do not bump__version__.py/pyproject.toml/package.jsonor move a "release" label without explicit approval. New CHANGELOG entries go under## [Unreleased]only.
A real-data audit only happened after v0.9.0 shipped. It found 5 broken tabs out of 15 routes checked — agents (showed sessions from unrelated projects), messages (49 MB unbounded payload), stats (4 MB payload), optimize (1.5s warm, 13s cold), yield (15s timeout). All 5 were fixed in v0.9.1.
But the audit only covered 15 routes. Untouched surfaces that an incoming agent should NOT assume work just because tests pass:
- Compare tab —
/api/comparereturned 859 bytes in the audit (probably empty), no per-tab UX check - Plan tab — including the v0.9.0 burn-projector v2 forecast strip + alert thresholds — never opened in a browser
- Q&A tab —
/api/qareturned 1696 rows in 0.15s, but rendering not verified - Tags tab — likewise, surface returns; render unverified
- Bookmarks tab — empty in the audit project
- Search — returned 1 hit, search-UI experience never exercised
- Playback v1 event stream —
/api/playback/project/{slug}?since=7dreturnedtotal=0(window probably wrong); the per-session event scrubber + FS panel never verified end-to-end against real session data - Live tab (v0.9.0 SSE) — the SSE handler was suspect (server CPU-locked once when a tab was open); never actually validated as load-bearing
- Meta-agent sidebar (v0.8.0) — requires Ollama running locally; never tested with a real model on real data; the tool-call loop's behavior on edge cases is unverified
- Discovery embeddings (
--use-embeddings, v0.7.3) — needspip install stackunderflow[embeddings]+ first-time model download; first-run UX never tested on a fresh machine stackunderflow init --install-skills(v0.7.1) — copies SKILL.md files; never confirmed Claude Code picks them up
The next session's first job should be: finish the audit. Open every tab on a real project, click around, watch for silent-render-but-wrong failures (the agents-tab nonsense was exactly this — it returned 200 OK with 10 rows of garbage). Treat unit tests as "code compiles," not "feature works."
One timeline. Dates match CHANGELOG.md. Pre-0.7 releases (v0.2 → v0.6.1) are in the CHANGELOG; the v0.7.0 ETL push is the start of the current architecture.
| Tag | Date | What |
|---|---|---|
| v0.7.0 | 2026-05-06 | ETL pipeline (Waves 1–4): usage_events + 5 marts + watermarked refresh + filesystem watcher + every dashboard route migrated to mart reads |
| v0.7.1 | 2026-05-13 | Wave 5 ETL follow-ups (tool_mart + command_mart, POST /api/etl/backfill, watcher single-instance lock, messages partitioning) + the post-v0.7 spec round (5 CLI + 5 MCP discovery tools, discovery_telemetry, opt-in lifecycle hooks + captured_events, message_tool_mart, tool_mart.calls_total, agent-teams + Agents tab, per-session playback + Playback tab); init --install-skills; beta-normalizer pricing coverage |
| v0.7.2 | 2026-05-13 | Pricing fixes round 2 (claude-opus-4-7 + GLM-5 + composer-1 + droid-auto); outcome-aware discovery outcome_confidence ladder; optimize parity tests; command_costs structural-mismatch lock-in |
| v0.7.3 | 2026-05-14 | Playback v2 (virtual-FS reconstruction) + opt-in semantic search (--use-embeddings, v014 discovery_embeddings) + Windows CI matrix + beta-flag drop on Playback/Agents |
| v0.7.4 | 2026-05-14 | CI cleanup (Tests Ubuntu-only after the Windows pytest port surfaced ~40 POSIX-shaped fixtures) + drop beta on yield/qa/tags + API docs for playback /fs |
| v0.8.0 | 2026-05-15 | CLI cost-report fix (today/month/status/report were under-counting by ~6× — $502 vs $3072 on the real store) + meta-agent sidebar (Ask StackUnderflow — Ollama tool-call loop) + --ingest / --auto-ingest flags on read-only CLI commands |
| v0.9.0 | 2026-05-15 | Wave 1 of the world-class roadmap (issue #103): file-risk recommender (#86), burn projector v2 (#87), mode recommender heuristic v1 (#88, v016 mode_recommendations), skill recommender (#89), real-time observability tab (#90, SSE), session-schema spec doc (#91); also fixed daily_mart_by_day to return the frontend-expected dict-keyed-by-date shape (was returning a list, crashing every project page with tokens.input undefined) |
| v0.9.1 | 2026-05-16 | Five audit fixes (issue #104): agents tab redesigned to detect Task-tool sub-agents per project (was synthesizing fake teams from unrelated projects); /api/messages paginated (49 MB → 262 KB); /api/stats trimmed (4 MB → 67.5 KB); /api/optimize cached + MCP-detector fast-path (1.5s → 345ms); /api/yield batched git work per cwd (timeout → 0.35s per project) |
| Spec | Issue | Branch | State |
|---|---|---|---|
Spec 20: PR / CI webhook ingest (v017 pr_outcomes + ci_runs) |
#92 | feat/pr-ci-webhook-ingest |
MERGED in v0.9.1 era |
Spec 21: Per-session static analysis pass (v018 static_analysis_findings) |
#93 | feat/static-analysis-pass |
NOT MERGED. Branch pushed at dd1414d. CURRENT_VERSION on branch = 18, on main = 17. Per-language analyzers for Python (radon/mypy/ruff), TS (tsc/eslint), Go (go vet/gocyclo). Coverage measurement deferred to spec 22. Has a new [analysis] extra in pyproject. 59 new tests on the branch. Needs review + merge — but probably AFTER finishing the audit, so we don't ship more unverified surface. |
Issues #92, #93 are still OPEN on GitHub (merge commits didn't include closing keywords). Manually close after verifying merged work is functional.
Wave 1 ✅ merged (v0.9.0). Wave 2
| Wave | Issues | State |
|---|---|---|
| 3 — outcome attribution + grading | #94 (outcome attribution v2, depends on #92), #95 (LLM-graded session quality) | pending |
| 4 — replay + active surfacing | #96 (context-window replay), #97 (active-surfacing meta-agent via hooks) | pending |
| 5 — fork mode + comparative benchmark (the killer features) | #98 (fork mode), #99 (comparative benchmark engine — needs-design, maintainer rubric required) |
pending |
| 6 — sensitive / long-tail | #100 (multi-device sync — needs-design), #101 (Windows test-fixture port), #102 (real-world beta-normalizer fixtures) |
pending |
Schema versions pre-reserved: v015 unused (file-risk didn't need it — a deliberate gap, never created), v016 = mode_recommendations, v017 = pr_outcomes + ci_runs, v018 = static_analysis_findings (on branch). Next free slot: v019 (reserved for spec 22 commit_session_link).
~/.stackunderflow/store.db (~3.9 GB):
user_version: 17
~150,843 usage_events (cost_source=unknown: 0)
All 8 marts populated (daily / session / project / provider_day / model_day / tool / command / message_tool).
v014: discovery_embeddings — table present, empty (populates on first --use-embeddings query)
v016: mode_recommendations — table present, populates on first `recommend mode` call
v017: pr_outcomes + ci_runs — tables present, empty (no webhooks configured)
agent_teams: still empty (~/.claude/teams/ does not exist on this machine)
captured_events: empty (hooks not installed)
discovery_telemetry: empty
Hard rules (NON-NEGOTIABLE — these have been violated several times this round and the maintainer has called it out)
- NO version bumps without explicit approval.
__version__.py,pyproject.toml,stackunderflow-ui/package.json,stackunderflow-ui/package-lock.json, CHANGELOG## [N.N.N]headings. - NO PRs opened by agents — the maintainer handles the merge. Agents push the branch; that's it.
- NO touching
~/.stackunderflow/store.dbfrom tests or scripts. Usetmp_path/:memory:. For real-data spot-check, copy to/tmp/wt-X/test-store.dbviasqlite3 ~/.stackunderflow/store.db ".backup ...". - NO
.notes/commits (gitignored). - NO
--no-verify(skipping git hooks). Fix the underlying issue. - NO external-library name references in shipped code/docs.
- Pre-assigned schema slots are sacred. v015 unused (reserved), v016 used (mode), v017 used (pr/ci), v018 reserved for spec 21 if/when it merges.
- Tests pass ≠ feature works. Before claiming "fixed," open the real dashboard tab in a browser and click through it. Per-route 200-OK curls are necessary but not sufficient — the agents-tab bug returned 200 OK with 10 rows of garbage from unrelated projects.
An offline, local-first observability toolkit for AI coding agents. It ingests and indexes session logs from 17 coding agent providers to surface cost analytics, interactive session playback (with step-by-step filesystem reconstruction), and a searchable knowledge base that both developers and agents can query to learn from past decisions and failures. Forked from a since-rewritten codebase; MIT, no external service dependencies, no telemetry.
The user runs stackunderflow start. A FastAPI server binds 127.0.0.1:8081, serves a React dashboard, and exposes:
- A REST API under
/api/*for the dashboard - An MCP server (over stdio) so Claude Desktop / Cursor / Claude Code can query session history without spinning up the dashboard
- A CLI (
stackunderflow ...) for ops, exports, plan budgets, ETL ops, etc. - A Python public API (
import stackunderflow; list_projects(); process(slug)) for scripting
Source-of-truth state lives at ~/.stackunderflow/store.db (SQLite). The dashboard is read-only against the store in the hot path; ingest happens in the background.
┌──────────────────────── Source files (17 providers) ────────────────────────┐
│ ~/.claude/projects/ # JSONL │
│ ~/.codex/sessions/ # JSONL │
│ ~/Library/.../Cursor/.../state.vscdb # SQLite │
│ ~/Library/.../saoudrizwan.claude-dev # JSON (Cline) │
│ ~/.gemini/, ~/.qwen/, ~/.factory/, ... # 13 beta providers │
└─────────────────────────────────────────────────────────────────────────────┘
│
▼ Adapter (per-provider parser)
┌────────────────────────── RAW LAYER ──────────────────────────────────────┐
│ messages, sessions, projects (SQLite) │
│ one row per source-message; immutable; UNIQUE(provider, slug) │
└─────────────────────────────────────────────────────────────────────────────┘
│
▼ Normalizer (per-provider transform)
┌────────────────────── NORMALIZED LAYER ───────────────────────────────────┐
│ usage_events │
│ one row per billable event, canonical shape, cost_usd computed once │
│ cost_source: live | rate_card | estimated | unknown │
└─────────────────────────────────────────────────────────────────────────────┘
│
▼ MartBuilder.refresh(conn, since_event_id)
┌──────────────────────── MARTS LAYER (8 builders) ─────────────────────────┐
│ daily_mart (day, project_id, provider, model, speed) │
│ session_mart (session_id, all per-session aggregates) │
│ project_mart (project_id, lifetime totals) │
│ provider_day_mart (day, provider) │
│ model_day_mart (day, model, speed) │
│ tool_mart (day, project_id, provider, tool_name) ← v007/v012 │
│ command_mart (day, project_id, command_name) ← v007 │
│ message_tool_mart (message_id, tool_name, call_index) ← v011 │
│ mart_watermark (mart_name → last_event_id, last_refresh_ts) │
└─────────────────────────────────────────────────────────────────────────────┘
│
▼
REST routes — plain SELECTs from marts (+ aggregator fallback)
tool_mart / command_mart / message_tool_mart JOIN usage_events back to messages to read tools_json (the aggregate-grain marts above them never touch raw messages). message_tool_mart watermarks on usage_events.id (not messages — that's a UNION view post-v008 and can't be watermarked directly).
The watcher (stackunderflow/etl/watcher.py) ties the layers together: filesystem change → adapter.read() → writer inserts messages → normalizer inserts events → refresh_all_marts() advances watermarks. End-to-end ~400 ms. A fcntl/msvcrt single-instance lock at ~/.stackunderflow/server.lock (see etl/lock.py) means a second stackunderflow start against the same store still serves HTTP but doesn't run the watcher; --no-lock / STACKUNDERFLOW_DISABLE_LOCK=1 skips it.
State directory layout (~/.stackunderflow/):
store.db— SQLite source of truth (canonical data; do not touch without a backup)cache/— disk side of the TieredCache (rebuildable;--freshwipes it)server.lock— watcher single-instance lockconfig.json— user settings (descriptor-resolved viasettings.py)backups/<ts>[-label]/—stackunderflow backup createsnapshots of~/.claude/, rsync-hard-linked against the previous snapshot (seedocs/backup.md)backup.log— launchd auto-backup stdout/stderr
stackunderflow/
adapters/ # Per-provider source parsers — 17 adapters (4 default-on, 13 beta)
base.py # SourceAdapter Protocol; SessionRef + Record dataclasses
claude.py codex.py cursor.py cline.py # default-on; cline.py also hosts the KiloCode + RooCode beta adapters
cursor_agent.py opencode.py qwen.py gemini.py copilot.py codeium.py # beta
continue_adapter.py droid.py kiro.py openclaw.py pi.py # beta
claude_teams.py # materialize_team_metadata() — ingests ~/.claude/teams/ + tasks/ (v013)
_streaming.py # size-capped streaming reader for large JSONL
api/ # Public Python API surface (list_projects/process/list_sessions)
etl/
normalize/ # Per-provider transforms messages → usage_events
base.py # Normalizer ABC + cost_source constants + _build_event helper
__init__.py # last-wins registry: register/get/all + 18 normalizers wire here
claude.py codex.py cursor.py cline.py # default-on
<14 beta normalizers; omp reuses the pi normalizer class>
marts/ # MartBuilder ABC + 8 builders
base.py # ABC; concrete rebuild_from_scratch default
__init__.py # last-wins registry; 8 builders wire here
daily.py session.py project.py provider_day.py model_day.py
tool.py command.py # v007 (Wave 5); tool gains calls_total in v012
message_tool.py # v011 — per-(message,tool,call_index) grain
backfill.py # Streams messages → events → marts; idempotent; --force rebuild
backfill_jobs.py # Process-local lock + single-slot job state for POST /api/etl/backfill
lock.py # fcntl/msvcrt watcher single-instance lock + stale-PID detection
watcher.py # watchfiles daemon; debounced 200 ms; per-adapter dispatch
watermark.py # get/set/refresh_all_marts; persists last_event_id + last_refresh_ts
status.py # Shared assembler for /api/etl/status + `stackunderflow etl status` (+ current_job/last_job/lock_held_by)
hooks/ # Opt-in Claude Code lifecycle hooks
_install.py # idempotent merge into .claude/settings.json (project or user scope) + backup
_repair.py # rewrite stale hook commands to the portable `stackunderflow hooks run <id>` form; --scope all walks $HOME
handlers.py # `hooks run <id>` body — reads payload on stdin, writes a captured_events row, always exits 0
templates.py # canonical hooks block emitted by `hooks install`
ingest/
writer.py # INSERT INTO messages + normalize+insert hook + messages_YYYYMM partition routing + claude-teams materialize call
enumerate.py # Discovery wrapper around all registered adapters
__init__.py # run_ingest(conn, adapters)
infra/
costs.py # compute_cost(tokens, model, provider, *, speed) → dict
cache.py # TieredCache — hot LRU + cold disk JSON
currency.py # Frankfurter live + 24h cache + ECB snapshot fallback
cursor_cache.py # Fingerprint cache for vscdb (3-8× cold-start speedup)
discovery.py # Filesystem scan helpers (legacy file-scan path — distinct from services/discovery.py)
providers/ # Per-provider Pricers (anthropic, openai, cursor, etc.)
mcp/
server.py # FastMCP server; 12 tools — 3 session/project + 3 discovery + 2 outcome + recommend_skills + recommend_mode + file_risk + get_burn_projection
store_reader.py # Read-only store helpers shared with the MCP server
reports/ # CLI report renderers (text/json/csv) + optimize patterns (mart-backed detectors via message_tool_mart)
routes/ # FastAPI routes — 23 modules, one file per concern
agent_teams.py bookmarks.py cfg.py commands.py compare.py context_budget.py
cost.py data.py etl.py export.py live.py meta_agent.py misc.py optimize.py
plan.py playback.py projects.py qa.py search.py sessions.py tags.py
webhooks.py yield_route.py
# misc.py also exposes /api/ollama-api/{path:path} — a thin httpx
# pass-through to a local Ollama daemon (default upstream
# http://localhost:11434). Powers the dashboard's chat sidebar.
# live.py — v0.9.0 SSE observability tab.
# webhooks.py — Spec 20 PR / CI webhook receiver.
# meta_agent.py — NDJSON tool-calling loop driving the right-docked sidebar.
services/ # compare, plans, yield_tracker, pricing, search, qa, tags, bookmarks, ...
discovery.py # discovery / outcome queries (shared by CLI + MCP); SessionMatch/OutcomeMatch/BudgetedResult; pack_within_budget
discovery_telemetry.py # loaded_count/cited_count recording + demote-uncited sweep
skill_synth.py # mine the store for project workflow patterns → auto-* SKILL.md files
skill_recommender.py # read-only skill suggestions (v0.9.0, #89)
risk.py # file-risk recommender (v0.9.0, #86)
burn.py # burn projector v2 — linear / weighted-7d forecast (v0.9.0, #87)
mode_recommender.py # cheapest-model heuristic v1 (v0.9.0, #88)
playback.py playback_fs.py # per-session tool-call timeline + virtual-FS reconstruction
agent_teams.py # Claude Code agent-team graph — v013-JOIN-backed, falls back to is_sidechain/raw_json heuristic
live.py # SSE event source for the Live tab
github_ingest.py # Spec 20 PR / CI ingest helpers
meta_agent.py # backend tool catalogue + dispatcher for the meta-agent sidebar; 13 read-only tools; 4 KB per-result cap
skills/ # Static SKILL.md files shipped with the package
check-prior-work/ find-related-sessions/ recall-past-decisions/ # one SKILL.md each
store/
schema.py # CURRENT_VERSION = 17; applies SQL + .py migrations idempotently; _ADD_COLUMN_GUARDS for v003/v012/v013
queries.py # Typed query helpers (one place for all SQL)
mart_queries.py # Read helpers used by route migrations
db.py types.py
migrations/ # v001 → v017 — v015 intentionally skipped; v005 + v008 are .py, rest are .sql
cli.py server.py deps.py settings.py __version__.py
cli_helpers/ # CLI-only helpers separated from cli.py
ingest.py # ensure_fresh() + is_stale() — read-only data commands' --ingest path
stackunderflow-ui/ # React dashboard (Vite); output → ../stackunderflow/static/react/
src/
pages/ # Overview, ProjectDashboard, Settings (Settings has the "Backfill now" button wired to POST /api/etl/backfill)
components/
common/ FilterBar, EtlStatusBadge (shows backfill / failure state), ExportButton, ...
dashboard/ one component per dashboard tab: Overview, Sessions, Cost, Compare,
Commands, Messages, Search, Q&A, Bookmarks, Tags, Yield, Agents, Playback
(plus the Optimize, Plan, and Live surfaces)
cost/ # Cost-tab widgets including CostByProviderCard
analytics/, charts/, layout/, qa/
services/ # API client (incl. EtlBackfillInProgressError) + format/currency/filters/providerStyle/navigation helpers
types/api.ts # Backend response shapes mirrored as TypeScript
tests/ # backend tests; integration/ has the slow-marker e2e + perf
docs/
HANDOFF.md # This file
specs/ # Architecture specs (multi-provider/, etl-architecture, agent-teams, messages-partitioning, session-schema-v1, adapter-contract, ...)
cli-reference.md api-reference.md mcp.md skills.md hooks.md multi-provider.md beta-normalizer-drift.md ...
Most behaviour is configured through ~/.stackunderflow/config.json (resolved by the descriptor pattern in settings.py: env → file → default). A few knobs are env-only or env-overridable:
STACKUNDERFLOW_BETA_<NAME>=1— enable a beta adapter (e.g.STACKUNDERFLOW_BETA_GEMINI=1). Off by default.STACKUNDERFLOW_DISABLE_WATCHER=1— skip the filesystem watcher (same asstart --no-watcher).STACKUNDERFLOW_DISABLE_LOCK=1— skip the watcher single-instance lock (same asstart --no-lock).STACKUNDERFLOW_DISCOVERY_BUDGET_TOKENS(default2000) — default--context-budgetfor the budget-aware discovery commands / MCP tools.STACKUNDERFLOW_DISCOVERY_RANK_WEIGHTS(default0.5,0.2,0.3) — discovery ranking weightsrecency,cost,relevance; malformed input falls back to the default.STACKUNDERFLOW_DISCOVERY_TELEMETRY(default on) — set0to disable passive citation-feedback recording.STACKUNDERFLOW_EMBED_MODEL— sentence-transformers model id forsearch-past-decisions --use-embeddings(defaultsentence-transformers/all-MiniLM-L6-v2).
The ETL migration shipped as v006_etl_layer.sql (v004 + v005 were already taken — synthetic-models cleanup + cursor-workspace redistribute). Migrations run from v001 through v017, all additive (no existing table mutated): v007 lower-grain marts, v008 messages partitioning, v009 discovery_telemetry, v010 captured_events, v011 message_tool_mart, v012 tool_mart.calls_total, v013 multi-agent session metadata + agent_teams, v014 discovery_embeddings, v016 mode_recommendations, v017 pr_outcomes + ci_runs. v015 was deliberately skipped — a reserved slot that was never needed and never created; the runner steps straight from v014 to v016. schema.CURRENT_VERSION = 17. schema.apply runs every migration whose number exceeds PRAGMA user_version, so they chain correctly regardless of merge order; _ADD_COLUMN_GUARDS makes the ALTER TABLE ADD COLUMN migrations (v003, v012, v013) idempotent against a crash mid-migration.
Every migrated route checks if its mart is populated. If yes → mart read. If no → original aggregator path. So the dashboard works even on a fresh install before backfill runs. After a backfill or a single watcher cycle, marts populate and the fast path takes over automatically. The post-v0.7 routes (/api/playback, /api/agent-teams) are pure read-side over messages.raw_json and don't have an empty-mart fallback — /api/agent-teams does fall back from the v013 JOIN to the is_sidechain/raw_json heuristic when the team artefacts aren't ingested.
cost_usd lives on every usage_events row. Marts SUM it, never re-apply rate cards. Currency conversion stays at the API boundary (already correct from v0.6.0). When pricing changes, re-normalize from raw messages — one code path.
Additive marts (daily, provider_day, model_day) can't simply SUM COUNT(DISTINCT session_id) across refresh windows (the same session can appear in two windows). Solution: after the additive INSERT...ON CONFLICT, a follow-up UPDATE recomputes session_count from usage_events for affected keys. Bounded by the number of distinct keys in the window — typically O(1)..O(few dozen). Tests lock this in.
session_martandproject_martuse INSERT OR REPLACE over a re-aggregated subquery for affected entities (totals stay correct when new events arrive for an existing session).daily_mart,provider_day_mart,model_day_martuse INSERT...ON CONFLICT DO UPDATE additively (because the same(day, …)key never appears in two refresh windows once the watermark moves forward).
Per spec — the registries live in stackunderflow/etl/normalize/__init__.py and stackunderflow/etl/marts/__init__.py, NOT in base.py. Last-wins (re-registering overwrites). _clear() for tests. The 18 normalizer + 8 mart-builder default registrations happen at package-import time via top-level register("name", Cls) calls. stackunderflow.etl.normalize.all() should list 18 keys (omp and pi both map to PiNormalizer); stackunderflow.etl.marts.all() 8.
Uses watchfiles (Rust-backed). Daemon thread spawned in lifespan. Catches every exception so a bad event never poisons the loop. --no-watcher / STACKUNDERFLOW_DISABLE_WATCHER=1 for headless mode. A fcntl/msvcrt lock at ~/.stackunderflow/server.lock (etl/lock.py) makes a second stackunderflow start against the same store serve HTTP without running the watcher — stale-PID detection via os.kill(pid, 0) clears a dead recorder's metadata; --no-lock / STACKUNDERFLOW_DISABLE_LOCK=1 skips the gate.
stackunderflow/infra/discovery.py is the legacy filesystem-scan helper (walks ~/.claude* JSONL). stackunderflow/services/discovery.py is the post-v0.7 store-backed query layer (SessionMatch / OutcomeMatch / BudgetedResult, pack_within_budget, the 5 find_sessions_* / search_past_decisions functions shared by the CLI commands and the MCP tools). They're unrelated — don't confuse them.
stackunderflow skills generate writes to <project>/.claude/skills/auto-*/ (or ~/.claude/skills/ for --scope user) — never into stackunderflow/skills/ (which holds only the 3 hand-authored static SKILL.md files). pyproject.toml's hatch.build.exclude blocks **/auto-*/SKILL.md from the wheel/sdist; .gitignore ignores .claude/skills/auto-*/. This is a hard constraint.
# Run the dashboard
stackunderflow start # binds 127.0.0.1:8081
# ingest + watcher run in background
stackunderflow start --no-watcher # headless / profiling — no fs watcher
stackunderflow start --no-lock # skip the watcher single-instance lock
# ETL ops
stackunderflow etl status # health + watermarks (+ current_job / lock_held_by)
stackunderflow etl status --format json
stackunderflow etl backfill # incremental (skips already-converted msgs)
stackunderflow etl backfill --force # drops events + marts, rebuilds (re-derives tool_mart.calls_total too)
# Discovery (also exposed as MCP tools — see docs/mcp.md, docs/cli-reference.md)
stackunderflow find-sessions-in-path "$(pwd)" --since 30d
stackunderflow find-sessions-touching-file path/to/file.py --mode write
stackunderflow search-past-decisions "watchfiles inotify"
stackunderflow find-sessions-where-action-worked "add caching" --file stackunderflow/infra/cache.py
stackunderflow find-failure-modes-for-file stackunderflow/routes/cost.py
# Auto-generated skills (docs/skills.md) — project-scoped by default
stackunderflow skills generate --dry-run --format json
stackunderflow skills list
stackunderflow skills clean --older-than 30d # preview; --yes to delete
# Opt-in Claude Code hooks (docs/hooks.md) — nothing installs until you run this
stackunderflow hooks install # --scope project (cwd's git root) by default
stackunderflow hooks status
stackunderflow hooks repair --dry-run
# Discovery telemetry
stackunderflow discovery telemetry
stackunderflow discovery demote-uncited --dry-run
# Tests
pytest tests/ -q # fast suite (default; slow tests deselected)
pytest -m slow tests/stackunderflow/integration -q # slow e2e + perf
ruff check stackunderflow/
# Frontend
cd stackunderflow-ui
npm run typecheck
npm run build # output → ../stackunderflow/static/react/
node --test tests/services/*.test.ts # frontend unit tests (Node built-in runner; no vitest dep)The Real-data + Audit-gap + Wave-2 + Roadmap sections at the top of this doc are the source of truth. This section enumerates the long-tail items not captured there.
| # | Item | Severity |
|---|---|---|
| A | Finish the dashboard audit. Open every tab on a real project, click through filters, watch for silent-render-but-wrong. The 5 fixed in v0.9.1 were just the ones I happened to hit. Untouched: Compare, Plan, QA, Tags, Bookmarks, Search-UX, Playback event-stream + FS panel end-to-end, Live tab SSE under load, meta-agent sidebar with real Ollama, init --install-skills first-run, --use-embeddings first-run. |
HIGH — gating any new feature work |
| B | Merge spec 21 (static analysis). Branch feat/static-analysis-pass at dd1414d, 59 tests, v018 migration. Adds Python (radon/mypy/ruff) + TS (tsc/eslint) + Go (go vet/gocyclo) analyzers + meta-agent get_session_quality tool. Held off merging during the v0.9.1 audit-fix sweep. Wave 2 is otherwise complete. |
medium (don't merge until A is well underway) |
| C | Close GitHub issues #86, #87, #88, #89, #90, #91, #92, #104. They were finished but the merge commits didn't include closing keywords. Manual close after verifying the merged work is functional (depends on A). | low |
| D | Windows test-fixture port (issue #101, spec 29). Build matrix is on Ubuntu + macOS + Windows; pytest matrix is Ubuntu-only because the first Windows run surfaced ~40 POSIX-shaped fixtures. Production code on Windows is fine; the test suite needs pathlib-friendly rewrites. Slog — a multi-day port plus several rounds of Windows-CI feedback. |
low |
| E | Real-world beta-normalizer fixtures (issue #102, spec 30). Synthetic spec-accurate fixtures exist; real captures need actual provider sessions on the maintainer's machine (logistics, not code). | low |
| F | Apply v018 to the real store (after merging spec 21). v017 just applied automatically; v018 will too. Then stackunderflow analyze backfill --since 30d to populate static_analysis_findings. |
low (post-B) |
| G | CHANGELOG hygiene. v0.9.1 release notes were added late and out of order due to a file-modified race during the version-bump commit. Inspect CHANGELOG.md around the ## [0.9.1] heading and tidy if needed. |
low cosmetic |
docs/cli-reference.md— everystackunderflowcommand, flag-for-flag (incl. discovery / skills / discovery-telemetry / hooks)docs/mcp.md— the 12 MCP tools + the citation-feedback loopdocs/skills.md— the 3 static SKILL.md files + theskills generate/list/cleansynthesis + the hard guardrailsdocs/hooks.md— the opt-in Claude Code lifecycle hooks (what's captured, the privacy model, the Windows status)docs/specs/etl-architecture.md— design contract for the pipelinedocs/specs/session-schema-v1.md+docs/specs/adapter-contract.md— the on-disk schema published as a versioned spec other tools can target (pinned toschema_version = 17); the second covers theSourceAdapterProtocol for new integrationsdocs/specs/messages-partitioning.md— v008 design + rollback + ops rolloutdocs/specs/agent-teams.md— the multi-agent FS-recognition design (v013)docs/beta-normalizer-drift.md— per-provider verdicts from the Wave 5 normalizer auditstackunderflow/services/discovery.py—SessionMatch/OutcomeMatch/BudgetedResult,pack_within_budget, the 5 shared discovery / outcome query functionsstackunderflow/etl/normalize/base.py+stackunderflow/etl/marts/base.py—Normalizer/MartBuilderABCsstackunderflow/etl/backfill.py+backfill_jobs.py+lock.py+watcher.py— orchestrator, the backfill-job slot, the single-instance lock, the watchfiles dispatchstackunderflow/store/migrations/v006_etl_layer.sql…v017_pr_ci_outcomes.sql— the schema progression (every migration header documents its own rationale)stackunderflow/store/mart_queries.py— every read helper used by routesstackunderflow/ingest/writer.py— partition routing helpers + the claude-teams materialize call- Any
routes/*.pyfor the JSON contracts the dashboard depends on (playback.py/agent_teams.py/live.py/webhooks.pyare the newest) tests/stackunderflow/integration/— e2e + perf regression — the most useful single dir to understand the whole pipeline at once
- No version bumps without
CHANGELOG.md+ git tag + GitHub release — done together as one PR (release: 0.7.0) - No external-library attribution in shipped code (the project is a clean rewrite)
- No backwards-compat shims — when an API shape changes, change the consumers in the same PR
- Tests must run on Linux CI (no macOS-only paths in non-platform-specific tests)
- Beta adapters opt in via
STACKUNDERFLOW_BETA_<NAME>=1— never on by default - Frontend tests use
node --test(Node 22+ built-in runner) not vitest — no new dev dep - Idempotent EVERYTHING in ETL — every
refresh,backfill,watcher cyclemust be safe to re-run - The user's
~/.stackunderflow/store.dbis sacred — tests usetmp_pathor:memory:, never the real store - Settings file:
~/.stackunderflow/config.json(notsettings.json); the descriptor pattern insettings.pyresolves env → file → default
| Symptom | Likely cause | Where to look |
|---|---|---|
/api/etl/status shows lag > 1000 events for minutes |
Watcher not running, or normalizer raising | stackunderflow start log; stackunderflow etl status |
| Marts empty after backfill | Normalizer / mart-builder for that provider/mart not registered | stackunderflow.etl.normalize.all() should list 18 keys; stackunderflow.etl.marts.all() 8 |
/api/agent-teams returns heuristic-only data |
~/.claude/teams/ artefacts not ingested, so no v013 team metadata in the store |
Re-ingest so claude_teams.materialize_team_metadata() runs; the heuristic fallback is expected when there are no team artefacts |
stackunderflow hooks run did nothing |
Expected — it always exits 0; if no row appeared, the event didn't match a handler heuristic, or captured_events couldn't be created |
stackunderflow.hooks.handlers; stackunderflow hooks status |
| Discovery commands return nothing on a fresh store | Store empty — no messages ingested yet |
stackunderflow etl backfill first; stackunderflow etl status |
| Dashboard cost = $0 for a provider | Pricer for the model returns None |
infra/providers/<provider>.py rates_for() |
| Watcher spammed log: "Adapter raised in cycle" | Provider adapter has a parse bug | Look at the provider's adapter, run adapter.read() directly on the file |
health: error in status |
Mart watermark stuck + watcher dead | Restart server; stackunderflow etl backfill --force if persistent |
Second stackunderflow start doesn't refresh data |
First instance holds the watcher lock — by design (it still serves HTTP) | stackunderflow etl status (watcher.lock_held_by); --no-lock to override |
| New install, dashboard slow | Marts empty, fallback to aggregator. Run stackunderflow etl backfill to populate |
One-time, then watcher keeps it fresh |
- Finish the dashboard audit (item A above). Run
stackunderflow start(orpython -m stackunderflow startfrom the repo for editable-source), openhttp://127.0.0.1:8081, pick a project with substantive data (-Users-yadkonrad-dev-dev-year26-feb26-SutroYarois a known good test target — 95 sessions, 374 Task tool calls). Open EVERY tab in order. For each: does it render, does the data make sense, do filters work, are there silent rendering bugs that look right but show wrong data? Use the audit script template from issue #104 if you want a starting point. Write findings to## [Unreleased]in CHANGELOG and to a new audit issue. Don't ship a release at the end of this — just fix bugs and accumulate. - Merge spec 21 (static analysis) when (1) is well underway. Branch
feat/static-analysis-pass. CURRENT_VERSION bumps 17 → 18. Adds the[analysis]extra inpyproject.toml. After merge:stackunderflow analyze backfill --since 30dto populate findings on the real store. - Close stale GitHub issues (#86–#104 minus #104 which tracks the audit). Manual close after verifying each merged feature actually functions in the dashboard.
- Consider what the next genuine release should mean. v0.9.1 was forced by the audit-bug discovery. The next bump should be when the dashboard is provably solid AND there's a story (e.g. wave-2 fully landed + audit clean, ship as v0.10.0).
- DO NOT start wave 3 yet. Wave 3 (#94, #95) builds on wave 2 (#92 + #93). #93 is on a branch but not merged. Don't kick off wave 3 specs until #93 lands and the audit closes.
The roadmap (issue #103) is the long-term direction: outcome attribution + comparative benchmark + replay/fork + multi-device sync. Two design-gated items need maintainer input before any agent dispatches:
- #99 Comparative benchmark engine. Maintainer writes the scoring rubric (
docs/specs/benchmark-rubric-v1.md) — what counts as "model X won" against model Y on a replayed session. Static-score + LLM-grade composition weights. Without this rubric an agent will pick reasonable-looking defaults and you'll argue about them later. - #100 Multi-device sync. Crypto library (
agevspynacl), wire format (per-device incremental snapshots vs append-log), conflict-resolution policy. The privacy contract is non-negotiable — get this wrong and you've shipped a destructive bug.
That's the picture. Files referenced are absolute paths under /Users/yadkonrad/dev_dev/year26/jan26/StackUnderflow/. Welcome — but please finish the audit before shipping anything.