Releases: 0bserver07/StackUnderflow
v0.9.1 — five audit fixes (agents/messages/stats/optimize/yield)
Patch release. After v0.9.0 shipped, a real-data audit on the maintainer's 95-session / 374-Task-call project found 5 dashboard tabs were either broken or unusably slow. This release fixes all five.
No new features. No schema changes. Just real bugs found by actually running the dashboard.
Fixed
Agents tab — was showing sessions from unrelated projects
The v013-fallback _list_team_sessions_scan synthesised "teams" from any session in any project that had a sidechain message anywhere in the store. SutroYaro had 0 sidechain messages but 374 Task tool calls — the tab returned 10 unrelated sessions from mycelium-ecosystem and ComedyCatalouge.
Replaced with a real per-project Task / Agent tool-call detector. One row per parent session, ranked by sub-agent invocation count. Indexed v013 path gains an _indexed_teams_match_project guard so a globally-populated agent_teams table doesn't suppress the fallback for projects with no v013 rows. Route accepts ?project=<slug>; AgentsTab passes the active project's slug.
/api/messages — 49 MB unbounded response
26K+ messages returned with no limit cap. Frontend MessagesTab leaned on client-side pagination, which choked.
Mandatory pagination: per_page=100 default, max 500. Returns {messages, total, page, per_page, total_pages, start_index, end_index}. Frontend MessagesTab switched to server-side pagination with Prev/Next nav. 49 MB → 262 KB (188×).
/api/stats — 4 MB payload
The bulk wasn't daily_stats (27 KB) but user_interactions.command_details (2.65 MB of full prompts) + errors.assistant_details (1.2 MB).
New params: ?days=N (default 90, 0 disables) caps daily_stats; ?include=block&include=block for selective fetch; ?details=true opts back into the full body. Default 4 MB → 67.5 KB (98.4%).
/api/optimize — 1.5 s warm, 13 s cold
cProfile pinned _detect_unused_mcp_servers at 1.3 s of the 1.5 s — re-parsing every messages.tools_json row to roll up MCP tool names.
Replaced with one indexed tool_mart lookup. Added in-process per-project mtime-keyed cache + ?force=true bypass + warnings[] field that emits mart_empty so the UI can prompt a backfill. 1548 ms → 345 ms cold, 0 ms warm.
/api/yield — 15 s timeout
cProfile showed 87% of wall-clock in subprocess.run — _classify_session shelled out 4× per kept session for git log / git show / git merge-base. _first_cwd_for_session ran one SELECT LIMIT 1 per session × 13 monthly partition fan-outs.
Replaced with one _GitWorkspace cache built per request: per-cwd git rev-parse + windowed git log + reachability set + revert-grep set. Per-session classification answered entirely from in-memory data. _first_cwd_for_sessions replaced with one ROW_NUMBER() OVER window query batched in chunks of 500. New STACKUNDERFLOW_YIELD_MAX_SESSIONS_PER_PROJECT cap (default 200; unlimited to disable). 6.62s → 1.47s aggregate; per-project timeout → 0.35s.
Notes
- 2750 → 2779 fast tests (+29 across all five fixes).
- Frontend 168 tests preserved; typecheck + build clean.
- No schema migration. Schema stays at v016/v017/v018 depending on which post-v0.9.0 work has applied.
- Audit + tracking: #104.
Install
pip install -U stackunderflow
stackunderflow start
v0.9.0 — Wave 1 of world-class roadmap (6 specs)
First wave of the world-class-coding-agent-analytics roadmap (issue #103). Six independent foundation specs that move the product from passive cost dashboard to actively-recommending coding-agent decision layer.
Added — Real-time observability tab (Spec 13, #90)
New Live dashboard tab streams what's happening right now: usage_events, tool calls, rolling-burn ticker, P50/P95/P99 tool-latency by name. SSE-backed (GET /api/live/stream) with 100ms disconnect-aware loop, snapshot endpoint (GET /api/live/stats) for cold loads, watcher-not-running banner when the tab can't get fresh data.
Added — File-risk recommender (Spec 16, #86)
stackunderflow risk file <path> [--since 30d] [--format text|json] surfaces "this file has caused N reverts in M days" before an agent edits it. Aggregates the v0.7.2 outcome heuristic into four buckets per file (total_sessions, reverted, failed, worked) plus the five most-recent failure-mode session ids. CLI + MCP (file_risk) + meta-agent (get_file_risk) + per-file risk block on the Playback FS panel. No schema migration.
Added — Burn projector v2 (Spec 17, #87)
stackunderflow plan show gains a structured projection block: Projected: (with method + daily burn), Days to limit:, Alert: when crossing a configurable threshold. Default weighted-7d (decay 0.85) once enough samples; auto-falls-back to linear when weighted collapses to $0 on a quiet 7-day tail with non-zero period total. Configurable thresholds via stackunderflow plan thresholds {show,set,reset} (default [50, 75, 90]). MCP + meta-agent get_burn_projection. UI: PlanBudgetCard shows the forecast strip + amber/red alert banner.
Added — Mode recommender heuristic v1 (Spec 18, #88)
stackunderflow recommend mode --prompt "<task>" [--current-model X] returns the cheapest model that historically solved similar tasks for this user. Pattern-match: same intent (build/fix/refactor/test/explore) + same token-band (tiny<200/small<800/med<3000/large) + non-empty language overlap. Confidence in [0, 1] over sample-size × spread × cost-gap. Returns confidence=0.0 with a clean "no historical data" message when there's not enough signal. Schema migration v016 = mode_recommendations 24h pull-through cache. CLI + MCP + meta-agent recommend_mode. The full benchmark engine (Spec 26) is the v2.
Added — Skill recommender (Spec 19, #89)
stackunderflow recommend skills [--project SLUG] [--threshold N] [--window-days D] mines the local store for repeated workflow patterns the user could install as auto-generated Claude Code skills — "you ran pytest tests/ -q 7 times across 7 sessions, want a skill?". Reuses services.skill_synth.synthesize_skills (no second copy of the detectors). Already-installed filter walks <project>/.claude/skills/auto-*/ and ~/.claude/skills/, drops anything the user already has installed. CLI + MCP + meta-agent recommend_skills. Each row carries an accept_command the user pastes to install — surface is read-only, never auto-writes. Cache at ~/.stackunderflow/cache/skill_recommendations.json (6h TTL).
Added — Open session-schema spec (Spec 12, #91)
docs/specs/session-schema-v1.md and docs/specs/adapter-contract.md publish the on-disk SQL schema + the SourceAdapter Protocol as a versioned spec — pinned to schema_version = 16. Tools that want to read from / write to the StackUnderflow store can now do so without reverse-engineering. Conformance test in tests/stackunderflow/store/test_schema_v1_spec.py enforces the doc/schema bind.
Notes
- Schema versions used: v016 (mode-recommender cache). v015 stays unused — Spec 16 verified
session_mart.outcomeexists at query time and skipped its migration; the slot is reserved. - Cross-spec wiring: meta-agent
TOOL_CATALOGgains 4 new tools (get_file_risk,get_burn_projection,recommend_mode,recommend_skills). Every wave-1 surface is reachable from the local Ollama sidebar. - Tests: 2488 → 2682 fast (+194). Frontend 152 → 168 (+16). Ruff baseline preserved (38). Typecheck + build clean.
Install
pip install -U stackunderflow
# Optional: enable semantic search
pip install -U 'stackunderflow[embeddings]'
# Recommended: pull a tools-capable Ollama model for the meta-agent
ollama pull qwen2.5-coder:7b
stackunderflow start
What's next — Wave 2
Issues #92 (PR/CI webhook ingest) and #93 (per-session static analysis pass). Two L-sized agents in parallel; ~2.5h wall-clock. These build the outcome-attribution rails that Wave 3 (issues #94, #95) and Wave 5 (the comparative-benchmark killer feature, #99) need. See the roadmap issue for the full plan.
v0.8.0 — Meta-agent sidebar + CLI cost-report fix + --ingest flag
Minor bump for three substantive deliverables. The CLI cost-report fix in particular is a high-priority correctness fix every user should pick up.
Fixed — CLI report cost commands now read from usage_events (HIGH PRIORITY)
stackunderflow status / today / month / report have been silently under-counting cost by ~6× since v0.7.0. The aggregator was recomputing cost from stale messages.input_tokens / output_tokens + messages.model via compute_cost(...), which missed every pricing fix since v0.7.2 (claude-opus-4-7, glm-5, composer-1, droid-auto, etc.) and dropped the Opus priority-tier 6× multiplier on rows whose model alias didn't round-trip through the live pricer.
Before / after on a real store:
stackunderflow statusreportedmonth: $502.53 (21575 msg)for May 2026.- True
SUM(usage_events.cost_usd)for the same window: $3072.74 (12972 events). - After this fix the CLI reports match the dashboard match the database. To within $0.01.
build_report now reads the stored usage_events.cost_usd value — the normalised, attributed cost written once on ingest. Function signature + return shape are byte-identical, so today, month, status, report, and plan show all pick up the fix transparently. Pre-backfill stores keep working via an empty-usage_events fallback to the legacy messages-based path.
Added — meta-agent sidebar with backend tool-calling (Ask StackUnderflow)
The chat surface is promoted from overlay drawer to permanent right-docked column, and the chat now drives a tool-calling loop against a local Ollama model that can read the same SQLite store the dashboard reads. The sidebar becomes a meta-agent that answers grounded questions about your own sessions, projects, costs, and file activity.
New route: POST /api/meta-agent/chat streams application/x-ndjson events (token / tool_call / tool_result / error / done). On each turn the route calls http://localhost:11434/api/chat with the tool catalogue; if the model emits a tool_calls array the route executes each one against the local store and re-calls Ollama. Hard cap at 5 hops keeps runaway loops bounded.
Tool catalogue (all read-only — the LLM cannot mutate state):
search_past_decisions(query, limit?, project?, since?)— substring search across transcriptsfind_sessions_in_path(path, since?, limit?)— sessions whose project root ispathor an ancestorfind_sessions_touching_file(file, mode?, limit?)— sessions where a file shows up in tool args / contentget_project_summary(slug?)— flat rollup for one project (sessions / messages / cost / first-last activity)get_cost_summary(period?)— cross-project cost rollup overtoday/7days/30days/month/allget_session_playback(session_id, at?)— files touched by a session up to a cutoff (metadata only)list_recent_sessions(project?, limit?)— most-recently-active sessions
Layout: expanded sidebar on viewports >= 1280px; icon rail at >= 768px; hidden below 768px with a header button to summon a fullscreen overlay. Collapse state persists in localStorage.stackunderflow_metaAgentSidebar.
Privacy: nothing leaves the machine. Route only opens HTTP to localhost:11434 (Ollama); tool executors only read ~/.stackunderflow/store.db. No fallback to a remote LLM. If Ollama is down the first NDJSON event is {"type": "error"} and the sidebar surfaces a banner. Recommended models: qwen2.5-coder:7b (default) and llama3.2:latest. See docs/meta-agent.md.
Added — --ingest / --auto-ingest flags on read-only CLI commands
Eight read-only commands (status, today, month, report, compare, yield, optimize, export) accept two new flags that force a fresh ingest pass before the command's query runs.
--ingest— synchronously walks every registered adapter, normalises new events, refreshes marts. Use whenstackunderflow startis not running in another terminal and you need authoritative numbers.--auto-ingest(default on) — when the store's newestusage_events.tsis older than 6 hours, the command prints[stale data — ingesting...]to stderr and runs the same pass.--no-auto-ingestdisables the staleness path entirely.
Empty stores are not considered stale — fresh installs require explicit --ingest so we don't silently walk every adapter root on the first call.
Added — docs/backup.md and docs/chat.md
Long-form docs for two features that shipped in earlier releases without long-form coverage:
docs/backup.md— thestackunderflow backupcommand tree (rsync--link-desthard-link efficiency, exclude list, restore semantics, macOS launchd auto-backup).docs/chat.md— the Ollama-backed chat sidebar (prerequisites, model dropdown UX, streaming flow, local-only privacy model).
HANDOFF marks spec 06 (backup) as closed.
Notes
- No schema migration.
- Tests: 2428 → 2488 fast (+60: 7 cost-fix + 22 meta-agent + 31 --ingest). Frontend 135 → 152 (+17 meta-agent NDJSON parser / viewport-state / tool-summary tests). Ruff baseline preserved (38).
- Recommended ops: after upgrade, run
stackunderflow statusto see the corrected cost number against your store. If the difference is non-trivial, your dashboard /--use-embeddingsqueries were already on the corrected mart path — only the CLI was wrong.
Install
pip install -U stackunderflow
# Optional: enable semantic search
pip install -U 'stackunderflow[embeddings]'
# Recommended: pull a tools-capable Ollama model for the meta-agent
ollama pull qwen2.5-coder:7b
stackunderflow start
v0.7.4 — CI cleanup + drop beta on yield/qa/tags + API docs for playback /fs
Patch release closing the CI fallout from v0.7.3 and dropping the remaining beta flags on the dashboard.
Changed — CI: Windows in build, Ubuntu-only in test
The Windows test job in test.yml was added in v0.7.3 (HANDOFF #4). The first real run surfaced ~40 POSIX-shaped test fixtures (hard-coded /Users/... literals, Path.resolve() drive-prefixing, paths comparing across \ vs /) — the production code paths fixed in this same release (_is_ancestor separator normalisation, set_home_env helper) are correct, but the rest of the test suite needs a Windows-friendly port.
.github/workflows/test.ymlmatrix reverts toubuntu-latestonly (still 3.11 + 3.12)..github/workflows/build.ymlkeeps the full[ubuntu-latest, macos-latest, windows-latest]×[3.11, 3.12]matrix — it exercises the wheel install +stackunderflow --version/--help/cfg lssmoke test, which catches the real cross-platform import / packaging surface. PowerShell wheel-install fixed viaGet-ChildItemsodist/*.whlworks without POSIX glob expansion.
Runtime on Windows is unaffected — discovery, hook, lock, and watcher entry points all work. The blocker is test-fixture portability.
Fixed — Windows test suite collection + path-prefix matching
Three issues, all in tests / pure helpers — no schema or production-route change.
- HOME → USERPROFILE shim. New
tests/conftest.pyexposesset_home_env(monkeypatch, home)which sets HOME + USERPROFILE + HOMEDRIVE + HOMEPATH together soPath.home()redirects on both POSIX and Windows. Eight call sites switched over frommonkeypatch.setenv("HOME", ...)which was inert on Windows. _is_ancestorseparator normalisation instackunderflow/services/discovery.py. Compared paths with a hard-coded/. A query path like/Users/yad/dev/foo/srcresolves throughPath.resolve()on Windows toC:\Users\yad\dev\foo\src— the prefix check then never matched. The helper now normalises\\→/on both sides before comparing. macOS / Linux unchanged.- Embedding test collection without numpy. The three embedding test files imported
numpy as npat module top. numpy ships transitively with the[embeddings]extra. Fixed bynp = pytest.importorskip("numpy")ahead of the rest of the imports.
Changed — yield / qa / tags tabs drop the beta: true flag
Three tab entries in stackunderflow-ui/src/pages/ProjectDashboard.tsx lose beta: true. These features have been in stackunderflow since v0.6.x — ~2 weeks of real-store use — and the beta pill no longer signals anything an observer can act on. All three components already have their own EmptyState for the no-data case.
Added — API reference coverage for Playback + Agent-Teams
docs/api-reference.md gains the Playback section: GET /api/playback/{session_id} (event stream) + GET /api/playback/project/{slug} (cross-session timeline) + the v0.7.3 GET /api/playback/{session_id}/fs?at=<iso> (virtual-filesystem reconstruction) with full request/response shape + reconstruction semantics. Plus the three GET /api/agent-teams/... endpoints landed in v0.7.0's v013 round.
Notes
- No schema change.
- Tests: 2428 fast (unchanged from v0.7.3 baseline). Ruff: 37. Frontend 135 tests, typecheck + build clean.
Install
pip install -U stackunderflow
# Optional: enable semantic search
pip install -U 'stackunderflow[embeddings]'
stackunderflow start
v0.7.3 — Playback v2 + opt-in semantic search + Windows CI + drop beta flag
Closes the remaining HANDOFF follow-ups left open after v0.7.2. One additive schema migration (v014) ships with the embeddings feature; existing stores apply it on next stackunderflow start.
Added — Playback v2 (HANDOFF #8)
Virtual-filesystem reconstruction at a point in time. The Playback tab now answers "what did every touched file look like at timestamp T?" by replaying the session's tool-call history.
Backend
- New route
GET /api/playback/{session_id}/fs?at=<iso>&paths=<csv>&include_content=true|false. Walks the session's messages in order, replays Read / Write / Edit / MultiEdit / NotebookEdit calls up toat, returns{files: {path: {content?, byte_count, last_modified_ts, operations_applied, reconstruction_complete}}, warnings}. - Reconstruction edge cases handled:
cat -nline-number stripping in Read results; Edit without prior Read flagged as partial; Editold_stringmiss → warning + state preserved; per-sub-edit MultiEdit handling; Write resets content; NotebookEdit accumulates{cell_id: source};replace_allhonoured. - Performance: ~150ms median on a real-store session (6900 messages, 455 touched files).
include_content=falsereturns metadata only for "which files changed" queries.
Frontend
- New side panel
PlaybackFsPanel.tsxon the Playback tab. File tree grouped by directory (root first, then alpha-sorted); monospace content viewer with line numbers; warnings banner above the body; warning icon on file rows wherereconstruction_complete: false. - Scrubber integration — 250ms debounce inside the panel's effect throttles fetches as the user scrubs through events. In-flight request key (
sessionId|at) drops stale responses so a fastj/kmash can't overwrite the newest snapshot with an older one. - Bandwidth optimisation — scrub fetches use
include_content=false; selecting a file triggers a targetedpaths=[that_file]&include_content=truefetch. - No new npm deps — reuses
@tabler/icons-react+ Tailwind + stock React.
Added — Discovery semantic search (HANDOFF #10)
Opt-in search-past-decisions --use-embeddings (and the matching MCP arg).
- Optional extra —
pip install stackunderflow[embeddings]pulls in sentence-transformers (+ torch + numpy transitives, ~500MB total). Users who never flag the mode pay zero. Without the extra,--use-embeddingsexits cleanly with the install hint. - Default model —
sentence-transformers/all-MiniLM-L6-v2(384 dims, 90MB). Override viaSTACKUNDERFLOW_EMBED_MODELenv or--embed-modelflag. - New v014 migration —
discovery_embeddingspull-through cache table keyed on(session_id, message_id, model_name). Rawnumpy.float32byte buffers;embedding_dimstored separately for corrupt-blob detection at read time. - Cosine re-rank —
encode(..., normalize_embeddings=True)makes cosine reduce to a dot product. Mapped[-1, 1] → [0, 1]to plug into the existingpack_within_budgetrank fn. When--use-embeddingsis on, the LIKE-density relevance term is replaced by the cosine score; recency + cost weights unchanged. - Substring still runs first —
--use-embeddingsonly re-ranks the candidate set, never widens it. - MCP
search_past_decisionsgains matchinguse_embeddings: bool = False+embed_model: str | None = Noneargs.
Added — Windows runner in the CI matrix (HANDOFF #4)
Both .github/workflows/test.yml and .github/workflows/build.yml gained windows-latest alongside Ubuntu and macOS at Python 3.11 + 3.12. fail-fast: false on each matrix. The msvcrt.locking() branch of stackunderflow/etl/lock.py and the Windows path of the watchfiles-backed watcher execute on every push for the first time.
Cross-platform test-marker housekeeping: 10 tests/stackunderflow/adapters/test_<provider>_defensive.py chmod-000 cases + 1 tests/stackunderflow/cli/test_export.py symlink case skip cleanly on Windows where the underlying OS primitive is a no-op (chmod) or requires elevation (symlink).
Known-gap documented in this changelog: the in-process msvcrt.locking(LK_NBLCK) second-acquire-returns-None contract is what CI actually validates here for the first time.
Changed — Playback + Agents tabs drop the beta flag (HANDOFF #11)
Both playback and agents tab entries in stackunderflow-ui/src/pages/ProjectDashboard.tsx drop beta: true. With v013 applied across the real store and the routes returning populated bodies, the beta pill no longer signalled anything. Empty-state UX is unchanged — each tab carries its own EmptyState component for the no-data case.
Notes
- Schema migration: v014 (
discovery_embeddings). Additive,IF NOT EXISTS-guarded. Applies on nextstackunderflow startagainst existing stores; no data migration required (the table starts empty and pull-through-fills on first--use-embeddingsquery). - Tests: 2355 → 2428 (+73). Ruff baseline preserved (37 errors). Frontend tests 110 → 135 (+25). Frontend typecheck + build clean.
Install
pip install -U stackunderflow
# Optional: enable semantic search
pip install -U 'stackunderflow[embeddings]'
stackunderflow start
v0.7.2 — pricing coverage round 2 + outcome confidence + optimize parity lock-in
A focused follow-up release on v0.7.1 that closes most of the post-v0.7.1 HANDOFF backlog: pricing coverage goes to 100% on a real store, outcome-aware discovery gains a confidence ladder, and three more follow-up items get structural lock-in tests.
Fixed — pricing coverage gaps (cost_source='unknown' → 0%)
After applying v007–v013 to the maintainer's real ~/.stackunderflow/store.db and running a force re-derive, 21% of usage_events were stamped cost_source='unknown'. This release closes the gap — the same data on the same store post-release shows 0 unknown events.
claude-opus-4-7— the dominant gap (97% of the residual). Adds the model to the Anthropic pricer with the published rate ($5 input / $25 output / $6.25 5m-cache-write / $0.50 cache-read per MTok). The token-set heuristic checks the 4-7 combination before falling into the legacy Opus 4 ($15/$75) family. 1M context window is at the same per-token rate.- Gemini 3 preview ids —
gemini-3-pro-preview/gemini-3.1-pro-preview($2/$12, ≤200K tier),gemini-3-flash-preview($0.30/$2.50). Forward-lookinggemini-3.0-pro/gemini-3.1-proplaceholders now match the preview ids so a preview-to-GA swap is a no-op on cost. composer-1— corrected from the v0.7.1 Sonnet-tier estimate ($3/$15) to Cursor's published $1.25/$10.droid-auto/cline-auto— both adapter-default placeholders now route to Anthropic's Sonnet 4.5 rate ($3/$15) instead of returningNone → $0.glm-5/glm-5.1— ZhipuAI's GLM family proxied through a Claude-shape API ($1.00/$3.20 and $1.40/$4.40 respectively).
Latent overcounting bug fixed: v0.7.1's _PROVIDER_TO_PRICER was routing every beta provider that wasn't explicitly listed (qwen, gemini, copilot, codeium, droid, kiro, openclaw, pi, omp, continue, opencode, cursor-agent) to the Anthropic pricer, producing 3–4× over-estimates. Each beta provider now routes to its own pricer.
Sources cited per rate in code comments. Regression tests lock in cost_source='rate_card' at the normalizer level for every new id.
Changed — outcome-aware discovery confidence ladder (HANDOFF #9)
The transcript-fallback heuristic behind find-sessions-where-action-worked / find-failure-modes-for-file was too permissive — any session that simply ended without an explicit complaint was confidently surfaced as worked. v0.7.2 adds a confidence score to every OutcomeMatch plus a default filter.
OutcomeMatch.outcome_confidence: floatin[0.0, 1.0]. Additive — theoutcomestring contract is unchanged.1.0—captured_eventsdeterministic (future hook integration)0.8— explicit in-vocabulary user phrase within the lookahead window0.5— agent revert tool call (git revert/git reset --hard/git checkout --/git restore) on the same file0.3— silence-as-worked (the old too-permissive case; still emitted but filtered by default)0.0— anchor was already the last recorded turn
- Default
min_confidence=0.5filter on both service functions. Rows below threshold stay in the database but are filtered at the surface. - CLI
--min-confidence+--verbose/-vflags; MCPmin_confidencearg (None → service default; out-of-range values clamp). - Expanded signal vocabulary (47 → 67 phrases): negative additions (
broke/broken/failing/mistake/error/regression/❌/👎), positive (tests pass(ing)/fixed/solved/correct/+1/👍/🎉/✅/✓), revert (try again/try a different). Keyword regex now drops\bboundary on tokens that start/end with non-word characters so emoji and+1match. - 8 new adversarial tests lock the per-row confidence at each ladder step.
Changed — optimize detectors lock in message_tool_mart migration (HANDOFF #2)
The v011 migration that landed in unreleased commits ahead of v0.7.0 already put the four detectors (junk_reads, bash_output_limits, low_read_edit_ratio, ghost_agents) on message_tool_mart fast paths. v0.7.2 adds parity tests proving the mart and raw-scan paths produce byte-equivalent findings, plus a bench_optimize_mart.py perf benchmark.
On a real-store-shape DB (78,763 mart rows, 247,278 messages across 14 monthly partitions): junk_reads 270×, bash_output_limits 1242×, low_read_edit 490×, ghost_agents 822×. The mart path is also more precise for bash_output_limits — the raw scan sized the whole user-message content as the tool result (over-counted multi-tool turns); the mart's byte_count is the matched-by-tool_use_id size of the actual tool_result block.
Verified — /api/cost-data command_costs stays aggregator-driven (HANDOFF #5)
HANDOFF #5 asked whether command_costs could move to command_mart (mirroring Wave 5's tool_costs migration). The shape mismatch the HANDOFF flagged is structural, not stale: the aggregator emits per-Interaction rows (interaction_id, session_id, prompt_preview, timestamp, tools_used, steps, models_used, had_error, cost, tokens); the command_mart grain (day, project_id, command_name) discards those fields on ingest. Locked the aggregator path in place with three new tests; a future per-Interaction-grain mart could revisit this.
Notes
- Tests: 2310 → 2355 (+45). Ruff baseline preserved (37 errors). Frontend typecheck + build clean.
- No schema migration in this release.
- Maintainer's real
~/.stackunderflow/store.dbis atuser_version=13with all 8 marts populated andcost_source=unknownat 0 after a post-release backfill.
Install
pip install -U stackunderflow
stackunderflow start
v0.7.1 — init --install-skills + beta pricing coverage
A small follow-up release on top of v0.7.0 that closes two of the open HANDOFF follow-ups.
Added
stackunderflow init --install-skills
Automates the manual cp -r stackunderflow/skills/* ~/.claude/skills/ install path for the three static SKILL.md files (check-prior-work, find-related-sessions, recall-past-decisions).
--install-skills— copies the three SKILL.md files into~/.claude/skills/<name>/SKILL.md. Idempotent: byte-identical destinations are skipped silently; a destination that differs is preserved with a warning unless--skills-forceis set.--skills-dest <path>— overrides the destination directory (defaults to~/.claude/skills/).--skills-force— overwrite a destination SKILL.md that differs from the shipped copy.
Source path resolved via importlib.resources so it works in both source-checkout and installed-wheel layouts.
Beta-normalizer pricing coverage
RATE_CARD now carries 17 new entries spanning the qwen, gemini, and claude-3-5-sonnet alias surface so cost_source=unknown drops for the default beta-fixture models. Models added:
- Qwen (8):
qwen-max,qwen-max-longcontext,qwen-plus,qwen-coder,qwen-coder-plus,qwen3-coder,qwen-auto,qwen-turbo - Gemini (8):
gemini-2.5-pro,gemini-3.0-pro,gemini-3.1-pro,gemini-auto,gemini-2.5-flash,gemini-2.5-flash-lite,gemini-1.5-pro,gemini-1.5-flash - Anthropic (1):
claude-3-5-sonnet(un-dated alias for the 2024-10-22 release)
All rates are inherited from the per-provider pricer modules — the migration was a coverage gap, not a rate-discovery exercise. gemini-3.0-pro and gemini-3.1-pro are forward-looking placeholders pegged to gemini-2.5-pro until Google publishes definitive rates.
Fixed
A latent overcounting bug shipped at the same time as the pricing coverage. _PROVIDER_TO_PRICER in stackunderflow/etl/normalize/base.py was routing every non-listed provider (qwen, gemini, copilot, codeium, droid, kiro, openclaw, pi, omp, continue, opencode, cursor-agent) to the Anthropic pricer. That produced 3-4x-too-high cost numbers for those providers — the Anthropic Sonnet rate ($3/$15 per M) was being applied to qwen requests instead of the actual qwen rate (~$1.20/$3.60 per M). Each beta provider now routes to its own pricer.
Notes
- No schema migration in this release. All v007–v013 schema work landed in the unreleased commits between v0.7.0 and this release; those migrations stay additive and off-by-default on existing stores.
- Tests: 2272 → 2310 (+38). Ruff baseline preserved (37 errors). Frontend typecheck + build clean.
Install
pip install -U stackunderflow
stackunderflow init --install-skills
stackunderflow start
v0.7.0 — ETL pipeline + handoff
v0.7.0 — proper ETL pipeline.
The dashboard's per-request aggregator passes are gone. Every cost / dashboard / compare / yield / optimize / messages-summary route now reads from indexed materialized marts; a filesystem watcher syncs marts within ~400 ms of any source-file write. End-to-end on a 247K-message store: dashboard cold-load went from 2.5 s → <50 ms warm.
Architecture: three layers, one watcher.
messages → usage_events → 5 marts (daily/session/project/provider_day/model_day)
↑
filesystem watcher (200 ms debounce)
See docs/specs/etl-architecture.md for the design contract and docs/HANDOFF.md for the state-of-the-codebase walkthrough.
Numbers (real maintainer store)
- Backfilled 150,337 events in 226 s (idempotent re-runs ~29 s)
- Marts: daily 940, session 841, project 151, provider_day 146, model_day 184
- Watermarks all aligned to 150,337
- Dashboard route latencies (warm, dev box): cost-by-provider 1 ms · compare 2 ms · yield 1 ms · projects 6 ms · dashboard-data 7 ms · cost-data 12 ms · optimize 100 ms
What's new
- 11 PRs across 4 waves shipped: foundation (#72), 4 default normalizers (#73), 5 mart builders (#74), filesystem watcher (#75), hot-path route migration (#76), 12 beta normalizers (#79), analytical-route migration (#81), real backfill + writer hook (#80),
/api/etl/status+ CLI (#78), UI status badge + backfill button (#77), real-data e2e + perf regression (#82) - 16/16 codeburn-catalog providers now have
Normalizersubclasses - New CLI:
stackunderflow etl status,stackunderflow etl backfill [--force],stackunderflow start --no-watcher - New API:
GET /api/etl/status - New UI: status badge in dashboard header, "Backfill now" on /settings
Tests
1598 passing, 2 skipped, 11 deselected (slow suite — pytest -m slow)
Migration
Schema bumps 5 → 6 via v006_etl_layer.sql (additive — existing tables and routes keep working). First-run cost: a one-time backfill (~3 minutes per 100K messages) populates the marts; subsequent runs are watcher-driven and incremental.
Handoff
docs/HANDOFF.md walks an incoming agent through the architecture, recent history, gotchas, real-data state, and what's left.
v0.6.1
Polish + correctness pass on v0.6.0. See CHANGELOG.md.
Fixed
- Cursor sessions: $0 → real cost (composer-* + cursor-auto now estimated at Sonnet rates; vendor-prefixed models delegate to upstream pricer). Cursor total in /api/compare jumped from $0.0000 → $14.4367 on real data.
<synthetic>model sentinel no longer leaks into compare/cost reports. v004 migration cleans 221 existing polluted rows.- Currency conversion no longer silently degrades to USD-with-foreign-symbol when Frankfurter is unreachable. Embedded rate snapshot + UI warning banner.
Changed
- Cursor adapter derives per-workspace project_slug from bubble file paths (was collapsing 9 conversations into 1 "cursor" project; now 7 distinct workspaces).
- v005 migration reparents existing legacy cursor sessions.
Tests
- +96 defensive empty-source / malformed-data tests covering 10 beta adapters.
- 1328 passing total (was 1187 at v0.6.0).
v0.6.0
v0.6.0 — full multi-provider pipeline + cost correctness + UI surfaces
See CHANGELOG.md for the complete entry list.
Highlights
- Multi-currency, CSV/JSON export, model aliases (#33–#35)
- 7 new optimize patterns + plan budgets + compare mode + yield analysis + context-budget (#37, #39–#42)
- Fast-mode (Claude Opus priority tier) detection through the entire pipeline including SQLite (#44, #48)
- Streaming JSONL reader (185MB → <1KB peak) + Cursor parse cache (3-8× speedup) (#38, #43)
- Cursor + Cline adapters default-on; 12 others remain beta-opt-in (#49)
- Public Python API (
list_projects/process/list_sessions) reads the store, provider-tagged (#50) - MCP server multi-provider —
session_query+ newlist_sessions+list_projectstools (#51) - Cursor v3 vscdb fix — was ingesting 0 sessions on real Cursor data (#52)
- Dashboard surfaces for compare, yield, plan, optimize, context-budget (#47)
- Codeburn attribution scrub from shipped code (#36)
Pipeline verified end-to-end
After this release a clean ingest produces:
- 188 projects, 228K messages, 1106 sessions, 7 providers (claude, codex, cursor, cline, gemini, droid, qwen) in the local store
- Public Python API + MCP both surface the same data
Tests
1187 passing, 2 skipped (was 882 at v0.5.0 — +305 new tests)