Skip to content

Releases: 0bserver07/StackUnderflow

v0.9.1 — five audit fixes (agents/messages/stats/optimize/yield)

16 May 04:46

Choose a tag to compare

Patch release. After v0.9.0 shipped, a real-data audit on the maintainer's 95-session / 374-Task-call project found 5 dashboard tabs were either broken or unusably slow. This release fixes all five.

No new features. No schema changes. Just real bugs found by actually running the dashboard.

Fixed

Agents tab — was showing sessions from unrelated projects

The v013-fallback _list_team_sessions_scan synthesised "teams" from any session in any project that had a sidechain message anywhere in the store. SutroYaro had 0 sidechain messages but 374 Task tool calls — the tab returned 10 unrelated sessions from mycelium-ecosystem and ComedyCatalouge.

Replaced with a real per-project Task / Agent tool-call detector. One row per parent session, ranked by sub-agent invocation count. Indexed v013 path gains an _indexed_teams_match_project guard so a globally-populated agent_teams table doesn't suppress the fallback for projects with no v013 rows. Route accepts ?project=<slug>; AgentsTab passes the active project's slug.

/api/messages — 49 MB unbounded response

26K+ messages returned with no limit cap. Frontend MessagesTab leaned on client-side pagination, which choked.

Mandatory pagination: per_page=100 default, max 500. Returns {messages, total, page, per_page, total_pages, start_index, end_index}. Frontend MessagesTab switched to server-side pagination with Prev/Next nav. 49 MB → 262 KB (188×).

/api/stats — 4 MB payload

The bulk wasn't daily_stats (27 KB) but user_interactions.command_details (2.65 MB of full prompts) + errors.assistant_details (1.2 MB).

New params: ?days=N (default 90, 0 disables) caps daily_stats; ?include=block&include=block for selective fetch; ?details=true opts back into the full body. Default 4 MB → 67.5 KB (98.4%).

/api/optimize — 1.5 s warm, 13 s cold

cProfile pinned _detect_unused_mcp_servers at 1.3 s of the 1.5 s — re-parsing every messages.tools_json row to roll up MCP tool names.

Replaced with one indexed tool_mart lookup. Added in-process per-project mtime-keyed cache + ?force=true bypass + warnings[] field that emits mart_empty so the UI can prompt a backfill. 1548 ms → 345 ms cold, 0 ms warm.

/api/yield — 15 s timeout

cProfile showed 87% of wall-clock in subprocess.run_classify_session shelled out 4× per kept session for git log / git show / git merge-base. _first_cwd_for_session ran one SELECT LIMIT 1 per session × 13 monthly partition fan-outs.

Replaced with one _GitWorkspace cache built per request: per-cwd git rev-parse + windowed git log + reachability set + revert-grep set. Per-session classification answered entirely from in-memory data. _first_cwd_for_sessions replaced with one ROW_NUMBER() OVER window query batched in chunks of 500. New STACKUNDERFLOW_YIELD_MAX_SESSIONS_PER_PROJECT cap (default 200; unlimited to disable). 6.62s → 1.47s aggregate; per-project timeout → 0.35s.

Notes

  • 2750 → 2779 fast tests (+29 across all five fixes).
  • Frontend 168 tests preserved; typecheck + build clean.
  • No schema migration. Schema stays at v016/v017/v018 depending on which post-v0.9.0 work has applied.
  • Audit + tracking: #104.

Install

pip install -U stackunderflow
stackunderflow start

v0.9.0 — Wave 1 of world-class roadmap (6 specs)

16 May 02:00

Choose a tag to compare

First wave of the world-class-coding-agent-analytics roadmap (issue #103). Six independent foundation specs that move the product from passive cost dashboard to actively-recommending coding-agent decision layer.

Added — Real-time observability tab (Spec 13, #90)

New Live dashboard tab streams what's happening right now: usage_events, tool calls, rolling-burn ticker, P50/P95/P99 tool-latency by name. SSE-backed (GET /api/live/stream) with 100ms disconnect-aware loop, snapshot endpoint (GET /api/live/stats) for cold loads, watcher-not-running banner when the tab can't get fresh data.

Added — File-risk recommender (Spec 16, #86)

stackunderflow risk file <path> [--since 30d] [--format text|json] surfaces "this file has caused N reverts in M days" before an agent edits it. Aggregates the v0.7.2 outcome heuristic into four buckets per file (total_sessions, reverted, failed, worked) plus the five most-recent failure-mode session ids. CLI + MCP (file_risk) + meta-agent (get_file_risk) + per-file risk block on the Playback FS panel. No schema migration.

Added — Burn projector v2 (Spec 17, #87)

stackunderflow plan show gains a structured projection block: Projected: (with method + daily burn), Days to limit:, Alert: when crossing a configurable threshold. Default weighted-7d (decay 0.85) once enough samples; auto-falls-back to linear when weighted collapses to $0 on a quiet 7-day tail with non-zero period total. Configurable thresholds via stackunderflow plan thresholds {show,set,reset} (default [50, 75, 90]). MCP + meta-agent get_burn_projection. UI: PlanBudgetCard shows the forecast strip + amber/red alert banner.

Added — Mode recommender heuristic v1 (Spec 18, #88)

stackunderflow recommend mode --prompt "<task>" [--current-model X] returns the cheapest model that historically solved similar tasks for this user. Pattern-match: same intent (build/fix/refactor/test/explore) + same token-band (tiny<200/small<800/med<3000/large) + non-empty language overlap. Confidence in [0, 1] over sample-size × spread × cost-gap. Returns confidence=0.0 with a clean "no historical data" message when there's not enough signal. Schema migration v016 = mode_recommendations 24h pull-through cache. CLI + MCP + meta-agent recommend_mode. The full benchmark engine (Spec 26) is the v2.

Added — Skill recommender (Spec 19, #89)

stackunderflow recommend skills [--project SLUG] [--threshold N] [--window-days D] mines the local store for repeated workflow patterns the user could install as auto-generated Claude Code skills — "you ran pytest tests/ -q 7 times across 7 sessions, want a skill?". Reuses services.skill_synth.synthesize_skills (no second copy of the detectors). Already-installed filter walks <project>/.claude/skills/auto-*/ and ~/.claude/skills/, drops anything the user already has installed. CLI + MCP + meta-agent recommend_skills. Each row carries an accept_command the user pastes to install — surface is read-only, never auto-writes. Cache at ~/.stackunderflow/cache/skill_recommendations.json (6h TTL).

Added — Open session-schema spec (Spec 12, #91)

docs/specs/session-schema-v1.md and docs/specs/adapter-contract.md publish the on-disk SQL schema + the SourceAdapter Protocol as a versioned spec — pinned to schema_version = 16. Tools that want to read from / write to the StackUnderflow store can now do so without reverse-engineering. Conformance test in tests/stackunderflow/store/test_schema_v1_spec.py enforces the doc/schema bind.

Notes

  • Schema versions used: v016 (mode-recommender cache). v015 stays unused — Spec 16 verified session_mart.outcome exists at query time and skipped its migration; the slot is reserved.
  • Cross-spec wiring: meta-agent TOOL_CATALOG gains 4 new tools (get_file_risk, get_burn_projection, recommend_mode, recommend_skills). Every wave-1 surface is reachable from the local Ollama sidebar.
  • Tests: 2488 → 2682 fast (+194). Frontend 152 → 168 (+16). Ruff baseline preserved (38). Typecheck + build clean.

Install

pip install -U stackunderflow

# Optional: enable semantic search
pip install -U 'stackunderflow[embeddings]'

# Recommended: pull a tools-capable Ollama model for the meta-agent
ollama pull qwen2.5-coder:7b

stackunderflow start

What's next — Wave 2

Issues #92 (PR/CI webhook ingest) and #93 (per-session static analysis pass). Two L-sized agents in parallel; ~2.5h wall-clock. These build the outcome-attribution rails that Wave 3 (issues #94, #95) and Wave 5 (the comparative-benchmark killer feature, #99) need. See the roadmap issue for the full plan.

v0.8.0 — Meta-agent sidebar + CLI cost-report fix + --ingest flag

15 May 13:55

Choose a tag to compare

Minor bump for three substantive deliverables. The CLI cost-report fix in particular is a high-priority correctness fix every user should pick up.

Fixed — CLI report cost commands now read from usage_events (HIGH PRIORITY)

stackunderflow status / today / month / report have been silently under-counting cost by ~6× since v0.7.0. The aggregator was recomputing cost from stale messages.input_tokens / output_tokens + messages.model via compute_cost(...), which missed every pricing fix since v0.7.2 (claude-opus-4-7, glm-5, composer-1, droid-auto, etc.) and dropped the Opus priority-tier 6× multiplier on rows whose model alias didn't round-trip through the live pricer.

Before / after on a real store:

  • stackunderflow status reported month: $502.53 (21575 msg) for May 2026.
  • True SUM(usage_events.cost_usd) for the same window: $3072.74 (12972 events).
  • After this fix the CLI reports match the dashboard match the database. To within $0.01.

build_report now reads the stored usage_events.cost_usd value — the normalised, attributed cost written once on ingest. Function signature + return shape are byte-identical, so today, month, status, report, and plan show all pick up the fix transparently. Pre-backfill stores keep working via an empty-usage_events fallback to the legacy messages-based path.

Added — meta-agent sidebar with backend tool-calling (Ask StackUnderflow)

The chat surface is promoted from overlay drawer to permanent right-docked column, and the chat now drives a tool-calling loop against a local Ollama model that can read the same SQLite store the dashboard reads. The sidebar becomes a meta-agent that answers grounded questions about your own sessions, projects, costs, and file activity.

New route: POST /api/meta-agent/chat streams application/x-ndjson events (token / tool_call / tool_result / error / done). On each turn the route calls http://localhost:11434/api/chat with the tool catalogue; if the model emits a tool_calls array the route executes each one against the local store and re-calls Ollama. Hard cap at 5 hops keeps runaway loops bounded.

Tool catalogue (all read-only — the LLM cannot mutate state):

  • search_past_decisions(query, limit?, project?, since?) — substring search across transcripts
  • find_sessions_in_path(path, since?, limit?) — sessions whose project root is path or an ancestor
  • find_sessions_touching_file(file, mode?, limit?) — sessions where a file shows up in tool args / content
  • get_project_summary(slug?) — flat rollup for one project (sessions / messages / cost / first-last activity)
  • get_cost_summary(period?) — cross-project cost rollup over today / 7days / 30days / month / all
  • get_session_playback(session_id, at?) — files touched by a session up to a cutoff (metadata only)
  • list_recent_sessions(project?, limit?) — most-recently-active sessions

Layout: expanded sidebar on viewports >= 1280px; icon rail at >= 768px; hidden below 768px with a header button to summon a fullscreen overlay. Collapse state persists in localStorage.stackunderflow_metaAgentSidebar.

Privacy: nothing leaves the machine. Route only opens HTTP to localhost:11434 (Ollama); tool executors only read ~/.stackunderflow/store.db. No fallback to a remote LLM. If Ollama is down the first NDJSON event is {"type": "error"} and the sidebar surfaces a banner. Recommended models: qwen2.5-coder:7b (default) and llama3.2:latest. See docs/meta-agent.md.

Added — --ingest / --auto-ingest flags on read-only CLI commands

Eight read-only commands (status, today, month, report, compare, yield, optimize, export) accept two new flags that force a fresh ingest pass before the command's query runs.

  • --ingest — synchronously walks every registered adapter, normalises new events, refreshes marts. Use when stackunderflow start is not running in another terminal and you need authoritative numbers.
  • --auto-ingest (default on) — when the store's newest usage_events.ts is older than 6 hours, the command prints [stale data — ingesting...] to stderr and runs the same pass. --no-auto-ingest disables the staleness path entirely.

Empty stores are not considered stale — fresh installs require explicit --ingest so we don't silently walk every adapter root on the first call.

Added — docs/backup.md and docs/chat.md

Long-form docs for two features that shipped in earlier releases without long-form coverage:

  • docs/backup.md — the stackunderflow backup command tree (rsync --link-dest hard-link efficiency, exclude list, restore semantics, macOS launchd auto-backup).
  • docs/chat.md — the Ollama-backed chat sidebar (prerequisites, model dropdown UX, streaming flow, local-only privacy model).

HANDOFF marks spec 06 (backup) as closed.

Notes

  • No schema migration.
  • Tests: 2428 → 2488 fast (+60: 7 cost-fix + 22 meta-agent + 31 --ingest). Frontend 135 → 152 (+17 meta-agent NDJSON parser / viewport-state / tool-summary tests). Ruff baseline preserved (38).
  • Recommended ops: after upgrade, run stackunderflow status to see the corrected cost number against your store. If the difference is non-trivial, your dashboard / --use-embeddings queries were already on the corrected mart path — only the CLI was wrong.

Install

pip install -U stackunderflow

# Optional: enable semantic search
pip install -U 'stackunderflow[embeddings]'

# Recommended: pull a tools-capable Ollama model for the meta-agent
ollama pull qwen2.5-coder:7b

stackunderflow start

v0.7.4 — CI cleanup + drop beta on yield/qa/tags + API docs for playback /fs

14 May 16:37

Choose a tag to compare

Patch release closing the CI fallout from v0.7.3 and dropping the remaining beta flags on the dashboard.

Changed — CI: Windows in build, Ubuntu-only in test

The Windows test job in test.yml was added in v0.7.3 (HANDOFF #4). The first real run surfaced ~40 POSIX-shaped test fixtures (hard-coded /Users/... literals, Path.resolve() drive-prefixing, paths comparing across \ vs /) — the production code paths fixed in this same release (_is_ancestor separator normalisation, set_home_env helper) are correct, but the rest of the test suite needs a Windows-friendly port.

  • .github/workflows/test.yml matrix reverts to ubuntu-latest only (still 3.11 + 3.12).
  • .github/workflows/build.yml keeps the full [ubuntu-latest, macos-latest, windows-latest] × [3.11, 3.12] matrix — it exercises the wheel install + stackunderflow --version / --help / cfg ls smoke test, which catches the real cross-platform import / packaging surface. PowerShell wheel-install fixed via Get-ChildItem so dist/*.whl works without POSIX glob expansion.

Runtime on Windows is unaffected — discovery, hook, lock, and watcher entry points all work. The blocker is test-fixture portability.

Fixed — Windows test suite collection + path-prefix matching

Three issues, all in tests / pure helpers — no schema or production-route change.

  • HOME → USERPROFILE shim. New tests/conftest.py exposes set_home_env(monkeypatch, home) which sets HOME + USERPROFILE + HOMEDRIVE + HOMEPATH together so Path.home() redirects on both POSIX and Windows. Eight call sites switched over from monkeypatch.setenv("HOME", ...) which was inert on Windows.
  • _is_ancestor separator normalisation in stackunderflow/services/discovery.py. Compared paths with a hard-coded /. A query path like /Users/yad/dev/foo/src resolves through Path.resolve() on Windows to C:\Users\yad\dev\foo\src — the prefix check then never matched. The helper now normalises \\/ on both sides before comparing. macOS / Linux unchanged.
  • Embedding test collection without numpy. The three embedding test files imported numpy as np at module top. numpy ships transitively with the [embeddings] extra. Fixed by np = pytest.importorskip("numpy") ahead of the rest of the imports.

Changed — yield / qa / tags tabs drop the beta: true flag

Three tab entries in stackunderflow-ui/src/pages/ProjectDashboard.tsx lose beta: true. These features have been in stackunderflow since v0.6.x — ~2 weeks of real-store use — and the beta pill no longer signals anything an observer can act on. All three components already have their own EmptyState for the no-data case.

Added — API reference coverage for Playback + Agent-Teams

docs/api-reference.md gains the Playback section: GET /api/playback/{session_id} (event stream) + GET /api/playback/project/{slug} (cross-session timeline) + the v0.7.3 GET /api/playback/{session_id}/fs?at=<iso> (virtual-filesystem reconstruction) with full request/response shape + reconstruction semantics. Plus the three GET /api/agent-teams/... endpoints landed in v0.7.0's v013 round.

Notes

  • No schema change.
  • Tests: 2428 fast (unchanged from v0.7.3 baseline). Ruff: 37. Frontend 135 tests, typecheck + build clean.

Install

pip install -U stackunderflow

# Optional: enable semantic search
pip install -U 'stackunderflow[embeddings]'

stackunderflow start

v0.7.3 — Playback v2 + opt-in semantic search + Windows CI + drop beta flag

14 May 11:53

Choose a tag to compare

Closes the remaining HANDOFF follow-ups left open after v0.7.2. One additive schema migration (v014) ships with the embeddings feature; existing stores apply it on next stackunderflow start.

Added — Playback v2 (HANDOFF #8)

Virtual-filesystem reconstruction at a point in time. The Playback tab now answers "what did every touched file look like at timestamp T?" by replaying the session's tool-call history.

Backend

  • New route GET /api/playback/{session_id}/fs?at=<iso>&paths=<csv>&include_content=true|false. Walks the session's messages in order, replays Read / Write / Edit / MultiEdit / NotebookEdit calls up to at, returns {files: {path: {content?, byte_count, last_modified_ts, operations_applied, reconstruction_complete}}, warnings}.
  • Reconstruction edge cases handled: cat -n line-number stripping in Read results; Edit without prior Read flagged as partial; Edit old_string miss → warning + state preserved; per-sub-edit MultiEdit handling; Write resets content; NotebookEdit accumulates {cell_id: source}; replace_all honoured.
  • Performance: ~150ms median on a real-store session (6900 messages, 455 touched files). include_content=false returns metadata only for "which files changed" queries.

Frontend

  • New side panel PlaybackFsPanel.tsx on the Playback tab. File tree grouped by directory (root first, then alpha-sorted); monospace content viewer with line numbers; warnings banner above the body; warning icon on file rows where reconstruction_complete: false.
  • Scrubber integration — 250ms debounce inside the panel's effect throttles fetches as the user scrubs through events. In-flight request key (sessionId|at) drops stale responses so a fast j/k mash can't overwrite the newest snapshot with an older one.
  • Bandwidth optimisation — scrub fetches use include_content=false; selecting a file triggers a targeted paths=[that_file]&include_content=true fetch.
  • No new npm deps — reuses @tabler/icons-react + Tailwind + stock React.

Added — Discovery semantic search (HANDOFF #10)

Opt-in search-past-decisions --use-embeddings (and the matching MCP arg).

  • Optional extrapip install stackunderflow[embeddings] pulls in sentence-transformers (+ torch + numpy transitives, ~500MB total). Users who never flag the mode pay zero. Without the extra, --use-embeddings exits cleanly with the install hint.
  • Default modelsentence-transformers/all-MiniLM-L6-v2 (384 dims, 90MB). Override via STACKUNDERFLOW_EMBED_MODEL env or --embed-model flag.
  • New v014 migrationdiscovery_embeddings pull-through cache table keyed on (session_id, message_id, model_name). Raw numpy.float32 byte buffers; embedding_dim stored separately for corrupt-blob detection at read time.
  • Cosine re-rankencode(..., normalize_embeddings=True) makes cosine reduce to a dot product. Mapped [-1, 1] → [0, 1] to plug into the existing pack_within_budget rank fn. When --use-embeddings is on, the LIKE-density relevance term is replaced by the cosine score; recency + cost weights unchanged.
  • Substring still runs first--use-embeddings only re-ranks the candidate set, never widens it.
  • MCP search_past_decisions gains matching use_embeddings: bool = False + embed_model: str | None = None args.

Added — Windows runner in the CI matrix (HANDOFF #4)

Both .github/workflows/test.yml and .github/workflows/build.yml gained windows-latest alongside Ubuntu and macOS at Python 3.11 + 3.12. fail-fast: false on each matrix. The msvcrt.locking() branch of stackunderflow/etl/lock.py and the Windows path of the watchfiles-backed watcher execute on every push for the first time.

Cross-platform test-marker housekeeping: 10 tests/stackunderflow/adapters/test_<provider>_defensive.py chmod-000 cases + 1 tests/stackunderflow/cli/test_export.py symlink case skip cleanly on Windows where the underlying OS primitive is a no-op (chmod) or requires elevation (symlink).

Known-gap documented in this changelog: the in-process msvcrt.locking(LK_NBLCK) second-acquire-returns-None contract is what CI actually validates here for the first time.

Changed — Playback + Agents tabs drop the beta flag (HANDOFF #11)

Both playback and agents tab entries in stackunderflow-ui/src/pages/ProjectDashboard.tsx drop beta: true. With v013 applied across the real store and the routes returning populated bodies, the beta pill no longer signalled anything. Empty-state UX is unchanged — each tab carries its own EmptyState component for the no-data case.

Notes

  • Schema migration: v014 (discovery_embeddings). Additive, IF NOT EXISTS-guarded. Applies on next stackunderflow start against existing stores; no data migration required (the table starts empty and pull-through-fills on first --use-embeddings query).
  • Tests: 2355 → 2428 (+73). Ruff baseline preserved (37 errors). Frontend tests 110 → 135 (+25). Frontend typecheck + build clean.

Install

pip install -U stackunderflow

# Optional: enable semantic search
pip install -U 'stackunderflow[embeddings]'

stackunderflow start

v0.7.2 — pricing coverage round 2 + outcome confidence + optimize parity lock-in

14 May 03:46

Choose a tag to compare

A focused follow-up release on v0.7.1 that closes most of the post-v0.7.1 HANDOFF backlog: pricing coverage goes to 100% on a real store, outcome-aware discovery gains a confidence ladder, and three more follow-up items get structural lock-in tests.

Fixed — pricing coverage gaps (cost_source='unknown' → 0%)

After applying v007–v013 to the maintainer's real ~/.stackunderflow/store.db and running a force re-derive, 21% of usage_events were stamped cost_source='unknown'. This release closes the gap — the same data on the same store post-release shows 0 unknown events.

  • claude-opus-4-7 — the dominant gap (97% of the residual). Adds the model to the Anthropic pricer with the published rate ($5 input / $25 output / $6.25 5m-cache-write / $0.50 cache-read per MTok). The token-set heuristic checks the 4-7 combination before falling into the legacy Opus 4 ($15/$75) family. 1M context window is at the same per-token rate.
  • Gemini 3 preview idsgemini-3-pro-preview / gemini-3.1-pro-preview ($2/$12, ≤200K tier), gemini-3-flash-preview ($0.30/$2.50). Forward-looking gemini-3.0-pro / gemini-3.1-pro placeholders now match the preview ids so a preview-to-GA swap is a no-op on cost.
  • composer-1 — corrected from the v0.7.1 Sonnet-tier estimate ($3/$15) to Cursor's published $1.25/$10.
  • droid-auto / cline-auto — both adapter-default placeholders now route to Anthropic's Sonnet 4.5 rate ($3/$15) instead of returning None → $0.
  • glm-5 / glm-5.1 — ZhipuAI's GLM family proxied through a Claude-shape API ($1.00/$3.20 and $1.40/$4.40 respectively).

Latent overcounting bug fixed: v0.7.1's _PROVIDER_TO_PRICER was routing every beta provider that wasn't explicitly listed (qwen, gemini, copilot, codeium, droid, kiro, openclaw, pi, omp, continue, opencode, cursor-agent) to the Anthropic pricer, producing 3–4× over-estimates. Each beta provider now routes to its own pricer.

Sources cited per rate in code comments. Regression tests lock in cost_source='rate_card' at the normalizer level for every new id.

Changed — outcome-aware discovery confidence ladder (HANDOFF #9)

The transcript-fallback heuristic behind find-sessions-where-action-worked / find-failure-modes-for-file was too permissive — any session that simply ended without an explicit complaint was confidently surfaced as worked. v0.7.2 adds a confidence score to every OutcomeMatch plus a default filter.

  • OutcomeMatch.outcome_confidence: float in [0.0, 1.0]. Additive — the outcome string contract is unchanged.
    • 1.0captured_events deterministic (future hook integration)
    • 0.8 — explicit in-vocabulary user phrase within the lookahead window
    • 0.5 — agent revert tool call (git revert / git reset --hard / git checkout -- / git restore) on the same file
    • 0.3 — silence-as-worked (the old too-permissive case; still emitted but filtered by default)
    • 0.0 — anchor was already the last recorded turn
  • Default min_confidence=0.5 filter on both service functions. Rows below threshold stay in the database but are filtered at the surface.
  • CLI --min-confidence + --verbose / -v flags; MCP min_confidence arg (None → service default; out-of-range values clamp).
  • Expanded signal vocabulary (47 → 67 phrases): negative additions (broke / broken / failing / mistake / error / regression / / 👎), positive (tests pass(ing) / fixed / solved / correct / +1 / 👍 / 🎉 / / ), revert (try again / try a different). Keyword regex now drops \b boundary on tokens that start/end with non-word characters so emoji and +1 match.
  • 8 new adversarial tests lock the per-row confidence at each ladder step.

Changed — optimize detectors lock in message_tool_mart migration (HANDOFF #2)

The v011 migration that landed in unreleased commits ahead of v0.7.0 already put the four detectors (junk_reads, bash_output_limits, low_read_edit_ratio, ghost_agents) on message_tool_mart fast paths. v0.7.2 adds parity tests proving the mart and raw-scan paths produce byte-equivalent findings, plus a bench_optimize_mart.py perf benchmark.

On a real-store-shape DB (78,763 mart rows, 247,278 messages across 14 monthly partitions): junk_reads 270×, bash_output_limits 1242×, low_read_edit 490×, ghost_agents 822×. The mart path is also more precise for bash_output_limits — the raw scan sized the whole user-message content as the tool result (over-counted multi-tool turns); the mart's byte_count is the matched-by-tool_use_id size of the actual tool_result block.

Verified — /api/cost-data command_costs stays aggregator-driven (HANDOFF #5)

HANDOFF #5 asked whether command_costs could move to command_mart (mirroring Wave 5's tool_costs migration). The shape mismatch the HANDOFF flagged is structural, not stale: the aggregator emits per-Interaction rows (interaction_id, session_id, prompt_preview, timestamp, tools_used, steps, models_used, had_error, cost, tokens); the command_mart grain (day, project_id, command_name) discards those fields on ingest. Locked the aggregator path in place with three new tests; a future per-Interaction-grain mart could revisit this.

Notes

  • Tests: 2310 → 2355 (+45). Ruff baseline preserved (37 errors). Frontend typecheck + build clean.
  • No schema migration in this release.
  • Maintainer's real ~/.stackunderflow/store.db is at user_version=13 with all 8 marts populated and cost_source=unknown at 0 after a post-release backfill.

Install

pip install -U stackunderflow
stackunderflow start

v0.7.1 — init --install-skills + beta pricing coverage

14 May 02:24

Choose a tag to compare

A small follow-up release on top of v0.7.0 that closes two of the open HANDOFF follow-ups.

Added

stackunderflow init --install-skills

Automates the manual cp -r stackunderflow/skills/* ~/.claude/skills/ install path for the three static SKILL.md files (check-prior-work, find-related-sessions, recall-past-decisions).

  • --install-skills — copies the three SKILL.md files into ~/.claude/skills/<name>/SKILL.md. Idempotent: byte-identical destinations are skipped silently; a destination that differs is preserved with a warning unless --skills-force is set.
  • --skills-dest <path> — overrides the destination directory (defaults to ~/.claude/skills/).
  • --skills-force — overwrite a destination SKILL.md that differs from the shipped copy.

Source path resolved via importlib.resources so it works in both source-checkout and installed-wheel layouts.

Beta-normalizer pricing coverage

RATE_CARD now carries 17 new entries spanning the qwen, gemini, and claude-3-5-sonnet alias surface so cost_source=unknown drops for the default beta-fixture models. Models added:

  • Qwen (8): qwen-max, qwen-max-longcontext, qwen-plus, qwen-coder, qwen-coder-plus, qwen3-coder, qwen-auto, qwen-turbo
  • Gemini (8): gemini-2.5-pro, gemini-3.0-pro, gemini-3.1-pro, gemini-auto, gemini-2.5-flash, gemini-2.5-flash-lite, gemini-1.5-pro, gemini-1.5-flash
  • Anthropic (1): claude-3-5-sonnet (un-dated alias for the 2024-10-22 release)

All rates are inherited from the per-provider pricer modules — the migration was a coverage gap, not a rate-discovery exercise. gemini-3.0-pro and gemini-3.1-pro are forward-looking placeholders pegged to gemini-2.5-pro until Google publishes definitive rates.

Fixed

A latent overcounting bug shipped at the same time as the pricing coverage. _PROVIDER_TO_PRICER in stackunderflow/etl/normalize/base.py was routing every non-listed provider (qwen, gemini, copilot, codeium, droid, kiro, openclaw, pi, omp, continue, opencode, cursor-agent) to the Anthropic pricer. That produced 3-4x-too-high cost numbers for those providers — the Anthropic Sonnet rate ($3/$15 per M) was being applied to qwen requests instead of the actual qwen rate (~$1.20/$3.60 per M). Each beta provider now routes to its own pricer.

Notes

  • No schema migration in this release. All v007–v013 schema work landed in the unreleased commits between v0.7.0 and this release; those migrations stay additive and off-by-default on existing stores.
  • Tests: 2272 → 2310 (+38). Ruff baseline preserved (37 errors). Frontend typecheck + build clean.

Install

pip install -U stackunderflow
stackunderflow init --install-skills
stackunderflow start

v0.7.0 — ETL pipeline + handoff

06 May 21:29

Choose a tag to compare

v0.7.0 — proper ETL pipeline.

The dashboard's per-request aggregator passes are gone. Every cost / dashboard / compare / yield / optimize / messages-summary route now reads from indexed materialized marts; a filesystem watcher syncs marts within ~400 ms of any source-file write. End-to-end on a 247K-message store: dashboard cold-load went from 2.5 s → <50 ms warm.

Architecture: three layers, one watcher.

messages → usage_events → 5 marts (daily/session/project/provider_day/model_day)
                            ↑
                       filesystem watcher (200 ms debounce)

See docs/specs/etl-architecture.md for the design contract and docs/HANDOFF.md for the state-of-the-codebase walkthrough.

Numbers (real maintainer store)

  • Backfilled 150,337 events in 226 s (idempotent re-runs ~29 s)
  • Marts: daily 940, session 841, project 151, provider_day 146, model_day 184
  • Watermarks all aligned to 150,337
  • Dashboard route latencies (warm, dev box): cost-by-provider 1 ms · compare 2 ms · yield 1 ms · projects 6 ms · dashboard-data 7 ms · cost-data 12 ms · optimize 100 ms

What's new

  • 11 PRs across 4 waves shipped: foundation (#72), 4 default normalizers (#73), 5 mart builders (#74), filesystem watcher (#75), hot-path route migration (#76), 12 beta normalizers (#79), analytical-route migration (#81), real backfill + writer hook (#80), /api/etl/status + CLI (#78), UI status badge + backfill button (#77), real-data e2e + perf regression (#82)
  • 16/16 codeburn-catalog providers now have Normalizer subclasses
  • New CLI: stackunderflow etl status, stackunderflow etl backfill [--force], stackunderflow start --no-watcher
  • New API: GET /api/etl/status
  • New UI: status badge in dashboard header, "Backfill now" on /settings

Tests

1598 passing, 2 skipped, 11 deselected (slow suite — pytest -m slow)

Migration

Schema bumps 5 → 6 via v006_etl_layer.sql (additive — existing tables and routes keep working). First-run cost: a one-time backfill (~3 minutes per 100K messages) populates the marts; subsequent runs are watcher-driven and incremental.

Handoff

docs/HANDOFF.md walks an incoming agent through the architecture, recent history, gotchas, real-data state, and what's left.

v0.6.1

02 May 03:36

Choose a tag to compare

Polish + correctness pass on v0.6.0. See CHANGELOG.md.

Fixed

  • Cursor sessions: $0 → real cost (composer-* + cursor-auto now estimated at Sonnet rates; vendor-prefixed models delegate to upstream pricer). Cursor total in /api/compare jumped from $0.0000 → $14.4367 on real data.
  • <synthetic> model sentinel no longer leaks into compare/cost reports. v004 migration cleans 221 existing polluted rows.
  • Currency conversion no longer silently degrades to USD-with-foreign-symbol when Frankfurter is unreachable. Embedded rate snapshot + UI warning banner.

Changed

  • Cursor adapter derives per-workspace project_slug from bubble file paths (was collapsing 9 conversations into 1 "cursor" project; now 7 distinct workspaces).
  • v005 migration reparents existing legacy cursor sessions.

Tests

  • +96 defensive empty-source / malformed-data tests covering 10 beta adapters.
  • 1328 passing total (was 1187 at v0.6.0).

v0.6.0

01 May 05:04

Choose a tag to compare

v0.6.0 — full multi-provider pipeline + cost correctness + UI surfaces

See CHANGELOG.md for the complete entry list.

Highlights

  • Multi-currency, CSV/JSON export, model aliases (#33#35)
  • 7 new optimize patterns + plan budgets + compare mode + yield analysis + context-budget (#37, #39#42)
  • Fast-mode (Claude Opus priority tier) detection through the entire pipeline including SQLite (#44, #48)
  • Streaming JSONL reader (185MB → <1KB peak) + Cursor parse cache (3-8× speedup) (#38, #43)
  • Cursor + Cline adapters default-on; 12 others remain beta-opt-in (#49)
  • Public Python API (list_projects / process / list_sessions) reads the store, provider-tagged (#50)
  • MCP server multi-provider — session_query + new list_sessions + list_projects tools (#51)
  • Cursor v3 vscdb fix — was ingesting 0 sessions on real Cursor data (#52)
  • Dashboard surfaces for compare, yield, plan, optimize, context-budget (#47)
  • Codeburn attribution scrub from shipped code (#36)

Pipeline verified end-to-end

After this release a clean ingest produces:

  • 188 projects, 228K messages, 1106 sessions, 7 providers (claude, codex, cursor, cline, gemini, droid, qwen) in the local store
  • Public Python API + MCP both surface the same data

Tests

1187 passing, 2 skipped (was 882 at v0.5.0 — +305 new tests)