Skip to content

Roadmap: world-class coding-agent analytics — 14 specs in 6 waves #103

@0bserver07

Description

@0bserver07

What this is

A 6-wave roadmap to take StackUnderflow from "great local cost dashboard with self-referential discovery" to a world-class coding-agent analytics + blackbox. 14 spec issues, each scoped for a single agent run, with pre-assigned schema slots so we don't collide.

The thesis

Today the project is a passive viewer + a Q&A meta-agent. The leap is:

  1. Outcome attribution that holds up (PR / CI / static analysis / LLM grading)
  2. Comparative benchmark + mode recommender (the killer feature: empirical "model X wins on YOUR work at $Y per outcome")
  3. Replay + fork (the blackbox: rebuild what the model saw, branch from any point)
  4. Active brain (proactive nudges, multi-device, open exchange)

Wave structure

Each wave can run ~3-4 agents in parallel on independent specs. Sequential between waves so the dependency graph holds.

Wave 1 — Independent foundations (4 agents)

Closes the lowest-hanging fruit; nothing depends on anything else.

Wall-clock estimate: ~1.5h.

Wave 2 — Outcome-attribution rails (3 agents)

Brings external signals into the store. Foundation for waves 3 + 5.

Wall-clock estimate: ~2.5h.

Wave 3 — Outcome attribution + grading (2 agents, mostly sequential)

Combines wave 2's data into trustworthy attribution + adds the LLM grading dimension.

Wall-clock estimate: ~3-4h.

Wave 4 — Replay + active surfacing (3 agents)

The blackbox + the proactive nudge.

Wall-clock estimate: ~2.5h.

Wave 5 — Fork + comparative benchmark (2 agents, sequential)

The two big swings. Each is XL.

Wall-clock estimate: ~6-8h. Spec 26 needs a maintainer-written scoring rubric before dispatch.

Wave 6 — Sensitive / long-tail (sequential, with design pauses)

Wall-clock estimate: open-ended, depends on design pace.

Schema-version pre-assignment

Spec Slot Tables / columns added
16 v015 (only if needed) maybe session_mart.outcome ALTER
18 v016 mode_recommendations
20 v017 pr_outcomes, ci_runs
21 v018 static_analysis_findings
22 v019 commit_session_link + pr_outcomes extensions
23 v020 session_quality_grades
25 v021 session_forks + sessions.is_fork / parent_session_id
26 v022 benchmark_runs, benchmark_outcomes
28 v023 sync_state

Specs 12, 13, 17, 19, 24, 27, 29, 30 introduce no schema. Total new schema versions: 9 (v015 through v023).

Hard rules every implementing agent must follow

These are duplicated in each spec issue body for safety:

  • DO NOT touch versions (__version__.py, pyproject.toml, package.json, package-lock.json)
  • DO NOT move CHANGELOG ## [N.N.N] headings — entries go under ## [Unreleased] only
  • DO NOT touch ~/.stackunderflow/store.db — tests use tmp_path / :memory:
  • Use the pre-assigned schema slot in the spec body — do not invent another version number
  • Branch off main, named feat/<spec-slug> or fix/<spec-slug>
  • DO NOT open a PR — maintainer handles the merge
  • Preserve ruff baseline (38) — no new lint errors
  • All tests green before pushpytest tests/ -q, cd stackunderflow-ui && npm run typecheck && npm run build && node --test tests/services/*.test.ts
  • See docs/HANDOFF.md for the standing rules

What to expect

Each wave ships a release (v0.9.0 → v0.14.0 roughly), bundling its specs into one PyPI publish. Total: ~6 releases over 4-7 calendar days at typical agent-orchestration pace, with maintainer review + design calls between waves.

When to stop / redirect

This is a roadmap, not a contract. After each wave, reassess:

  • Did the wave's specs deliver real user value, or just code?
  • Did downstream waves' assumptions hold?
  • Is something more important emerging that should reprioritize?

Metadata

Metadata

Assignees

No one assigned

    Labels

    roadmapRoadmap / tracking issue

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions