Goal
Replace synthetic-but-spec-accurate beta-normalizer fixtures with real session data captured from each beta provider. The current fixtures catch structural failure modes; they don't catch the next "Cursor v3 conversationId-in-the-key" until real local data hits the normalizer.
Why now
HANDOFF #6 has been open since v0.7.0. The defensive empty/malformed coverage from v0.6.1 + the spec-shape coverage from Wave 5 are not enough for normalizer correctness on real data — the codeburn catalog spec doesn't capture every edge case in the wild.
Schema
None. Test-fixture additions only.
User-visible surface
None. CI-only correctness improvement.
Implementation plan
- Set up minimal accounts / installs for each beta provider where reasonable (the maintainer or a contributor needs to do this on their machine):
- Qwen, Gemini, Copilot, Codeium, Continue, Droid, Kiro, OpenClaw, Pi/OMP, OpenCode, Cursor Agent, KiloCode, Roo Code (13 in total).
- Run a representative session in each (5-10 turns, mix of prompts + tool calls).
- Copy the resulting source files (each adapter's
enumerate() knows where) into tests/fixtures/beta_normalizers/<provider>/real_session_<id>.<ext>.
- Strip PII / secrets — write a redaction helper that zeroes API keys, file paths in
/Users/<name>/, etc.
- Add real-shape parity tests: each provider's normalizer is run on each real fixture, asserts
cost_source != "unknown" (or documents which models legitimately are), model is non-empty, tokens_in / tokens_out are non-zero.
- Update
docs/beta-normalizer-drift.md with new findings.
Tests
- Per-provider parity: run normalizer on each real fixture, assert key invariants.
- Redaction helper: smoke-test that no secrets / personal paths leak into the committed fixture.
Hard parts
- Most beta providers don't have real local sessions on the maintainer's machine. This is a logistics problem, not a code problem. The agent can write the parity test infrastructure; the user (or a contributor) has to capture the real fixtures.
- PII redaction is high-stakes. Get it wrong, you commit secrets. Use multiple redaction passes; require a manual review step before any commit.
- Some providers may have changed their format since the synthetic fixtures were written — those are exactly the regressions this finds.
Out of scope
- Real-session capture from providers the maintainer doesn't have access to (contributors can fill in).
- Stress / load testing — different effort.
Dependencies
Estimated effort
Size M — agent does ~1-2 hr of test-infra + redaction-helper work; the real fixture capture is asynchronous and depends on the maintainer's machine state.
Hard rules
- DO NOT touch versions / CHANGELOG headings.
- ANY committed fixture must pass through the redaction helper FIRST.
- Branch:
feat/real-beta-fixtures off main.
- Document explicitly which providers have real fixtures vs synthetic-only.
Goal
Replace synthetic-but-spec-accurate beta-normalizer fixtures with real session data captured from each beta provider. The current fixtures catch structural failure modes; they don't catch the next "Cursor v3 conversationId-in-the-key" until real local data hits the normalizer.
Why now
HANDOFF #6 has been open since v0.7.0. The defensive empty/malformed coverage from v0.6.1 + the spec-shape coverage from Wave 5 are not enough for normalizer correctness on real data — the codeburn catalog spec doesn't capture every edge case in the wild.
Schema
None. Test-fixture additions only.
User-visible surface
None. CI-only correctness improvement.
Implementation plan
enumerate()knows where) intotests/fixtures/beta_normalizers/<provider>/real_session_<id>.<ext>./Users/<name>/, etc.cost_source != "unknown"(or documents which models legitimately are),modelis non-empty,tokens_in / tokens_outare non-zero.docs/beta-normalizer-drift.mdwith new findings.Tests
Hard parts
Out of scope
Dependencies
Estimated effort
Size M — agent does ~1-2 hr of test-infra + redaction-helper work; the real fixture capture is asynchronous and depends on the maintainer's machine state.
Hard rules
feat/real-beta-fixturesoff main.