Spec 24: context-window replay — reconstruct what the model 'saw' at time T

## Goal
For any moment in a session, reconstruct what was actually in the model's context window: system prompt, tool definitions, prior messages, file contents the model was looking at. The "blackbox flight recorder" angle — see where the model lost the thread.

## Why now
Playback v2 (v0.7.3) reconstructs the filesystem. The next layer up is the agent's mental state. Together they let you answer "the model lost the original requirement at turn 7 because it never saw it after the file got truncated."

## Schema
**None.** Pure read-side over `messages.raw_json` + a context-window estimator.

## User-visible surface
- **API**: `GET /api/playback/{session_id}/context?at=<iso>&model=<id>` returns:
  ```json
  {
    "session_id": "...", "snapshot_ts": "...",
    "model_id": "claude-opus-4-7",
    "system_prompt": "...", "system_prompt_tokens": 1234,
    "tools_definition_tokens": 567,
    "prior_messages": [
      {"role": "user", "ts": "...", "content_excerpt": "...", "tokens_estimate": 234}
    ],
    "context_used_tokens": 123456,
    "context_max_tokens": 200000,
    "context_used_pct": 61.7,
    "warnings": ["truncation_likely_at_msg_42"]
  }
  ```
- **Meta-agent tool**: `get_context_at(session_id, at)` — feed the meta-agent so it can answer "did Claude see the original spec when it made the change at minute 30?".
- **UI**: extend PlaybackTab with a "Context" panel (toggle alongside the FS panel) showing tokens-used / tokens-max bar + collapsible message list at that point.

## Implementation plan
1. New service `stackunderflow/services/playback_context.py`:
   - `reconstruct_context_at(conn, session_id, *, at, model_id) -> dict`.
   - Walks `messages` for the session in `seq` order up to `at`.
   - Estimates tokens per message (cl100k-base or Anthropic's tokenizer if available; fall back to chars/4).
   - Looks up the model's context-max from `infra/providers/<provider>.py` config.
   - Detects "truncation event" — when cumulative-tokens exceeded model context, the previous-history was almost certainly compacted. Mark the message before the next user turn after the breach.
2. New route in `routes/playback.py`.
3. Frontend Context panel.
4. Meta-agent tool entry.

## Tests
- 5-message synthetic session → assert per-message token estimates are sane (not zero, not 100x off).
- A session with a known compaction (look for "context-truncated" or model-side compaction signals in `raw_json`) → assert the warning fires.
- 404 on unknown session.
- `at` cutoff is honoured (messages after `at` not included).

## Hard parts
- Token-estimation is the load-bearing piece. Use `tiktoken` with `cl100k_base` for OpenAI-family, fall back to chars/4 for everyone else (acceptable v1; refine later). Add as an optional dep `[tokenizer]`.
- Model context limits vary (e.g., Opus 4.7 1M-context vs 200K). Look up via the existing `infra/providers/<provider>.py:rates_for(model)` family — extend to return `context_max` if not already there.
- "What the model saw" includes tool definitions. Estimate ~500-1500 tokens per tool block (depends on schema verbosity). Pull from the meta-agent tool catalogue + the user's installed MCP tool list (see `context_budget` route — reuse).

## Out of scope
- Exact tokenizer parity with closed-source models (Anthropic's tokenizer isn't public).
- Reconstructing the file contents the model saw inline (Playback v2's `/fs` already covers this; the Context panel just links).

## Dependencies
- Builds on Playback v2 (shipped).
- Consumed by Spec 25 (fork mode — needs to recreate the context to re-prompt).

## Estimated effort
**Size L** — single agent, ~2 hr.

## Hard rules
- DO NOT touch versions / CHANGELOG headings.
- No schema migration.
- New optional dep `[tokenizer]` (with `tiktoken`) is allowed.
- Branch: `feat/context-replay` off main.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Spec 24: context-window replay — reconstruct what the model 'saw' at time T #96

Goal

Why now

Schema

User-visible surface

Implementation plan

Tests

Hard parts

Out of scope

Dependencies

Estimated effort

Hard rules

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Spec 24: context-window replay — reconstruct what the model 'saw' at time T #96

Description

Goal

Why now

Schema

User-visible surface

Implementation plan

Tests

Hard parts

Out of scope

Dependencies

Estimated effort

Hard rules

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions