Goal
For any moment in a session, reconstruct what was actually in the model's context window: system prompt, tool definitions, prior messages, file contents the model was looking at. The "blackbox flight recorder" angle — see where the model lost the thread.
Why now
Playback v2 (v0.7.3) reconstructs the filesystem. The next layer up is the agent's mental state. Together they let you answer "the model lost the original requirement at turn 7 because it never saw it after the file got truncated."
Schema
None. Pure read-side over messages.raw_json + a context-window estimator.
User-visible surface
- API:
GET /api/playback/{session_id}/context?at=<iso>&model=<id> returns:
{
"session_id": "...", "snapshot_ts": "...",
"model_id": "claude-opus-4-7",
"system_prompt": "...", "system_prompt_tokens": 1234,
"tools_definition_tokens": 567,
"prior_messages": [
{"role": "user", "ts": "...", "content_excerpt": "...", "tokens_estimate": 234}
],
"context_used_tokens": 123456,
"context_max_tokens": 200000,
"context_used_pct": 61.7,
"warnings": ["truncation_likely_at_msg_42"]
}
- Meta-agent tool:
get_context_at(session_id, at) — feed the meta-agent so it can answer "did Claude see the original spec when it made the change at minute 30?".
- UI: extend PlaybackTab with a "Context" panel (toggle alongside the FS panel) showing tokens-used / tokens-max bar + collapsible message list at that point.
Implementation plan
- New service
stackunderflow/services/playback_context.py:
reconstruct_context_at(conn, session_id, *, at, model_id) -> dict.
- Walks
messages for the session in seq order up to at.
- Estimates tokens per message (cl100k-base or Anthropic's tokenizer if available; fall back to chars/4).
- Looks up the model's context-max from
infra/providers/<provider>.py config.
- Detects "truncation event" — when cumulative-tokens exceeded model context, the previous-history was almost certainly compacted. Mark the message before the next user turn after the breach.
- New route in
routes/playback.py.
- Frontend Context panel.
- Meta-agent tool entry.
Tests
- 5-message synthetic session → assert per-message token estimates are sane (not zero, not 100x off).
- A session with a known compaction (look for "context-truncated" or model-side compaction signals in
raw_json) → assert the warning fires.
- 404 on unknown session.
at cutoff is honoured (messages after at not included).
Hard parts
- Token-estimation is the load-bearing piece. Use
tiktoken with cl100k_base for OpenAI-family, fall back to chars/4 for everyone else (acceptable v1; refine later). Add as an optional dep [tokenizer].
- Model context limits vary (e.g., Opus 4.7 1M-context vs 200K). Look up via the existing
infra/providers/<provider>.py:rates_for(model) family — extend to return context_max if not already there.
- "What the model saw" includes tool definitions. Estimate ~500-1500 tokens per tool block (depends on schema verbosity). Pull from the meta-agent tool catalogue + the user's installed MCP tool list (see
context_budget route — reuse).
Out of scope
- Exact tokenizer parity with closed-source models (Anthropic's tokenizer isn't public).
- Reconstructing the file contents the model saw inline (Playback v2's
/fs already covers this; the Context panel just links).
Dependencies
- Builds on Playback v2 (shipped).
- Consumed by Spec 25 (fork mode — needs to recreate the context to re-prompt).
Estimated effort
Size L — single agent, ~2 hr.
Hard rules
- DO NOT touch versions / CHANGELOG headings.
- No schema migration.
- New optional dep
[tokenizer] (with tiktoken) is allowed.
- Branch:
feat/context-replay off main.
Goal
For any moment in a session, reconstruct what was actually in the model's context window: system prompt, tool definitions, prior messages, file contents the model was looking at. The "blackbox flight recorder" angle — see where the model lost the thread.
Why now
Playback v2 (v0.7.3) reconstructs the filesystem. The next layer up is the agent's mental state. Together they let you answer "the model lost the original requirement at turn 7 because it never saw it after the file got truncated."
Schema
None. Pure read-side over
messages.raw_json+ a context-window estimator.User-visible surface
GET /api/playback/{session_id}/context?at=<iso>&model=<id>returns:{ "session_id": "...", "snapshot_ts": "...", "model_id": "claude-opus-4-7", "system_prompt": "...", "system_prompt_tokens": 1234, "tools_definition_tokens": 567, "prior_messages": [ {"role": "user", "ts": "...", "content_excerpt": "...", "tokens_estimate": 234} ], "context_used_tokens": 123456, "context_max_tokens": 200000, "context_used_pct": 61.7, "warnings": ["truncation_likely_at_msg_42"] }get_context_at(session_id, at)— feed the meta-agent so it can answer "did Claude see the original spec when it made the change at minute 30?".Implementation plan
stackunderflow/services/playback_context.py:reconstruct_context_at(conn, session_id, *, at, model_id) -> dict.messagesfor the session inseqorder up toat.infra/providers/<provider>.pyconfig.routes/playback.py.Tests
raw_json) → assert the warning fires.atcutoff is honoured (messages afteratnot included).Hard parts
tiktokenwithcl100k_basefor OpenAI-family, fall back to chars/4 for everyone else (acceptable v1; refine later). Add as an optional dep[tokenizer].infra/providers/<provider>.py:rates_for(model)family — extend to returncontext_maxif not already there.context_budgetroute — reuse).Out of scope
/fsalready covers this; the Context panel just links).Dependencies
Estimated effort
Size L — single agent, ~2 hr.
Hard rules
[tokenizer](withtiktoken) is allowed.feat/context-replayoff main.