Skip to content

Spec 25: fork mode — replay session up to message K, swap model, regenerate #98

@0bserver07

Description

@0bserver07

Goal

The "what-if" tree. Replay a session up to message K, swap the model (or system prompt, or tools), regenerate from there, persist the fork as a child session linked to the parent. Lets you ask "would Sonnet have done this in fewer turns?" empirically.

Why now

Comparative benchmark (Spec 26) needs this primitive. And separately, this is the killer "blackbox" feature — replay + fork is what makes a flight recorder useful.

Schema

v021session_forks:

CREATE TABLE session_forks (
  fork_session_id TEXT PRIMARY KEY,    -- the new session id
  parent_session_id TEXT NOT NULL,
  forked_at_message_seq INTEGER NOT NULL,
  forked_at_ts TEXT NOT NULL,
  fork_reason TEXT NOT NULL,            -- 'manual' | 'benchmark' | 'what-if'
  swap_model TEXT,                      -- new model id (NULL = unchanged)
  swap_system_prompt TEXT,              -- new system prompt (NULL = unchanged)
  swap_tools_json TEXT,                 -- new tool list (NULL = unchanged)
  status TEXT NOT NULL,                 -- 'running' | 'complete' | 'failed' | 'cancelled'
  created_ts TEXT NOT NULL,
  completed_ts TEXT,
  raw_outcome_json TEXT
);
CREATE INDEX idx_sf_parent ON session_forks(parent_session_id);

Plus add is_fork BOOLEAN and parent_session_id TEXT to sessions (additive).

User-visible surface

  • CLI: stackunderflow fork session <id> --at-message <seq> --swap-model <new> [--swap-system-prompt FILE] [--reason what-if].
  • CLI: stackunderflow fork list <parent_session_id>.
  • API: POST /api/playback/{id}/fork with body {at_message_seq, swap_model?, swap_system_prompt?, reason} returns the new fork session id; GET /api/playback/{id}/forks lists children.
  • Meta-agent tool: fork_session(id, at_message_seq, swap_model).
  • UI: extend Playback tab with a "Fork from here" button on each event.

Implementation plan

  1. v021 migration.
  2. New service stackunderflow/services/session_fork.py:
    • fork(conn, parent_id, at_seq, *, swap_model, swap_system_prompt, swap_tools, reason) — uses Spec 24 (reconstruct_context_at) to rebuild the model context, calls Ollama (or queues for cloud-API depending on swap_model), persists assistant turns into a new session.
    • Background-task runner (similar to backfill_jobs.py pattern) — fork can take minutes; don't block the API.
  3. CLI + API + meta-agent.
  4. UI button.

Tests

  • Simple fork: 3-message parent, fork at msg 2, swap model → assert child session exists with parent link, msg 1 + msg 2 copied verbatim, msg 3+ are new.
  • Mocked LLM: assert the LLM was called with the reconstructed context (per Spec 24).
  • Cancellation: fork running, user cancels → status='cancelled', no zombie data.
  • Idempotency: forking the same parent at the same seq with the same swap_model returns existing fork (don't duplicate).

Hard parts

  • LLM call orchestration. Forks against Ollama are simple (use the meta-agent route's machinery). Forks against cloud APIs need credentials + cost-cap (default refuse cloud; require explicit --allow-cloud + budget cap).
  • Tool execution during fork. If the original session called Bash, what does the fork do? v1 answer: don't actually execute tools — record the model's tool-call intent only. v2 could execute in a sandboxed dir. Document this constraint loudly.
  • Context cost. Reconstructing 200K tokens of context per fork costs real money. Surface estimated cost before forking; refuse if over a threshold.

Out of scope

  • Sandboxed tool execution during fork (defer).
  • Cloud-API forks by default (opt-in only, with budget cap).
  • "Fork tree" visualization (defer to v2).

Dependencies

  • Blocked by: Spec 24 (context-window replay).
  • Consumed by Spec 26 (comparative benchmark).

Estimated effort

Size XL — single agent, ~3-4 hr. Background-job orchestration + LLM-loop machinery is the bulk.

Hard rules

  • DO NOT touch versions / CHANGELOG headings.
  • Pre-assigned schema slot: v021.
  • Branch: feat/session-fork-mode off main.
  • Default to LOCAL Ollama only; cloud requires explicit opt-in flag.

Metadata

Metadata

Assignees

No one assigned

    Labels

    size-xl~3-4 hr agent runspecSpec/feature for an agent to implementwave-5Wave 5: fork + comparative benchmark

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions