Skip to content

manav8498/processfork

ProcessFork

git for AI agents. Snapshot, fork, and merge live LLM sessions in 8 ms.

snapshot a 4-hour Claude Code session in 8 ms, fork into 12 attempts, merge the winner back, push to a registry

60-second demo: pf snapshot → pf fork ×12 → pf merge → pf push file:// → pf clone on a fresh store
Replay it locally: asciinema play demo/processfork-demo.cast

crates.io PyPI npm release MIT

CI 200 tests 8 ms snapshot 88.96% line coverage Rust + Py + TS


Why

You're 4 hours into a refactor with Claude Code. The agent has read 200 files, run 47 tests, opened a database, started a dev server. Then it suggests a destructive change.

Today: lose everything, undo by hand, or restart. With ProcessFork: pf snapshot → 8 ms → safe. Try 12 alternatives in parallel, merge the winner back, ship the whole session to a teammate.

It's git — snapshot, branch, merge, push, clone — but for live AI agent state.

Highlights

  • 8 ms snapshots. Full agent state — model + KV-cache + files + tools + reasoning — into one content-addressed .pfimg.
  • 🌳 Real fork & merge. 12 parallel attempts share storage automatically (CoW). Merge the winner with a real 3-way diff (files, tools, trace) — git-style <<<<<<< markers and all.
  • 🔒 Won't double-send your email. HMAC-chained tool-call ledger; restored agents see prior side-effects as facts, not as actions to re-issue. (ACRFence-resistant.)
  • 🤝 Drop-in for Claude Code, LangGraph, OpenInterpreter, vLLM, SGLang, AutoGen, CrewAI.
  • 📦 Single binary, MIT, Rust core, Python + TypeScript SDKs. 200+ tests.

Quick start (60 seconds)

# install the CLI:
cargo install processfork                      # → `pf` on your $PATH

# snapshot a directory:
mkdir /tmp/sandbox && echo "fn main() {}" > /tmp/sandbox/main.rs
pf snapshot --agent-id demo --fs-root /tmp/sandbox
# → sha256:1c2497b0…   ⏱ 8 ms

# edit something, snapshot again, see the diff:
echo "fn main() { println!(\"hi\") }" > /tmp/sandbox/main.rs
pf snapshot --agent-id demo --fs-root /tmp/sandbox --name v2
pf log
pf diff <first-cid> <second-cid>

Prefer Python? pip install processfork. TypeScript? npm install @processfork/sdk.

The full 60-second demo (snapshot → fork ×12 → merge → push → clone on a fresh store) is bash demo/script.sh. Runs end-to-end on a laptop. No GPU, no API keys.

When you'd reach for it

Situation Command
Agent about to do something destructive pf snapshot pre-rm-rf
Stuck — want to try 12 approaches in parallel pf fork -n 12 --explore "fix bug"
Hand a complex session to a teammate pf push hf://you/session-name
Time-travel debug ("when did it go wrong?") pf log then pf checkout <CID>
RL rollout fabric for agent training snapshot, fan out, score, merge

Use it with your stack

Adapter Status What it gives you
Claude Code ✅ ships v1.0 /snapshot, /fork, /merge slash-commands inside any session
LangGraph ✅ ships v1.0 drop-in BaseCheckpointSaver over the FS+env+trace+effects layers
OpenInterpreter ✅ ships v1.0 interpreter.snapshot("pre-rm-rf") then .checkout("pre-rm-rf")
AutoGen ✅ ships v1.0 atomic FS+env+trace+effects snapshot across an agent group
CrewAI ✅ ships v1.0 CrewMemory drop-in; every step time-travelable
vLLM 🟡 mock ships v1.0 · live = Modal lane mock: K/V page bytes + manifest persist & restore via the SDK; live (Modal A10G): V0 engine bit-exact, V1 engine output-equivalent (see "What does/doesn't ship" below)
SGLang 🟡 mock ships v1.0 · live = Modal lane mock: RadixCache k_buffer/v_buffer page round-trip; live: scaffolded — Modal lane reaches the parity stub but full radix-tree replay is v1.1

How it works

ProcessFork captures the five things that together make up a live agent — atomically — into one content-addressed file. Each layer ships at a different maturity level in v1.0.x:

Layer What it captures v1.0.x status
World Filesystem (full), env (default-redacted), browser DOM (CDP). In-flight subprocesses: --criu-pid <PID> for full state on Linux+CRIU (v1.0.12); --respawn-pid <PID> for portable argv/cwd/env/fd-paths on macOS + Linux + Windows (v1.0.14). ✅ FS + env ship; ✅ procs ships portably (--respawn-pid); 🟡 register-state via --criu-pid (Linux only)
Effects Append-only ledger of tool calls, HMAC-chained per entry (ACRFence). ✅ ships (CLI + Python SDK + TS SDK + 5 adapters)
Trace Chat + tool-call message log ✅ ships
Model LoRA / IA³ / full-finetune weight diffs, in-place TTT updates. The format and TIES+DARE merge math ship and are exercised on the Modal A10G lane; the generic CLI snapshot path produces an empty LoRA envelope because the layer is populated by adapters (vLLM/SGLang/etc.), not by walking a directory. 🟡 format ships; CLI path is placeholder; adapter-populated
Cache Paged KV-cache, content-addressed per page (CoW across forks). Same shape: format + page math ship; the generic CLI snapshot produces an empty page manifest; the vLLM/SGLang adapters populate it for real. 🟡 format ships; CLI path is placeholder; adapter-populated

Identical content shares storage automatically — 12 parallel forks use ~1.004× the space of one in the operator's matrix run, well under the < 1.5× budget. The merge engine handles each layer with the right algorithm: git-style 3-way diff for files (conflict markers materialize; resolution UI is v1.1), TIES + DARE for model weights, the HMAC effects chain that defends against semantic-rollback attacks (ACRFence), and an LLM-summarized "what branch B learned" patch injected into branch A's reasoning trace without re-prefilling the cache.

What does and doesn't ship in v1.0.x

Production-credible today (independent retest, 12/12 matrix passing):

  • pf snapshot / pf checkout for filesystem sandboxes, with default secret-shaped env redaction.
  • HMAC-chained effects ledger end-to-end (CLI + Python + TS), tamper detected by pf verify.
  • Fork & merge: 12 forks at ~1.004× storage; clean and conflicting merges produce content-addressed merged CIDs with Git-style markers in conflict files.
  • File:// (and OCI / S3 / HF) registry transport.
  • 5 adapters (Claude Code, LangGraph, OpenInterpreter, AutoGen, CrewAI) over the FS + env + trace + effects layers.
  • vLLM/SGLang mock mode: K/V page bytes + manifest persist into the store and read back on checkout.

Tractable today (v1.0.12 closed three of these from v1.0.11):

  • Live in-flight subprocess capture ✅ via processfork-criu (Linux + criu CLI required). pf snapshot --criu-pid <PID> writes a real procs.criu.v1 bundle (CRIU images tarball + JSON header). On macOS / Windows / non-criu Linux, the command exits with a clear "CRIU unavailable" message — no silent half-state. Validation runs on the operator's Linux box; the format + gating + non-Linux skip path is unit-tested everywhere. See adapters/pf-criu/README.md for the full caveat list (this lane has the same shape as the Modal vLLM lane: code ships, validation lives on a host the upstream CI doesn't have).
  • Conflict-merge resolution UI ✅ via pf merge-resolve / pf merge-finalize. When pf merge produces conflicts, drop the merged FS into a workdir, hand-edit the marker files, then finalize into a single-parent image. The finalize step refuses to ship as long as files still contain <<<<<<< (override with --force). Round-trip regression-tested on macOS.
  • Loud warning on empty engine layers ✅. The generic CLI snapshot now emits a multi-line stderr warning that model + cache are placeholders and only the FS+env+trace+effects layers were captured. Suppress with --allow-empty-engine-layers once your CI has internalized the boundary.

Not yet production-ready, though the format and code paths exist:

  • Local PF_HAS_GPU=1 vLLM/SGLang test (examples/06, examples/07, pf-cache/tests/cache_bit_exact_vllm.rs). These exit 2 with a "use the adapter packages directly + Modal lane" pointer — they were operator-runs-it skeletons that never got a self-contained subprocess flow. The Modal A10G lane (scripts/gpu-validate-modal.py) does run vLLM end-to-end and emits the JSONs in benchmarks/gpu-validation/.
  • Bit-exact KV-cache restore on vLLM V1 engine. The Modal lane shows V0 engine bit_exact: true for 38 619 KV pages but V1 engine = output-equivalent (first-80-chars match), not bit-exact, on TinyLlama-1.1B. V1's collective_rpc worker scheduling has internal non-determinism that ProcessFork cannot eliminate from the outside. Workaround for bit-exact replay today: pin to vLLM V0 engine + pass enforce_eager=True to disable CUDA graphs, e.g. LLM(..., enforce_eager=True) or vllm serve ... --enforce-eager. The V1 path stays output-equivalent until upstream lands deterministic batch scheduling — see adapters/pf-vllm/README.md for the full workaround note.
  • Generic CLI model/cache layer capture. The generic pf snapshot produces empty model + cache envelopes — these layers are populated through adapters (vLLM/SGLang/etc.), not by walking a directory. If you want the model+cache layers populated, use the vLLM or SGLang adapter from inside your engine process. The empty path now warns loudly by default.

Architecture deep-dive · Three-way merge protocol · Engineering specs

Status

v1.0.15 tagged. Closes the one production caveat from the v1.0.14 retest: pf verify now accepts the operator's session secret out-of-band. New pf verify --session-secret-hex <HEX> flag (also reads PF_SESSION_SECRET env var). Operator-supplied secret WINS over any embedded one — true ACRFence requires the secret to live outside the blob, so trusting only the embedded copy means a tampering attacker who can rewrite the blob can also re-sign it. New --fail-on-unverifiable-ledgers opt-in promotes "skipped" to a hard failure for CI. New verify-line telemetry shows how many chains were verified via the operator secret. Wrong-secret case now correctly fails with HMAC mismatch (was: silently skipped). All earlier audit-round fixes (v1.0.7 chain wiring, v1.0.9 Python SDK chain, v1.0.10 TS SDK chain, v1.0.13 ignore + GC + Python parents, v1.0.14 portable respawn-pid + symlink restore + runnable adapter examples) still stand. cargo deny check: still advisories ok, bans ok, licenses ok, sources ok.

v1.0.14 status (kept for reference)

v1.0.14 closed the three "left as-is" limitations from v1.0.13. (1) examples/06+07 are no longer exit-2 stubs — they run a byte-identical mock-mode K/V page round-trip end-to-end through the adapter on every host with processfork-vllm / processfork-sglang installed (no GPU required); the Modal lane stays the bit-exact validation. (2) New pf snapshot --respawn-pid <PID> portable subprocess capture works on macOS + Linux + Windows — emits a procs.respawn.v1 blob with argv/cwd/env/fd-paths, complementary to the Linux-only --criu-pid. The two flags are mutually exclusive (different fidelity tiers; pick the one matching your needs). (3) pf checkout no longer hard-errors on absolute-target symlinks — by default they're skipped with a stderr warning while the rest of the tree restores normally (matches tar/rsync); pass --allow-absolute-symlinks to restore them verbatim. The v1.0.3 "Zip Slip" CVE protection (PF-SA-2026-001) is unchanged: relative-target escapes and absolute paths in the FS tree itself are still hard-refused. cargo deny check: still advisories ok, bans ok, licenses ok, sources ok. v1.0.13 fixes (false-conflict ignores, GC marker pruning, Python SDK parents=) all stand.

v1.0.13 status (kept for reference)

v1.0.13 closed the two confirmed bugs and the one Python SDK lineage gap the v1.0.12 retest flagged (matrix had been 10 PASS / 2 ISSUE / 3 LIMITATION). (1) False merge conflicts from generated test artifacts are gone — WalkFsCapture now ships a default-extra ignore set covering __pycache__, .pytest_cache, .mypy_cache, .ruff_cache, .tox, .coverage, .venv, .DS_Store, *.pyc, *.pyo; new --ignore <PAT> (with glob support via globset), --ignore-from <PATH> (default: <fs_root>/.pfignore.gitignore), and --no-default-ignores flags; full gitignore-style file parsing (comments, blank lines, trailing slash; negation logged-and-skipped). (2) pf gc --retain-recent N no longer leaves dangling images/<cid>.json markers — GC now prunes them alongside the layer blobs and reports the count, closing the referential-integrity bug where pf log listed CIDs whose pf checkout would fail. (3) Python SDK now exposes parents=[...] lineage on snapshot_filesystem so SDK-only divergent forks can 3-way merge without routing through the CLI; bad CIDs surface as ValueError. cargo deny check: still advisories ok, bans ok, licenses ok, sources ok. The v1.0.12 limitations the auditor noted as scope/environment (live PF_HAS_GPU=1 vLLM, CRIU Linux-only, absolute-symlink restore safety) stand as-is — first two are infrastructure, last one is the v1.0.3 "Zip Slip" hardening that we're not unwinding.

metric observed target
Snapshot p50, synthetic 4-layer fixture (macOS arm64) 7.9 ms < 500 ms p99
Snapshot p50, real GPU host (Modal A10G, 64 × 4 KiB) 42 ms (warm) < 500 ms p99
KV-cache restore, vLLM V0 engine + TinyLlama-1.1B on A10G bit_exact: true — 38 619 KV pages, regenerated text byte-identical (JSON) out_a == out_b byte-equal
KV-cache restore, vLLM V1 engine (collective_rpc) output-equivalent, not bit-exact — first-80-chars match across snapshot/restore on 38 599 KV pages (JSON); bit_exact: false field is the source of truth out_a == out_b byte-equal (target unmet on V1)
Cache capture, 64 pages 531 µs
12-fork ÷ 1-fork storage ratio (auditor's matrix) 1.004× ≤ 1.5×
Total Rust tests passing 217 (incl. v1.0.15 verify-with-operator-secret regression)
Python SDK + Claude + CRIU + vLLM + SGLang adapter tests 42 passed, 4 skipped (CRIU Linux + live-GPU paths)
TS SDK smoke tests 8 (incl. 3 v1.0.10 regressions)

Synthetic-fixture numbers come from cargo bench --workspace. GPU numbers come from modal run scripts/gpu-validate-modal.py; raw JSON lives in benchmarks/gpu-validation/ and the breakdown in benchmarks/RESULTS.md. The local PF_HAS_GPU=1 paths in examples/06 and examples/07 are not the validation path — they exit 2 with a Modal-lane pointer; the validation IS the Modal lane, and the JSONs above are its output.

Install

cargo install processfork                          # Rust CLI (the `pf` binary)
pip   install processfork                          # Python SDK
npm   install @processfork/sdk                     # TypeScript SDK

Per-adapter packages (one each on PyPI):

pip install processfork-claude-code
pip install processfork-langgraph
pip install processfork-openinterpreter
pip install "processfork-vllm[vllm]"               # needs CUDA + vllm ≥ 0.10
pip install "processfork-sglang[sglang]"           # needs CUDA + sglang ≥ 0.5
pip install "processfork-autogen[autogen]"
pip install "processfork-crewai[crewai]"
pip install processfork-criu                       # Linux only; needs `criu` CLI on $PATH

Build from source if you want to hack on it:

git clone https://github.com/manav8498/processfork && cd processfork
cargo build --release -p processfork               # → target/release/pf

Full build-from-source instructions in docs/install.md. Pre-built wheels cover macOS arm64, Linux x86_64, and Linux aarch64; macOS Intel + Windows wheels arrive in v1.0.1 (operator: same package, just more platforms).

Repo layout

crates/      Rust workspace (10 crates: pf-core, pf-cache, pf-world, pf-effects,
             pf-model, pf-merge, pf-registry, processfork (CLI, the `pf` binary), pf-py, pf-ts)
adapters/    7 first-party integration packages
benchmarks/  PFBench harness + Criterion microbench
docs/        mdBook source (25+ pages)
examples/    8 self-contained runnable examples
demo/        60-second demo recording script

Docs

Your first fork (5 min) · 60-second demo · Architecture · Merge protocol · Security model · Performance tuning · Engineering specs

Contributing

PRs welcome. The bar is cargo fmt, cargo clippy --all-targets -- -D warnings, cargo test --workspace, plus a green coverage delta. See CONTRIBUTING.md.

License

MIT.