Skip to content

chore(vendor): refresh FC patch to v1.12 with full build-clean diff#188

Merged
WaylandYang merged 2 commits into
mainfrom
chore/v0.4-vendored-fc-v1.12-patch
May 29, 2026
Merged

chore(vendor): refresh FC patch to v1.12 with full build-clean diff#188
WaylandYang merged 2 commits into
mainfrom
chore/v0.4-vendored-fc-v1.12-patch

Conversation

@WaylandYang
Copy link
Copy Markdown
Contributor

Summary

  • Regenerate 0001-feat-mem-backend-shared-option-for-MAP-SHARED.patch against the v1.12.0 tag so a fresh tools/devtool build --release of the vendored fork produces a working binary in one shot.
  • Bundle in the two follow-ups that had to be applied on the dev box after the original patch failed to compile and then failed at runtime:
    • shared: false defaults at six unpatched MemBackendConfig struct literals (build fix).
    • O_RDWR on the mem_file when shared=true (runtime fix — otherwise mmap(MAP_SHARED, PROT_WRITE) rejects the RO fd with EACCES).
  • Update docs/VENDORED-FIRECRACKER.md to point at the new forkd-v0.4-mem-backend-shared-v1.12 branch on deeplethe/firecracker, list the three constituent commits, and flip Phases 5a/5b/5c to done with the dates they shipped.

Test plan

  • git diff v1.12.0..forkd-v0.4-mem-backend-shared-v1.12 matches the contents of the regenerated patch file (4 files, 44 insertions, 5 deletions).
  • On the dev box: tools/devtool build --release succeeds clean from a fresh clone of forkd-v0.4-mem-backend-shared-v1.12.
  • test-shared.sh with the resulting binary shows shared=false → rw-p (MAP_PRIVATE) and shared=true → rw-s (MAP_SHARED) — see commit message for full run output.

🤖 Generated with Claude Code

The committed patch was the original ~33-line v1.16-dev diff and only
covered the main MemBackendConfig field + mmap-flag plumbing. Two
follow-ups had to be applied on the dev box to get a buildable tree
that actually mmaps MAP_SHARED:

  1. `shared: false` defaults at six unpatched MemBackendConfig struct
     literals (build fix — without this the tree refuses to compile).
  2. `O_RDWR` on the mem_file when shared=true (otherwise the kernel
     rejects MAP_SHARED + PROT_WRITE on a read-only fd with EACCES).

Regenerate as a v1.12.0 ↔ tip-of-branch combined diff so a fresh
`tools/devtool build --release` produces a working binary in one shot.
VENDORED-FIRECRACKER.md updated to point at the new
`forkd-v0.4-mem-backend-shared-v1.12` branch and list the three
commits that make up the patch.

Verified end-to-end on the dev box: shared=false yields `rw-p`
(MAP_PRIVATE, stock behavior) and shared=true yields `rw-s`
(MAP_SHARED, the patch enabling the v0.4 live-fork primitive).
Three stale pointers caught during self-review of the PR:

  1. crates/forkd-vmm/src/lib.rs:315 — the rustdoc comment for
     `MemoryBackend::MemfdShared` named the old `-mem-backend-shared`
     branch. After this PR makes `-v1.12` the canonical pointer the
     published API docs would mislead readers.

  2. DESIGN-v0.4-PHASE3-SPIKE.md:3 — the "Status: RESOLVED" banner
     cited the old branch + commit b30f1ba. Updated to the v1.12
     branch and described what the three commits there mean, so the
     spike doc and VENDORED-FIRECRACKER.md agree.

  3. docs/VENDORED-FIRECRACKER.md:38 (Smoke-checking) — pointed at
     `firecracker-patch/test-shared.sh` "in space/ next to forkd/"
     which is the author's local sibling-directory layout, not
     anything in the forkd repo. Commit the script into
     `scripts/dev/test-shared.sh` so the documented verification
     path actually works from a fresh forkd clone.

All three are docs/comment-level — no runtime change.
WaylandYang added a commit that referenced this pull request May 29, 2026
All discovered during self-review post-scope-correction. None are
runtime-impacting; the goal is to make the blueprint actually
match what the next four PRs will look like.

Highlights:

  1. Integration sketch (line 90) was still in the pre-correction
     shape — calling WpBranch::begin(memfd, ...) which would arm WP
     in the controller's process and miss every guest write. Rewrote
     to show request_wp_uffd + begin_with_external_uffd, the
     constructor the correction implies, and added a PauseGuard
     bracket. This was the most consequential bug — anyone reading
     the sketch to implement 6.3 would have shipped broken code.

  2. The main `## PR breakdown` table only listed 4 PRs while the
     scope-correction inset said 5. Added a 6.1.5 row and rewrote
     6.2's Done-when to capture the SCM_RIGHTS receiver scope that
     the correction added.

  3. Phase 6.1's patch shape claimed "2-3 files including
     api_server/request/snapshot.rs in Optional form" — what
     actually shipped touched 5 files (rpc_interface.rs +
     api_server/mod.rs + vstate/vm.rs needed exhaustive-match
     branches) and kept mem_file_path as PathBuf. Corrected so
     future rebases follow what the patch actually does.

  4. The ~50 LOC estimate for 6.1.5 ignored the seccomp filter
     update and the UFFDIO_REGISTER EINVAL we hit on first contact.
     Rewrote to ~100 LOC + seccomp work + a half-day diagnosis
     budget; documented the specific failure modes future
     implementers will hit.

  5. The "USER-API open Q #5" reference was pointing at the wrong
     section. Q5 there is about Python SDK wait=False semantics;
     the backward-compatibility claim lives in §Backward
     compatibility. Fixed both the in-text reference and the 6.4
     Done-when so the engineer edits the right section.

  6. Open question #2 was framed against the old "begin after
     pause" flow. Rewrote it to target the actual remaining
     vulnerable region — snapshot_vmstate_only between pause and
     resume.

Self-review used the same xhigh-effort code-review pass that
caught the three stale-reference bugs in #188.
@WaylandYang WaylandYang merged commit 7765f63 into main May 29, 2026
2 checks passed
@WaylandYang WaylandYang deleted the chore/v0.4-vendored-fc-v1.12-patch branch May 29, 2026 08:35
WaylandYang added a commit that referenced this pull request May 29, 2026
* docs(v0.4): add Phase 6 implementation blueprint

Phases 5a/5b/5c (memfd + MAP_SHARED) are done. The next chunk is
mode="live" — actually exercising WpBranch from branch_sandbox. That
work splits into 4 PRs (6.1-6.4) which share several non-obvious
decisions worth pinning down before any of them starts:

  - FC needs a SnapshotType::VmstateOnly variant. Already vendoring
    FC for MAP_SHARED; cheaper than the tee approach and unblocks
    every downstream PR.
  - memory.bin written by --live is byte-identical to --full — same
    dense layout, no sparse holes. Restore path is untouched.
    (USER-API doc's open Q5 will be flipped from "not compatible"
    to "compatible" by PR 6.4.)
  - Two pieces of plumbing precede the live path itself:
    Vm::memfd_handle() getter + Vm::snapshot_vmstate_only wrapper.
  - In-flight branch tracking for wait:false stays in-memory only
    for v0.4 (daemon restart marks them failed; persistence is v0.5+).

The doc also captures the open questions worth re-evaluating during
implementation (concurrent --live on same parent, RAII pause guard,
sync-vs-async copy thread, live-of-live chaining).

Companion to DESIGN-v0.4.md, DESIGN-v0.4-USER-API.md, and the
RESOLVED Phase 3 spike.

* docs(v0.4): Phase 6 scope correction — add 6.1.5 FC uffd endpoint

UFFDIO_REGISTER is per-process. Controller can't WP-arm FC's guest
memory from outside; the registration has to happen inside FC.
Pattern: FC creates uffd + UFFDIO_REGISTER (WP), sends fd to
controller via SCM_RIGHTS, controller arms + handles events.

Adds Phase 6.1.5 PR (FC patch for POST /uffd/wp endpoint) in front
of the original 6.2. Total estimate moves from ~2 weeks to ~2.5
weeks. 6.3 and 6.4 unchanged in shape; 6.3's WpBranch::begin will
take an externally-registered uffd instead of creating one.

Discovered while reading wp_snapshot.rs's begin() signature against
KVM's real architecture (KVM runs in FC's process, not controller).
The Phase 2 PoC works because it had KVM + handler in the same
process — that simplification doesn't survive contact with forkd's
controller/FC split.

* docs(v0.4): close 8 review findings on the Phase 6 blueprint

All discovered during self-review post-scope-correction. None are
runtime-impacting; the goal is to make the blueprint actually
match what the next four PRs will look like.

Highlights:

  1. Integration sketch (line 90) was still in the pre-correction
     shape — calling WpBranch::begin(memfd, ...) which would arm WP
     in the controller's process and miss every guest write. Rewrote
     to show request_wp_uffd + begin_with_external_uffd, the
     constructor the correction implies, and added a PauseGuard
     bracket. This was the most consequential bug — anyone reading
     the sketch to implement 6.3 would have shipped broken code.

  2. The main `## PR breakdown` table only listed 4 PRs while the
     scope-correction inset said 5. Added a 6.1.5 row and rewrote
     6.2's Done-when to capture the SCM_RIGHTS receiver scope that
     the correction added.

  3. Phase 6.1's patch shape claimed "2-3 files including
     api_server/request/snapshot.rs in Optional form" — what
     actually shipped touched 5 files (rpc_interface.rs +
     api_server/mod.rs + vstate/vm.rs needed exhaustive-match
     branches) and kept mem_file_path as PathBuf. Corrected so
     future rebases follow what the patch actually does.

  4. The ~50 LOC estimate for 6.1.5 ignored the seccomp filter
     update and the UFFDIO_REGISTER EINVAL we hit on first contact.
     Rewrote to ~100 LOC + seccomp work + a half-day diagnosis
     budget; documented the specific failure modes future
     implementers will hit.

  5. The "USER-API open Q #5" reference was pointing at the wrong
     section. Q5 there is about Python SDK wait=False semantics;
     the backward-compatibility claim lives in §Backward
     compatibility. Fixed both the in-text reference and the 6.4
     Done-when so the engineer edits the right section.

  6. Open question #2 was framed against the old "begin after
     pause" flow. Rewrote it to target the actual remaining
     vulnerable region — snapshot_vmstate_only between pause and
     resume.

Self-review used the same xhigh-effort code-review pass that
caught the three stale-reference bugs in #188.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant