Skip to content

feat: virtio-blk filesystem emulation + overlay CoW block device#10

Open
chaosreload wants to merge 17 commits intozerobootdev:mainfrom
chaosreload:feat/virtio-blk-filesystem
Open

feat: virtio-blk filesystem emulation + overlay CoW block device#10
chaosreload wants to merge 17 commits intozerobootdev:mainfrom
chaosreload:feat/virtio-blk-filesystem

Conversation

@chaosreload
Copy link

Summary

This PR implements Step 1 of the zeroboot enhancement plan: virtio-blk MMIO emulation with overlay CoW isolation, enabling full filesystem access in forked VMs without relying on Firecracker at runtime.

Changes

New: src/vmm/virtio_blk.rs

  • VirtioBlk: Full virtio-blk MMIO emulator handling KVM_EXIT_MmioWrite/Read
    • Processes QueueNotify → reads virtq descriptors from guest memory
    • Handles read/write/flush requests → updates used ring → injects IRQ (GSI 5)
    • Fixes VIRTIO_F_EVENT_IDX suppression: updates avail_event after queue drain
  • OverlayBlockDevice: Per-fork in-memory CoW layer
    • Read: check HashMap<sector, Vec<u8>> first, fall back to pread() on base image
    • Write: store in overlay only — base image is never modified (Arc<File> shared read-only)
    • Isolation: each fork's overlay is independent and discarded on drop

Modified: src/vmm/vmstate.rs

  • Virtio queue address detection: supports both Firecracker v1.12 (raw u64) and v1.15+ (Versionize [0x02][u32] prefix format)
  • XSAVE detection: relaxed MXCSR validation to tolerate exception flags set by numpy/Python
  • Multi-anchor offset detection: EFER + CR0 + APIC_BASE cross-validation instead of single IOAPIC anchor

Modified: src/vmm/kvm.rs

  • Integrate VirtioBlk into ForkedVm
  • Route MmioRead/MmioWrite exits to virtio-blk emulator
  • fork_cow() accepts block_file: Option<Arc<File>> parameter
  • Memory loading uses sendfile instead of std::fs::read to avoid double-buffering

Modified: guest/init.c

  • Added CODE:<python_code> command: executes via popen("python3 /tmp/zb_code.py 2>&1")
  • cat <path> now works with full filesystem access
  • Statically linked for minimal rootfs dependency

Modified: src/main.rs, src/api/handlers.rs

  • Auto-load block device from {workdir}/rootfs_path or ZEROBOOT_ROOTFS env var
  • cmd_template() saves rootfs absolute path after snapshot creation

Modified: docs/, README.md

  • New "Virtio-Blk Filesystem Emulation" section in ARCHITECTURE.md
  • Filesystem Access section in API.md
  • Updated README benchmarks with measured numbers

Test Results

Tested on c8i.xlarge with nested virtualization enabled (Ubuntu 22.04, Python 3.10, numpy 2.2.6, pandas 2.3.3):

Test Result Latency
Fork (KVM + CoW) P50 655 µs
Fork (KVM + CoW) P99 996 µs
Fork + echo hello P50 5.8 ms
CODE:print(1+1) 2 205 ms
CODE:import numpy as np; print(np.array([1,2,3]).mean()) 2.0 450 ms
cat /etc/os-release ✅ full output 30 ms
Memory per fork (100 concurrent) ~169 KB

Known Limitations

  • Python exec latency (200–450ms) includes on-demand .so loading through virtio-blk. Future: pre-warm pages before snapshot to eliminate disk I/O post-fork.
  • MMIO virtio uses polling (KVM_EXIT_MMIO). Future: KVM_IOEVENTFD + KVM_IRQFD for lower QueueNotify latency.
  • CPUID snapshot restore falls back to host CPUID under nested virtualization (cosmetic warning only).

Rootfs Build

See ARCHITECTURE.md for full rootfs build instructions (Ubuntu 22.04 + Python 3.10 + numpy/pandas + static init binary).

Ubuntu added 17 commits March 23, 2026 16:33
Step 1 complete:
- virtio_blk.rs: KVM_EXIT_MMIO virtio-blk emulator with overlay CoW
- vmstate.rs: robust offset detection for FC v1.15+, virtio queue addr parsing
- kvm.rs: integrate VirtioBlk into fork_cow, sendfile-based memfd loading
- guest/init.c: C init with popen python3 for CODE: execution
- Verified: echo/cat/CODE:numpy all work end-to-end

Test: Ubuntu 22.04 rootfs, Python 3.10, numpy 2.2.6, pandas 2.3.3
Fork latency: ~1ms, Python exec: ~200ms, numpy exec: ~450ms
- ARCHITECTURE.md: new section covering virtio-blk design, overlay CoW,
  EVENT_IDX fix, vmstate detection, guest init protocol, rootfs build guide,
  template creation, and performance benchmarks table
- API.md: added Filesystem Access section explaining isolated CoW per fork
- README.md: updated How It Works to include filesystem step, refreshed
  benchmark table with actual measured numbers

Closes Step 1 documentation.
@chaosreload
Copy link
Author

Hi, just a heads-up on the related PRs I have opened to keep things reviewable in smaller pieces:

This PR (#10) contains the core feature implementation. The docs in this PR overlap with #11 — if you prefer to review docs separately, feel free to merge #11 first and I can rebase this PR to remove the doc changes.

Happy to make any adjustments. Thanks for your time!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant