From 1e0e509376a96d4e4c2c24d8f6ee1371d57c8703 Mon Sep 17 00:00:00 2001 From: Ubuntu Date: Tue, 24 Mar 2026 09:03:40 +0000 Subject: [PATCH] docs: add QUICKSTART guide, update ARCHITECTURE and API docs for virtio-blk filesystem support --- README.md | 16 +- docs/API.md | 17 ++ docs/ARCHITECTURE.md | 112 ++++++++++++- docs/QUICKSTART.md | 384 +++++++++++++++++++++++++++++++++++++++++++ 4 files changed, 520 insertions(+), 9 deletions(-) create mode 100644 docs/QUICKSTART.md diff --git a/README.md b/README.md index 1658518..66abbd2 100644 --- a/README.md +++ b/README.md @@ -33,11 +33,12 @@ curl -X POST https://api.zeroboot.dev/v1/exec \ | Metric | Zeroboot | E2B | microsandbox | Daytona | |---|---|---|---|---| -| Spawn latency p50 | **0.79ms** | ~150ms | ~200ms | ~27ms | -| Spawn latency p99 | 1.74ms | ~300ms | ~400ms | ~90ms | -| Memory per sandbox | ~265KB | ~128MB | ~50MB | ~50MB | -| Fork + exec (Python) | **~8ms** | - | - | - | -| 1000 concurrent forks | 815ms | - | - | - | +| Spawn latency p50 | **0.65ms** | ~150ms | ~200ms | ~27ms | +| Spawn latency p99 | 1.00ms | ~300ms | ~400ms | ~90ms | +| Memory per sandbox | ~169KB | ~128MB | ~50MB | ~50MB | +| Fork + echo (serial) | **~5.8ms** | - | - | - | +| Fork + Python exec | **~205ms** | - | - | - | +| Filesystem (cat file) | **~30ms** | N/A | - | - | Each sandbox is a real KVM virtual machine with hardware-enforced memory isolation. @@ -49,8 +50,9 @@ Each sandbox is a real KVM virtual machine with hardware-enforced memory isolati ``` 1. **Template** (one-time): Firecracker boots a VM, pre-loads your runtime, and snapshots memory + CPU state -2. **Fork** (~0.8ms): Creates a new KVM VM, maps snapshot memory as CoW, restores all CPU state -3. **Isolation**: Each fork is a separate KVM VM with hardware-enforced memory isolation +2. **Fork** (~0.8ms): Creates a new KVM VM, maps snapshot memory as CoW, restores CPU state + virtio-blk device +3. **Filesystem**: Each fork has an independent overlay block device — reads hit the shared base image, writes are isolated in-memory +4. **Isolation**: Each fork is a separate KVM VM with hardware-enforced memory isolation ## SDKs diff --git a/docs/API.md b/docs/API.md index 2b7047a..80d6e5d 100644 --- a/docs/API.md +++ b/docs/API.md @@ -113,3 +113,20 @@ Authorization: Bearer zb_live_key1 - If no keys file exists, auth is disabled - Invalid or missing keys return **HTTP 401** - Rate limited at **100 req/s per key** (HTTP 429) + +## Filesystem Access + +Sandboxes have a full read-write filesystem backed by the rootfs image specified at template +creation time. Each fork gets an independent overlay so writes are isolated: + +```python +# Works inside CODE: commands +import os +os.makedirs("/tmp/mydir", exist_ok=True) +with open("/tmp/mydir/output.txt", "w") as f: + f.write("hello from sandbox") +print(open("/tmp/mydir/output.txt").read()) # hello from sandbox +``` + +Writes are discarded when the fork ends — the base image is never modified. +To persist outputs, include them in the stdout response. diff --git a/docs/ARCHITECTURE.md b/docs/ARCHITECTURE.md index 43a54cb..ee46594 100644 --- a/docs/ARCHITECTURE.md +++ b/docs/ARCHITECTURE.md @@ -17,6 +17,7 @@ │ 4. Restore CPU: sregs → XCRS → XSAVE │ │ → regs → LAPIC → MSRs → MP state │ │ 5. Serial I/O via 16550 UART emulation │ + │ 6. Virtio-blk MMIO + overlay CoW disk │ └──────────────┬──────────────────────────┘ │ ┌───────────────────┼───────────────────┐ @@ -54,12 +55,14 @@ Each fork gets its own KVM VM with private memory pages. Writes trigger CoW page | File | Purpose | |---|---| -| `src/vmm/kvm.rs` | Fork engine: KVM VM + CoW mmap + CPU state restore | -| `src/vmm/vmstate.rs` | Firecracker vmstate parser with auto-detect offsets | +| `src/vmm/kvm.rs` | Fork engine: KVM VM + CoW mmap + CPU state restore + virtio-blk integration | +| `src/vmm/vmstate.rs` | Firecracker vmstate parser: auto-detect offsets + virtio queue addr detection | +| `src/vmm/virtio_blk.rs` | **Virtio-blk MMIO emulator + overlay CoW block device** | | `src/vmm/firecracker.rs` | Template creation via Firecracker API | | `src/vmm/serial.rs` | 16550 UART emulation for guest I/O | | `src/api/handlers.rs` | HTTP API: exec, batch, health, metrics, auth | | `src/main.rs` | CLI: template, test-exec, bench, serve | +| `guest/init.c` | Guest PID 1: serial command dispatcher + CODE: execution via popen(python3) | | `sdk/python/` | Python SDK (zero dependencies) | | `sdk/node/` | TypeScript SDK (zero dependencies, uses fetch) | | `deploy/` | systemd service + fleet deploy script | @@ -81,3 +84,108 @@ Firecracker's CPUID filtering confuses numpy's runtime CPU feature detection. Se ### IOAPIC Restore Pattern Don't zero-init `kvm_irqchip`. Use `KVM_GET_IRQCHIP` first, then overwrite the redirect table entries from the snapshot, then `KVM_SET_IRQCHIP`. Zero-initializing corrupts other irqchip state and causes interrupt routing failures. + +## Virtio-Blk Filesystem Emulation + +Zeroboot implements a full virtio-blk MMIO device emulator (`src/vmm/virtio_blk.rs`) so each +forked VM has a working filesystem without relying on Firecracker at runtime. + +### Data Flow + +``` + rootfs.ext4 (read-only, shared across all forks via Arc) + | + v + OverlayBlockDevice (per-fork in-memory CoW layer) + | read: check overlay HashMap> first + | on miss: pread() from shared base image + | write: insert sector into overlay (base image untouched) + v + VirtioBlk MMIO emulator (handles KVM_EXIT_MmioWrite / MmioRead) + | guest writes QueueNotify to 0xC0001000+0x050 + | -> read avail ring -> parse descriptor chain + | -> dispatch read/write/flush -> update used ring -> inject IRQ (GSI 5) + v + Guest kernel ext4 (transparent block device /dev/vda) +``` + +### Key Algorithms + +**Overlay CoW isolation** +- Each fork owns a `HashMap>` keyed by 512-byte sector number +- Writes only touch the overlay; the shared base image is opened O_RDONLY +- Forks never observe each other's writes; overlay is freed when the fork is dropped + +**VIRTIO_F_EVENT_IDX suppression fix** +- The guest uses event index suppression to avoid redundant `QueueNotify` writes +- After draining the queue, the emulator writes `last_avail_idx` into the `avail_event` + field of the used ring header — this tells the guest "notify me for the next request" +- Without this update, only the first I/O per wake-up would be processed + +**Vmstate queue address detection (Firecracker v1.12 + v1.15)** +- v1.15 serializes `GuestAddress` with a 2-byte Versionize prefix: `[0x02][u32_LE]` +- Parser searches for pattern `[02][u32][02][u32][02][u32]` (desc/avail/used ring GPA) +- Falls back to 3-consecutive-raw-u64 format for Firecracker v1.12 compatibility + +### Guest Init Protocol + +`guest/init.c` is a statically-linked PID 1 that mounts filesystems and listens on `/dev/ttyS0`: + +| Host sends | Guest action | Guest responds | +|---|---|---| +| `CODE:\n` | Writes code to `/tmp/zb_code.py`, runs `python3 /tmp/zb_code.py 2>&1` | stdout + stderr | +| `echo \n` | Writes text to serial | text | +| `cat \n` | Opens file, reads to serial | file contents | +| *(any command)* | After response | `ZEROBOOT_DONE\n` | + +### Building a Custom Rootfs + +```bash +# 1. Bootstrap Ubuntu 22.04 minimal +sudo debootstrap --arch=amd64 jammy /tmp/rootfs http://archive.ubuntu.com/ubuntu/ +echo "deb http://archive.ubuntu.com/ubuntu jammy main universe" | sudo tee /tmp/rootfs/etc/apt/sources.list + +# 2. Install Python 3 + scientific packages +sudo chroot /tmp/rootfs apt-get update -qq +sudo chroot /tmp/rootfs apt-get install -y python3 python3-pip gcc +sudo chroot /tmp/rootfs pip3 install numpy pandas + +# 3. Compile and install guest init (statically linked) +sudo cp guest/init.c /tmp/rootfs/init.c +sudo chroot /tmp/rootfs gcc -O2 -static -o /init /init.c +sudo rm /tmp/rootfs/init.c + +# 4. Package as ext4 image (~1.5 GB) +dd if=/dev/zero of=rootfs.ext4 bs=1M count=1500 +mkfs.ext4 -F rootfs.ext4 +sudo mount -o loop rootfs.ext4 /mnt/out +sudo cp -a /tmp/rootfs/. /mnt/out/ +sudo umount /mnt/out +``` + +### Creating a Template + +```bash +# Boot the VM via Firecracker, wait for guest to reach the serial listen loop, snapshot +./zeroboot template vmlinux.bin rootfs.ext4 ./workdir 10 /init 512 +# +``` + +`workdir/rootfs_path` is written automatically and picked up by `bench` / `serve` / `test-exec`. + +### Observed Performance (c8i.xlarge, nested virtualization) + +| Metric | Value | +|---|---| +| Pure CoW mmap P50 | 0.7 µs | +| Full fork (KVM + CPU restore) P50 | **655 µs** | +| Full fork P99 | **996 µs** | +| Fork + echo hello P50 | 5.8 ms | +| Fork + CODE:print(1+1) | ~205 ms | +| Fork + CODE:import numpy; ... | ~450 ms | +| Fork + cat /etc/os-release | ~30 ms | +| Memory per fork (100 concurrent) | ~169 KB | + +Python exec latency (200–450 ms) reflects on-demand `.so` loading through virtio-blk. +This is a one-time cost per fork; pages are reused across CoW forks once warm. +**Planned optimization**: pre-warm all `.so` pages before snapshotting → zero disk I/O after fork. diff --git a/docs/QUICKSTART.md b/docs/QUICKSTART.md new file mode 100644 index 0000000..40e1d58 --- /dev/null +++ b/docs/QUICKSTART.md @@ -0,0 +1,384 @@ +# Zeroboot Quick Start Guide + +> Based on chaosreload/zeroboot (feat/virtio-blk-filesystem branch) +> Environment: AWS c8i.xlarge, Ubuntu 22.04, nested virtualization enabled + +--- + +## 1. Machine Setup + +### 1.1 Launch a New Instance (Recommended) + +Enable nested virtualization at launch time via `--cpu-options` — no need for a stop/modify/start cycle. + +> ⚠️ Requires AWS CLI >= v2.34. Older versions don't support the `NestedVirtualization` parameter. + +```bash +# Get the latest Ubuntu 22.04 AMI (ap-southeast-1) +AMI_ID=$(aws ec2 describe-images \ + --owners 099720109477 \ + --filters 'Name=name,Values=ubuntu/images/hvm-ssd/ubuntu-jammy-22.04-amd64-server-*' \ + 'Name=state,Values=available' \ + --query 'sort_by(Images, &CreationDate)[-1].ImageId' \ + --region ap-southeast-1 --output text) + +# Launch with nested virtualization enabled in one step +INSTANCE_ID=$(aws ec2 run-instances \ + --image-id $AMI_ID \ + --instance-type c8i.xlarge \ + --key-name \ + --security-group-ids \ + --cpu-options "NestedVirtualization=enabled" \ + --tag-specifications 'ResourceType=instance,Tags=[{Key=Name,Value=zeroboot-fresh}]' \ + --region ap-southeast-1 \ + --query 'Instances[0].InstanceId' --output text) + +echo "Instance ID: $INSTANCE_ID" + +# Wait until running +aws ec2 wait instance-running --instance-ids $INSTANCE_ID --region ap-southeast-1 +``` + +> ⚠️ Only c8i / m8i / r8i instance families support nested virtualization (Intel 8th-gen platform only). + +### 1.2 Enable Nested Virtualization on an Existing Instance + +If you already have a running C8i instance, stop it first: + +```bash +aws ec2 stop-instances --region ap-southeast-1 --instance-ids $INSTANCE_ID +aws ec2 wait instance-stopped --region ap-southeast-1 --instance-ids $INSTANCE_ID + +aws ec2 modify-instance-cpu-options \ + --region ap-southeast-1 \ + --instance-id $INSTANCE_ID \ + --nested-virtualization enabled + +aws ec2 start-instances --region ap-southeast-1 --instance-ids $INSTANCE_ID +``` + +### 1.3 Verify KVM is Available + +```bash +ssh ubuntu@ +ls -la /dev/kvm +# Expected: crw-rw-rw- 1 root kvm 10, 232 ... + +# If permission denied: +sudo chmod 666 /dev/kvm +``` + +--- + +## 2. Install Firecracker + +```bash +curl -L -o fc.tgz https://github.com/firecracker-microvm/firecracker/releases/download/v1.15.0/firecracker-v1.15.0-x86_64.tgz +tar xzf fc.tgz +sudo mv release-v1.15.0-x86_64/firecracker-v1.15.0-x86_64 /usr/local/bin/firecracker +sudo mv release-v1.15.0-x86_64/jailer-v1.15.0-x86_64 /usr/local/bin/jailer +sudo chmod +x /usr/local/bin/firecracker /usr/local/bin/jailer +firecracker --version +# Expected: Firecracker v1.15.0 +``` + +--- + +## 3. Download Kernel + +```bash +mkdir -p ~/fc-exp +cd ~/fc-exp + +# Firecracker official quickstart kernel (4.14.174, fast boot) +curl -fsSL -o vmlinux.bin \ + https://s3.amazonaws.com/spec.ccfc.min/img/quickstart_guide/x86_64/kernels/vmlinux.bin + +ls -lh vmlinux.bin +# Expected: ~21MB +``` + +--- + +## 4. Build Rootfs with Docker + +Building with Docker is more reproducible and maintainable than debootstrap (similar to how E2B sandbox templates work). + +### 4.1 Install Docker + +```bash +# Remove conflicting packages +sudo apt remove $(dpkg --get-selections docker.io docker-compose docker-compose-v2 docker-doc podman-docker containerd runc 2>/dev/null | cut -f1) 2>/dev/null || true + +# Add Docker's official GPG key +sudo apt update +sudo apt install -y ca-certificates curl +sudo install -m 0755 -d /etc/apt/keyrings +sudo curl -fsSL https://download.docker.com/linux/ubuntu/gpg -o /etc/apt/keyrings/docker.asc +sudo chmod a+r /etc/apt/keyrings/docker.asc + +# Add Docker repository +sudo tee /etc/apt/sources.list.d/docker.sources < Dockerfile << 'EOF' +FROM ubuntu:22.04 + +ENV DEBIAN_FRONTEND=noninteractive + +RUN apt-get update -qq && \ + apt-get install -y --no-install-recommends \ + python3 python3-pip gcc libc6-dev && \ + pip3 install --no-cache-dir numpy pandas && \ + apt-get clean && rm -rf /var/lib/apt/lists/* + +# Compile statically-linked guest init (PID 1 inside the VM) +COPY init.c /init.c +RUN gcc -O2 -static -o /init /init.c && rm /init.c +EOF +``` + +### 4.3 Build and Export to ext4 + +```bash +cd ~/zeroboot-rootfs + +# Build image (~3 min, mostly pip install) +sudo docker build -t zeroboot-rootfs . + +# Verify static linking +sudo docker run --rm zeroboot-rootfs ldd /init +# Expected: not a dynamic executable + +# Export as tar +sudo docker create --name tmp-rootfs zeroboot-rootfs +sudo docker export tmp-rootfs -o rootfs.tar +sudo docker rm tmp-rootfs + +# Pack into ext4 image +cd ~/fc-exp +dd if=/dev/zero of=rootfs.ext4 bs=1M count=1500 status=progress +mkfs.ext4 -F rootfs.ext4 + +sudo mkdir -p /mnt/rootfs_out +sudo mount -o loop rootfs.ext4 /mnt/rootfs_out +sudo tar xf ~/zeroboot-rootfs/rootfs.tar -C /mnt/rootfs_out +sudo umount /mnt/rootfs_out + +ls -lh rootfs.ext4 +# Expected: ~1.5GB +``` + +### 4.4 Verify Rootfs + +```bash +sudo mount -o loop,ro rootfs.ext4 /mnt/rootfs_out +sudo chroot /mnt/rootfs_out python3 -c "import numpy, pandas; print('numpy', numpy.__version__, 'pandas', pandas.__version__)" +sudo chroot /mnt/rootfs_out ldd /init +sudo umount /mnt/rootfs_out +``` + +--- + +## 5. Build Zeroboot + +```bash +# Install dependencies (C toolchain + Rust) +sudo apt-get install -y build-essential +curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh -s -- -y +source ~/.cargo/env + +# Clone the repo +git clone -b feat/virtio-blk-filesystem \ + https://github.com/chaosreload/zeroboot.git ~/zeroboot +cd ~/zeroboot + +# Build in release mode (~30s) +cargo build --release + +ls -lh target/release/zeroboot +# Expected: ~1.8MB ELF binary +``` + +--- + +## 6. Create Template (Take Snapshot) + +```bash +# Drop page cache to avoid OOM during snapshot +echo 3 | sudo tee /proc/sys/vm/drop_caches > /dev/null + +mkdir -p ~/zeroboot-work + +# Take snapshot (~10-15 seconds) +# Args: +~/zeroboot/target/release/zeroboot template \ + ~/fc-exp/vmlinux.bin \ + ~/fc-exp/rootfs.ext4 \ + ~/zeroboot-work \ + 10 /init 512 + +# Expected output: +# Starting Firecracker... +# Firecracker VM started +# Waiting 10s for guest to boot... +# Pausing VM... +# Creating snapshot... +# Snapshot created: state=14312B, mem=512MB +# Template created in 13.xx s +``` + +Verify the output: + +```bash +ls -lh ~/zeroboot-work/snapshot/ +# vmstate (~14KB, CPU register state) +# mem (~512MB, memory image) + +cat ~/zeroboot-work/rootfs_path +# Should show the absolute path to rootfs.ext4 +``` + +--- + +## 7. Test Execution + +### 7.1 Basic echo + +```bash +echo 3 | sudo tee /proc/sys/vm/drop_caches > /dev/null + +~/zeroboot/target/release/zeroboot test-exec ~/zeroboot-work "echo hello" +# Expected: +# Fork time: ~1ms +# === Output === +# echo hello +# hello +# ZEROBOOT_DONE +``` + +### 7.2 Read a file (verify filesystem access) + +```bash +~/zeroboot/target/release/zeroboot test-exec ~/zeroboot-work "cat /etc/os-release" +# Expected: Ubuntu 22.04 release info +``` + +### 7.3 Execute Python code + +```bash +# Simple calculation +~/zeroboot/target/release/zeroboot test-exec ~/zeroboot-work "CODE:print(1+1)" +# Expected: 2 + +# Using numpy +~/zeroboot/target/release/zeroboot test-exec ~/zeroboot-work \ + "CODE:import numpy as np; print(np.array([1,2,3]).mean())" +# Expected: 2.0 + +# Write a file (verify CoW isolation — base image is never modified) +~/zeroboot/target/release/zeroboot test-exec ~/zeroboot-work \ + "CODE:open('/tmp/test','w').write('hello'); print(open('/tmp/test').read())" +# Expected: hello +``` + +--- + +## 8. Benchmark + +```bash +echo 3 | sudo tee /proc/sys/vm/drop_caches > /dev/null +~/zeroboot/target/release/zeroboot bench ~/zeroboot-work 2>/dev/null +# Reference numbers (c8i.xlarge, warm page cache): +# Fork P50: ~655µs ← sub-millisecond! +# Fork P99: ~996µs +# Fork + echo P50: ~5.8ms +# Memory per fork (100 concurrent): ~169KB +``` + +--- + +## 9. Start the API Server + +```bash +~/zeroboot/target/release/zeroboot serve ~/zeroboot-work 8080 +# Zeroboot API server listening on port 8080 +``` + +In another terminal: + +```bash +# Health check +curl localhost:8080/v1/health + +# Execute Python +curl -X POST localhost:8080/v1/exec \ + -H 'Content-Type: application/json' \ + -d '{"code": "print(1+1)"}' + +# Expected response: +# {"id":"...","stdout":"2","stderr":"","exit_code":0,"fork_time_ms":0.65,...} +``` + +--- + +## 10. How It Works + +``` +test-exec / serve + │ + ├─ load_snapshot() + │ ├─ sendfile(mem_file → memfd) [512MB, kernel-to-kernel, no user buffer] + │ └─ parse_vmstate() [CPU registers, virtio queue addresses] + │ + └─ fork_cow() [~1ms] + ├─ KVM: create_vm + create_irq_chip + ├─ mmap(memfd, MAP_PRIVATE) [CoW: shared reads, page fault on write] + ├─ Restore CPU state: sregs → XCRS → XSAVE → regs → LAPIC → MSRs + ├─ Create OverlayBlockDevice [per-fork in-memory CoW layer] + └─ VM run loop + ├─ IoOut/IoIn → 16550 UART (serial I/O) + └─ MmioWrite → VirtioBlk (filesystem I/O) +``` + +--- + +## Troubleshooting + +**Q: `ls: cannot access '/dev/kvm'`** +A: Nested virtualization is not enabled. Follow Step 1. + +**Q: `OOM Killed`** +A: Run `echo 3 | sudo tee /proc/sys/vm/drop_caches > /dev/null` to free page cache before taking the snapshot. + +**Q: `echo hello` works but `CODE:` hangs** +A: The snapshot was taken before Python finished booting. Increase `wait_secs` in Step 6 to 15. + +**Q: `Warning: snapshot CPUID rejected`** +A: Expected under nested virtualization. Harmless — suppress with `2>/dev/null`. + +**Q: `Too many open files (os error 24)` at 1000-concurrent bench** +A: Run `ulimit -n 65535` to raise the file descriptor limit. + +**Q: AWS CLI error `Unknown parameter in CpuOptions: NestedVirtualization`** +A: Upgrade AWS CLI to v2.34+.