Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
61 changes: 61 additions & 0 deletions .github/workflows/release.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,61 @@
name: Release

on:
push:
tags:
- 'v*'

permissions:
contents: write

jobs:
build-and-release:
name: Build and Release
runs-on: ubuntu-22.04

steps:
- name: Checkout
uses: actions/checkout@v4

- name: Install Rust
uses: dtolnay/rust-toolchain@stable
with:
targets: x86_64-unknown-linux-musl

- name: Install musl tools
run: sudo apt-get install -y musl-tools

- name: Build (static binary)
run: |
cargo build --release --target x86_64-unknown-linux-musl
strip target/x86_64-unknown-linux-musl/release/zeroboot

- name: Rename binary
run: |
VERSION=${GITHUB_REF_NAME}
cp target/x86_64-unknown-linux-musl/release/zeroboot \
zeroboot-${VERSION}-x86_64-linux

- name: Create Release
uses: softprops/action-gh-release@v2
with:
name: "${{ github.ref_name }}"
body: |
## Install

```bash
VERSION=${{ github.ref_name }}
curl -fsSL "https://github.com/${{ github.repository }}/releases/download/${VERSION}/zeroboot-${VERSION}-x86_64-linux" \
-o /usr/local/bin/zeroboot
chmod +x /usr/local/bin/zeroboot
zeroboot --help
```

## Requirements
- Linux x86_64
- `/dev/kvm` available (bare metal or EC2 c8i/m8i/r8i with nested virtualization)
- Firecracker: `v1.12+` (see [install guide](docs/QUICKSTART.md))
files: |
zeroboot-${{ github.ref_name }}-x86_64-linux
draft: false
prerelease: false
16 changes: 9 additions & 7 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -33,11 +33,12 @@ curl -X POST https://api.zeroboot.dev/v1/exec \

| Metric | Zeroboot | E2B | microsandbox | Daytona |
|---|---|---|---|---|
| Spawn latency p50 | **0.79ms** | ~150ms | ~200ms | ~27ms |
| Spawn latency p99 | 1.74ms | ~300ms | ~400ms | ~90ms |
| Memory per sandbox | ~265KB | ~128MB | ~50MB | ~50MB |
| Fork + exec (Python) | **~8ms** | - | - | - |
| 1000 concurrent forks | 815ms | - | - | - |
| Spawn latency p50 | **0.65ms** | ~150ms | ~200ms | ~27ms |
| Spawn latency p99 | 1.00ms | ~300ms | ~400ms | ~90ms |
| Memory per sandbox | ~169KB | ~128MB | ~50MB | ~50MB |
| Fork + echo (serial) | **~5.8ms** | - | - | - |
| Fork + Python exec | **~205ms** | - | - | - |
| Filesystem (cat file) | **~30ms** | N/A | - | - |

Each sandbox is a real KVM virtual machine with hardware-enforced memory isolation.

Expand All @@ -49,8 +50,9 @@ Each sandbox is a real KVM virtual machine with hardware-enforced memory isolati
```

1. **Template** (one-time): Firecracker boots a VM, pre-loads your runtime, and snapshots memory + CPU state
2. **Fork** (~0.8ms): Creates a new KVM VM, maps snapshot memory as CoW, restores all CPU state
3. **Isolation**: Each fork is a separate KVM VM with hardware-enforced memory isolation
2. **Fork** (~0.8ms): Creates a new KVM VM, maps snapshot memory as CoW, restores CPU state + virtio-blk device
3. **Filesystem**: Each fork has an independent overlay block device — reads hit the shared base image, writes are isolated in-memory
4. **Isolation**: Each fork is a separate KVM VM with hardware-enforced memory isolation

## SDKs

Expand Down
17 changes: 17 additions & 0 deletions docs/API.md
Original file line number Diff line number Diff line change
Expand Up @@ -113,3 +113,20 @@ Authorization: Bearer zb_live_key1
- If no keys file exists, auth is disabled
- Invalid or missing keys return **HTTP 401**
- Rate limited at **100 req/s per key** (HTTP 429)

## Filesystem Access

Sandboxes have a full read-write filesystem backed by the rootfs image specified at template
creation time. Each fork gets an independent overlay so writes are isolated:

```python
# Works inside CODE: commands
import os
os.makedirs("/tmp/mydir", exist_ok=True)
with open("/tmp/mydir/output.txt", "w") as f:
f.write("hello from sandbox")
print(open("/tmp/mydir/output.txt").read()) # hello from sandbox
```

Writes are discarded when the fork ends — the base image is never modified.
To persist outputs, include them in the stdout response.
112 changes: 110 additions & 2 deletions docs/ARCHITECTURE.md
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,7 @@
│ 4. Restore CPU: sregs → XCRS → XSAVE │
│ → regs → LAPIC → MSRs → MP state │
│ 5. Serial I/O via 16550 UART emulation │
│ 6. Virtio-blk MMIO + overlay CoW disk │
└──────────────┬──────────────────────────┘
┌───────────────────┼───────────────────┐
Expand Down Expand Up @@ -54,12 +55,14 @@ Each fork gets its own KVM VM with private memory pages. Writes trigger CoW page

| File | Purpose |
|---|---|
| `src/vmm/kvm.rs` | Fork engine: KVM VM + CoW mmap + CPU state restore |
| `src/vmm/vmstate.rs` | Firecracker vmstate parser with auto-detect offsets |
| `src/vmm/kvm.rs` | Fork engine: KVM VM + CoW mmap + CPU state restore + virtio-blk integration |
| `src/vmm/vmstate.rs` | Firecracker vmstate parser: auto-detect offsets + virtio queue addr detection |
| `src/vmm/virtio_blk.rs` | **Virtio-blk MMIO emulator + overlay CoW block device** |
| `src/vmm/firecracker.rs` | Template creation via Firecracker API |
| `src/vmm/serial.rs` | 16550 UART emulation for guest I/O |
| `src/api/handlers.rs` | HTTP API: exec, batch, health, metrics, auth |
| `src/main.rs` | CLI: template, test-exec, bench, serve |
| `guest/init.c` | Guest PID 1: serial command dispatcher + CODE: execution via popen(python3) |
| `sdk/python/` | Python SDK (zero dependencies) |
| `sdk/node/` | TypeScript SDK (zero dependencies, uses fetch) |
| `deploy/` | systemd service + fleet deploy script |
Expand All @@ -81,3 +84,108 @@ Firecracker's CPUID filtering confuses numpy's runtime CPU feature detection. Se
### IOAPIC Restore Pattern

Don't zero-init `kvm_irqchip`. Use `KVM_GET_IRQCHIP` first, then overwrite the redirect table entries from the snapshot, then `KVM_SET_IRQCHIP`. Zero-initializing corrupts other irqchip state and causes interrupt routing failures.

## Virtio-Blk Filesystem Emulation

Zeroboot implements a full virtio-blk MMIO device emulator (`src/vmm/virtio_blk.rs`) so each
forked VM has a working filesystem without relying on Firecracker at runtime.

### Data Flow

```
rootfs.ext4 (read-only, shared across all forks via Arc<File>)
|
v
OverlayBlockDevice (per-fork in-memory CoW layer)
| read: check overlay HashMap<sector, Vec<u8>> first
| on miss: pread() from shared base image
| write: insert sector into overlay (base image untouched)
v
VirtioBlk MMIO emulator (handles KVM_EXIT_MmioWrite / MmioRead)
| guest writes QueueNotify to 0xC0001000+0x050
| -> read avail ring -> parse descriptor chain
| -> dispatch read/write/flush -> update used ring -> inject IRQ (GSI 5)
v
Guest kernel ext4 (transparent block device /dev/vda)
```

### Key Algorithms

**Overlay CoW isolation**
- Each fork owns a `HashMap<u64, Vec<u8>>` keyed by 512-byte sector number
- Writes only touch the overlay; the shared base image is opened O_RDONLY
- Forks never observe each other's writes; overlay is freed when the fork is dropped

**VIRTIO_F_EVENT_IDX suppression fix**
- The guest uses event index suppression to avoid redundant `QueueNotify` writes
- After draining the queue, the emulator writes `last_avail_idx` into the `avail_event`
field of the used ring header — this tells the guest "notify me for the next request"
- Without this update, only the first I/O per wake-up would be processed

**Vmstate queue address detection (Firecracker v1.12 + v1.15)**
- v1.15 serializes `GuestAddress` with a 2-byte Versionize prefix: `[0x02][u32_LE]`
- Parser searches for pattern `[02][u32][02][u32][02][u32]` (desc/avail/used ring GPA)
- Falls back to 3-consecutive-raw-u64 format for Firecracker v1.12 compatibility

### Guest Init Protocol

`guest/init.c` is a statically-linked PID 1 that mounts filesystems and listens on `/dev/ttyS0`:

| Host sends | Guest action | Guest responds |
|---|---|---|
| `CODE:<python_code>\n` | Writes code to `/tmp/zb_code.py`, runs `python3 /tmp/zb_code.py 2>&1` | stdout + stderr |
| `echo <text>\n` | Writes text to serial | text |
| `cat <path>\n` | Opens file, reads to serial | file contents |
| *(any command)* | After response | `ZEROBOOT_DONE\n` |

### Building a Custom Rootfs

```bash
# 1. Bootstrap Ubuntu 22.04 minimal
sudo debootstrap --arch=amd64 jammy /tmp/rootfs http://archive.ubuntu.com/ubuntu/
echo "deb http://archive.ubuntu.com/ubuntu jammy main universe" | sudo tee /tmp/rootfs/etc/apt/sources.list

# 2. Install Python 3 + scientific packages
sudo chroot /tmp/rootfs apt-get update -qq
sudo chroot /tmp/rootfs apt-get install -y python3 python3-pip gcc
sudo chroot /tmp/rootfs pip3 install numpy pandas

# 3. Compile and install guest init (statically linked)
sudo cp guest/init.c /tmp/rootfs/init.c
sudo chroot /tmp/rootfs gcc -O2 -static -o /init /init.c
sudo rm /tmp/rootfs/init.c

# 4. Package as ext4 image (~1.5 GB)
dd if=/dev/zero of=rootfs.ext4 bs=1M count=1500
mkfs.ext4 -F rootfs.ext4
sudo mount -o loop rootfs.ext4 /mnt/out
sudo cp -a /tmp/rootfs/. /mnt/out/
sudo umount /mnt/out
```

### Creating a Template

```bash
# Boot the VM via Firecracker, wait for guest to reach the serial listen loop, snapshot
./zeroboot template vmlinux.bin rootfs.ext4 ./workdir 10 /init 512
# <kernel> <rootfs> <workdir> <wait_s> <init> <mem_mib>
```

`workdir/rootfs_path` is written automatically and picked up by `bench` / `serve` / `test-exec`.

### Observed Performance (c8i.xlarge, nested virtualization)

| Metric | Value |
|---|---|
| Pure CoW mmap P50 | 0.7 µs |
| Full fork (KVM + CPU restore) P50 | **655 µs** |
| Full fork P99 | **996 µs** |
| Fork + echo hello P50 | 5.8 ms |
| Fork + CODE:print(1+1) | ~205 ms |
| Fork + CODE:import numpy; ... | ~450 ms |
| Fork + cat /etc/os-release | ~30 ms |
| Memory per fork (100 concurrent) | ~169 KB |

Python exec latency (200–450 ms) reflects on-demand `.so` loading through virtio-blk.
This is a one-time cost per fork; pages are reused across CoW forks once warm.
**Planned optimization**: pre-warm all `.so` pages before snapshotting → zero disk I/O after fork.
Loading