Skip to content

Commit f299a9d

Browse files
README: add install instructions, honest limitations, when-to-use guide
- Add Install section (binary curl + pip install) - Fix misleading "30-50x network storage" claim — note it's situational and only helps on cold reads from NFS/JuiceFS, not warm FUSE - Add "When to Use It" section with honest guidance on where zerostart wins (large GPU packages, repeated cold starts) and where it doesn't (small packages, one-off scripts, local NVMe) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
1 parent afebcba commit f299a9d

File tree

1 file changed

+29
-1
lines changed

1 file changed

+29
-1
lines changed

README.md

Lines changed: 29 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -78,6 +78,19 @@ No resolution, no environment setup, no uv involved.
7878

7979
CUDA libraries (nvidia-cublas, nvidia-cudnn, nvidia-nccl, etc.) are ~6GB and identical across torch, vllm, and diffusers environments. zerostart caches extracted wheels at `$ZEROSTART_CACHE/shared_wheels/` and hardlinks them into new environments — so the second torch-based environment skips downloading those 6GB entirely.
8080

81+
## Install
82+
83+
```bash
84+
# Rust binary (fast wheel installation + streaming)
85+
curl -fsSL https://github.com/gpu-cli/zerostart/releases/latest/download/zerostart-linux-x86_64 \
86+
-o /usr/local/bin/zerostart && chmod +x /usr/local/bin/zerostart
87+
88+
# Python SDK (model acceleration, integrations)
89+
pip install zerostart
90+
```
91+
92+
Requires Linux + Python 3.10+ + `uv` (pre-installed on most GPU containers).
93+
8194
## Quick Start
8295

8396
```bash
@@ -141,9 +154,11 @@ Three transparent hooks eliminate the bottlenecks in standard model loading:
141154
|------|--------|---------------|---------|
142155
| Meta device init | `from_pretrained` | Skips random weight initialization (75% of load time) | ~4x |
143156
| Auto-cache | `from_pretrained` | Snapshots model on first load, mmap hydrate on repeat | ~9x |
144-
| Network volume fix | `safetensors.load_file` | Eager read instead of mmap on FUSE/NFS volumes | 30-50x on network storage |
157+
| Network volume fix | `safetensors.load_file` | Eager read instead of mmap on NFS/JuiceFS (cold reads) | situational* |
145158
| .bin conversion | `torch.load` | Converts legacy checkpoints to safetensors, mmaps on repeat | ~2x |
146159

160+
*Network volume fix only helps on cold reads from network-backed filesystems where mmap page faults trigger network round-trips. On FUSE with warm page cache (most container providers), mmap is already fast.
161+
147162
### Benchmarks (model loading)
148163

149164
Measured on RTX A6000 with Qwen2.5-7B (15.2GB):
@@ -237,6 +252,19 @@ Performance knobs via environment variables:
237252
ZS_PARALLEL_DOWNLOADS=32 ZS_CHUNK_MB=32 zerostart run -v -p torch test.py
238253
```
239254

255+
## When to Use It
256+
257+
**Use zerostart when:**
258+
- Deploying large GPU packages (torch, vllm, diffusers) on container providers
259+
- Cold starts matter — spot instances, CI/CD, autoscaling
260+
- Loading large models (7B+) repeatedly — `accelerate()` gives 8-9x speedup
261+
- You want warm starts under 4 seconds for 177-package stacks
262+
263+
**Don't bother when:**
264+
- Small packages (ruff, black, httpie) — uvx is fast enough, zerostart adds ~0.5s overhead
265+
- One-off scripts that don't repeat — cold start optimization doesn't pay off
266+
- You're already on local NVMe with models in page cache — mmap is already fast
267+
240268
## Requirements
241269

242270
- Linux (container GPU providers: RunPod, Vast.ai, Lambda, etc.)

0 commit comments

Comments
 (0)