You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
- Add Install section (binary curl + pip install)
- Fix misleading "30-50x network storage" claim — note it's situational
and only helps on cold reads from NFS/JuiceFS, not warm FUSE
- Add "When to Use It" section with honest guidance on where zerostart
wins (large GPU packages, repeated cold starts) and where it doesn't
(small packages, one-off scripts, local NVMe)
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Copy file name to clipboardExpand all lines: README.md
+29-1Lines changed: 29 additions & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -78,6 +78,19 @@ No resolution, no environment setup, no uv involved.
78
78
79
79
CUDA libraries (nvidia-cublas, nvidia-cudnn, nvidia-nccl, etc.) are ~6GB and identical across torch, vllm, and diffusers environments. zerostart caches extracted wheels at `$ZEROSTART_CACHE/shared_wheels/` and hardlinks them into new environments — so the second torch-based environment skips downloading those 6GB entirely.
Requires Linux + Python 3.10+ + `uv` (pre-installed on most GPU containers).
93
+
81
94
## Quick Start
82
95
83
96
```bash
@@ -141,9 +154,11 @@ Three transparent hooks eliminate the bottlenecks in standard model loading:
141
154
|------|--------|---------------|---------|
142
155
| Meta device init |`from_pretrained`| Skips random weight initialization (75% of load time) |~4x |
143
156
| Auto-cache |`from_pretrained`| Snapshots model on first load, mmap hydrate on repeat |~9x |
144
-
| Network volume fix |`safetensors.load_file`| Eager read instead of mmap on FUSE/NFS volumes | 30-50x on network storage|
157
+
| Network volume fix |`safetensors.load_file`| Eager read instead of mmap on NFS/JuiceFS (cold reads) | situational*|
145
158
| .bin conversion |`torch.load`| Converts legacy checkpoints to safetensors, mmaps on repeat |~2x |
146
159
160
+
*Network volume fix only helps on cold reads from network-backed filesystems where mmap page faults trigger network round-trips. On FUSE with warm page cache (most container providers), mmap is already fast.
161
+
147
162
### Benchmarks (model loading)
148
163
149
164
Measured on RTX A6000 with Qwen2.5-7B (15.2GB):
@@ -237,6 +252,19 @@ Performance knobs via environment variables:
237
252
ZS_PARALLEL_DOWNLOADS=32 ZS_CHUNK_MB=32 zerostart run -v -p torch test.py
238
253
```
239
254
255
+
## When to Use It
256
+
257
+
**Use zerostart when:**
258
+
- Deploying large GPU packages (torch, vllm, diffusers) on container providers
0 commit comments