Stateless GPU workstations on GCP. Pay only when you compute — $0.00 when idle.
FrostGPU implements a freeze-and-thaw pattern for GPU-heavy workloads. Your environment and data are frozen into cheap cold storage between sessions. When you're ready to work, thaw a fresh Spot VM in seconds — fully loaded, ready to run.
Provision → Work → Persist → Destroy. All in one lifecycle.
Most cloud GPUs bill you for the persistent disk as long as the VM exists, even when it's "Stopped." FrostGPU breaks the workstation into three cost-optimized layers:
- ❄️ Immutable Environment (Snapshots): Your OS, drivers, and
uv/pythonenvironments are baked into "Golden Images." Snapshots are compressed — a 50GB disk often results in a 10GB snapshot. - ⚡ Ephemeral Compute (Spot GPU): High-performance GPUs at Spot rates (~$0.18/hr for T4). Destroyed immediately after your session.
- 🗄️ Cold Progress (GCS Sync): Datasets and training outputs are synced to Regional GCS buckets ($0.02/GB) and mapped back to the VM disk on every boot.
For a standard workspace in europe-west2 (London) with a 10GB OS Snapshot and 90GB of GCS Data:
| Persistence Mode | Status | Monthly Cost |
|---|---|---|
| GCP Always On | Running 24/7 | ~$129.60 |
| RunPod/Lambda | Stopped Pod | ~$20.00 |
| Traditional GCP | Stopped VM (Disk only) | ~$10.00 |
| ❄️ FrostGPU | Snapshot + GCS (frozen) | ~$2.30 |
make up— Thaw your workstation- Finds the latest timestamped Golden Image
- Provisions a fresh Spot VM with your configured hardware
- Rsyncs your models/datasets from GCS to the local disk
make tunnel— Start working- Opens an SSH session with multi-port forwarding (Jupyter, WebUIs, Tensorboard) based on
SSH_FORWARDS
- Opens an SSH session with multi-port forwarding (Jupyter, WebUIs, Tensorboard) based on
make sync— Save mid-session- Pushes current progress to GCS
make down— Freeze and destroy- Final rsync to GCS and destruction of the VM
Before you begin, ensure you have:
- GCP Account: A project with billing enabled.
- gcloud CLI: Installed and authenticated (
gcloud auth login). - Project Quota: GPU quota in your target zone (e.g.,
NVIDIA_T4_GPUS,NVIDIA_L4_GPUS). - APIs Enabled: Compute Engine and Cloud Storage APIs must be active.
1. Configure Copy the base environment file and fill in your values.
cp .env.example .env
vi .env2. Initialize & Bake your Golden Image One-time setup — creates your infrastructure and bakes the first frozen environment.
make init # Creates GCS bucket + base VM
make ssh # Install your tools/libraries (ComfyUI, PyTorch, etc.)
make snapshot # Bakes the Golden Image and destroys the VM3. Daily Workflow
make up # Thaw — spin up your workstation
make tunnel # Work — open port tunnels (7860, 8888, etc.)
make down # Freeze — sync to GCS and destroy the VM4. Model Downloader (Cost Optimization)
Download large models without paying GPU rates. The downloader VM uses
GCS FUSE mounting (gcsfuse) — directories defined in SYNC_DIRS are
mounted directly onto GCS. Anything written to the local path lands in GCS in real-time, no manual sync required.
make dl-up # Launch a cheap e2-small VM with FUSE-mounted GCS dirs
make dl-ssh # SSH in and download models (writes go straight to GCS)
make dl-down # Unmount and destroy the instanceNote:
make dl-syncis a no-op for downloader VMs — it exits early with a warning since files are already in GCS via the FUSE mount.
Maintain multiple environments (e.g., L4 in Seoul vs. T4 in London) with multiple .env files.
1. Create a new environment
cp .env .env.t4
vi .env.t4 # Update ZONE, MACHINE_TYPE, ACCELERATOR, and VM_NAME2. Switch environments
Change the ENV variable at the top of the Makefile:
ENV ?= .env.t4 # Point to your new environmentAll commands (make up, make down, etc.) will automatically target that environment.
If you install new system libraries (apt) or global Python packages:
make upmake ssh→ install new toolsmake snapshot
A new timestamped image is created and used for all future boots.
Define ports in your .env to open them during make tunnel:
SSH_FORWARDS=8888:8888 6006:6006 7860:7860Map VM directories to GCS subdirectories in your .env:
SYNC_DIRS=/home/user/models:models /home/user/outputs:outputs- Slow init: SSH in and run
tail -f /var/log/gpu-driver-install.log. - Spot preemption: Your work is safe in GCS up to the last
make sync. Just runmake upto thaw a fresh VM. - Full cleanup: Delete all cloud resources with
make teardown.