❄️ FrostGPU

Stateless GPU workstations on GCP. Pay only when you compute — $0.00 when idle.

FrostGPU implements a freeze-and-thaw pattern for GPU-heavy workloads. Your environment and data are frozen into cheap cold storage between sessions. When you're ready to work, thaw a fresh Spot VM in seconds — fully loaded, ready to run.

Provision → Work → Persist → Destroy. All in one lifecycle.

Architecture: Freeze the State, Rent the Compute

Most cloud GPUs bill you for the persistent disk as long as the VM exists, even when it's "Stopped." FrostGPU breaks the workstation into three cost-optimized layers:

❄️ Immutable Environment (Snapshots): Your OS, drivers, and uv/python environments are baked into "Golden Images." Snapshots are compressed — a 50GB disk often results in a 10GB snapshot.
⚡ Ephemeral Compute (Spot GPU): High-performance GPUs at Spot rates (~$0.18/hr for T4). Destroyed immediately after your session.
🗄️ Cold Progress (GCS Sync): Datasets and training outputs are synced to Regional GCS buckets ($0.02/GB) and mapped back to the VM disk on every boot.

The Economics of Cold Persistence

For a standard workspace in europe-west2 (London) with a 10GB OS Snapshot and 90GB of GCS Data:

Persistence Mode	Status	Monthly Cost
GCP Always On	Running 24/7	~$129.60
RunPod/Lambda	Stopped Pod	~$20.00
Traditional GCP	Stopped VM (Disk only)	~$10.00
❄️ FrostGPU	Snapshot + GCS (frozen)	~$2.30

Session Lifecycle

make up — Thaw your workstation
- Finds the latest timestamped Golden Image
- Provisions a fresh Spot VM with your configured hardware
- Rsyncs your models/datasets from GCS to the local disk
make tunnel — Start working
- Opens an SSH session with multi-port forwarding (Jupyter, WebUIs, Tensorboard) based on SSH_FORWARDS
make sync — Save mid-session
- Pushes current progress to GCS
make down — Freeze and destroy
- Final rsync to GCS and destruction of the VM

📋 Prerequisites

Before you begin, ensure you have:

GCP Account: A project with billing enabled.
gcloud CLI: Installed and authenticated (gcloud auth login).
Project Quota: GPU quota in your target zone (e.g., NVIDIA_T4_GPUS, NVIDIA_L4_GPUS).
APIs Enabled: Compute Engine and Cloud Storage APIs must be active.

Getting Started

1. Configure Copy the base environment file and fill in your values.

cp .env.example .env
vi .env

2. Initialize & Bake your Golden Image One-time setup — creates your infrastructure and bakes the first frozen environment.

make init     # Creates GCS bucket + base VM
make ssh      # Install your tools/libraries (ComfyUI, PyTorch, etc.)
make snapshot # Bakes the Golden Image and destroys the VM

3. Daily Workflow

make up       # Thaw — spin up your workstation
make tunnel   # Work — open port tunnels (7860, 8888, etc.)
make down     # Freeze — sync to GCS and destroy the VM

4. Model Downloader (Cost Optimization) Download large models without paying GPU rates. The downloader VM uses GCS FUSE mounting (gcsfuse) — directories defined in SYNC_DIRS are mounted directly onto GCS. Anything written to the local path lands in GCS in real-time, no manual sync required.

make dl-up    # Launch a cheap e2-small VM with FUSE-mounted GCS dirs
make dl-ssh   # SSH in and download models (writes go straight to GCS)
make dl-down  # Unmount and destroy the instance

Note: make dl-sync is a no-op for downloader VMs — it exits early with a warning since files are already in GCS via the FUSE mount.

🌐 Multi-Environment Management

Maintain multiple environments (e.g., L4 in Seoul vs. T4 in London) with multiple .env files.

1. Create a new environment

cp .env .env.t4
vi .env.t4  # Update ZONE, MACHINE_TYPE, ACCELERATOR, and VM_NAME

2. Switch environments Change the ENV variable at the top of the Makefile:

ENV ?= .env.t4  # Point to your new environment

All commands (make up, make down, etc.) will automatically target that environment.

⚙️ Advanced: Power User Workflows

🔄 Re-baking the Golden Image

If you install new system libraries (apt) or global Python packages:

make up
make ssh → install new tools
make snapshot

A new timestamped image is created and used for all future boots.

🔌 Arbitrary Port Tunneling

Define ports in your .env to open them during make tunnel:

SSH_FORWARDS=8888:8888 6006:6006 7860:7860

📂 Directory Syncing

Map VM directories to GCS subdirectories in your .env:

SYNC_DIRS=/home/user/models:models /home/user/outputs:outputs

🛠️ Troubleshooting & Logs

Slow init: SSH in and run tail -f /var/log/gpu-driver-install.log.
Spot preemption: Your work is safe in GCS up to the last make sync. Just run make up to thaw a fresh VM.
Full cleanup: Delete all cloud resources with make teardown.

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
scripts		scripts
.env.example		.env.example
.gitignore		.gitignore
Makefile		Makefile
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

❄️ FrostGPU

Architecture: Freeze the State, Rent the Compute

The Economics of Cold Persistence

Session Lifecycle

📋 Prerequisites

Getting Started

🌐 Multi-Environment Management

⚙️ Advanced: Power User Workflows

🔄 Re-baking the Golden Image

🔌 Arbitrary Port Tunneling

📂 Directory Syncing

🛠️ Troubleshooting & Logs

About

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

❄️ FrostGPU

Architecture: Freeze the State, Rent the Compute

The Economics of Cold Persistence

Session Lifecycle

📋 Prerequisites

Getting Started

🌐 Multi-Environment Management

⚙️ Advanced: Power User Workflows

🔄 Re-baking the Golden Image

🔌 Arbitrary Port Tunneling

📂 Directory Syncing

🛠️ Troubleshooting & Logs

About

Resources

Uh oh!

Stars

Watchers

Forks

Contributors

Uh oh!

Languages