|
| 1 | +# Container Persistence Design |
| 2 | + |
| 3 | +## Problem Statement |
| 4 | + |
| 5 | +Currently, workspaces lose user environment state on stop/start because: |
| 6 | +1. `docker run --rm` destroys the container when it exits |
| 7 | +2. Only `/workspaces` is mounted from persistent storage |
| 8 | +3. Home directory changes (`.claude`, `.ssh`, etc.) are lost |
| 9 | +4. Installed packages (`apt install`, `pip install`) are lost |
| 10 | + |
| 11 | +The current `persist-home.sh` approach only symlinks items that exist at build time - new dotfiles created later are not persisted. |
| 12 | + |
| 13 | +## Goals |
| 14 | + |
| 15 | +1. **Stop/Start preserves ALL state** - home directory, packages, configs |
| 16 | +2. **Rebuild from scratch still works** - explicit rebuild clears everything |
| 17 | +3. **Fresh environment variables on restart** - tokens change each start |
| 18 | +4. **No manual whitelist maintenance** - new dotfiles auto-persist |
| 19 | + |
| 20 | +## Proposed Solution: Container Persistence |
| 21 | + |
| 22 | +Instead of destroying the container on stop, keep it around and restart it. |
| 23 | + |
| 24 | +### Architecture |
| 25 | + |
| 26 | +``` |
| 27 | +First Boot: |
| 28 | + VM Start → docker run --name <container> envbuilder → Build → Init |
| 29 | +
|
| 30 | +Restart: |
| 31 | + VM Start → docker start <container> → Envbuilder runs again → Skip build → Init |
| 32 | +
|
| 33 | +Rebuild: |
| 34 | + VM Start → docker rm <container> → docker run --name <container> → Full build |
| 35 | +``` |
| 36 | + |
| 37 | +## Implementation Plan |
| 38 | + |
| 39 | +### Phase 1: Template Changes (No Envbuilder Modifications) |
| 40 | + |
| 41 | +**File: `scripts/vm/run-envbuilder.sh`** |
| 42 | + |
| 43 | +```bash |
| 44 | +CONTAINER_NAME="coder-${workspace_name}" |
| 45 | +ENV_FILE="/home/${linux_user}/env.txt" |
| 46 | +CONTAINER_ENV_FILE="/workspaces/.envbuilder-env" |
| 47 | + |
| 48 | +# Copy env to persistent location (for restarts) |
| 49 | +cp "$ENV_FILE" "$CONTAINER_ENV_FILE" |
| 50 | + |
| 51 | +# Check if container exists |
| 52 | +if docker container inspect "$CONTAINER_NAME" >/dev/null 2>&1; then |
| 53 | + echo "Container exists, starting..." |
| 54 | + docker start "$CONTAINER_NAME" |
| 55 | + # Wait for container to exit (envbuilder will exec into init) |
| 56 | + docker wait "$CONTAINER_NAME" |
| 57 | +else |
| 58 | + echo "Creating new container..." |
| 59 | + docker run \ |
| 60 | + --name "$CONTAINER_NAME" \ |
| 61 | + --net=host \ |
| 62 | + -h ${workspace_name} \ |
| 63 | + -v /home/${linux_user}/envbuilder:/workspaces \ |
| 64 | + -v /var/run/docker.sock:/var/run/docker.sock \ |
| 65 | + -v "$CONTAINER_ENV_FILE":/workspaces/.envbuilder-env:ro \ |
| 66 | + ${ssh_agent_mount} \ |
| 67 | + --env-file "$ENV_FILE" \ |
| 68 | + "$image" |
| 69 | +fi |
| 70 | +``` |
| 71 | + |
| 72 | +**Challenge:** Container restart uses original env vars, not fresh ones. |
| 73 | + |
| 74 | +### Phase 2: Envbuilder - Fresh Env on Restart |
| 75 | + |
| 76 | +Add new option to envbuilder: |
| 77 | + |
| 78 | +```go |
| 79 | +// options/options.go |
| 80 | +{ |
| 81 | + Flag: "env-file", |
| 82 | + Env: WithEnvPrefix("ENV_FILE"), |
| 83 | + Value: serpent.StringOf(&o.EnvFile), |
| 84 | + Description: "Path to environment file to source at startup. " + |
| 85 | + "This is read on every run, allowing fresh env vars on container restart.", |
| 86 | +}, |
| 87 | +``` |
| 88 | + |
| 89 | +**Behavior:** |
| 90 | +1. At startup, before any other processing, check if `ENVBUILDER_ENV_FILE` is set |
| 91 | +2. If set and file exists, source it (override current env vars) |
| 92 | +3. This allows fresh CODER_AGENT_TOKEN etc. on restart |
| 93 | + |
| 94 | +**File: `envbuilder.go` (early in `run()`):** |
| 95 | + |
| 96 | +```go |
| 97 | +// Load fresh environment from file if specified |
| 98 | +// This enables container restart with new tokens |
| 99 | +if opts.EnvFile != "" { |
| 100 | + if err := loadEnvFile(opts.Filesystem, opts.EnvFile); err != nil { |
| 101 | + opts.Logger(log.LevelWarn, "Failed to load env file %s: %v", opts.EnvFile, err) |
| 102 | + } else { |
| 103 | + opts.Logger(log.LevelInfo, "Loaded fresh environment from %s", opts.EnvFile) |
| 104 | + } |
| 105 | +} |
| 106 | +``` |
| 107 | + |
| 108 | +### Phase 3: Explicit Rebuild Mechanism |
| 109 | + |
| 110 | +**Template Parameter:** |
| 111 | +```hcl |
| 112 | +data "coder_parameter" "force_rebuild" { |
| 113 | + name = "force_rebuild" |
| 114 | + display_name = "Force Rebuild" |
| 115 | + description = "Delete container and rebuild from scratch. Use after devcontainer.json changes." |
| 116 | + type = "bool" |
| 117 | + default = "false" |
| 118 | + mutable = true |
| 119 | + ephemeral = true # Resets after each build |
| 120 | + order = 30 |
| 121 | +} |
| 122 | +``` |
| 123 | + |
| 124 | +**Startup Script:** |
| 125 | +```bash |
| 126 | +if [ "${force_rebuild}" = "true" ]; then |
| 127 | + echo "Force rebuild requested, removing existing container..." |
| 128 | + docker rm -f "$CONTAINER_NAME" 2>/dev/null || true |
| 129 | +fi |
| 130 | +``` |
| 131 | + |
| 132 | +### Phase 4: Caching Wins (Future Enhancement) |
| 133 | + |
| 134 | +With container persistence, we could add: |
| 135 | + |
| 136 | +1. **Commit on graceful stop** - Save runtime state to image |
| 137 | + ```bash |
| 138 | + # In a shutdown hook |
| 139 | + docker commit "$CONTAINER_NAME" "${cache_repo}:${workspace_id}-runtime" |
| 140 | + ``` |
| 141 | + |
| 142 | +2. **Use runtime image on next start** - Faster than rebuilding |
| 143 | + ```bash |
| 144 | + # Check for runtime cache |
| 145 | + if docker pull "${cache_repo}:${workspace_id}-runtime" 2>/dev/null; then |
| 146 | + builder_image="${cache_repo}:${workspace_id}-runtime" |
| 147 | + fi |
| 148 | + ``` |
| 149 | + |
| 150 | +## Migration Path |
| 151 | + |
| 152 | +1. **v0.2.0**: Add `ENVBUILDER_ENV_FILE` support |
| 153 | +2. **Template update**: Remove `--rm`, add container naming and lifecycle |
| 154 | +3. **v0.3.0** (optional): Add commit-on-stop for runtime caching |
| 155 | + |
| 156 | +## Backward Compatibility |
| 157 | + |
| 158 | +- `ENVBUILDER_ENV_FILE` is optional - existing deployments work unchanged |
| 159 | +- Template changes are opt-in per workspace |
| 160 | +- `ENVBUILDER_SKIP_REBUILD=true` should be set for container persistence to skip build phase |
| 161 | + |
| 162 | +## Testing Plan |
| 163 | + |
| 164 | +1. Create workspace, install `claude` (creates `~/.claude`) |
| 165 | +2. Stop workspace |
| 166 | +3. Start workspace |
| 167 | +4. Verify `~/.claude` exists with all contents |
| 168 | +5. Verify fresh CODER_AGENT_TOKEN works |
| 169 | +6. Test explicit rebuild clears everything |
| 170 | + |
| 171 | +## Open Questions |
| 172 | + |
| 173 | +1. **Container cleanup on delete** - Need to ensure container is removed when workspace is deleted |
| 174 | +2. **Disk space** - Container filesystems can grow; may need periodic cleanup |
| 175 | +3. **Image updates** - How to handle base image updates? Force rebuild? |
0 commit comments