This document helps AI coding agents work effectively with the envbox project. Read this first before making changes.
ALWAYS pull latest main and create a feature branch. NEVER push directly to main.
Before starting any work, ALWAYS do this:
# 1. Get latest main
git checkout main
git pull origin main
# 2. Create feature branch from latest main
git checkout -b your-branch-name
# 3. Make your changes and commit them
# ... do your work ...
git add .
git commit -m "your message"
# 4. Push to YOUR branch (not main!)
git push origin your-branch-name
# 5. Create a Pull Request for reviewDO NOT:
- ❌ Start work without pulling latest main first
- ❌ Run
git push origin main- this pushes directly to main and bypasses code review - ❌ Create branches from stale/outdated main branches
envbox enables running non-privileged containers capable of running system-level software (e.g., dockerd, systemd) in Kubernetes. It wraps Nestybox sysbox to provide secure Docker-in-Docker capabilities.
Primary use case: Coder workspaces, though the project is general-purpose.
┌─────────────────────────────────────────────────────────────┐
│ Outer Container (Privileged) │
│ - Runs on the Kubernetes node │
│ - Starts sysbox-mgr, sysbox-fs, and dockerd │
│ - Managed by the envbox binary (/envbox) │
│ - Has elevated privileges to manage namespaces │
│ │
│ ┌────────────────────────────────────────────────────────┐ │
│ │ Inner Container (Unprivileged - MUST STAY SECURE) │ │
│ │ - User's actual workspace/workload │ │
│ │ - Created via sysbox runtime │ │
│ │ - Runs dockerd, systemd, or other system software │ │
│ │ - NEVER privileged - this is the security model │ │
│ │ - Name: "workspace_cvm" (InnerContainerName) │ │
│ └────────────────────────────────────────────────────────┘ │
└─────────────────────────────────────────────────────────────┘
- Two-tier container model: Outer (privileged) manages inner (unprivileged)
- Security boundary: The inner container must remain unprivileged - this is non-negotiable
- Sysbox integration: The outer container runs sysbox components that enable the inner container to run system-level software securely
- User namespace mapping: UID/GID offset of 100000 (UserNamespaceOffset constant)
NEVER make the inner container privileged.
The entire premise of envbox is providing secure system-level capabilities without granting actual privilege. The inner container is explicitly set to Privileged: false (see cli/docker.go:732). Any change that compromises this breaks the security model.
.
├── cmd/envbox/ # Main entry point
├── cli/ # CLI commands (docker.go is the main orchestrator - 1000+ lines)
├── dockerutil/ # Docker API client wrappers
├── xunix/ # Linux-specific utilities (GPU, mounts, devices)
├── background/ # Process management for sysbox components
├── sysboxutil/ # Sysbox manager interaction
├── buildlog/ # Build logging utilities
├── slogkubeterminate/ # Kubernetes termination signal handling
├── integration/ # Integration tests (require VM/physical machine)
│ └── integrationtest/ # Test helper package - maintain and improve this API
├── deploy/ # Dockerfile and deployment files
├── scripts/ # Build and utility scripts
└── xhttp/xio/ # HTTP and I/O utilities
cli/docker.go: Main orchestration logic - starts sysbox, manages inner container lifecyclecmd/envbox/main.go: Entry point that callscli.Root()integration/docker_test.go: Primary integration test suiteMakefile: Build targets and sysbox version pinningdeploy/Dockerfile: Multi-stage build for envbox image
- Go 1.24+
- Docker installed
- VM or physical machine for integration tests (Docker-in-Docker won't work for testing envbox)
- Linux kernel with seccomp API level >= 5
# Build the envbox binary
make build/envbox
# Run unit tests
make test
# Run integration tests (REQUIRED before PR)
CODER_TEST_INTEGRATION=1 make test-integration
# Format code (gofumpt + markdownfmt)
make fmt
# Build Docker image
make build/image/envbox
# Clean build artifacts
make clean✅ Run make fmt to format code
✅ Run make test for unit tests
✅ Run CODER_TEST_INTEGRATION=1 make test-integration (critical!)
✅ Verify golangci-lint passes (CI will check)
✅ Update documentation if adding features
✅ If changing environment variables, update README.md table
✅ Push to feature branch, NOT main
Integration tests validate actual container behavior and are the most important validation. Unit tests are for input/output validation, but integration tests ensure correctness.
Why integration tests matter:
- They test the actual outer/inner container interaction
- They validate sysbox integration
- They catch subtle namespace, mount, and device issues
- They ensure GPU passthrough works correctly
# Set the environment variable to enable integration tests
export CODER_TEST_INTEGRATION=1
# Run all integration tests
make test-integration
# Run specific test
go test -v -count=1 ./integration/ -run TestDocker/DockerdUse the integration/integrationtest package helpers:
import (
"github.com/coder/envbox/integration/integrationtest"
)
func TestMyFeature(t *testing.T) {
t.Parallel()
if val, ok := os.LookupEnv("CODER_TEST_INTEGRATION"); !ok || val != "1" {
t.Skip("integration tests are skipped unless CODER_TEST_INTEGRATION=1")
}
pool, err := dockertest.NewPool("")
require.NoError(t, err)
tmpdir := integrationtest.TmpDir(t)
binds := integrationtest.DefaultBinds(t, tmpdir)
// Run envbox
resource := integrationtest.RunEnvbox(t, pool, &integrationtest.CreateDockerCVMConfig{
Image: integrationtest.DockerdImage,
Username: "root",
OuterMounts: binds,
})
// Wait for inner container's docker daemon
integrationtest.WaitForCVMDocker(t, pool, resource, time.Minute)
// Your test logic here
}Integration test helpers:
TmpDir(t)- Creates temporary directory (handles cleanup)MkdirAll(t, paths...)- Creates directories safelyWriteFile(t, path, contents)- Writes test filesRunEnvbox(t, pool, config)- Starts envbox containerWaitForCVMDocker(t, pool, resource, timeout)- Waits for inner dockerdDefaultBinds(t, tmpdir)- Creates standard volume binds
Unit tests are for:
- Input validation and parsing (e.g., mount string parsing)
- Pure functions without side effects
- Mock-based testing of Docker API calls
Patterns:
- Use table-driven tests
- Mock external dependencies (see
dockerutil/dockerfake/) - Test edge cases and error conditions
-
Define constant in
cli/docker.go:EnvMyNewFeature = "CODER_MY_NEW_FEATURE"
-
Add to
dockerCmd.Flags()section:cliflag.StringVarP(cmd.Flags(), &flags.myFeature, "my-feature", "", EnvMyNewFeature, "default", "Description")
-
Use in container creation logic (around line 730+ in
cli/docker.go) -
Update README.md environment variable table
-
Add integration test to verify behavior
- Parse mount in
parseMounts()function - Add mount to
xunix.Mountslice - Pass to inner container via
Mountsfield - Test with integration test
- Detection logic goes in
xunix/gpu.goorxunix/device.go - Use regex patterns to identify relevant mounts/libraries
- Pass devices via
Resources.Devicesin container config - Mount libraries via
Mounts - Test with actual GPU hardware (integration test may need special setup)
- Reproduce: Write integration test that fails with the bug
- Fix: Make minimal changes to fix the issue
- Verify: Ensure integration test now passes
- Check: Run full test suite (
make test && make test-integration) - Document: Add comments explaining non-obvious fixes
- Update
README.mdfor user-facing changes - Add inline comments for complex logic
- Update this
AGENTS.mdif development patterns change - Keep examples current with actual code
The main orchestration logic (1000+ lines). Key responsibilities:
- Starts sysbox-mgr and sysbox-fs background processes
- Starts dockerd in outer container
- Waits for sysbox manager to be ready
- Pulls inner container image
- Creates and starts inner container with proper configuration
- Forwards signals to inner container
- Handles bootstrap script execution
Important functions:
dockerCmd()- Main command logicdockerdArgs()- Generates dockerd argumentsparseMounts()- Parses CODER_MOUNTS environment variable- Inner container creation (around line 730)
Docker API client wrappers and utilities:
client.go- Docker client creation and managementcontainer.go- Container operationsimage.go- Image pulling and metadata (including OS detection for GPU passthrough)daemon.go- Dockerd process managementregistry.go- Registry authentication and image pull secretsexec.go- Container exec operationsnetwork.go- Network configuration
Architecture-specific files:
image_linux_amd64.go- AMD64-specific usr lib detectionimage_linux_arm64.go- ARM64-specific usr lib detection
Linux-specific utilities for system interactions:
gpu.go- GPU detection and mount identification (regex-based)device.go- Device handling (/dev/fuse, /dev/net/tun, etc.)mount.go- Mount point handlingsys.go- System information (kernel version, etc.)user.go- User/group operationsexec.go- Process executionfs.go- Filesystem abstractions (can be mocked withxunixfake/)env.go- Environment variable utilitieserror.go- "No space left on device" detection
GPU detection patterns (see gpu.go):
- Mounts matching:
(?i)(nvidia|vulkan|cuda) - Libraries matching:
(?i)(libgl(e|sx|\.)|nvidia|vulkan|cuda) - Shared objects:
\.so(\.[0-9\.]+)?$
Process management for long-running background processes (sysbox-mgr, sysbox-fs, dockerd):
process.go- Process abstraction with stdout/stderr capture- Handles process lifecycle and monitoring
- Logs process output for debugging
Important: Maintain and improve this API to make integration tests easier to write.
Current helpers:
TmpDir(t)- Temporary directory creationMkdirAll(t, paths...)- Directory creationWriteFile(t, path, contents)- File writingRunEnvbox(t, pool, config)- Envbox container startupWaitForCVMDocker(t, pool, resource, timeout)- Dockerd readiness checkDefaultBinds(t, tmpdir)- Standard volume bindsCreateCoderToken(t)- Coder agent token creation- Certificate handling utilities
Future improvements should make common testing patterns easier.
All configuration uses CODER_* prefixed environment variables. See README.md for complete list.
CODER_INNER_IMAGE- Inner container image (required)CODER_INNER_USERNAME- Inner container username (required)CODER_AGENT_TOKEN- Coder agent token (required for Coder integration)CODER_AGENT_URL- Coder deployment URLCODER_BOOTSTRAP_SCRIPT- Script to run in inner container (typically starts agent)
CODER_MOUNTS- Mount paths (format:src:dst[:ro],src:dst[:ro])CODER_ADD_GPU- Enable GPU passthrough (true/false)CODER_ADD_TUN- Add TUN device (true/false)CODER_ADD_FUSE- Add FUSE device (true/false)CODER_CPUS- CPU limit for inner containerCODER_MEMORY- Memory limit for inner container (bytes)CODER_INNER_ENVS- Environment variables to pass to inner container (supports wildcards)
- Go 1.24+ required (see
go.mod) - Docker API client pinned to specific version (avoid breaking changes)
- coder/coder v2.14.4 - Main Coder integration
- coder/tailscale fork - Not important to agents; version should match coder/coder's go.mod
- sysbox 0.6.7 - Exact version with SHA pinned in Makefile
- Docker CE 27.3.1 - Pinned in Dockerfile
- sysbox 0.6.7 - Downloaded and verified via SHA256 in Dockerfile
- Linux kernel - Requires seccomp API level >= 5
- Ubuntu 22.04 (Jammy) - Base image
Updating sysbox:
- Update
SYSBOX_VERSIONin Makefile - Update
SYSBOX_SHAin Makefile (get from nestybox releases) - Update both
ARGvalues in deploy/Dockerfile - Test thoroughly with integration tests
Updating coder/coder dependency:
- Check coder/coder's go.mod for tailscale version
- Update both dependencies in go.mod
- Run
go mod tidy - Verify integration tests pass
- Template: Coder templates define envbox containers as Kubernetes pods
- Agent: Bootstrap script installs and starts Coder agent in inner container
- Token:
CODER_AGENT_TOKENauthenticates agent with Coder deployment - Workspace: Inner container becomes user's workspace environment
- Environment variables follow Coder naming conventions
- Agent must start successfully for workspace to be usable
- GPU passthrough important for ML/AI workspaces
- Mount handling critical for persistent home directories
- Network configuration affects agent connectivity
See coder/coder repo examples for reference templates.
- Push directly to main - Always use feature branches and PRs
- Make inner container privileged - Breaks entire security model
- Skip integration tests - They catch real-world issues unit tests miss
- Test envbox inside envbox - Won't work; requires VM/physical machine
- Change sysbox version without updating SHA - Build will fail verification
- Ignore "no space left on device" errors - These have special handling (see
noSpaceDataDir) - Modify user namespace offset (100000) - Will break existing container mappings
- Remove signal forwarding - Inner container won't receive termination signals
- Always use feature branches - Create branch, push to it, then PR
- Run integration tests on every change - They're the source of truth
- Use integrationtest helpers - Consistent, reliable test setup
- Preserve backward compatibility - Existing workspaces depend on envbox behavior
- Test GPU passthrough on real hardware - Mock tests won't catch driver issues
- Log important events - Helps debugging in production
- Handle errors gracefully - Users should understand what went wrong
- Update documentation - Keep README.md and this file current
- Kernel compatibility: sysbox requires seccomp API level >= 5
- Storage drivers: Overlay2 doesn't work on top of overlay (use vfs fallback)
- Disk space: Special handling when user PVC is full (
noSpaceDataDir) - GPU libraries: Must mount symlinked shared objects correctly
- AWS EKS: Special handling for web identity tokens
- Idmapped mounts: May need disabling on some systems (
CODER_DISABLE_IDMAPPED_MOUNT)
-
Bug fixes - Especially:
- Container lifecycle issues
- Mount and volume problems
- GPU passthrough failures
- Signal handling bugs
- Network configuration issues
-
Integration tests - Add tests for:
- New features
- Bug reproductions
- Edge cases
- Different inner images
-
Documentation - Improve:
- README.md clarity
- Code comments in complex sections
- Integration test examples
- This AGENTS.md file
-
Small features - Incremental improvements:
- New environment variables
- Additional mount options
- Device passthrough enhancements
- Better error messages
- Large architectural changes (discuss with maintainers first)
- Performance optimizations (profile first)
- Refactoring (ensure integration tests cover affected code)
# Outer container logs
kubectl logs <pod-name>
# Inner container logs (from outer container)
docker logs workspace_cvm
# Sysbox manager status
cat /run/sysbox/sysmgr.sock- Sysbox not starting: Check kernel version and seccomp support
- Inner container fails to start: Check image pull secrets and registry auth
- GPU not detected: Verify mounts, check
xunix/gpu.goregex patterns - Bootstrap script fails: Examine script execution logs
- Out of space: Check if vfs fallback is being used
# Keep test containers around on failure
# Modify test to not cleanup on failure
if !t.Failed() {
os.RemoveAll(tmpdir)
}
# Inspect running test container
docker ps | grep envbox
docker exec -it <container-id> bash
# Check inner container
docker exec -it <outer-container-id> docker ps
docker exec -it <outer-container-id> docker logs workspace_cvm- golangci-lint v1.64.8 runs in CI
- shellcheck for shell scripts
- gofumpt for Go formatting (stricter than gofmt)
- markdownfmt for Markdown files
Run locally:
make fmt # Format code
# golangci-lint runs automatically in CI- Follow standard Go conventions
- Use descriptive variable names
- Add comments for non-obvious logic
- Keep functions focused and reasonably sized
- Use error wrapping with
xerrors.Errorf - Structured logging with
slog
// ✅ Good
if err != nil {
return xerrors.Errorf("pull inner image: %w", err)
}
// ❌ Bad
if err != nil {
return err // Lost context
}// Use structured logging
log.Info(ctx, "starting inner container",
slog.F("image", innerImage),
slog.F("username", username))
// Don't use fmt.Println-
ci.yaml - Main CI pipeline:
- Linting (golangci-lint, shellcheck)
- Formatting checks
- Unit tests
- Integration tests
- Security scanning
-
release.yaml - Release automation
-
latest.yaml - Latest tag updates
Integration tests run on ubuntu-latest-8-cores runners with proper permissions. They're the gate for merging PRs.
- README.md - User documentation and configuration reference
- This AGENTS.md - Developer/agent guidance
- Integration tests - Examples of correct usage patterns
- Sysbox docs - https://github.com/nestybox/sysbox/tree/master/docs
- Coder docs - https://coder.com/docs
- Coder template example - https://github.com/coder/coder/tree/main/examples/templates/envbox
- Read relevant code sections first
- Check existing tests for patterns
- Start with small, focused changes
- Write integration test to verify behavior
- Run full test suite before submitting
- Update documentation as needed
- Push to feature branch and create PR
Remember: Never push to main. The security of the inner container is paramount. Integration tests are mandatory. Make incremental changes. Help maintain the integrationtest package API.