Tako VM implements defense-in-depth to safely execute untrusted code.
┌─────────────────────────────────────────────────────────────┐
│ Input Validation │
│ (Size limits, sanitization) │
├─────────────────────────────────────────────────────────────┤
│ Container Isolation │
│ (Docker with security restrictions) │
├─────────────────────────────────────────────────────────────┤
│ Syscall Filtering │
│ (Seccomp whitelist) │
├─────────────────────────────────────────────────────────────┤
│ Resource Limits │
│ (Memory, CPU, time, file size) │
├─────────────────────────────────────────────────────────────┤
│ Output Sanitization │
│ (Capped output, error filtering) │
└─────────────────────────────────────────────────────────────┘
By default, containers have no network access:
docker run --network=none ...This prevents:
- Data exfiltration
- Command & control communication
- Attacks on internal services
- Cryptocurrency mining pools
Selective Network Access
For jobs that need network (e.g., API calls), configure per job type:
job_types:
- name: api-client
network_enabled: trueWhen network_enabled: true, containers can access any external host. For strict egress control, use external firewalls or Kubernetes NetworkPolicy.
docker run --read-only ...Writable locations:
/output/- For results/tmp/- Temporary files (noexec)
All Linux capabilities are dropped except those required for privilege dropping:
docker run --cap-drop=ALL --cap-add=SETUID --cap-add=SETGID ...Note on no-new-privileges: Tako VM does NOT use --security-opt=no-new-privileges because it conflicts with gosu, which is used to drop from root to the sandbox user after installing dependencies. The privilege drop flow is:
- Container starts as root (required for dependency installation)
gosudrops privileges to sandbox user (uid 1000)- User code executes as unprivileged sandbox user
This trade-off is necessary because:
- Dependencies may require root to install (e.g., system packages)
gosuuses setuid to switch users securelyno-new-privilegesblocks setuid, breaking the privilege drop
The risk is mitigated by:
- gVisor runtime (userspace kernel) blocks most privilege escalation
- Seccomp profile restricts dangerous syscalls
- Code runs as non-root after the privilege drop
Code runs as unprivileged user (uid 1000) inside the container:
# In Dockerfile
USER sandbox# At runtime (enforced by Tako VM)
docker run --user=1000:1000 ...This is controlled by enable_userns: true (default). Even if container code somehow modifies the Dockerfile or image, the --user flag at runtime ensures non-root execution.
Containers are destroyed after each execution:
docker run --rm ...No persistent state between executions.
Seccomp (Secure Computing Mode) restricts available syscalls.
The whitelist includes safe operations:
- File I/O (read, write, open, close)
- Memory (mmap, brk)
- Process (exit, getpid)
- Time (clock_gettime)
Dangerous syscalls are blocked:
ptrace- Process debuggingmount- Filesystem mountingreboot- System rebootsethostname- Hostname changesinit_module- Kernel modules
The profile is at tako_vm/seccomp_profile.json:
{
"defaultAction": "SCMP_ACT_ERRNO",
"syscalls": [
{
"names": ["read", "write", "open", ...],
"action": "SCMP_ACT_ALLOW"
}
]
}docker run --memory=512m --memory-swap=512m ...Prevents:
- Memory exhaustion attacks
- Fork bombs consuming RAM
docker run --cpus=1.0 ...Prevents CPU starvation of other processes.
docker run --pids-limit=100 ...Prevents fork bombs.
docker run --ulimit=fsize=104857600 ... # 100MBPrevents disk filling attacks.
Enforced timeout kills long-running processes:
timeout = job.get("timeout", 30)
subprocess.run(..., timeout=timeout)| Input | Limit | Configuration |
|---|---|---|
| Code | 100KB | max_code_bytes |
| Input data | 1MB | max_input_bytes |
| Timeout | 300s | max_timeout |
| Output | Limit | Configuration |
|---|---|---|
| stdout | 64KB | max_stdout_bytes |
| stderr | 64KB | max_stderr_bytes |
| Single artifact | 10MB | max_artifact_bytes |
| Total artifacts | 50MB | max_total_artifacts_bytes |
When building job type containers, Tako VM validates all inputs to prevent injection attacks:
| Validation | Function | Description |
|---|---|---|
| Docker image | validate_docker_image() |
Rejects shell injection, newlines, special characters |
| Python version | validate_python_version() |
Only allows 3.8, 3.9, 3.10, 3.11, 3.12, etc. |
| Pip packages | validate_pip_requirement() |
Rejects URLs, path specifiers, shell characters |
| Environment keys | validate_env_key() |
POSIX-compliant variable names only |
| Environment values | validate_env_value() |
Rejects control characters, backticks, $ |
| Shared code paths | Path validation | Prevents directory traversal |
Example attack prevention:
# These malicious inputs are rejected:
# Docker image injection
base_image = "python:3.11\nRUN rm -rf /" # ❌ Rejected
# Python version injection
python_version = "3.11; apt install malware" # ❌ Rejected
# Pip package injection
requirements = ["numpy; rm -rf /"] # ❌ Rejected
# Environment variable injection
environment = {"PATH": "$HOME/malware"} # ❌ RejectedOutput artifacts are validated before collection:
# is_safe_filename() rejects:
- Path separators (/, \)
- Parent directory references (..)
- Hidden files (.filename)This prevents containers from creating artifacts that could overwrite or read unauthorized files.
Stack traces are sanitized to prevent information leakage:
# Internal path: /var/lib/tako-vm/workspace/job-123/code/main.py
# Sanitized: /code/main.pyAlways use TLS in production:
listen 443 ssl http2;
ssl_protocols TLSv1.2 TLSv1.3;Tako VM protects against:
| Threat | Mitigation |
|---|---|
| Code execution escape | Container isolation, seccomp |
| Resource exhaustion | Memory, CPU, time limits |
| Data exfiltration | Network isolation |
| Disk filling | File size limits |
| Information leakage | Output sanitization |
Tako VM does NOT protect against:
| Threat | Reason |
|---|---|
| Docker daemon compromise | Requires Docker access |
| Host kernel exploits | Containers share kernel |
| Side-channel attacks | Shared CPU/memory |
| Timing attacks | Execution time visible |
For higher security, consider:
- gVisor (supported by Tako VM)
- Kata Containers
- Dedicated execution hosts
- VM-based isolation
Tako VM supports gVisor (runsc) for strong container isolation. gVisor provides a userspace kernel that intercepts and emulates syscalls, adding a significant security boundary beyond standard Docker. By default, Tako VM runs in permissive mode, which falls back to runc if gVisor is not installed.
| Benefit | Description |
|---|---|
| Userspace kernel | Syscalls handled in userspace, not host kernel |
| Reduced attack surface | Most kernel vulnerabilities don't affect gVisor |
| Container escape prevention | Much harder to escape to host |
| Production-tested | Used by Google Cloud Run, GKE Sandbox |
gVisor is required for strict security mode. Install it following the official gVisor installation guide.
Ubuntu/Debian:
curl -fsSL https://gvisor.dev/archive.key | sudo gpg --dearmor -o /usr/share/keyrings/gvisor-archive-keyring.gpg
echo "deb [arch=$(dpkg --print-architecture) signed-by=/usr/share/keyrings/gvisor-archive-keyring.gpg] https://storage.googleapis.com/gvisor/releases release main" | sudo tee /etc/apt/sources.list.d/gvisor.list > /dev/null
sudo apt-get update && sudo apt-get install -y runsc
sudo runsc install
sudo systemctl restart dockerVerify installation:
docker run --runtime=runsc --rm hello-world# tako_vm.yaml
container_runtime: runsc # 'runsc' (gVisor) or 'runc' (standard Docker)
security_mode: strict # 'strict' (require gVisor) or 'permissive' (fallback)Security modes:
- permissive (default): Falls back to standard runc runtime with a warning. Works on all platforms.
- strict: Fails with
RuntimeUnavailableErrorif gVisor is not available. Recommended for production.
Environment variable override (useful for testing):
TAKO_VM_SECURITY_MODE=permissive pytest tests/ -vgVisor only runs on Linux. For macOS/Windows development, Tako VM includes a Lima VM configuration with gVisor pre-installed:
# Start the VM
limactl start lima-gvisor.yaml
# Enter the VM
limactl shell tako-gvisor
# Run Tako VM with gVisor
cd ~/tako-vm
pytest tests/ -vThe Lima VM provides:
- Ubuntu 24.04 with Docker and gVisor pre-installed
- 4 CPUs, 8GB RAM, 50GB disk
- Home directory mounted for code access
| Aspect | gVisor (runsc) | Standard (runc) |
|---|---|---|
| Security | Strong (userspace kernel) | Good (kernel namespaces) |
| Performance | ~5-15% overhead | Native speed |
| Compatibility | Most Python code works | Full compatibility |
| Kernel exploits | Protected | Vulnerable |
| Setup complexity | Requires installation | Built into Docker |
Recommendation: Use gVisor (strict mode) for production and any environment running untrusted or AI-generated code. Use permissive mode only for development when gVisor is not available.
Docker containers share the host kernel, which has security implications:
| Protection | Level | Notes |
|---|---|---|
| Filesystem isolation | Good | Separate root filesystem |
| Process isolation | Good | Separate PID namespace |
| Network isolation | Good | --network=none blocks all |
| User isolation | Moderate | UID mapping available |
| Syscall filtering | Good | Seccomp whitelist |
| Risk | Description | Mitigation |
|---|---|---|
| Kernel exploits | Container escapes via kernel bugs | Keep kernel updated, use gVisor |
| Resource side-channels | CPU cache timing attacks | Dedicated hosts |
/proc information |
Process info leakage | Restrict /proc access |
| Device access | Hardware access if not restricted | --cap-drop=ALL |
For high-security environments:
1. gVisor (Google)
- User-space kernel that intercepts syscalls
- Significant performance overhead
- Strong isolation without VMs
docker run --runtime=runsc ...2. Kata Containers
- Lightweight VMs with container UX
- Hardware-level isolation
- Higher resource overhead
3. Firecracker (AWS)
- MicroVMs for serverless
- Used by AWS Lambda
- Sub-second boot times
4. Dedicated Hosts
- Run Tako VM on isolated machines
- Network segmentation
- Physical separation
| Use Case | Recommended Isolation |
|---|---|
| Development | Docker (default) |
| Internal tools | Docker + seccomp |
| Multi-tenant SaaS | gVisor or Kata |
| High-security | Firecracker or dedicated VMs |
Short answer: No. The API server needs Docker socket access to spawn executor containers. Docker socket access effectively grants root privileges on the host, so containerizing the server doesn't create a meaningful security boundary.
Current Model (adequate for most cases):
┌─────────────────────────────────────────┐
│ Host/VM │
│ ┌──────────────┐ ┌───────────────┐ │
│ │ Tako VM │───▶│ Executor │ │
│ │ Server │ │ Container │ │
│ │ (trusted) │ │ (untrusted) │ │
│ └──────────────┘ └───────────────┘ │
└─────────────────────────────────────────┘
Why containerize the server anyway?
- Easier deployment (Docker Compose, Kubernetes)
- Consistent environment across machines
- Simpler updates and rollbacks
For true separation (future consideration):
High-Security Model (separate hosts):
┌─────────────┐ ┌─────────────────────────────┐
│ Host A │ │ Host B │
│ ┌───────┐ │ │ ┌───────┐ ┌──────────┐ │
│ │ API │──┼────▶│ │Docker │──▶│ Executor │ │
│ │Server │ │ RPC │ │ Agent │ │Container │ │
│ └───────┘ │ │ └───────┘ └──────────┘ │
└─────────────┘ └─────────────────────────────┘
This separates the API server from the execution environment entirely, but adds significant complexity.
- Install gVisor and use
security_mode: strict - Enable
enable_seccomp: true - Use HTTPS in production
- Set appropriate resource limits
- Keep Docker and gVisor updated
- Minimize use of
network_enabled: truejobs - Monitor for anomalies
- Review execution logs
- Test security controls regularly
- Verify gVisor is working:
docker run --runtime=runsc --rm hello-world - Set
container_runtime: runscin config - Set
security_mode: strictfor production - Test your workloads with gVisor (some edge cases may differ)