KVM-isolated VMs (or rootless containers) for long-running agent sessions. Management server with gRPC, WebSocket, and HTTP interfaces. Web dashboard, CLI, and REST API. Runs on your hardware; no hosted control plane.
git clone https://github.com/jmagly/agentic-sandbox.git
cd agentic-sandbox && make build && cd management && ./dev.sh
# open http://localhost:8122 → "+ Create Instance" → Container → Create → doneNew here? Walk through Getting Started — prerequisite check, ~15 min to first running agent.
Features · Quick Start · Architecture · API
- Persistent sessions. Each agent runs inside its own VM (or container) with a persistent gRPC link to the management server. Closing your terminal does not stop the agent.
- Hardware isolation. Full KVM virtualization — each agent gets its own kernel. Rootless Docker is supported as a lighter-weight alternative.
- Shared storage with explicit namespaces. virtiofs-backed
global(read-only) andinbox(read-write per-agent) mounts. - Live terminal observability. Server streams every PTY chunk to the dashboard; server-side virtual terminal snapshots available via REST.
- Human-in-the-loop. PTY heuristics detect
(y/n)and similar pauses, file a HITL request, and inject your response back into stdin. - Restart-safe. Session reconciliation, crash-loop detection, and ephemeral per-VM secrets.
- Resource governance. Declarative quotas and per-VM CPU/memory/disk limits.
Agentic Sandbox is the runtime substrate for the AIWG SDLC suite. AIWG provides the agents, skills, and workflow scaffolding; Agentic Sandbox provides the isolated execution environment. Either can be used independently.
Full walkthrough — including prerequisite verification, build-time expectations, and troubleshooting — is in docs/getting-started.md. The summary below assumes the prerequisites are already installed.
Prerequisites: Linux host. For the container path (fastest): Rust 1.75+,
protoc, Docker. For the VM path (full isolation): all of the above plus KVM (egrep -c '(vmx|svm)' /proc/cpuinfo> 0), libvirt + QEMU (apt install qemu-kvm libvirt-daemon-system), and an Ubuntu 24.04 base image (cd images/qemu && ./build-base-image.sh 24.04).
The recommended path launches the full system — management server + dashboard. From the dashboard you can create VM or container instances, attach terminal panes, and watch live events without ever touching a shell. Power-user shortcuts for skipping the dashboard are below.
# 1. Build all three crates (management server, agent client, CLI)
make build # or: ( cd management && cargo build --release ) && \
# ( cd agent-rs && cargo build --release ) && \
# ( cd cli && cargo build --release )
# 2. Start the management server. Dashboard is at http://localhost:8122,
# WebSocket at ws://localhost:8121, gRPC at :8120.
cd management && ./dev.sh
# 3. Open the dashboard in a browser:
# http://localhost:8122In the dashboard:
- Click + Create Instance in the sidebar header.
- Pick Runtime:
- Container — fast (~2s), backed by Docker. Choose an agent image from the dropdown (
agentic/claude:latest,codex,opencode). - VM — full hardware isolation, ~30s–10m to provision depending on loadout. Pick a loadout (
claude-only,full-suite,dual-review, etc.).
- Container — fast (~2s), backed by Docker. Choose an agent image from the dropdown (
- Name it (
agent-01,my-codex, anything matching[a-z0-9-]+). - Click Create. The instance appears in the sidebar with a
[VM]or[CT]badge. - Click the row → click 📺 Pane to attach a terminal session.
Stop / Restart / Force off / Delete are all per-row buttons; the pane has a ⟳ Resync button if the terminal ever drifts.
If you'd rather not open a browser, the sandboxctl CLI (also installed as agentic-sandbox) does everything the dashboard does:
# After `make build`, install or symlink the binary:
ln -sf "$(pwd)/cli/target/release/sandboxctl" ~/.local/bin/
# Configure a context pointing at the local management server (one-time)
sandboxctl config set-context local --server http://localhost:8122
# Spawn a container-runtime agent
sandboxctl container create agent-01 --image agentic/claude:latest
# Or a VM-runtime agent
sandboxctl vm create agent-02 --loadout profiles/claude-only.yaml --agentshare --start
# List instances
sandboxctl agent list
# Find a session on the agent, then attach (Ctrl-A d to detach)
sandboxctl session list --agent agent-01
sandboxctl session attach <session-id> --write
# Submit a long-running task from a manifest file
cat > task.yaml <<'EOF'
version: "1"
kind: Task
metadata:
id: ""
name: "Refactor authentication"
repository:
url: "https://github.com/myorg/myapp.git"
branch: "main"
claude:
prompt: "Refactor the authentication module to use JWT refresh tokens"
model: "claude-sonnet-4-5-20250929"
lifecycle:
timeout: "2h"
EOF
sandboxctl task submit --file task.yaml --waitRun sandboxctl --help for the full noun-first verb tree (agent / session / container / vm / task / hitl / loadout / storage / event / health / ops).
For air-gapped boxes, scripted environments, or when you want a single VM without running the management server, drive the provisioner directly:
./images/qemu/provision-vm.sh agent-01 \
--loadout profiles/claude-only.yaml \
--agentshare \
--start
# The agent inside the VM will try to dial host.internal:8120 in a loop.
# Start the management server first if you want gRPC + the dashboard;
# otherwise the VM is still SSH-reachable as a plain isolated environment:
ssh -i /var/lib/agentic-sandbox/secrets/ssh-keys/agent-01 agent@<vm-ip>Useful flags: --profile basic (minimal cloud-init), --cpus 8 --memory 16G --disk 100G, --network-mode isolated|allowlist|full. See images/qemu/README.md for the full reference.
If you're scripting against the API directly:
curl -X POST http://localhost:8122/api/v1/tasks \
-H "Content-Type: application/json" \
-d '{
"manifest": {
"version": "1",
"kind": "Task",
"metadata": {
"id": "",
"name": "Refactor authentication"
},
"repository": {
"url": "https://github.com/myorg/myapp.git",
"branch": "main"
},
"claude": {
"prompt": "Refactor the authentication module to use JWT refresh tokens",
"model": "claude-sonnet-4-5-20250929"
},
"lifecycle": {
"timeout": "2h"
}
}
}'For the full provisioning, profile, and loadout reference, see docs/LOADOUTS.md and the Provisioning section below.
Host
├── agent-01 (KVM VM) 192.168.122.201
│ ├── Claude Code
│ ├── Rust toolchain
│ └── agent-client → gRPC → Management Server
├── agent-02 (KVM VM) 192.168.122.202
│ └── agent-client → gRPC → Management Server
└── Management Server :8120 gRPC :8121 WS :8122 HTTP
Each agent runs in a QEMU/KVM virtual machine provisioned from a cloud-init manifest. VMs are first-class objects with independent CPU, memory, and disk quotas, isolated libvirt networking, and ephemeral per-VM secrets. Docker containers are supported as a lighter-weight alternative for faster iteration.
A Rust async server (Tokio, Tonic, Axum) that coordinates all connected agents:
┌─────────────────────────────────────────────────────────────┐
│ Management Server (Rust) │
│ │
│ gRPC :8120 WebSocket :8121 HTTP :8122 │
│ ┌──────────────┐ ┌───────────────┐ ┌──────────────┐ │
│ │ AgentService │ │ WebSocketHub │ │ HTTP API │ │
│ │ Connect() │ │ terminal I/O │ │ dashboard │ │
│ │ Exec() │ │ metrics push │ │ REST CRUD │ │
│ └──────────────┘ └───────────────┘ └──────────────┘ │
│ │
│ AgentRegistry CommandDispatcher OutputAggregator │
│ HitlStore ScreenRegistry CrashLoopDetector │
│ TaskOrchestrator AiwgServeHandle │
└─────────────────────────────────────────────────────────────┘
Agent state — heartbeats, metrics, setup progress, loadout metadata — is tracked in-memory via DashMap and exposed through all three interfaces.
Submit long-running AI tasks that get assigned to available VMs, monitored through completion, and stream their logs via SSE:
PENDING → STAGING → PROVISIONING → READY → RUNNING → COMPLETING → COMPLETED
↘ ↘
FAILED CANCELLED
Tasks receive a dedicated workspace in agentshare:
/srv/agentshare/
├── tasks/{task_id}/manifest.yaml # Task metadata
├── inbox/{task_id}/ # Input files (read-only inside VM)
└── outbox/{task_id}/ # Artifacts written by agent
VMs get virtiofs-mounted shared storage with separate read-only and read-write namespaces:
| Mount | VM Path | Mode | Purpose |
|---|---|---|---|
| Global | /mnt/global (~/global) |
Read-only | Shared tools, prompts, configs |
| Inbox | /mnt/inbox (~/inbox) |
Read-write | Task inputs, run logs, outputs |
The inbox layout provides structured access patterns — agents find their task workspace at ~/inbox/current/ without needing to know task IDs.
The management server monitors PTY output and automatically detects when an agent is waiting for human input. Detection runs after every output chunk through a scored heuristic that recognizes patterns like (y/n), [Y/n], Human:, ❯, and explicit confirmation phrases.
Agent PTY output
│
▼
prompt_detector::detect_prompt() ← scores output chunk
│
score ≥ 0.85
│
▼
HitlStore::create() ← deduplicates per session
│
├── REST: GET /api/v1/hitl (operator polls)
├── Dashboard: pending requests UI
└── AiwgServeHandle::emit() (if aiwg serve wired in)
│
operator responds
│
▼
POST /api/v1/hitl/{id}/respond ← injects text into PTY stdin
One pending request per session at a time — duplicate detections are suppressed until the active request is resolved.
When AIWG_SERVE_ENDPOINT is set, the management server registers with an aiwg serve dashboard and streams live sandbox events over a persistent authenticated WebSocket. The integration reconnects with exponential backoff (1 s → 30 s) and never blocks server startup.
The sandbox additionally registers as an AIWG executor (per executor.v1.md), accepting mission dispatches via POST /api/v1/sessions/:id/dispatch and reporting the full mission.* lifecycle (assigned → started → completed/failed/aborted, with HITL and resumability) over a second WS at /ws/executors/{id}. Mission state persists across mgmt-server restarts in <secrets_dir>/../missions.json. Full integration spec: docs/aiwg-executor.md.
| Event | Trigger |
|---|---|
agent.connected |
gRPC stream registered |
agent.disconnected |
gRPC stream closed or timed out |
agent.ready |
cloud-init provisioning complete |
agent.provisioning |
loadout step progress |
session.start / session.end |
PTY/exec session lifecycle |
hitl.input_required |
HITL prompt detected |
What a typical autonomous coding task looks like end to end.
./images/qemu/provision-vm.sh agent-01 \
--loadout profiles/claude-only.yaml \
--agentshare \
--startVM boots, cloud-init runs the loadout manifest, agent-client registers via gRPC, status transitions Starting → Provisioning → Ready. If aiwg serve is configured, agent.ready fires.
curl -X POST http://localhost:8122/api/v1/tasks \
-H "Content-Type: application/json" \
-d '{
"manifest": {
"version": "1",
"kind": "Task",
"metadata": {
"id": "",
"name": "Refactor authentication"
},
"repository": {
"url": "https://github.com/myorg/myapp.git",
"branch": "main"
},
"claude": {
"prompt": "Refactor the authentication module to use JWT refresh tokens",
"model": "claude-sonnet-4-5-20250929"
},
"lifecycle": {
"timeout": "2h"
}
}
}'Task is assigned to agent-01, repository cloned into inbox, Claude Code launched inside the VM.
Open http://localhost:8122 for the live terminal stream, or:
curl http://localhost:8122/api/v1/tasks/{task_id}/logsAn hour in, Claude Code hits an ambiguous refactor decision and prints a confirmation prompt. The dashboard shows a pending HITL request. Respond without opening a terminal:
curl -X POST http://localhost:8122/api/v1/hitl/{hitl_id}/respond \
-H "Content-Type: application/json" \
-d '{"response": "yes, update all callers"}'The response text is injected into the agent's PTY stdin and the agent continues.
ls /srv/agentshare/outbox/{task_id}/
# auth-module/ jwt-refresh.ts test-results.json SUMMARY.mdPre-built profiles for common setups:
| Profile | Tools | Use Case |
|---|---|---|
agentic-dev |
Python (uv), Node.js (fnm), Go, Rust, Claude Code, Aider, Docker, ripgrep, fd, jq | Full development environment |
basic |
SSH, basic utilities | Minimal — custom setup via cloud-init |
./images/qemu/provision-vm.sh my-agent \
--profile agentic-dev \
--cpus 8 \
--memory 16384 \
--disk 100G \
--agentshare \
--startDeclarative YAML manifests for composable provisioning. Loadouts specify tools, runtimes, AI providers, and AIWG frameworks without modifying base profiles:
apiVersion: loadout/v1
kind: loadout
metadata:
name: claude-only
extends:
- layers/base-dev.yaml
- providers/claude-code.yaml
aiwg:
enabled: true
frameworks:
- name: all
providers: [claude]See docs/LOADOUTS.md for the full manifest schema and available options.
Submit tasks to agents via the REST API. The orchestrator assigns tasks to available VMs, manages the workspace, and tracks lifecycle state.
# Submit a task
curl -X POST http://localhost:8122/api/v1/tasks \
-H "Content-Type: application/json" \
-d '{
"manifest": {
"version": "1",
"kind": "Task",
"metadata": {
"id": "",
"name": "SQL injection audit"
},
"repository": {
"url": "https://github.com/myorg/myapp.git",
"branch": "main"
},
"claude": {
"prompt": "Audit the API for SQL injection vulnerabilities",
"model": "claude-sonnet-4-5-20250929"
},
"lifecycle": {
"timeout": "1h"
}
}
}'
# Check status
curl http://localhost:8122/api/v1/tasks/{task_id}
# Stream logs (SSE)
curl http://localhost:8122/api/v1/tasks/{task_id}/logs
# List artifacts
curl http://localhost:8122/api/v1/tasks/{task_id}/artifacts
# List A2A task artifacts captured by messages:send
curl http://localhost:8122/agents/{instance_id}/v1/tasks/{task_id}/artifactsSee docs/task-orchestration-api.md for full API details and docs/task-run-lifecycle.md for the lifecycle state machine.
The server monitors agent PTY output and automatically detects when an agent is waiting for human input. When detected, a HITL request is created and held until resolved.
# List pending requests
curl http://localhost:8122/api/v1/hitl
# Respond — text is injected directly into the agent's PTY stdin
curl -X POST http://localhost:8122/api/v1/hitl/a3f1b2.../respond \
-H "Content-Type: application/json" \
-d '{"response": "y"}'Requests are deduplicated per session — a second prompt won't fire while the first is pending. Once resolved, the slot opens again.
# Provision and start
./images/qemu/provision-vm.sh agent-01 --profile agentic-dev --agentshare --start
# Lifecycle management
virsh start agent-01 # start stopped VM
virsh shutdown agent-01 # graceful stop
virsh destroy agent-01 # force stop
# Rebuild (preserves IP and config)
./scripts/reprovision-vm.sh agent-01 --profile agentic-dev
# Remove completely
./scripts/destroy-vm.sh agent-01
# Deploy updated agent binary to running VM
./scripts/deploy-agent.sh agent-01 --debugSee docs/vm-lifecycle.md for the state machine and docs/LIFECYCLE.md for the full operations reference.
| Endpoint | Method | Description |
|---|---|---|
/api/v1/agents |
GET | List registered agents with metrics and loadout info |
/api/v1/agents/{id} |
GET | Get agent details |
/api/v1/agents/{id} |
DELETE | Remove agent |
/api/v1/agents/{id}/start |
POST | Start agent VM |
/api/v1/agents/{id}/stop |
POST | Stop agent VM |
/api/v1/agents/{id}/destroy |
POST | Force destroy agent VM |
/api/v1/agents/{id}/reprovision |
POST | Reprovision agent VM |
| Endpoint | Method | Description |
|---|---|---|
/api/v1/tasks |
GET | List tasks |
/api/v1/tasks |
POST | Submit new task |
/api/v1/tasks/{id} |
GET | Get task status and metadata |
/api/v1/tasks/{id} |
DELETE | Cancel task |
/api/v1/tasks/{id}/logs |
GET | Stream task logs (SSE) |
/api/v1/tasks/{id}/artifacts |
GET | List task artifacts |
/agents/{instance_id}/v1/tasks/{task_id}/artifacts |
GET | List persisted A2A task artifacts |
/agents/{instance_id}/v1/tasks/{task_id}/artifacts/{artifact_id} |
GET | Return one persisted A2A task artifact |
| Endpoint | Method | Description |
|---|---|---|
/api/v1/vms |
GET | List all VMs |
/api/v1/vms |
POST | Create VM |
/api/v1/vms/{name} |
GET | Get VM details |
/api/v1/vms/{name}/start |
POST | Start VM |
/api/v1/vms/{name}/stop |
POST | Graceful stop |
/api/v1/vms/{name}/destroy |
POST | Force stop |
/api/v1/vms/{name} |
DELETE | Delete VM |
| Endpoint | Method | Description |
|---|---|---|
/api/v1/hitl |
GET | List pending HITL requests |
/api/v1/agents/{id}/hitl |
POST | Create HITL request for agent (returns 409 on duplicate) |
/api/v1/hitl/{id}/respond |
POST | Submit response — injects text into PTY stdin |
| Endpoint | Method | Description |
|---|---|---|
/api/v1/sessions/{id}/screen |
GET | Current PTY screen snapshot (no WebSocket needed) |
/ws/sessions/{id}/orchestrate |
WS | Live screen updates; defaults to observer/read-only. Add ?role=controller to allow write/resize/signal frames. |
| Endpoint | Method | Description |
|---|---|---|
/api/v1/secrets |
GET / POST / DELETE | Manage agent authentication secrets |
/api/v1/events |
GET | VM lifecycle event stream (SSE) |
/healthz |
GET | Liveness probe |
/readyz |
GET | Readiness probe |
/metrics |
GET | Prometheus metrics |
service AgentService {
rpc Connect(stream AgentMessage) returns (stream ManagementMessage);
rpc Exec(ExecRequest) returns (stream ExecOutput);
}Real-time push of agent metrics, PTY output, session events, and task progress. Used by the dashboard and external monitoring clients.
| Variable | Default | Description |
|---|---|---|
LISTEN_ADDR |
0.0.0.0:8120 |
gRPC listen address (WS = port+1, HTTP = port+2) |
SECRETS_DIR |
.run/secrets |
Directory containing agent-hashes.json |
RUST_LOG |
info |
Log level: trace, debug, info, warn, error |
LOG_FORMAT |
pretty |
Log format: pretty, json, compact |
HEARTBEAT_TIMEOUT |
90 |
Seconds before marking agent disconnected |
METRICS_ENABLED |
true |
Enable Prometheus metrics export |
AIWG_SERVE_ENDPOINT |
— | aiwg serve base URL (integration disabled if unset) |
AIWG_SERVE_NAME |
agentic-sandbox |
Display name in aiwg serve dashboard |
| Variable | Required | Description |
|---|---|---|
AGENT_ID |
Yes | Unique identifier for this agent |
AGENT_SECRET |
Yes | 256-bit shared secret for authentication |
MANAGEMENT_SERVER |
Yes | Server address, e.g. 192.168.122.1:8120 |
HEARTBEAT_INTERVAL |
No | Seconds between heartbeats (default: 30) |
Override settings in management/.run/dev.env without modifying environment.
The management server exports Prometheus metrics at /metrics:
agentic_agents_connected # Connected agent count
agentic_agents_ready # Ready agents
agentic_tasks_running # Active tasks
agentic_tasks_completed_total # Total completed tasks
agentic_commands_total # Commands dispatched
agentic_commands_duration_ms # Command execution latency (histogram)
Set up Prometheus and AlertManager:
cd scripts/prometheus && ./deploy.sh
# Prometheus: http://localhost:9090
# AlertManager: http://localhost:9093See docs/monitoring.md and docs/observability/ for alerting rules and dashboards.
# Full cycle: rebuild server + agent, deploy to all running VMs
./scripts/dev-deploy-all.sh --debug
# Deploy agent binary to a specific VM
./scripts/deploy-agent.sh agent-01 --debug
# Management server live-reload
cd management && ./dev.sh
# E2E tests
./scripts/run-e2e-tests.sh
# Chaos tests
./scripts/chaos/run-all.sh
# Unit tests
cd management && cargo test
cd agent-rs && cargo testagentic-sandbox/
├── management/ # Management server (Rust)
│ ├── src/
│ │ ├── http/ # REST API handlers
│ │ ├── orchestrator/ # Task orchestration engine
│ │ ├── telemetry/ # Logging, metrics, tracing
│ │ ├── ws/ # WebSocket hub and connections
│ │ ├── hitl.rs # HITL request store
│ │ ├── aiwg_serve.rs # Outbound aiwg serve integration
│ │ ├── screen_state.rs # PTY screen observer
│ │ ├── prompt_detector.rs # HITL prompt heuristics
│ │ └── crash_loop.rs # Crash loop detection
│ └── ui/ # Embedded web dashboard
├── agent-rs/ # Agent client (Rust)
├── cli/ # CLI tool — VM management
├── proto/ # gRPC protocol definitions
├── images/qemu/ # VM provisioning scripts and loadout profiles
├── scripts/ # Utility and deployment scripts
├── configs/ # Security profiles (seccomp)
├── docs/ # Reference documentation
└── tests/e2e/ # End-to-end tests (pytest)
| Document | Description |
|---|---|
| Architecture | System design and component relationships |
| Positioning | Design axes and when this is (or isn't) a good fit |
| API Reference | Complete HTTP, gRPC, and WebSocket API |
| WebSocket Protocol | Per-message reference: legacy agent-scoped + formal session-registry protocols |
| CLI Design | sandboxctl operator/admin CLI taxonomy and acceptance criteria |
| Deployment Guide | Installation and production configuration |
| Operations Guide | Day-to-day operations and runbooks |
| Loadouts | Declarative VM provisioning manifests |
| Agentshare Storage | virtiofs storage layout and usage |
| Task Orchestration | Task API and lifecycle |
| Task Run Lifecycle | State machine and transitions |
| Session Reconciliation | Session recovery after restarts |
| VM Lifecycle | VM state machine and management |
| Troubleshooting | Common issues and fixes |
| Monitoring | Prometheus metrics and alerting |
| Observability | Full observability setup |
| Reliability | Reliability patterns and quickstart |
- QEMU/KVM provisioning with cloud-init
- Management server (Rust/gRPC/WebSocket/HTTP)
- Agent client with registration, heartbeat, and metrics
- virtiofs shared storage (global/inbox)
- Web dashboard with live terminal access
- Task orchestration with artifact collection
- Claude Code integration
-
sandboxctloperator/admin CLI (design) - Declarative loadout manifest system
- Prometheus metrics and AlertManager alerting
- Session reconciliation after server restart
- VM pooling and resource quotas
- PTY screen observer (server-side virtual terminal snapshots)
- Human-in-the-Loop detection and REST API
- aiwg serve outbound registration and event streaming
- Crash loop detection and alerting
- Docker runtime with rootless containers
- Multi-host orchestration
- Kubernetes operator
AGPL-3.0-only — see LICENSE