Skip to content

Latest commit

 

History

History
1694 lines (1345 loc) · 39 KB

File metadata and controls

1694 lines (1345 loc) · 39 KB

API Reference

Comprehensive API documentation for the agentic-sandbox management server.

Overview

The management server exposes three network interfaces:

Port Protocol Purpose
8120 gRPC Agent bidirectional communication
8121 WebSocket Real-time output streaming for dashboard
8122 HTTP REST API and web dashboard

Authentication

gRPC (Agents): Agents authenticate using x-agent-id and x-agent-secret headers. Secrets are generated during VM provisioning and stored as SHA256 hashes on the host.

HTTP/WebSocket: No authentication required for local-host operator access. Exception: the AIWG executor-contract route POST /api/v1/sessions/:id/dispatch requires Authorization: Bearer <token> where the token is issued by aiwg serve at executor registration. See AIWG Executor Contract for the full integration.

Common Response Format

All HTTP endpoints return JSON. Error responses follow this structure:

{
  "error": {
    "code": "ERROR_CODE",
    "message": "Human-readable error message"
  }
}

HTTP REST API

Base URL: http://localhost:8122

Health & Monitoring

GET /healthz

Simple liveness probe. Returns 200 if server is running.

Response: 200 OK with JSON body {"status":"alive"}

Example:

curl http://localhost:8122/healthz

GET /readyz

Readiness probe. Returns 200 if server is ready to accept traffic.

Response:

{
  "ready": true,
  "reason": "agents_connected"
}

Status Codes:

  • 200 - Ready
  • 503 - Not ready (returns reason)

Example:

curl http://localhost:8122/readyz

GET /healthz/deep

Detailed health check with metrics.

Response:

{
  "status": "healthy",
  "uptime_seconds": 0,
  "agent_count": 2,
  "active_tasks": 0
}

Example:

curl http://localhost:8122/healthz/deep

GET /healthz/libvirt

Bounded libvirt RPC health probe. Returns 200 when libvirt answers within the read budget, or 503 when libvirt is down, slow, or the fail-fast circuit is open.

Response:

{
  "status": "healthy",
  "libvirt": "alive"
}

On timeout the response uses the same structured VM error body as /api/v1/vms and includes Retry-After.

Example:

curl -i http://localhost:8122/healthz/libvirt

GET /metrics

Prometheus metrics endpoint.

Response: Prometheus text format

Example:

curl http://localhost:8122/metrics

Agents

GET /api/v1/agents

List all connected agents with their status and metrics.

Response:

{
  "agents": [
    {
      "id": "agent-01",
      "hostname": "agent-01",
      "ip_address": "192.168.122.201",
      "status": "Ready",
      "connected_at": 1706572800000,
      "last_heartbeat": 1706572830000,
      "metrics": {
        "cpu_percent": 2.3,
        "memory_used_bytes": 536870912,
        "memory_total_bytes": 8589934592,
        "disk_used_bytes": 2147483648,
        "disk_total_bytes": 53687091200,
        "load_avg": [0.15, 0.20, 0.18],
        "uptime_seconds": 3600
      },
      "system_info": {
        "os": "Ubuntu 24.04",
        "kernel": "6.8.0-generic",
        "cpu_cores": 4,
        "memory_bytes": 8589934592,
        "disk_bytes": 53687091200
      }
    }
  ]
}

Field Descriptions:

  • status: "Starting", "Ready", "Busy", "Error", "ShuttingDown", "Stale", "Disconnected"
  • connected_at: Unix timestamp (milliseconds)
  • last_heartbeat: Unix timestamp (milliseconds)
  • metrics: Optional, current resource usage
  • system_info: Optional, VM hardware information

Example:

curl http://localhost:8122/api/v1/agents

Virtual Machines

VM endpoints are QEMU-specific.

GET /api/v1/vms

List all VMs managed by libvirt.

Query Parameters:

  • state (string, default: "all") - Filter by state: "running", "stopped", "all"
  • prefix (string, default: "agent-") - Filter by name prefix. Use "*" for all VMs.

Response:

{
  "vms": [
    {
      "name": "agent-01",
      "state": "running",
      "uuid": "a1b2c3d4-e5f6-7890-abcd-ef1234567890",
      "vcpus": 4,
      "memory_mb": 8192,
      "ip_address": "192.168.122.201",
      "uptime_seconds": null
    }
  ],
  "total": 1
}

States:

  • "running", "stopped", "paused", "shutdown", "crashed", "suspended", "unknown"

Example:

# List all agent VMs
curl http://localhost:8122/api/v1/vms

# List only running VMs
curl http://localhost:8122/api/v1/vms?state=running

# List all VMs (including non-agent VMs)
curl http://localhost:8122/api/v1/vms?prefix=*

GET /api/v1/vms/{name}

Get detailed information about a specific VM.

Response:

{
  "name": "agent-01",
  "state": "running",
  "uuid": "a1b2c3d4-e5f6-7890-abcd-ef1234567890",
  "vcpus": 4,
  "memory_mb": 8192,
  "ip_address": "192.168.122.201",
  "uptime_seconds": null,
  "agent": {
    "connected": true,
    "connected_at": 1706572800000,
    "hostname": "agent-01"
  }
}

Status Codes:

  • 200 - Success
  • 404 - VM not found

Example:

curl http://localhost:8122/api/v1/vms/agent-01

POST /api/v1/vms

Create a new VM using the provisioning script.

Request Body:

{
  "name": "agent-03",
  "profile": "agentic-dev",
  "vcpus": 4,
  "memory_mb": 8192,
  "disk_gb": 50,
  "agentshare": true,
  "start": true,
  "ssh_key": "/home/user/.ssh/id_ed25519.pub"
}

Field Descriptions:

  • name (string, required) - VM name (must match ^agent-[a-z0-9-]+$)
  • profile (string, default: "agentic-dev") - Provisioning profile: "agentic-dev", "basic"
  • vcpus (u32, default: 4) - Number of CPU cores
  • memory_mb (u64, default: 8192) - Memory in megabytes
  • disk_gb (u64, default: 50) - Disk size in gigabytes
  • agentshare (bool, default: true) - Enable virtiofs shared storage
  • start (bool, default: true) - Start VM after provisioning
  • ssh_key (string, optional) - Path to SSH public key (auto-detected if omitted)

Response: 202 Accepted

{
  "operation": {
    "id": "op-12345678-1234-1234-1234-123456789abc",
    "type": "vm_create",
    "status": "pending",
    "target": "agent-03",
    "created_at": "2024-01-30T12:00:00Z",
    "progress_percent": 0
  },
  "vm": null
}

Status Codes:

  • 202 - Accepted (provisioning started)
  • 400 - Invalid request (e.g., invalid VM name)
  • 409 - VM already exists

Error Codes:

  • INVALID_VM_NAME - Name doesn't match required pattern
  • VM_ALREADY_EXISTS - VM with this name already exists
  • PROVISIONING_ERROR - Provisioning script failed

Example:

curl -X POST http://localhost:8122/api/v1/vms \
  -H "Content-Type: application/json" \
  -d '{
    "name": "agent-03",
    "profile": "agentic-dev",
    "vcpus": 4,
    "memory_mb": 8192,
    "disk_gb": 50,
    "agentshare": true,
    "start": true
  }'

# Minimal request (uses all defaults)
curl -X POST http://localhost:8122/api/v1/vms \
  -H "Content-Type: application/json" \
  -d '{"name": "agent-04"}'

POST /api/v1/vms/{name}/start

Start a stopped VM.

Response:

{
  "vm": {
    "name": "agent-01",
    "state": "running"
  },
  "message": null
}

Status Codes:

  • 200 - Success (idempotent - returns 200 even if already running)

Example:

curl -X POST http://localhost:8122/api/v1/vms/agent-01/start

POST /api/v1/vms/{name}/stop

Gracefully stop a running VM (ACPI shutdown).

Response:

{
  "vm": {
    "name": "agent-01",
    "state": "shutdown"
  },
  "message": "Graceful shutdown initiated"
}

Status Codes:

  • 200 - Success (idempotent)

Example:

curl -X POST http://localhost:8122/api/v1/vms/agent-01/stop

POST /api/v1/vms/{name}/destroy

Force stop a running VM (immediate termination).

Response:

{
  "vm": {
    "name": "agent-01",
    "state": "stopped"
  },
  "message": "VM destroyed"
}

Status Codes:

  • 200 - Success (idempotent)

Example:

curl -X POST http://localhost:8122/api/v1/vms/agent-01/destroy

POST /api/v1/vms/{name}/restart

Restart a running VM.

Request Body:

{
  "mode": "graceful",
  "timeout_seconds": 60
}

Field Descriptions:

  • mode (string, default: "graceful") - Restart mode: "graceful" (ACPI shutdown) or "hard" (force destroy)
  • timeout_seconds (u64, default: 60) - Timeout for graceful shutdown before forcing

Response: 202 Accepted

{
  "operation": {
    "id": "op-12345678-1234-1234-1234-123456789abc",
    "type": "vm_restart",
    "status": "pending",
    "target": "agent-01",
    "created_at": "2024-01-30T12:00:00Z",
    "progress_percent": 0
  },
  "vm": null
}

Status Codes:

  • 202 - Accepted
  • 404 - VM not found
  • 409 - VM not running

Example:

# Graceful restart with default timeout
curl -X POST http://localhost:8122/api/v1/vms/agent-01/restart \
  -H "Content-Type: application/json" \
  -d '{"mode": "graceful", "timeout_seconds": 60}'

# Hard restart (immediate)
curl -X POST http://localhost:8122/api/v1/vms/agent-01/restart \
  -H "Content-Type: application/json" \
  -d '{"mode": "hard"}'

DELETE /api/v1/vms/{name}

Delete a VM definition from libvirt.

Query Parameters:

  • delete_disk (bool, default: false) - Also delete VM disk image
  • force (bool, default: false) - Force delete even if running

Response:

{
  "deleted": true,
  "name": "agent-01",
  "disk_deleted": true
}

Status Codes:

  • 200 - Success
  • 404 - VM not found
  • 409 - VM is running and force=false

Error Codes:

  • VM_NOT_FOUND - VM doesn't exist
  • VM_RUNNING - VM is running and force not set

Example:

# Delete VM (keep disk)
curl -X DELETE http://localhost:8122/api/v1/vms/agent-01

# Delete VM and disk
curl -X DELETE "http://localhost:8122/api/v1/vms/agent-01?delete_disk=true"

# Force delete running VM
curl -X DELETE "http://localhost:8122/api/v1/vms/agent-01?force=true&delete_disk=true"

POST /api/v1/vms/{name}/deploy-agent

Deploy agent binary to a running VM.

Response: 202 Accepted

{
  "operation": {
    "id": "op-12345678-1234-1234-1234-123456789abc",
    "type": "vm_create",
    "status": "pending",
    "target": "agent-01",
    "created_at": "2024-01-30T12:00:00Z",
    "progress_percent": 0
  },
  "vm": null
}

Status Codes:

  • 202 - Accepted
  • 404 - VM not found
  • 409 - VM not running

Example:

curl -X POST http://localhost:8122/api/v1/vms/agent-01/deploy-agent

Operations

Long-running operations (VM create, restart, deploy) return operation IDs that can be polled for status.

GET /api/v1/operations/{id}

Get operation status.

Response:

{
  "id": "op-12345678-1234-1234-1234-123456789abc",
  "type": "vm_create",
  "status": "completed",
  "target": "agent-03",
  "created_at": "2024-01-30T12:00:00Z",
  "completed_at": "2024-01-30T12:05:00Z",
  "progress_percent": 100,
  "result": {
    "vm": {
      "name": "agent-03",
      "state": "running"
    }
  }
}

Field Descriptions:

  • type: "vm_create", "vm_delete", "vm_restart"
  • status: "pending", "running", "completed", "failed"
  • progress_percent: 0-100
  • result: Operation-specific result data (only on completion)

Failed Operation Response:

{
  "id": "op-12345678-1234-1234-1234-123456789abc",
  "type": "vm_create",
  "status": "failed",
  "error": "Provisioning script failed with exit code 1",
  "target": "agent-03",
  "created_at": "2024-01-30T12:00:00Z",
  "completed_at": "2024-01-30T12:02:00Z",
  "progress_percent": 20
}

Status Codes:

  • 200 - Success
  • 404 - Operation not found

Example:

curl http://localhost:8122/api/v1/operations/op-12345678-1234-1234-1234-123456789abc

Events

VM lifecycle and agent events are tracked and available for querying.

POST /api/v1/events

Receive event from vm-event-bridge (internal use).

Request Body:

{
  "event_type": "vm.started",
  "vm_name": "agent-01",
  "timestamp": "2024-01-30T12:00:00Z",
  "details": {
    "reason": "manual"
  },
  "agent_id": "agent-01",
  "trace_id": null
}

Response:

{
  "received": true
}

GET /api/v1/events

List recent events across all VMs and agents.

Response:

{
  "events": [
    {
      "event_type": "vm.started",
      "vm_name": "agent-01",
      "timestamp": "2024-01-30T12:00:00Z",
      "details": {
        "reason": "manual"
      },
      "agent_id": "agent-01",
      "trace_id": null
    }
  ],
  "total_count": 42,
  "last_event_id": 42
}

Event Types:

VM Lifecycle:

  • vm.started, vm.stopped, vm.crashed, vm.shutdown, vm.rebooted
  • vm.suspended, vm.resumed, vm.defined, vm.undefined, vm.pmsuspended

Agent Events:

  • agent.connected, agent.disconnected, agent.registered, agent.heartbeat
  • agent.command.started, agent.command.completed
  • agent.pty.created, agent.pty.closed

Session Reconciliation:

  • session.query_sent, session.report_received
  • session.reconcile_started, session.reconcile_complete
  • session.killed, session.preserved, session.reconcile_failed

Example:

curl http://localhost:8122/api/v1/events

Tasks

Task orchestration endpoints for submitting and managing Claude Code tasks.

POST /api/v1/tasks

Submit a new task from a manifest.

Request Body:

{
  "manifest_yaml": "name: example-task\nrepository:\n  url: https://github.com/user/repo\nprompt: 'Fix the bug in main.rs'"
}

OR

{
  "manifest": {
    "name": "example-task",
    "repository": {
      "url": "https://github.com/user/repo"
    },
    "prompt": "Fix the bug in main.rs"
  }
}

Response: 202 Accepted

{
  "task_id": "task-12345678-1234-1234-1234-123456789abc",
  "accepted": true,
  "error": null
}

Status Codes:

  • 202 - Accepted
  • 400 - Invalid manifest
  • 503 - Orchestrator not available

Example:

curl -X POST http://localhost:8122/api/v1/tasks \
  -H "Content-Type: application/json" \
  -d '{
    "manifest": {
      "name": "fix-bug",
      "repository": {
        "url": "https://github.com/user/repo"
      },
      "prompt": "Fix the authentication bug"
    }
  }'

GET /api/v1/tasks

List all tasks with optional filtering.

Query Parameters:

  • state (string, optional) - Comma-separated states: pending, staging, provisioning, ready, running, completing, completed, failed, failed_preserved, cancelled
  • limit (usize, default: 50) - Max results
  • offset (usize, default: 0) - Pagination offset

Response:

{
  "tasks": [
    {
      "id": "task-12345678-1234-1234-1234-123456789abc",
      "name": "fix-bug",
      "state": "running",
      "state_message": "Claude Code executing",
      "created_at": "2024-01-30T12:00:00Z",
      "started_at": "2024-01-30T12:01:00Z",
      "state_changed_at": "2024-01-30T12:01:30Z",
      "vm_name": "agent-task-abc123",
      "vm_ip": "192.168.122.220",
      "exit_code": null,
      "error": null,
      "progress": {
        "output_bytes": 4096,
        "tool_calls": 5,
        "current_tool": "bash",
        "last_activity_at": "2024-01-30T12:05:00Z"
      }
    }
  ],
  "total_count": 1
}

Example:

# List all tasks
curl http://localhost:8122/api/v1/tasks

# List only running tasks
curl "http://localhost:8122/api/v1/tasks?state=running"

# List completed and failed tasks
curl "http://localhost:8122/api/v1/tasks?state=completed,failed"

GET /api/v1/tasks/{id}

Get task status.

Response:

{
  "id": "task-12345678-1234-1234-1234-123456789abc",
  "name": "fix-bug",
  "state": "completed",
  "state_message": "Task completed successfully",
  "created_at": "2024-01-30T12:00:00Z",
  "started_at": "2024-01-30T12:01:00Z",
  "state_changed_at": "2024-01-30T12:10:00Z",
  "vm_name": "agent-task-abc123",
  "vm_ip": "192.168.122.220",
  "exit_code": 0,
  "error": null,
  "progress": {
    "output_bytes": 102400,
    "tool_calls": 23,
    "current_tool": null,
    "last_activity_at": "2024-01-30T12:10:00Z"
  }
}

Status Codes:

  • 200 - Success
  • 404 - Task not found

Example:

curl http://localhost:8122/api/v1/tasks/task-12345678-1234-1234-1234-123456789abc

DELETE /api/v1/tasks/{id}

Cancel a running task.

Request Body:

{
  "reason": "User cancelled via dashboard"
}

Response:

{
  "success": true,
  "error": null
}

Status Codes:

  • 200 - Success
  • 400 - Cannot cancel (e.g., already completed)
  • 404 - Task not found

Example:

curl -X DELETE http://localhost:8122/api/v1/tasks/task-12345678-1234-1234-1234-123456789abc \
  -H "Content-Type: application/json" \
  -d '{"reason": "User requested cancellation"}'

GET /api/v1/tasks/{id}/logs

Stream task logs via Server-Sent Events (SSE).

Response: SSE stream

Event Types:

  • stdout - Standard output from Claude Code
  • stderr - Standard error from Claude Code
  • event - Structured event (JSON)
  • completed - Task finished (data: exit code)
  • error - Task error (data: error message)

Status Codes:

  • 200 - Success (streaming)
  • 404 - Task not found

Example:

curl -N http://localhost:8122/api/v1/tasks/task-12345678-1234-1234-1234-123456789abc/logs

SSE Output:

event: stdout
data: Analyzing codebase...

event: stdout
data: Running tests...

event: completed
data: 0

GET /agents/{instance_id}/v1/tasks/{task_id}/artifacts

List JSON artifacts persisted for an A2A task, including stdout/stderr chunks captured from messages:send dispatch. This route reads the executor TaskStore; it is separate from the legacy filesystem artifact route under /api/v1/tasks/{id}/artifacts.

Response:

{
  "task_id": "task-123",
  "artifacts": [
    {
      "artifact_id": "task-123-stdout-0001",
      "task_id": "task-123",
      "created_at": "2026-05-21T00:00:00Z",
      "artifact": {
        "kind": "output_chunk",
        "stream": "stdout",
        "data": "hello\n",
        "seq": 1
      }
    }
  ]
}

Status Codes:

  • 200 - Success
  • 404 - Task not found for that instance

GET /agents/{instance_id}/v1/tasks/{task_id}/artifacts/{artifact_id}

Return one persisted A2A task artifact JSON blob.

Status Codes:

  • 200 - Success
  • 404 - Task or artifact not found

GET /api/v1/tasks/{id}/artifacts

List artifacts produced by a task.

Response:

{
  "artifacts": [
    {
      "name": "summary.md",
      "path": "summary.md",
      "size_bytes": 2048,
      "content_type": "text/markdown",
      "checksum": ""
    }
  ]
}

Status Codes:

  • 200 - Success
  • 404 - Task not found

Example:

curl http://localhost:8122/api/v1/tasks/task-12345678-1234-1234-1234-123456789abc/artifacts

GET /api/v1/tasks/{id}/artifacts/{name}

Download a specific artifact.

Response: File download with appropriate Content-Type and Content-Disposition headers.

Status Codes:

  • 200 - Success
  • 404 - Task or artifact not found

Example:

curl -O http://localhost:8122/api/v1/tasks/task-12345678-1234-1234-1234-123456789abc/artifacts/summary.md

gRPC API

Address: localhost:8120

The gRPC API is used for bidirectional communication between agents and the management server. See proto/agent.proto for complete protocol definitions.

Service: AgentService

Connect (Bidirectional Stream)

Establishes a persistent connection for agent-management communication.

Agent → Management Messages:

  • AgentRegistration - Initial registration with system info
  • Heartbeat - Periodic status updates (every 30s)
  • OutputChunk - stdout/stderr/log streams
  • CommandResult - Command execution results
  • Metrics - Resource usage snapshots
  • SessionReport - Active sessions for reconciliation
  • SessionReconcileAck - Reconciliation confirmation

Management → Agent Messages:

  • RegistrationAck - Accept registration
  • CommandRequest - Execute command
  • ConfigUpdate - Update configuration
  • ShutdownSignal - Graceful shutdown request
  • Ping - Keepalive
  • StdinChunk - Input for running command
  • PtyControl - PTY resize/signal
  • SessionQuery - Request session report
  • SessionReconcile - Session cleanup instructions

Authentication Headers:

x-agent-id: agent-01
x-agent-secret: <plaintext-secret-from-vm>

Example using grpcurl:

# Note: Connect is a bidirectional stream, grpcurl example shown for reference
grpcurl -plaintext \
  -H "x-agent-id: agent-01" \
  -H "x-agent-secret: secret-from-vm" \
  -d @ \
  localhost:8120 agentic.sandbox.v1.AgentService/Connect

Exec (Server Streaming)

Execute a one-shot command and stream output.

Request:

{
  "agent_id": "agent-01",
  "command": "ls",
  "args": ["-la", "/tmp"],
  "working_dir": "/home/agent",
  "env": {"DEBUG": "1"},
  "timeout_seconds": 60
}

Response Stream:

{"stream": "STREAM_STDOUT", "data": "dG90YWwgNAo=", "exit_code": 0, "complete": false}
{"stream": "STREAM_STDOUT", "data": "ZHJ3eHJ3eHJ3eCA=", "exit_code": 0, "complete": false}
{"stream": "STREAM_STDOUT", "data": "", "exit_code": 0, "complete": true}

Stream Types:

  • STREAM_STDOUT (1) - Standard output
  • STREAM_STDERR (2) - Standard error

Example using grpcurl:

grpcurl -plaintext \
  -d '{
    "agent_id": "agent-01",
    "command": "echo",
    "args": ["Hello, World!"],
    "timeout_seconds": 10
  }' \
  localhost:8120 agentic.sandbox.v1.AgentService/Exec

Protocol Messages

AgentRegistration

message AgentRegistration {
  string agent_id = 1;          // VM name (e.g., "agent-01")
  string ip_address = 2;        // Agent's IP
  string hostname = 3;          // Hostname
  string profile = 4;           // Profile used (basic, agentic-dev)
  map<string, string> labels = 5;
  SystemInfo system = 6;
}

message SystemInfo {
  string os = 1;                // e.g., "Ubuntu 24.04"
  string kernel = 2;            // e.g., "6.8.0-generic"
  int32 cpu_cores = 3;
  int64 memory_bytes = 4;
  int64 disk_bytes = 5;
}

CommandRequest

message CommandRequest {
  string command_id = 1;        // Unique ID for correlation
  string command = 2;           // Command to execute
  repeated string args = 3;     // Arguments
  string working_dir = 4;       // Working directory
  map<string, string> env = 5;  // Environment variables
  int32 timeout_seconds = 6;    // Execution timeout (0 = no timeout)
  bool capture_output = 7;      // Stream stdout/stderr back
  string run_as = 8;            // User to run as (default: agent)

  // PTY terminal options
  bool allocate_pty = 9;        // Spawn in pseudo-terminal
  uint32 pty_cols = 10;         // Terminal width (default: 80)
  uint32 pty_rows = 11;         // Terminal height (default: 24)
  string pty_term = 12;         // TERM env var (default: xterm-256color)
}

Heartbeat

message Heartbeat {
  string agent_id = 1;
  int64 timestamp_ms = 2;
  AgentStatus status = 3;       // STARTING, READY, BUSY, ERROR, SHUTTING_DOWN, STALE, DISCONNECTED
  float cpu_percent = 4;
  int64 memory_used_bytes = 5;
  int64 uptime_seconds = 6;
}

SessionReport & SessionReconcile

Used for post-restart session cleanup.

message SessionReport {
  string agent_id = 1;
  repeated ActiveSession sessions = 2;
  int64 timestamp_ms = 3;
}

message ActiveSession {
  string command_id = 1;        // UUID assigned by server
  string session_name = 2;      // e.g., "main", "claude"
  SessionType session_type = 3; // INTERACTIVE, HEADLESS, BACKGROUND
  string command = 4;           // Original command
  int64 started_at_ms = 5;
  int32 pid = 6;
  bool is_pty = 7;
}

message SessionReconcile {
  repeated string keep_session_ids = 1;    // Sessions to keep
  repeated string kill_session_ids = 2;    // Sessions to terminate
  bool kill_unrecognized = 3;              // Kill all not in keep list
  int32 grace_period_seconds = 4;          // Grace period before SIGKILL
}

WebSocket API

Address: ws://localhost:8121

Real-time streaming of agent output, metrics, and events to dashboard clients.

Connection

Connect to ws://localhost:8121 using any WebSocket client. No authentication required.

Example (JavaScript):

const ws = new WebSocket('ws://localhost:8121');

ws.onopen = () => {
  console.log('Connected to WebSocket');
};

ws.onmessage = (event) => {
  const message = JSON.parse(event.data);
  console.log('Received:', message);
};

ws.onclose = () => {
  console.log('Disconnected');
};

Message Types

Messages are JSON with a type field indicating the message type.

Agent Output

Stdout, stderr, and log streams from agents.

{
  "type": "output",
  "agent_id": "agent-01",
  "stream_id": "cmd-12345",
  "stream_type": "stdout",
  "data": "SGVsbG8sIFdvcmxkIQo=",
  "timestamp": 1706572800000
}

Stream Types: "stdout", "stderr", "log"

Agent Metrics

Periodic resource usage updates.

{
  "type": "metrics",
  "agent_id": "agent-01",
  "cpu_percent": 2.3,
  "memory_used_bytes": 536870912,
  "memory_total_bytes": 8589934592,
  "disk_used_bytes": 2147483648,
  "disk_total_bytes": 53687091200,
  "load_avg": [0.15, 0.20, 0.18],
  "timestamp": 1706572800000
}

Agent Status

Agent connection state changes.

{
  "type": "agent_status",
  "agent_id": "agent-01",
  "status": "Ready",
  "timestamp": 1706572800000
}

Status Values: "Starting", "Ready", "Busy", "Error", "ShuttingDown", "Stale", "Disconnected"


Code Examples

Python Client

import requests
import json
from typing import Optional

class AgenticClient:
    def __init__(self, base_url: str = "http://localhost:8122"):
        self.base_url = base_url
        self.session = requests.Session()

    def list_agents(self):
        """List all connected agents."""
        resp = self.session.get(f"{self.base_url}/api/v1/agents")
        resp.raise_for_status()
        return resp.json()["agents"]

    def list_vms(self, state: str = "all"):
        """List VMs with optional state filter."""
        resp = self.session.get(
            f"{self.base_url}/api/v1/vms",
            params={"state": state}
        )
        resp.raise_for_status()
        return resp.json()["vms"]

    def create_vm(
        self,
        name: str,
        profile: str = "agentic-dev",
        vcpus: int = 4,
        memory_mb: int = 8192,
        disk_gb: int = 50,
        start: bool = True
    ):
        """Create a new VM."""
        resp = self.session.post(
            f"{self.base_url}/api/v1/vms",
            json={
                "name": name,
                "profile": profile,
                "vcpus": vcpus,
                "memory_mb": memory_mb,
                "disk_gb": disk_gb,
                "agentshare": True,
                "start": start
            }
        )
        resp.raise_for_status()
        return resp.json()["operation"]["id"]

    def get_operation(self, op_id: str):
        """Poll operation status."""
        resp = self.session.get(f"{self.base_url}/api/v1/operations/{op_id}")
        resp.raise_for_status()
        return resp.json()

    def wait_for_operation(self, op_id: str, timeout: int = 300):
        """Poll until operation completes."""
        import time
        start = time.time()
        while time.time() - start < timeout:
            op = self.get_operation(op_id)
            if op["status"] == "completed":
                return op
            elif op["status"] == "failed":
                raise Exception(f"Operation failed: {op.get('error')}")
            time.sleep(2)
        raise TimeoutError("Operation timed out")

    def start_vm(self, name: str):
        """Start a VM."""
        resp = self.session.post(f"{self.base_url}/api/v1/vms/{name}/start")
        resp.raise_for_status()
        return resp.json()

    def stop_vm(self, name: str):
        """Stop a VM gracefully."""
        resp = self.session.post(f"{self.base_url}/api/v1/vms/{name}/stop")
        resp.raise_for_status()
        return resp.json()

    def delete_vm(self, name: str, delete_disk: bool = False, force: bool = False):
        """Delete a VM."""
        resp = self.session.delete(
            f"{self.base_url}/api/v1/vms/{name}",
            params={"delete_disk": delete_disk, "force": force}
        )
        resp.raise_for_status()
        return resp.json()

# Usage
client = AgenticClient()

# List agents
agents = client.list_agents()
print(f"Connected agents: {len(agents)}")

# Create VM and wait for completion
op_id = client.create_vm("agent-05")
print(f"Provisioning started: {op_id}")
result = client.wait_for_operation(op_id)
print(f"VM created: {result['result']}")

# Start/stop VM
client.stop_vm("agent-05")
client.start_vm("agent-05")

# Delete VM
client.delete_vm("agent-05", delete_disk=True, force=True)

JavaScript/Node.js Client

const axios = require('axios');

class AgenticClient {
  constructor(baseUrl = 'http://localhost:8122') {
    this.baseUrl = baseUrl;
    this.client = axios.create({ baseURL: baseUrl });
  }

  async listAgents() {
    const resp = await this.client.get('/api/v1/agents');
    return resp.data.agents;
  }

  async listVMs(state = 'all') {
    const resp = await this.client.get('/api/v1/vms', {
      params: { state }
    });
    return resp.data.vms;
  }

  async createVM(options) {
    const {
      name,
      profile = 'agentic-dev',
      vcpus = 4,
      memoryMb = 8192,
      diskGb = 50,
      start = true
    } = options;

    const resp = await this.client.post('/api/v1/vms', {
      name,
      profile,
      vcpus,
      memory_mb: memoryMb,
      disk_gb: diskGb,
      agentshare: true,
      start
    });
    return resp.data.operation.id;
  }

  async getOperation(opId) {
    const resp = await this.client.get(`/api/v1/operations/${opId}`);
    return resp.data;
  }

  async waitForOperation(opId, timeout = 300000) {
    const start = Date.now();
    while (Date.now() - start < timeout) {
      const op = await this.getOperation(opId);
      if (op.status === 'completed') {
        return op;
      } else if (op.status === 'failed') {
        throw new Error(`Operation failed: ${op.error}`);
      }
      await new Promise(resolve => setTimeout(resolve, 2000));
    }
    throw new Error('Operation timed out');
  }

  async startVM(name) {
    const resp = await this.client.post(`/api/v1/vms/${name}/start`);
    return resp.data;
  }

  async stopVM(name) {
    const resp = await this.client.post(`/api/v1/vms/${name}/stop`);
    return resp.data;
  }

  async deleteVM(name, options = {}) {
    const { deleteDisk = false, force = false } = options;
    const resp = await this.client.delete(`/api/v1/vms/${name}`, {
      params: { delete_disk: deleteDisk, force }
    });
    return resp.data;
  }
}

// Usage
(async () => {
  const client = new AgenticClient();

  // List agents
  const agents = await client.listAgents();
  console.log(`Connected agents: ${agents.length}`);

  // Create VM
  const opId = await client.createVM({ name: 'agent-06' });
  console.log(`Provisioning started: ${opId}`);
  const result = await client.waitForOperation(opId);
  console.log(`VM created:`, result.result);
})();

curl Examples

# Health check
curl http://localhost:8122/healthz

# List agents
curl http://localhost:8122/api/v1/agents | jq

# List running VMs
curl "http://localhost:8122/api/v1/vms?state=running" | jq

# Get VM details
curl http://localhost:8122/api/v1/vms/agent-01 | jq

# Create VM
curl -X POST http://localhost:8122/api/v1/vms \
  -H "Content-Type: application/json" \
  -d '{"name":"agent-07"}' | jq

# Poll operation status
curl http://localhost:8122/api/v1/operations/op-12345 | jq

# Start VM
curl -X POST http://localhost:8122/api/v1/vms/agent-07/start | jq

# Stop VM
curl -X POST http://localhost:8122/api/v1/vms/agent-07/stop | jq

# Restart VM
curl -X POST http://localhost:8122/api/v1/vms/agent-07/restart \
  -H "Content-Type: application/json" \
  -d '{"mode":"graceful","timeout_seconds":60}' | jq

# Delete VM
curl -X DELETE "http://localhost:8122/api/v1/vms/agent-07?delete_disk=true&force=true" | jq

# List events
curl http://localhost:8122/api/v1/events | jq

# Submit task
curl -X POST http://localhost:8122/api/v1/tasks \
  -H "Content-Type: application/json" \
  -d '{
    "manifest": {
      "name": "analyze-repo",
      "repository": {"url": "https://github.com/user/repo"},
      "prompt": "Analyze code quality"
    }
  }' | jq

# List tasks
curl http://localhost:8122/api/v1/tasks | jq

# Stream task logs
curl -N http://localhost:8122/api/v1/tasks/task-12345/logs

Error Codes

HTTP Status Codes

Code Meaning
200 OK - Request successful
202 Accepted - Async operation started
400 Bad Request - Invalid input
404 Not Found - Resource doesn't exist
409 Conflict - Resource state conflict
500 Internal Server Error - Server error
503 Service Unavailable - Service not ready

Application Error Codes

Code Description
VM_NOT_FOUND VM doesn't exist in libvirt
VM_RUNNING VM is running (when stopped required)
VM_STOPPED VM is stopped (when running required)
VM_NOT_RUNNING VM is not running
VM_ALREADY_EXISTS VM name already in use
INVALID_VM_NAME VM name doesn't match pattern
PROVISIONING_ERROR VM provisioning failed
LIBVIRT_ERROR libvirt operation failed
OPERATION_NOT_FOUND Operation ID not found

Endpoints not yet integrated above

The following routes are wired up in management/src/http/server.rs but were absent from the canonical reference. They are documented here in summary form so callers can discover them; the reference sections above will absorb these on the next documentation pass.

Agent lifecycle (extended)

POST /api/v1/agents/{id}/reprovision

Triggers a reprovision of the named agent VM via reprovision-vm.sh.

Response: 202 Accepted with {"operation_id": "...", "status": "queued"}

POST /api/v1/agents/{id}/rotate-secret

Rotates the per-agent shared secret used for the gRPC handshake. Old and new secrets are both accepted during the rotation grace window.

Query params:

  • grace_seconds (optional, default 300) — how long the previous secret remains valid after rotation.

Response: 202 Accepted with {"operation_id": "...", "deadline_ms": 1234567890}

AIWG bridge

GET /api/v1/aiwg/status

Returns current AIWG bridge connection state.

Response:

{ "connected": true, "session_count": 3, "last_event_secs": 12 }

POST /api/v1/aiwg/reconnect

Forces a reconnect of the AIWG bridge.

Response: 200 OK with {"ok": true}

Sessions (agent-scoped)

POST /api/v1/agents/{id}/sessions

Creates a new interactive PTY session on the agent. The response preserves the legacy websocket fields for older clients and also includes the current v2 PTY and orchestrator attach metadata for #321-style TUI orchestration.

Request body:

{
  "command": "bash",
  "session_name": "codex-tui"
}

Both fields are optional. When omitted, the server launches bash with a generated terminal-* session name.

Response:

{
  "session_id": "<stable-session-id>",
  "instance_id": "<routable-a2a-instance-id>",
  "command_id": "<agent-command-correlation-id>",
  "session_name": "codex-tui",
  "ws_endpoint": "ws://{host}:8121/",
  "join_message": {
    "type": "join_session",
    "session_id": "<stable-session-id>",
    "role": "controller"
  },
  "pty_ws_url": "wss://{host}/agents/<instance_id>/sessions/<session_id>/attach",
  "pty_ws_subprotocol": "pty-ws.v1",
  "orchestrator_observer_url": "/ws/sessions/<session_id>/orchestrate?role=observer",
  "orchestrator_controller_url": "/ws/sessions/<session_id>/orchestrate?role=controller",
  "default_role": "observer",
  "controller_policy": "controller input is policy-gated"
}

For new orchestration clients, use default_role: observer first. Controller attachment is intended only for policy-approved bounded input. The legacy ws_endpoint / join_message fields remain for compatibility with older path-agnostic websocket clients.

DELETE /api/v1/agents/{id}/sessions/{session}

Kills a session.

Query params:

  • signal (optional, default TERM) — one of TERM | KILL | INT | HUP.

Response: 200 OK with {"killed": true}

Container images (curated catalog)

GET /api/v1/container-images

Returns the curated agent-image catalog used to populate the dashboard's Create Instance image picker (#179). The list mirrors the Dockerfiles under images/container/ and is updated when new images land in CI.

Response:

{
  "images": [
    { "ref": "agentic/claude:latest",  "label": "Claude",  "description": "Anthropic Claude Code agent",  "default": true },
    { "ref": "agentic/codex:latest",   "label": "Codex",   "description": "OpenAI Codex agent" },
    { "ref": "agentic/opencode:latest","label": "OpenCode","description": "OpenCode agent" },
    { "ref": "agentic/automation-control:latest", "label": "Automation Control", "description": "Orchestrator-ready control image with Codex, Aider, dev tools, and credential-free probes" }
  ]
}

AIWG executor contract

POST /api/v1/sessions/{id}/dispatch

AIWG aiwg serve calls this route to dispatch a mission to this sandbox. See AIWG Executor Contract for the full integration (registration, capabilities, event vocabulary, persistence, lifecycle).

Auth: Authorization: Bearer <token> — token issued at executor registration. Constant-time comparison.

Request body:

{
  "mission_id":  "<UUID>",
  "objective":   "<command/prompt>",
  "completion":  "<optional completion criteria>",
  "long_running": false,
  "executor_filter": { "agent_id": "agent-01" },
  "metadata":    { }
}

Response: 202 Accepted

{
  "mission_id":      "<echo>",
  "executor_id":     "<sandbox instance_id>",
  "status":          "assigned",
  "estimated_start": "<RFC3339>"
}

Failure: 401 (bad token), 404 (agent not found), 503 (no agents available / executor not registered), 500 (dispatcher error — emits mission.failed with reason).

Storage downloads

These complement the upload/list endpoints already documented under Storage:

Endpoint Method Description
/api/v1/storage/global/_download GET Stream a file from the read-only global share
/api/v1/storage/inbox/{agent_id}/_download GET Stream a file from a per-agent inbox
/api/v1/storage/outbox/{task_id}/_download GET Stream a file from a per-task outbox

All three accept ?path=<relative-path> and respond with the raw file bytes (Content-Type inferred from extension).


Rate Limits

Currently no rate limiting is enforced. For production deployments, consider implementing:

  • Per-IP rate limits on HTTP endpoints
  • Connection limits on WebSocket
  • gRPC flow control for agent streams

Versioning

API version is included in the path: /api/v1/...

Current version: v1

Breaking changes will increment the version number. Legacy endpoints are maintained for backwards compatibility where possible.