OpenHive Container Runtime Contracts

This document defines the runtime contract for the four production container roles used by OpenHive:

Gateway containers
Dashboard containers
Agent containers
Sandbox containers

These contracts are the baseline for issue #84 and the Kubernetes productization work that followed. They are intentionally explicit about startup behavior, health checks, writable paths, and isolation expectations so later work such as ContainerAgentPool and reconciliation can build on a stable operational model.

Contract Status

OpenHive currently has two containerization layers:

the current deployable baseline in deploy/k8s/
the evolving container orchestration path built around ContainerAgentPool

This document describes the contract that operators should rely on now, while also identifying which responsibilities remain future work.

For clarity:

the supported preview_local onboarding path is still backend + dashboard from source
the Kubernetes productized control-plane path now adds a standalone dashboard container next to the full Gateway runtime
the matching local operator entrypoint for sandbox workflows is make run-sandbox / uv run uvicorn hive.container.sandbox_entrypoint:app --port 8091 with HIVE_SANDBOX_URL=http://127.0.0.1:8091
local image-parity checks can instead use make run-sandbox-container, which starts the profiled Docker Compose sandbox service built from Dockerfile.sandbox on the same local port
sandbox-backed dev-task workflows now have a real operator-facing surface through the authenticated Gateway and dashboard dev-task review APIs, but this remains an optional local operator workflow rather than a required preview_local onboarding step
the gateway relay and short-lived task-token contract exist for model-backed isolated runtimes; the isolated agent runtime now uses a shared task relay client that future sandbox relay backends should reuse
the canonical current sandbox coding path is the default HIVE_SANDBOX_CODING_BACKEND=codex_cli mode, which runs the local codex exec CLI as a governed subprocess with Codex’s workspace-write sandbox mode so task-local edits can be proposed without full unrestricted access
the sandbox container also exposes an explicit opt-in proof mode, HIVE_SANDBOX_CODING_BACKEND=relay_helper, which swaps that default local CLI execution for a local relay helper process that consumes gateway-issued task tokens
the sandbox container also exposes HIVE_SANDBOX_CODING_BACKEND=deterministic_proof for Docker-local control loop verification only; it does not prove provider-backed coding quality, but it does prove archive seeding, runtime attempts, checkpoints, artifacts, and in-place resume without depending on a live LLM provider
the Docker Compose sandbox profile bind-mounts source hive, packages, and scripts directories so local verification can exercise the current checkout without rebuilding the sandbox image for every Python-only control-loop change
the default codex_cli adapter now scrubs inherited secret env before launch, but it can now also opt in to a narrow provider-env allowlist from the sandbox runtime env when operators need real codex_cli execution
subprocess env construction is centralized for governed runtime paths: inherited provider keys, database credentials, gateway/internal secrets, and app credentials are scrubbed unless a documented direct-env allowlist names the key
sandbox command status includes bounded stdout/stderr previews with sizes, omitted counts, state, exit code, command id, and retrieval hints; full logs remain available only through explicit log retrieval
sandbox-backed Work Runs publish durable progress checkpoints through the sandbox API and persist runtime attempts separately from the logical task id; this lets the operator resume a stale or failed Work Run in place instead of treating timeout extension as the long-term scaling mechanism
checkpoint runtime events expose a typed checkpoint_sequence_no, durable marker, and checkpoint_payload so dashboards can show friendly progress and recovery context without relying on untyped lifecycle-event extras
in-place Work Run resume is a new sandbox attempt under the same logical task: it keeps the task id, current workspace, artifacts/log visibility, and latest checkpoint context, while requeue remains the fallback that creates a new logical task when the original workspace is unavailable or should be forked
the governed process-session lifecycle is list, poll, log, and cancel over existing sandbox command ids, not an arbitrary terminal or shell creation API
when an operator also supplies sandbox runtime provider metadata such as a provider id and base URL, the sandbox entrypoint can materialize a minimal runtime ~/.codex/config.toml under the writable sandbox home so the default codex_cli path can target an OpenAI-compatible provider without baking that config into the image
the Kubernetes sandbox pod now sets HIVE_SANDBOX_CODEX_SANDBOX_MODE=danger-full-access for the inner Codex CLI because the pod itself is already the outer governed workspace boundary and container runtimes such as Docker Desktop block Codex's inner bubblewrap namespace sandbox
the same pod also mounts a dedicated writable home at /home/codex instead of reusing /tmp, because real non-ephemeral Codex CLI runs may stall when their managed home directory lives directly under a temporary root
when a remote sandbox is seeded from workspace_archive_b64, approval first tries to apply the patch back to the original requested workspace path; if that path is not visible inside the sandbox pod, operators may configure an explicit host-side apply relay with HIVE_SANDBOX_WORKSPACE_APPLY_RELAY_URL and HIVE_SANDBOX_WORKSPACE_APPLY_RELAY_TOKEN
dev-task workspace state now has a source-tested WorkspaceManifest shape that records task id, repo path class, artifact manifest path, target paths, source/requested workspace path presence, apply-back mode, and relay configuration without storing raw archives, patches, or secrets in run state
that env-auth path still is not the same as a relay-backed sandbox runtime; it is a direct provider-env bridge for the local CLI, not a token flow
that means vendor-secret residency claims only apply to explicitly relay-backed sandbox or agent runtimes, not to every current preview_local dev-task execution path or the default codex_cli sandbox path

Common Contract Rules

All OpenHive runtime containers share these rules:

Area	Contract
Health endpoint	Each role must expose a lightweight HTTP probe endpoint. Gateway, Agent, and Sandbox use `GET /healthz`; Dashboard uses `GET /dashboard-healthz`.
Probe behavior	Health endpoints must return quickly and without external side effects.
Logging	Containers write structured logs to stdout/stderr.
Shutdown	Containers must tolerate `SIGTERM` and allow the orchestrator to stop them without manual cleanup steps.
Secrets	Secrets come from environment or mounted secret sources. They must never be baked into images or persisted in logs.
Filesystem policy	Any path that must be writable at runtime must be explicitly mounted or documented. Read-only roots are preferred where practical.

Role Contracts

1. Gateway Container

Primary responsibility

Run the platform control plane
Serve the platform API and session/auth endpoints consumed by the dashboard
Own long-lived coordination such as auth, scheduling, routing, and credential proxying

Current entrypoints

Baseline K8s deployment: hive.container.gateway_entrypoint:app
Full application runtime: hive.main:app

Startup contract

The baseline deployment may use the minimal gateway entrypoint so the image is probeable before the full runtime stack is configured.
A full production gateway must only be considered healthy after FastAPI startup wiring completes, including DB setup, scheduler startup, and route mounting.
The Kubernetes full-runtime overlay at deploy/k8s/overlays/full-gateway-runtime enables HIVE_STRICT_STARTUP=true, which makes startup fail fast when required secrets are still on placeholder defaults or when DB initialization and migrations cannot complete.
The same overlay also sets HIVE_STARTUP_MIGRATION_MODE=check and HIVE_METADATA_CREATE_ON_STARTUP=false, so the Kubernetes path expects the schema to already be at the current Alembic head instead of mutating it implicitly during control-plane startup.
The same overlay adds a startupProbe so Kubernetes waits for the real control-plane startup path before liveness checks restart the pod.

Health contract

Port: 8080
Endpoint: GET /healthz
Baseline response identifies role gateway
Full runtime response also reflects DB and active-agent health via hive.main

Filesystem and volume contract

No persistent workspace volume is required for the baseline gateway container.
The full runtime may read project workspace and config paths, but gateway-owned state must remain reconstructable from DB plus configured workspace mounts.
The current full-runtime overlay mounts /data/hive as a writable emptyDir. That is enough for control-plane bring-up and route reachability, but operators should treat it as non-durable until a persistent storage story is wired in.

Isolation contract

Namespace: openhive
Gateway is the intended long-term role that holds real vendor secrets in process memory.
In the current source-based preview_local deployment, Keeper and Scout often still run in the same process as the gateway via LocalAgentPool, so this boundary is only fully enforced today for relay-backed sandbox model access and encrypted-at-rest channel credentials.
Gateway may talk to DB, Agent runtime, Sandbox runtime, and approved IM / LLM upstreams.

Required environment

Full runtime requires the normal server settings used by hive.main, especially database connectivity, dashboard session secret, and platform credentials.
The current full-runtime overlay treats these as the minimum required env surface:
- DATABASE_URL
- DASHBOARD_SESSION_SECRET
- HIVE_INTERNAL_SECRET
- HIVE_WORKSPACE
- HIVE_STRICT_STARTUP=true
- HIVE_STARTUP_MIGRATION_MODE=check
- HIVE_METADATA_CREATE_ON_STARTUP=false
Pool selection stays in the composition root and currently uses these modes:
- HIVE_POOL_BACKEND=local keeps the in-process LocalAgentPool path used by source-based preview flows.
- HIVE_POOL_BACKEND=container selects ContainerAgentPool and requires HIVE_CONTAINER_RUNTIME_BACKEND to choose the isolated-runtime backend.
- The first supported Kubernetes deployment path uses HIVE_CONTAINER_RUNTIME_BACKEND=kubernetes. Local source-based experiments can still use local_subprocess to start one isolated HTTP runtime per agent on the same host.
Baseline health-only deployment does not require the full control-plane env surface.

2. Dashboard Container

Primary responsibility

Serve the operator-facing Next.js dashboard
Keep browser traffic same-origin while proxying dashboard-originated API calls to Gateway
Preserve cookie-based auth behavior for self-hosted deployments

Current entrypoint

Standalone Next.js runtime generated from Dockerfile.web

Startup contract

The production image is built from Next.js standalone output and starts the generated server.js runtime.
The runtime must listen on port 3000 and tolerate the standard PORT and HOSTNAME environment variables used by the standalone server.
Dashboard API traffic must not depend on local-dev-only assumptions such as a hardcoded http://localhost:8080 rewrite target.
Same-origin browser requests to /api/* and /healthz are proxied by the Next.js proxy.ts layer to the configured in-cluster Gateway origin.

Health contract

Port: 3000
Endpoint: GET /dashboard-healthz
Response identifies role dashboard

Filesystem and volume contract

No writable volume is required for the current standalone dashboard runtime.
Static assets and the generated server bundle ship inside the image.

Isolation contract

Namespace: openhive
Dashboard may reach the in-cluster Gateway service.
Dashboard does not require direct DB access.
Browser-visible API traffic should remain same-origin to the dashboard host; the in-cluster hop to Gateway happens server-side through the dashboard proxy.

Required environment

HIVE_GATEWAY_INTERNAL_URL for runtime proxying to Gateway
Optional PORT / HOSTNAME overrides supported by the standalone Next.js server
NEXT_PUBLIC_API_URL remains optional for special non-proxied deployments, but the supported Kubernetes path uses the same-origin proxy instead

3. Agent Container

Primary responsibility

Host exactly one agent runtime instance for the Kubernetes-backed ContainerAgentPool path
Expose a probeable runtime boundary that can be started, stopped, and replaced independently

Current entrypoint

hive.container.agent_entrypoint:app

Startup contract

The container must start without mutating bundled defaults inside the image.
Config bootstrap must copy default files into the writable config volume without overwriting operator edits.
The agent process must read its writable workspace from HIVE_WORKSPACE.
When HIVE_AGENT_CONFIG_JSON is supplied, the entrypoint must bootstrap one real HiveAgent runtime inside the container and expose it through the existing /run and /flush-memory HTTP contract.
The first Kubernetes runtime backend now creates one Pod plus one Service per managed agent identity, with deterministic runtime naming and Service-DNS base URL resolution. Exact agent_id and controller ownership values are carried in Pod annotations rather than lossy label-safe rewrites.
The isolated runtime must route model calls back through HIVE_GATEWAY_URL using the gateway relay flow. It must not require a long-lived vendor API key in the agent container environment.
That vendor-secret boundary is narrower than a blanket "no secrets at all" claim: the current local separated-process runtime may still receive DATABASE_URL when DB-backed runtime features are enabled, so operators should interpret the current proof as provider/app credential isolation rather than total operational-secret elimination.

Health contract

Port: 8090
Endpoint: GET /healthz
Response includes:
- status
- role=agent
- workspace
- runtime_ready
- agent_id
- project_id
- controller_id
- deployment_backend
- readiness_reason

Filesystem and volume contract

Writable config/workspace mount is required.
Current baseline mount path:
- /data/config
Current baseline env:
- HIVE_WORKSPACE=/data/config
The init container writes defaults from /app/defaults/agent into the writable volume.
The first Kubernetes-backed lifecycle slice uses an emptyDir workspace per managed agent Pod. That keeps the runtime contract explicit without claiming durable per-agent storage yet.
Agent runtime data must live in the mounted workspace, not in the image filesystem.

Isolation contract

Namespace: hive-agents
Agent containers should not require arbitrary external egress.
Agent containers may reach:
- Gateway API
- Sandbox API
Agent containers must not rely on direct access to another agent container.
State-changing agent-runtime HTTP endpoints require the gateway/controller shared secret (X-Internal-Secret or Bearer auth). GET /healthz remains the probe endpoint and does not execute runtime work.

Required environment

HIVE_WORKSPACE
HIVE_AGENT_CONFIG_JSON for the real isolated-runtime mode
HIVE_GATEWAY_URL so the agent runtime can reach the gateway relay
HIVE_INTERNAL_SECRET for local relay-token issuance until a narrower per-agent token bootstrap path lands, and for authenticating gateway/controller calls into /run, /resume-run, and /flush-memory
Any per-agent runtime metadata supplied by the future pool implementation
Long-lived vendor API secrets must not be injected into agent containers; the supported isolated mode uses the gateway relay instead

4. Sandbox Container

Primary responsibility

Execute sandbox-local development/runtime tasks
Persist task-local logs and artifacts in a bounded writable area
Remain isolated from agent runtime and arbitrary network destinations

Current entrypoint

hive.container.sandbox_entrypoint:app

Startup contract

Sandbox API startup requires DB connectivity because task metadata and events are persisted.
The sandbox entrypoint is not a placeholder-only probe target anymore: it is the current runtime for the governed dev-task lane used by the project-scoped Gateway facade.
The container root filesystem may remain read-only if writable task storage is mounted separately.

Health contract

Port: 8091
Endpoint: GET /healthz
Response identifies role sandbox
Dev-task status responses include an optional runtime block with backend_run_id, execution_class, artifact_root, log_root, and heartbeat metadata so operators can map OpenHive tasks back to sandbox execution state

Filesystem and volume contract

Sandbox task-local writable root:
- /sandbox/commands
- /sandbox/tasks
The current runtime also benefits from a writable /tmp mount for subprocess and tool behavior.
Task storage under /sandbox/tasks/<task_id>/ is split into:
- repo/
- artifacts/
- logs/
- scratch/
Task-local writable storage must be ephemeral or policy-controlled; artifacts that need to survive task completion must be persisted through the sandbox API contract.

Isolation contract

Namespace: hive-sandbox
Sandbox containers must not require agent-to-agent communication.
Sandbox containers may reach Gateway for control-plane interaction.
Sandbox containers must not talk directly to agent runtime.
External egress is deny-by-default and should only be opened through explicit allowlists.
The reusable /commands API is intentionally narrower than unrestricted shell access: it only accepts governed argv-based commands, rejects shell entrypoints and env overrides, and requires explicit allowlisted registries for networked package-install flows.

Required environment

DATABASE_URL for the current sandbox API runtime
optional HIVE_SANDBOX_CODING_BACKEND; omit it or set codex_cli for the default governed local CLI path, or set relay_helper for the explicit gateway-relay-backed helper proof path
optional HIVE_SANDBOX_CODEX_AUTH_MODE; leave unset or scrubbed for the default secret-scrubbed codex_cli child env, or set env to allow the governed codex subprocess to receive explicitly allowlisted provider env vars
optional HIVE_SANDBOX_CODEX_ENV_ALLOWLIST; comma-separated provider env keys such as OPENAI_API_KEY or QWEN_API_KEY that may be forwarded only when HIVE_SANDBOX_CODEX_AUTH_MODE=env
optional HIVE_SANDBOX_CODEX_MODEL; defaults to qwen3-max but may be overridden when the configured Codex provider only supports a different model
optional HIVE_SANDBOX_WORKSPACE_APPLY_RELAY_URL and HIVE_SANDBOX_WORKSPACE_APPLY_RELAY_TOKEN for archive-seeded tasks whose original workspace lives outside the sandbox filesystem. This is an operator-local apply-back channel, not a model-token relay; the sandbox sends the approved patch to that URL only after PM approval and only when the original requested workspace path is not locally reachable.
Any future sandbox execution settings required by backend-specific task runners

Probe and Readiness Semantics

Until dedicated readiness endpoints are introduced, OpenHive uses the following rule:

Gateway, Agent, and Sandbox use GET /healthz for both liveness and readiness probes
Dashboard uses GET /dashboard-healthz for both liveness and readiness probes

That means:

handlers must stay lightweight
baseline entrypoints must not depend on slow external calls inside health probes
probe responses may still include lightweight ownership and readiness metadata when it helps distinguish startup delays from hard bootstrap failures

Kubernetes Baseline Mapping

The manifests under deploy/k8s/base/ and the full-runtime overlays map these contracts as follows:

Role	Namespace	Port	Probe	Writable paths
Gateway	`openhive`	`8080`	`/healthz`	none required in baseline; `/data/hive` in full runtime
Dashboard	`openhive`	`3000`	`/dashboard-healthz`	none required
Agent	`hive-agents`	`8090`	`/healthz`	`/data/config`
Sandbox	`hive-sandbox`	`8091`	`/healthz`	`/sandbox/commands`, `/sandbox/tasks`, `/tmp`

Non-Goals for This Contract

This document does not claim that the following are already implemented:

per-agent production relay APIs
Kubernetes-native autoscaling behavior
final HA topology for the dashboard or Gateway control plane

Those concerns should build on this contract rather than redefining the runtime surface.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

OpenHive Container Runtime Contracts

Contract Status

Common Contract Rules

Role Contracts

1. Gateway Container

2. Dashboard Container

3. Agent Container

4. Sandbox Container

Probe and Readiness Semantics

Kubernetes Baseline Mapping

Non-Goals for This Contract

FilesExpand file tree

container-runtime-contracts.md

Latest commit

History

container-runtime-contracts.md

File metadata and controls

OpenHive Container Runtime Contracts

Contract Status

Common Contract Rules

Role Contracts

1. Gateway Container

2. Dashboard Container

3. Agent Container

4. Sandbox Container

Probe and Readiness Semantics

Kubernetes Baseline Mapping

Non-Goals for This Contract