The Gateway service is configured with restart: always in docker-compose.yml.
Docker will automatically restart it after crashes, OOM kills, or host reboots.
# Start all services
docker compose up -d
# Check service status
docker compose ps
# View Gateway logs (last 100 lines, follow)
docker compose logs -f --tail=100 gateway
# Manual restart
docker compose restart gatewayThe preview-era containerization baseline lives under deploy/k8s/.
Use docs/deploy-k8s.md for namespace layout, NetworkPolicy
checks, and init-container bootstrap verification. Use
docs/container-runtime-contracts.md for the
explicit startup, health, volume, and isolation contract of the Gateway, Agent,
and Sandbox runtime roles.
For the current Kubernetes productization slice:
- the supported DB mode is operator-managed external PostgreSQL
- OpenHive owns the in-cluster migration Job contract, not PostgreSQL lifecycle
- operators own PostgreSQL backups, restore, retention, and major-version upgrades
- the recommended operator-facing deployment is the preview installer driven by
deploy/k8s/preview-installer/values.env.exampleandmake k8s-preview-install env_file=/path/to/env - the full platform overlay adds the standalone dashboard, same-origin API proxying, and a combined ingress example for dashboard plus API traffic
# Stop all services (data volumes preserved)
docker compose stop
# Stop and remove containers (data volumes preserved)
docker compose downThe backup script (scripts/backup.sh) handles:
| What | Where |
|---|---|
| PostgreSQL full dump | /var/backups/hive/db/hive-TIMESTAMP.sql.gz |
projects/ directory |
/var/backups/hive/projects/projects-TIMESTAMP.tar.gz |
Retention: last 7 days. Older files are deleted automatically.
# 1. Make the script executable (done once)
chmod +x /opt/openhive/scripts/backup.sh
# 2. Open the crontab editor
crontab -e
# 3. Add this line (runs at 02:15 local time daily)
15 2 * * * /opt/openhive/scripts/backup.sh >> /var/log/hive-backup.log 2>&1The script reads credentials from environment variables. Set them in /etc/environment
or a cron-specific env file:
DB_HOST=localhost
DB_PORT=5432
DB_NAME=hive
DB_USER=hive
DB_PASSWORD=<your-password>
HIVE_PROJECTS_DIR=/opt/openhive/.runtime/projects
HIVE_BACKUP_DIR=/var/backups/hive# Run immediately (uses defaults from env)
./scripts/backup.sh
# Override backup directory
./scripts/backup.sh /mnt/external/backupsRestore database:
# Stop the Gateway first to prevent writes during restore
docker compose stop gateway
# Decompress and restore
gunzip -c /var/backups/hive/db/hive-TIMESTAMP.sql.gz | \
PGPASSWORD=$DB_PASSWORD psql \
--host=localhost --port=5432 \
--username=hive hive
# Restart Gateway
docker compose start gatewayRestore projects directory:
# Decompress the archive into the correct location
tar --extract --gzip \
--file=/var/backups/hive/projects/projects-TIMESTAMP.tar.gz \
--directory=/opt/openhive/.runtime/The Gateway exposes a health endpoint at GET /healthz:
{
"status": "ok",
"db": "healthy",
"agents": { "active": 2 }
}| Field | Values |
|---|---|
status |
"ok" or "degraded" |
db |
"healthy" or "unreachable" |
agents.active |
Number of currently active agent instances |
# Quick Gateway check
curl http://localhost:8080/healthz | jq .
# Dashboard container probe when running the standalone web server
curl http://localhost:3000/dashboard-healthz | jq .Agent runtime pods and the sandbox API also expose GET /healthz for probe use:
# Agent runtime probe
curl http://localhost:8090/healthz | jq .
# Sandbox probe
curl http://localhost:8091/healthz | jq .Agent runtime readiness now distinguishes startup from bootstrap failures:
{
"status": "error",
"role": "agent",
"runtime_ready": false,
"agent_id": "keeper:proj_a",
"project_id": "proj_a",
"controller_id": "gateway",
"deployment_backend": "kubernetes",
"readiness_reason": "RuntimeError: relay unavailable"
}Readiness guidance:
| Field | Meaning |
|---|---|
status=ready |
Runtime is serving work and the readiness probe should pass |
status=starting |
Runtime has not finished bootstrap yet |
status=error |
Bootstrap failed; inspect readiness_reason before restarting blindly |
agent_id, project_id, controller_id |
Pod-to-agent ownership mapping for operator triage |
For Kubernetes-backed preview deployments, the quickest operator loop is:
- inspect the failing pod
kubectl get pods -A -o wide - query the pod probe payload with
kubectl exec ... -- wget -qO- http://localhost:8090/healthz - map the pod back to OpenHive ownership through annotations such as
openhive.io/agent-id,openhive.io/project-id,openhive.io/agent-role, andopenhive.io/controller-id - for Keeper dev-task investigations, fetch the task through
/dev-tasks/{task_id}and inspect the nestedruntimeblock forbackend_run_id,execution_class,artifact_root, andlog_root
Every HTTP request and Feishu WebSocket event is assigned a trace_id.
All log lines within a request chain carry the same trace_id field.
# Correlate all log lines for a single request
docker compose logs gateway | grep '"trace_id": "abc123def456"'HTTP responses include the trace ID in the X-Trace-Id header for
easy correlation from client logs.
Structlog outputs JSON-formatted lines to stdout. Docker captures them.
# Stream all logs
docker compose logs -f gateway
# Filter by log level
docker compose logs gateway 2>&1 | grep '"log_level": "error"'
# Last 1000 lines
docker compose logs --tail=1000 gatewayFor production, consider shipping logs to a centralised store (Loki, Datadog, etc.)
by configuring the Docker logging driver in docker-compose.yml.
Monitor the backup directory and Docker volumes:
# Backup sizes
du -sh /var/backups/hive/db/* /var/backups/hive/projects/*
# Docker volume (PostgreSQL data)
docker system df -v | grep pgdata
# Projects directory
du -sh .runtime/projects/