ARK is a single-machine AI operations platform. One box runs everything: LLM inference, API proxy, chat interface, image generation, experiment orchestration, 3D visualization, monitoring, and secure multi-tenant access.
No Kubernetes. No microservices mesh. No cloud dependency. One machine, one docker-compose, full sovereignty.
| Component | Minimum | Recommended (Ark Spec) |
|---|---|---|
| CPU | 8-core x86_64 | AMD Ryzen 9 9950X (16C/32T) |
| RAM | 32 GB DDR4 | 96 GB DDR5 |
| GPU | RTX 3060 12GB | RTX 4090/5090 (24-32GB VRAM) |
| Storage | 500 GB NVMe | 2 TB NVMe |
| Network | 100 Mbps | 1 Gbps symmetric |
| OS | Ubuntu 22.04+ | Ubuntu 24.04 LTS |
INTERNET
│
▼
┌──────────────────┐
│ Nginx (443) │ ← Let's Encrypt TLS
│ Reverse Proxy │
└────────┬─────────┘
│
┌────────────┼────────────────────────┐
│ │ │
┌────▼────┐ ┌────▼────┐ ┌────▼────┐ ┌──▼──────┐
│Open │ │LiteLLM │ │Grafana │ │ Other │
│WebUI │ │ :4000 │ │ :3001 │ │Services │
│ :3000 │ │ │ │ │ │ │
└────┬────┘ └────┬────┘ └─────────┘ └─────────┘
│ │
└──────┬─────┘
▼
┌─────────────┐ ┌─────────────┐
│ Ollama │────▶│ RTX 5090 │
│ :11434 │ │ 32GB VRAM │
└──────────────┘ └─────────────┘
| Port | Service | Binding | Access |
|---|---|---|---|
| 22 | SSH | 0.0.0.0 | Key auth only |
| 80 | Nginx HTTP | 0.0.0.0 | Redirect to 443 |
| 443 | Nginx HTTPS | 0.0.0.0 | TLS termination |
| 3000 | Open WebUI | 127.0.0.1 | Via Nginx |
| 3001 | Grafana | 127.0.0.1 | Via Nginx |
| 4000 | LiteLLM | 0.0.0.0 | API key required |
| 5678 | n8n | 127.0.0.1 | Via Nginx |
| 8000 | ChromaDB | 127.0.0.1 | Via Nginx |
| 8188 | ComfyUI | 127.0.0.1 | Via Nginx |
| 9090 | Prometheus | 127.0.0.1 | Via Nginx + basic auth |
| 9100 | node_exporter | 0.0.0.0 | Metrics only |
| 9835 | nvidia_gpu_exporter | 0.0.0.0 | Metrics only |
| 11434 | Ollama | 0.0.0.0 | Via WireGuard + LiteLLM |
| 51820 | WireGuard | 0.0.0.0 | UDP, VPN mesh |
chat.domain.io → 127.0.0.1:3000 (Open WebUI)
llm.domain.io → 127.0.0.1:4000 (LiteLLM API)
art.domain.io → 127.0.0.1:8188 (ComfyUI)
grafana.domain.io → 127.0.0.1:3001 (Grafana)
n8n.domain.io → 127.0.0.1:5678 (n8n)
prometheus.domain.io → 127.0.0.1:9090 (Prometheus)
chromadb.domain.io → 127.0.0.1:8000 (ChromaDB)
- Ollama (systemd) — Loads/unloads LLM models on demand. VRAM management is critical: only one large model at a time on consumer GPUs.
- ComfyUI (Docker + GPU passthrough) — Image generation via Flux, SD, etc. Holds ~750MB VRAM when idle.
- LiteLLM (Docker) — OpenAI-compatible API that routes to Ollama. Supports model aliasing (
gpt-4→ local model), API key auth, request logging to Prometheus.
- Open WebUI (Docker) — Full-featured chat UI with model selection, document upload, RAG via ChromaDB. Auto-login via proxy auth headers.
- ComfyUI — Drag-and-drop image generation workflows.
- ChromaDB (Docker) — Vector database for RAG embeddings.
- NVMe at
/opt/ark/data/— All persistent data, models, experiments, recordings.
- Prometheus (Docker) — Scrapes node_exporter (CPU/RAM/disk), nvidia_gpu_exporter (GPU temp/VRAM/power), Ollama, LiteLLM.
- Grafana (Docker) — Dashboards for GPU health, system metrics, API usage.
- n8n (Docker) — Workflow automation. Connects to LiteLLM, webhooks, email, etc.
- Nginx (systemd) — Reverse proxy with Let's Encrypt TLS. All subdomains terminate here.
- WireGuard (systemd) — VPN mesh connecting remote nodes (Battlegrounds server, mobile devices).
- iptables — Default DROP policy. Explicit allows for each service.
- API key auth — LiteLLM master key. Virtual keys with budgets (requires DB).
- Geo-blocking — CN/RU IP ranges blocked on API ports.
- Proxy auth — Open WebUI auto-login via trusted headers from Nginx.
User → chat.domain.io → Nginx → Open WebUI → Ollama → GPU → Response
Client → llm.domain.io → Nginx → LiteLLM (key check) → Ollama → GPU → Response
MiroFish (remote) → Sophon Receiver (8099) → /opt/ark/data/experiments/
│
▼
Blender (GPU EEVEE)
│
▼
/opt/ark/data/recordings/
│
▼
axl.domain.io/recordings/
node_exporter (9100) ─┐
nvidia_gpu (9835) ───┤
ollama (11434) ───┼──→ Prometheus (9090) ──→ Grafana (3001)
litellm (4000) ───┘
The GPU is the most constrained resource. VRAM allocation:
| Model | VRAM | Priority |
|---|---|---|
| qwen3.5:35b | ~23 GB | High (primary thinker) |
| devstral-small-2 | ~15 GB | Medium (coder) |
| Dolphin-Mistral-24B | ~13 GB | Low (uncensored) |
| nomic-embed-text | ~0.3 GB | Always loaded |
| ComfyUI (idle) | ~0.75 GB | Background |
Rule: Only one large model loaded at a time. Ollama auto-unloads after timeout. ComfyUI can be freed with curl -X POST http://localhost:8188/free.
If Ollama hangs after VRAM pressure:
curl -X POST http://localhost:8188/free \
-H "Content-Type: application/json" \
-d '{"unload_models":true,"free_memory":true}'
sudo systemctl restart ollama/opt/ark/data/ # Primary data volume (NVMe)
├── docker-compose/ # docker-compose.yml
├── docker-data/ # Docker data root
├── ollama/ # Model weights (~50GB+)
├── open-webui/ # Chat history, user data
├── litellm/ # config.yaml
├── prometheus/ # prometheus.yml + TSDB (30d retention)
├── grafana/ # Dashboard data
├── chromadb/ # Vector store
├── comfyui/ # Models, outputs, workflows
├── n8n/ # Workflow data
├── experiments/ # Sophon experiment data
├── recordings/ # Rendered videos
└── ark-dashboard/ # Landing page app
/opt/ # Applications
├── ark/ # THIS PROJECT
├── teams/ # Agent team profiles (FORGE, LENS, PRISM)
├── sophon-blender/ # Blender scene pipeline
├── sophon-3d/ # Three.js renderer (deprecated)
├── sophon-receiver/ # Experiment data receiver
└── sophon-prism/ # Future interactive cockpit
Default policy: INPUT DROP
Allowed:
- SSH (22) from anywhere
- HTTP/HTTPS (80/443) from anywhere
- WireGuard (51820/UDP) from anywhere
- Docker bridge (172.16.0.0/12) to internal ports
- node_exporter (9100) from Docker only
- nvidia_gpu_exporter (9835) from Docker only
Blocked:
- CN/RU IP ranges on port 4000 (LiteLLM)
- Known scanner IPs (full DROP)
- Open WebUI: Proxy auth via Nginx headers (X-Trusted-Email, X-Trusted-Name)
- LiteLLM: Master API key (
sk-ark-*) - Grafana: Proxy auth via Nginx (X-WEBAUTH-USER)
- Prometheus: Basic auth via Nginx
- SSH: Key-only, no password
- Nginx with Let's Encrypt (certbot auto-renewal)
- All subdomains:
*.yourdomain.com - HSTS enabled
- Install Ubuntu 24.04 LTS
- Partition NVMe, mount at
/opt/ark/data - Install NVIDIA driver (570+) and CUDA toolkit
- Install Docker with NVIDIA Container Toolkit
- Install Ollama, pull models
- Deploy docker-compose (all services)
- Install Nginx, configure subdomains, run certbot
- Install WireGuard, configure peers
- Install Prometheus exporters (node, gpu)
- Import Grafana dashboards
- Configure iptables firewall
- Set up LiteLLM API keys
- Test all endpoints
- Set up monitoring alerts
| Extension | Purpose | Files |
|---|---|---|
| Sophon | Experiment orchestration + 3D replay | /opt/sophon-* |
| Blender Pipeline | Cinema-quality data visualization | /opt/sophon-blender/ |
| MiroFish | Remote agent swarm coordinator | External (DigitalOcean) |
| PRISM | Interactive Three.js cockpit | /opt/sophon-prism/ (future) |