Configuration for the NVIDIA DGX Spark workstation spark-1822 — a single-box, self-hosted LLM setup:
- vLLM and llama.cpp for inference (HF safetensors and GGUF respectively, both GPU-accelerated on the GB10).
- Open WebUI + Ollama for the chat UI.
- Traefik as the HTTPS reverse proxy in front of everything (docker-label-driven, mints its own internal CA).
- Cloudflare Tunnel for outbound-only public ingress (no inbound ports on the host).
- Tailscale sidecar for tailnet-only ingress (peer-to-peer over WireGuard; no public DNS, no public ports).
- Netdata for real-time observability.
- mDNS helper publishing
<sub>.spark-1822.localaliases on the LAN. - Trivy + Dependabot keep the supply chain honest in CI.
Three ingress paths into the same backends:
LAN client ──(mDNS *.spark-1822.local)──> traefik :80/:443 ──> backend
public client ──(DNS, Cloudflare edge)──> cloudflared ──> traefik :80 ──> backend
tailnet client ──(MagicDNS, WireGuard)──> tailscale :443 ──> traefik :80 ──> backend
Backends (vllm, llama-cpp, open-webui, ollama, netdata) all sit on a single shared Docker network named traefik — defined by the traefik/ stack, joined as external: true by everyone else. The active proxy and both tunnel/sidecar connectors each attach to the same network and dial container names directly.
TLS comes from three different roots: Traefik mints its own internal CA (clients install traefik-root.crt once); Cloudflare provides publicly-trusted certs at its edge for the tunnel hostnames; Tailscale auto-provisions publicly-trusted MagicDNS certs for the tailnet hostname.
.
├── traefik/ # HTTPS reverse proxy (docker-label-driven)
├── cloudflare/ # Cloudflare Tunnel connector — public ingress
├── tailscale/ # Tailscale sidecar — tailnet ingress
├── vllm/ # vLLM inference server (HF safetensors)
├── llama-cpp/ # llama.cpp inference server (GGUF)
├── open-webui/ # Open WebUI + Ollama (chat UI)
├── netdata/ # Real-time observability
├── mdns/ # Host-side mDNS aliases helper
├── .github/ # CI: Trivy workflow + Dependabot config
├── sparky.svg # Project logo (AI self-portrait)
├── CHANGELOG.md
├── LICENSE
└── README.md
Each stack has its own README.md — start there for deploy / configure / upgrade details.
| Stack | Role | URL on LAN |
|---|---|---|
traefik/ |
HTTPS reverse proxy, docker-label-driven, mints its own internal CA | publishes :80/:443 |
cloudflare/ |
Cloudflare Tunnel connector — outbound-only public ingress | configurable per-hostname in the CF dashboard |
tailscale/ |
Tailscale sidecar — tailnet-only ingress over WireGuard, optional Serve overlay fronts traefik |
https://spark-1822.<tailnet>.ts.net |
vllm/ |
vLLM inference server (HF safetensors), tool-calling enabled (qwen3_xml) |
https://vllm.spark-1822.local |
llama-cpp/ |
llama.cpp GPU-accelerated inference server (GGUF). Router mode (default) serves every GGUF in the HF cache on demand; classic single-model mode also supported. OpenAI-compatible API + web UI | https://llama.spark-1822.local |
open-webui/ |
Open WebUI + Ollama (GPU on Ollama only) | https://open-webui.spark-1822.local, https://ollama.spark-1822.local |
netdata/ |
Real-time host + container telemetry | https://netdata.spark-1822.local |
mdns/ |
Host systemd template publishing <sub>.spark-1822.local mDNS aliases |
host-level |
| Hardware | NVIDIA DGX Spark |
| Hostname | spark-1822.local |
| OS | Ubuntu (kernel 6.17.0-nvidia), aarch64 |
| GPU | NVIDIA GB10 (compute capability 12.1, 124 GiB VRAM) |
| Docker | 29.x + Compose v2 |
| GPU runtime | nvidia-container-toolkit 1.19 (CDI mode) |
On a fresh host, in order:
-
Install the mDNS helper (host-side; publishes
<sub>.spark-1822.localaliases):cd /opt/mdns && make install
-
Bring up the reverse proxy — this also creates the shared
traefikDocker network everything else joins:cd /opt/traefik cp .env.example .env # then set TRAEFIK_TAG make ca-cert # one-time: mint Traefik's internal root CA make wildcard-cert # mint the wildcard leaf signed by that root docker compose up -d
Install
traefik/certs/traefik-root.crton each client that should trust the host's LAN URLs (per-OS install table intraefik/README.md). -
Publish a mDNS alias for each subdomain you'll expose:
cd /opt/mdns for a in traefik vllm llama ollama open-webui netdata; do make add ALIAS=$a; done
-
Bring up the services — each one attaches to the
traefiknetwork and Traefik auto-routes via thetraefik.*labels in its compose:cd /opt/open-webui && cp .env.example .env && docker compose up -d cd /opt/netdata && cp .env.example .env && docker compose up -d cd /opt/vllm && make up ENV=<variant> # see vllm/envs/ cd /opt/llama-cpp && make up ENV=<variant> # see llama-cpp/envs/
-
(Optional) Public ingress via Cloudflare Tunnel — only if you want internet-reachable URLs:
cd /opt/cloudflare cp .env.example .env # paste CLOUDFLARE_TUNNEL_TOKEN from the CF dashboard docker compose up -d
Then configure Public Hostnames in the Cloudflare dashboard so they forward to
http://traefik:80with the matching internal Host header (recipe incloudflare/README.md). -
(Optional) Tailnet ingress via Tailscale — only if you want this host reachable from your tailnet:
cd /opt/tailscale cp .env.example .env # paste TS_AUTHKEY from the Tailscale admin console docker compose up -d
The node registers as
spark-1822.<tailnet>.ts.netwith a real publicly-trusted MagicDNS cert; Tailscale Serve wires:80/:443on the tailnet to Traefik. For per-backend tailnet URLs (https://vllm.<tailnet>.ts.net,https://traefik.<tailnet>.ts.net, …), create one Tailscale VIP Service per backend and apply viamake -C /opt/tailscale services-apply— seetailscale/README.mdfor the full walk-through.
/opt on the host is a checkout of this repo — every stack lives in place at /opt/<name>/. Edit locally, commit, push; then pull on the host:
ssh spark-1822.local 'sudo git -C /opt pull --ff-only'After the pull, apply the change in the relevant stack (each stack's README has details):
- Inference stacks (
vllm/,llama-cpp/) —cd /opt/<stack> && make up ENV=<variant>to (re)start with a variant. - Traefik — routing changes via Docker labels or
dynamic/*.ymlfiles are hot-reloaded;docker compose restart traefikonly iftraefik.ymlitself changed. - Other stacks —
cd /opt/<name> && docker compose up -d.
Host-local files outside git stay put across pulls — each stack's .env (secrets), inference envs/*.env variants, TLS material (*.crt/*.key), and *.bak backups are all gitignored.
- Image tags float by default, pin in production. The committed
.env.examplefiles use floating tags (latest, or a stable major-line likev2for Traefik, or the multi-arch floating tagserver-cudaforggml-org/llama.cpp) so a freshcp .env.example .envbootstraps to a working state without anyone having to look up the current release. For production deployments, override in your host-local.envwith a specific, reproducible pin — an immutable tag (v2.11.X,v0.20.2) when the registry publishes one, or a content-digest pin (server-cuda@sha256:…) when only floating tags exist. Each per-service.env.exampleshows the pin format inline. - Inference config split by scope.
<stack>/.envcarries host-wide values (image pin, HF cache path, HF token, default knobs);<stack>/envs/<name>.envcarries just the model selection plus per-variant overrides.make up ENV=<name>chains both viadocker compose --env-file .env --env-file envs/<name>.env up -d. Both files are gitignored — the templates live next to them as.env.example. - Loopback ports on inference stacks.
vllm/andllama-cpp/additionally bind their API to127.0.0.1on the host for direct curl / benchmarking — LAN traffic still flows through the proxy. - Permissions.
/opt/<stack>/isroot:root. The.envfiles areroot:docker 640so thedocker-group user reads them and runs compose without sudo. Editing configs requiressudo. - Supply chain. Every third-party Docker image is referenced by tag in
.env.example(floating for first-bootstrap convenience; pin in your host-local.envfor production). Every GitHub Action is pinned by commit SHA. Trivy scans push / PR / weekly cron; Dependabot keeps the SHA pins fresh with a weekly grouped PR.
CHANGELOG.md— Keep a Changelog format, SemVer versioning..github/workflows/trivy.yml— image CVE scans (HIGH+CRITICAL, fixed-only), IaC config scan, filesystem secret scan. Doc:.github/workflows/trivy.md..github/dependabot.yml— weekly grouped PR to bump pinned GitHub Action SHAs.LICENSE— MIT.sparky.svg— project mascot. Drawn by the AI that helped build this repo, as a self-portrait.