Skip to content

chestercs/dgx-openclaw-stack

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

202 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

DGX OpenClaw Stack

A one-command, production-grade local AI agent stack — OpenClaw + vLLM + bge-m3 multilingual embeddings + SearxNG private web search + hybrid (BM25 + vector) memory retrieval, wired together in a single docker compose file.

Calibrated for the NVIDIA GB10 "Grace-Blackwell" Superchip (NVIDIA DGX Spark, ASUS Ascent GB10) running Gemma 4 26B-A4B MoE NVFP4 and Gemma 4 31B IT NVFP4 dense side by side on separate ports (8004 / 8005, separate OpenClaw provider entries) — pick either model in the UI without restarting. Portable to other hardware — swap the LLM for whatever fits your GPU, or point OpenClaw at a cloud LLM API and keep everything else.

The default profile's tuning decisions — NVFP4 quantization, GPU memory split between LLM and embedding, FP8 KV cache, concurrency bands, context-window budgeting — are calibrated to the GB10 Superchip's specific hardware profile: 128 GB of unified LPDDR5X, 273 GB/s bandwidth, and native FP4 tensor-core acceleration (sm_120/sm_121). On a DGX Spark or ASUS Ascent GB10 you get those numbers out of the box. On other hardware everything except the LLM service is reusable as-is.

Docker Compose vLLM Gemma 4 OpenClaw Hardware License: MIT


Who this is for

You are… What you get Time to working stack
A GB10 owner (DGX Spark, ASUS Ascent GB10) The calibrated reference profile. Boot the stack and run Gemma 4 26B-A4B MoE NVFP4 (~25 tok/s decode single-stream, ~112 tok/s aggregate at 4-paralel) with multilingual embeddings, hybrid memory, private web search, bilingual TTS — on your hardware, no cloud. The dense 31B is preserved as a profile-gated alternative for parity testing. ~30 min, mostly model download
An x86_64 + NVIDIA GPU operator (RTX 4090, A6000, etc.) Same wiring; swap vllm-llm for a model your VRAM holds (Gemma 4 12B BF16, Qwen 2.5, Llama 3.3). All non-LLM services transfer unchanged. ~30 min + tuning
A cloud-LLM user (OpenAI, Anthropic, OpenRouter, Bedrock, remote vLLM) Park the local LLM service, point three env vars at your hosted endpoint. You still get the local agent stack: bge-m3 embeddings, SearxNG private search, hybrid memory, dreaming, heartbeat, TTS. ~10 min (no GPU)
A contributor or curious reader A worked example of a deterministic, opinionated AI agent stack. Every wiring decision has a why in the comments; the patcher is small enough to read in one sitting. n/a — start with docs/ARCHITECTURE.md

If none of those rows describe you, this repo probably isn't your fit — it's optimized for self-hosting on real hardware (or a real cloud LLM), not for trying out a chatbot on a laptop.

What you get

A fully local agent platform (or local-plus-cloud-LLM hybrid — your choice), with:

Component What it does
Gemma 4 26B-A4B MoE (NVFP4) 25.2B-total / 3.8B-active Google Gemma 4 mixture-of-experts model (128 experts, top-8 routing), quantized by NVIDIA to NVFP4 with the nvfp4_experts_only recipe. Native tool calling, 256K context, multimodal (text + image), ~25 tok/s decode single-stream on GB10 with Marlin SM121 backend + CUDA graphs (~4× faster than dense; ~112 tok/s aggregate at 4-parallel).
Gemma 4 31B IT (NVFP4) — concurrent dense 31.3B dense Google Gemma 4 quantized to NVFP4. Runs side-by-side with the MoE on port 8005 (provider id vllm-dense), single-user / 256K context / ~6.9 tok/s decode profile. Pick either model in the OpenClaw UI without restarting; the dense backend exists for parity testing, multimodal-heavy workloads where dense quality matters, or as a fallback.
bge-m3 embeddings BAAI/bge-m3 multilingual dense embeddings via vLLM. 100+ languages, 1024-dim, 8K context, EN↔HU cosine ≈ 0.88.
SearxNG meta-search Self-hosted, privacy-respecting web search backend wired into OpenClaw's native webSearch provider. Strict engine whitelist (DuckDuckGo, Brave, Mojeek, Qwant, Startpage, Wikipedia family, Reddit, GitHub, arXiv) — queries never reach Google / Bing / Yandex / Yahoo / Baidu.
OpenClaw gateway The open-source agent runtime: Chrome extension UI, CLI, persistent memory, heartbeat, multi-agent world-building.
Bilingual TTS surface OpenAI-compatible /v1/audio/speech router fronting Kokoro 82M (English, Apache 2.0, ~500 MB-1 GB VRAM, ships by default) and an opt-in F5-TTS Hungarian backend (CC-BY-NC model weights — see below). Wired into OpenClaw via the sanctioned messages.tts.providers.openai baseUrl override. Diacritic-based autodetect re-routes Hungarian-text requests to the HU backend transparently when both are active.
Whisper STT (EN + HU) OpenAI-compatible /v1/audio/transcriptions via Systran/faster-whisper-large-v3 on a self-built CUDA 13 image (~150 LOC FastAPI wrapper around faster-whisper — the upstream speaches image rejects Blackwell tensor-core compute types on sm_120, so we self-build to match the vllm-llm / openclaw-tts-en wheel pattern). ~3 GB VRAM, autodetects language (FLEURS Hungarian WER 14.1%). Wired into OpenClaw's tools.media.audio pipeline — voice-note uploads in the Control UI chat, Discord voice channels, the VoiceCall CLI, and Talk / Voicewake nodes all transcribe through this service. MIT wrapper + MIT Whisper weights.
Browser automation (opt-in) OpenClaw's built-in browser tool attaches to a self-hosted Playwright Chromium cluster over Chrome DevTools Protocol — one warm Chromium per onboarded credential. 1x manual OAuth onboarding per service via a noVNC bridge (./bootstrap-browser-login.sh github-user1); afterwards the agent reaches authenticated content with no per-call re-auth until the upstream session expires (~14d GitHub, ~30d Notion, etc.). Activate via --profile browser. Apache 2.0. Limitation: passkey-only auth flows don't work over noVNC by W3C origin-bound spec — use password+TOTP or API tokens for those. Details in docs/reference/browser-automation.md.
Idempotent config patcher A small Node script that makes your OpenClaw config deterministic — runs on every up, never clobbers onboarding choices it shouldn't. Wires hybrid (BM25 + vector) retrieval with MMR re-rank on top of memorySearch, flips the bundled SearxNG plugin on, points the openai TTS provider at the bundled router, upserts the STT entry into tools.media.audio.models[], and writes one browser.profiles.<name>.cdpUrl per registered Chromium profile.

Everything lives in one Docker Compose file. No separate vLLM service definitions, no reverse-proxied DNS trickery, no host.docker.internal workarounds — containers reach each other by their compose service name on the default bridge network.

Hardware targets

The reference profile (docker compose up -d with no edits) is designed and tested on:

  • NVIDIA DGX Spark (GB10 Superchip, 128 GB unified LPDDR5X, 273 GB/s)
  • ASUS Ascent GB10 (same GB10 Superchip, same memory architecture)

Works unchanged on any future workstation built around the GB10 Superchip — the stack doesn't depend on DGX- or ASUS-specific firmware, only on the Blackwell datacenter compute capabilities (sm_120/sm_121) and the GB10's 128 GB unified memory budget.

The reference profile won't boot as-is on non-GB10 hardwarevllm/vllm-openai:gemma4-cu130 and the NVFP4 model both need Blackwell FP4 kernels. Two supported alternatives, sketched briefly in docs/CUSTOMIZATION.md:

  • Other NVIDIA GPU: switch to a stock vLLM image and a model that fits your VRAM (smaller Gemma 4 NVFP4 if you have a Blackwell desktop; Gemma 4 12B BF16 / Qwen 2.5 / Llama 3.3 elsewhere). The memory-split and concurrency constants in .env.example will need re-tuning for your card.
  • Cloud LLM: park the local vLLM services behind profiles: ["never"] and set OPENAI_BASE_URL / LLM_BASE_URL / EMBED_BASE_URL in .env to your hosted endpoints (cloud OpenAI-compatible API, remote vLLM on another box, etc.). bge-m3 stays local by default but can also be remoted. Everything downstream — gateway, SearxNG, hybrid retrieval, dreaming, heartbeat — is unchanged.

Performance (measured on GB10, single-shot generation)

Scenario MoE 26B-A4B (default) Dense 31B (alternative)
Decode throughput, 1 concurrent user ~24.9 tok/s sustained (NVFP4 + Marlin MoE backend on SM121 + CUDA graphs, measured 2026-05-08) ~6.9 tok/s sustained (NVFP4)
Aggregate throughput, 4 concurrent users ~112 tok/s (~28 tok/s per user — continuous-batching + CUDA graphs amortize kernel-launch overhead) n/a (single-stream profile)
Stable context window, 1 concurrent user 256K reachable (prefill-bound past ~100K — 100K prompt + 200 gen ≈ 70s wall) ~220K tokens before vLLM preemption
Stable context window, paged 4 simul users ~4.3× concurrency at 256K (paging-runtime; full simul-256K = preempt) ~110K tokens each, continuous batching
Model footprint ~16.5 GB (NVFP4 weights, vision tower included) ~17 GB (NVFP4 weights, vision tower included)
Vision prefill per image ~280 vision tokens for a ≈ 512×512 region, sub-second encode (both)
First-boot cold start (after model download) ~3–4 min from up to gateway-ready (both)
KV cache FP8 (halves cache footprint vs default BF16 KV cache) FP8 (same)

Numbers come from a DGX Spark with 128 GB unified LPDDR5X. Single-prompt streaming with a warm KV cache; throughput drops with longer contexts and more concurrent users. Re-tune the LLM_GPU_MEM_UTIL / LLM_MAX_NUM_SEQS constants in .env for other hardware.

Quickstart

Four shell commands plus one in-browser onboarding step — that's the minimal path to a working default-profile install. First-boot is two-phase by design (the gateway waits for explicit OpenClaw onboarding before applying the wiring); skip the heads-up below at your peril.

git clone https://github.com/chestercs/dgx-openclaw-stack.git
cd dgx-openclaw-stack

./bootstrap.sh                              # interactive, non-destructive, idempotent
docker compose up -d                        # 10 default services; gateway will crash-loop
                                            #   with "Missing config" until you onboard

# Phase 2 — open the OpenClaw Chrome extension OR run `openclaw setup` in a
# shell on the host, pair with `ws://<your-host>:18789` using the gateway
# token printed by bootstrap. Onboarding writes openclaw.json.

docker compose up -d --force-recreate openclaw-config-init openclaw-gateway openclaw-cli

That's it — the patcher applies all wiring steps and the gateway goes healthy. Two-phase fresh-install onboarding (gateway crash-loop → onboarding → patcher applies wiring) is the OpenClaw security model, not a bug; details in SETUP.md. If anything goes sideways, the symptoms map directly onto entries in docs/TROUBLESHOOTING.md.

This brings up the 10 default services (LLM MoE + dense + embedding + gateway + cli + config-init + searxng + tts-en + tts-router + stt-whisper). Several capabilities are opt-in profiles that don't start with the default up:

  • --profile hu — Hungarian TTS (F5-TTS, CC-BY-NC weights — see Hungarian TTS opt-in).
  • --profile browser — Playwright Chromium for login-gated sites; per-credential 1× OAuth via the noVNC helper.
  • --profile python — Python code-execution sandbox (MCP).
  • Image generation lives in a separate compose file and proxies to your existing ComfyUI install (the repo ships no model weights).
  • Discord integration is a separate operator-side flow (Developer Portal app → bot token → openclaw channels add). The patcher handles every Discord-related field automatically once you've created the channel; see docs/discord-bot-setup.md and docs/reference/discord-config.md.

A more honest "what reproduces from a fresh clone vs what's manual" breakdown is in § Reproducibility from a fresh clone below.

Architecture at a glance

    ┌──────────────────────────────────────────────────────────────────────┐
    │  DGX Spark / ASUS GB10                                               │
    │                                                                       │
    │  Default profile (10 services, all on the compose bridge network)    │
    │  ┌─────────────────┐  ┌─────────────────┐  ┌─────────────────┐       │
    │  │ vllm-llm        │  │ vllm-llm-dense  │  │ vllm-embedding  │       │
    │  │ :8004 (MoE 26B) │  │ :8005 (dense 31)│  │ :8005 (bge-m3)  │       │
    │  └────────▲────────┘  └────────▲────────┘  └────────▲────────┘       │
    │  ┌─────────────────┐  ┌─────────────────┐  ┌─────────────────┐       │
    │  │ searxng         │  │ openclaw-tts-en │  │ openclaw-tts-   │       │
    │  │ :8080 privacy   │  │ Kokoro 82M EN   │  │ router :8090    │       │
    │  │ meta-search     │  │ Apache 2.0      │  │ OAI-compat seam │       │
    │  └────────▲────────┘  └────────▲────────┘  └────────▲────────┘       │
    │  ┌─────────────────┐                                                 │
    │  │ openclaw-stt-   │                                                 │
    │  │ whisper :8093   │                                                 │
    │  │ faster-whisper  │                                                 │
    │  └────────▲────────┘                                                 │
    │           │ compose DNS (service names)                              │
    │  ┌────────┴──────────────────────────────────────────────────────┐   │
    │  │ openclaw-gateway          :18789 (only published port)        │◀── Chrome ext.
    │  │   ├ openclaw-config-init  (one-shot patcher, runs every up)   │◀── CLI
    │  │   └ openclaw-cli          (always-up, shares gateway netns)   │   │
    │  └───────────────────────────────────────────────────────────────┘   │
    │                                                                       │
    │  Opt-in profiles (parked unless explicitly enabled)                  │
    │  ─ --profile hu       → openclaw-tts-f5hun (F5-TTS HU, CC-BY-NC)     │
    │  ─ --profile browser  → openclaw-browser   (Chromium + noVNC)        │
    │  ─ --profile python   → openclaw-python-sandbox (MCP exec)           │
    │                                                                       │
    │  Separate compose (proxies to operator-side ComfyUI on host)         │
    │  ─ openclaw-image-comfyui/docker-compose.yml --profile image-gen     │
    └──────────────────────────────────────────────────────────────────────┘

All inter-container traffic is on the compose default bridge network; only port 18789 is published to the host. Put a reverse proxy (Nginx Proxy Manager, Caddy, Traefik, or a Cloudflared tunnel) in front for public access over wss://.

Deep dive: docs/ARCHITECTURE.md.

Features

  • One compose file for everything. LLM, embedding, web search, and agent stack in one docker compose up -d.
  • NVFP4-native. Ships with the official vllm/vllm-openai:gemma4-cu130 image; no custom build required.
  • True tool calling. The shipped tool_chat_template_gemma4.jinja plus --tool-call-parser gemma4 --enable-auto-tool-choice produces OpenAI-format tool_calls. OpenClaw uses them directly.
  • Multimodal. Gemma 4's vision tower is included. Drop an image into the chat; the model reads it at ~280 tokens per image by default.
  • Multilingual RAG built in. bge-m3 gives you high-quality cross-lingual embeddings for memorySearch out of the box.
  • Hybrid retrieval + MMR. memorySearch runs BM25 (SQLite FTS5) alongside vector similarity and re-ranks the candidate set with MMR for diversity — exact-keyword / ID matches stop falling through the cracks of pure cosine search.
  • Privacy-respecting web search. Self-hosted SearxNG wired into OpenClaw's native webSearch tool. No commercial search API, no query leak to Google / Bing / Yandex. Strict engine whitelist (DuckDuckGo, Brave, Mojeek, Qwant, Startpage + Wikipedia / Reddit / GitHub / arXiv).
  • Bilingual self-hosted TTS. Kokoro 82M (English, Apache 2.0) ships by default and runs alongside the LLM on the same GB10 GPU. Optional Hungarian (F5-TTS, opt-in via --profile hu) for fully local cross-language voice; details in Hungarian TTS opt-in.
  • Bilingual self-hosted STT. Systran/faster-whisper-large-v3 on a self-built CUDA 13 image (~150 LOC FastAPI wrapper — Blackwell compat ate the upstream speaches-ai image), autodetecting English and Hungarian (FLEURS HU WER 14.1%). ~3 GB VRAM at float16, wired into OpenClaw's tools.media.audio pipeline — voice-note uploads, Discord voice, VoiceCall CLI, and Talk/Voicewake nodes all transcribe through it. Details in docs/reference/stt-stack.md.
  • Optional FLUX-Krea-dev image generation. The openclaw-image-comfyui MCP bridge (opt-in via --profile image-gen, separate compose file) drives the operator's existing ComfyUI install through flux-krea-2k (SFW, 1280×720 default) and flux-krea-2k-adult (same pipeline + flux-uncensored-v2 LoRA) workflow templates. The bridge ships no model weights; the recommended ~35 GB download is documented in docs/reference/image-comfyui-bridge.md. For 4K output, render at 2K native and upscale externally with ESRGAN — diffusion-based upscalers (SUPIR, UltimateSDUpscale tile pass) produce visible tile-seam artifacts on FLUX latents and were dropped.
  • Discord-ready out of the box. Once you create a bot in Discord's Developer Portal and run openclaw channels add --channel discord, the patcher writes 11 production-tested Discord overrides automatically (progressive streaming, slash-command authz for issue #19310, tool-surface widening for the Discord-routed agent, cron + browser cheatsheets in the workspace). Every override is env-gated and individually disable-able — see docs/reference/discord-config.md for the at-a-glance table.
  • Long context, honest numbers. 256K model max; realistic stable bands (per user count) are documented in the compose file.
  • Idempotent configuration. The patcher re-applies a known-good state on every up. Safe to run repeatedly.
  • Reverse-proxy ready. gateway.trustedProxies is pre-populated; add your LAN CIDR via OPENCLAW_LAN_CIDR if needed.
  • Non-destructive bootstrap. bootstrap.sh never overwrites an existing .env value or host directory.

Repository layout

dgx-openclaw-stack/
├─ docker-compose.yml           # default + opt-in profiles (hu, browser, python)
├─ patch-config.mjs             # idempotent OpenClaw config patcher (27+ steps,
│                               #   header docblock indexes every one)
├─ bootstrap.sh                 # non-destructive interactive first-time setup
├─ bootstrap-browser-login.sh   # 1x OAuth onboarding helper (noVNC bridge)
├─ rotate-secrets.sh            # rotate gateway / service tokens in place
├─ .env.example                 # documented env template (every tunable lives here)
├─ templates/
│  ├─ tool_chat_template_gemma4.jinja        # Gemma 4 tool-call chat template
│  ├─ discord-text-agent/AGENTS.md.example   # discord-friend agent template
│  └─ userscripts/                            # web chat UI userscripts (opt-in)
├─ searxng/
│  └─ settings/settings.yml     # SearxNG override: JSON API + strict engine whitelist
├─ vllm-llm/                    # custom vLLM image (gemma4 tool-call parser patch
│                               #   for colon namespaces — see Dockerfile)
├─ openclaw-base-ext/           # local extension of the openclaw image (adds ffmpeg)
├─ openclaw-tts-en/             # English TTS service (Kokoro 82M, Apache 2.0)
├─ openclaw-tts-router/         # OpenAI-compat TTS router (passthrough + ffmpeg)
├─ openclaw-tts-f5hun/          # OPT-IN Hungarian TTS (--profile hu, CC-BY-NC weights)
├─ openclaw-stt-whisper/        # Self-built CUDA 13 STT image (faster-whisper)
├─ openclaw-browser/            # OPT-IN browser automation (--profile browser)
├─ openclaw-python-sandbox/     # OPT-IN Python MCP exec sandbox (--profile python)
├─ openclaw-image-comfyui/      # OPT-IN image-gen MCP bridge — SEPARATE compose file
│                               #   (proxies to operator's existing ComfyUI install)
├─ docs/
│  ├─ ARCHITECTURE.md           # service-by-service design rationale
│  ├─ CUSTOMIZATION.md          # model swaps, remote backends, hardware retuning
│  ├─ TROUBLESHOOTING.md        # common failure modes and fixes
│  ├─ discord-bot-setup.md      # zero-to-bot Discord Developer Portal walkthrough
│  └─ reference/                # deeper reference docs (15+ files — see reference/README.md)
├─ README.md                    # you are here — pitch + quickstart
├─ SETUP.md                     # end-user first-boot walkthrough
├─ CHANGELOG.md                 # versioned release notes
├─ CLAUDE.md                    # contributor / coding-agent guide
├─ CONTRIBUTING.md              # how to file issues + send PRs
└─ LICENSE                      # MIT (model weights retain upstream licenses)

Reproducibility from a fresh clone

Honest scope: a git clone + ./bootstrap.sh + docker compose up -d followed by the onboarding handshake brings up the 10 default services and the full agent baseline (Gemma 4 MoE + dense, embedding, gateway, web search, EN TTS, STT, hybrid memory, all 27+ patcher overrides). The advanced surfaces (Discord, image-gen, Hungarian TTS, browser automation) need explicit operator steps — each is documented but none is a one-command install. The table below is the honest answer to "will my clone end up where the maintainer's deploy is?".

Layer Reproduces from compose up alone? What's needed beyond bootstrap
10 default services (LLM MoE + dense, embedding, gateway, cli, config-init, searxng, tts-en, tts-router, stt-whisper) ✅ after onboarding Gateway is expected to crash-loop until you complete the Chrome-extension wizard or openclaw onboard — then re-run the patcher trio. SETUP.md §5–6b walks through this.
Gemma 4 NVFP4 weights HF account, accept the Gemma 4 license, put your hf_… token in .env. bootstrap.sh prompts for it.
Hungarian TTS (F5-TTS) --profile hu + accept the CC-BY-NC model license at build time. See Hungarian TTS opt-in.
Browser automation --profile browser, then per-credential noVNC OAuth via ./bootstrap-browser-login.sh <profile>. Each login is 1× manual (W3C origin-bound — passkeys don't work).
Python sandbox --profile python, secrets generated by bootstrap.sh.
Discord integration Discord Developer Portal (create app + bot token), openclaw channels add --channel discord, copy templates/discord-text-agent/AGENTS.md.example to workspace-discord/AGENTS.md. Walkthrough: docs/discord-bot-setup.md. The 11 patcher overrides (steps 20-30) auto-apply once the channel exists; the operator-tunable env knobs are catalogued in docs/reference/discord-config.md.
Image generation Separate compose at openclaw-image-comfyui/docker-compose.yml (--profile image-gen), plus your own ComfyUI install on host.docker.internal:13036 (or LAN IP), plus model weights of your choice (FLUX Krea / SDXL fine-tunes). The repo ships no weights. See docs/reference/image-comfyui-bridge.md.
Memory contents ❌ (by design) User's accumulated notes under workspace/memory/*.md are operator data, not code. Back up with tar czf openclaw-$(date +%F).tar.gz -C $OPENCLAW_CONFIG_DIR .

What the repo does guarantee: bit-stable wiring of every service it ships, deterministic patcher state on every up, pinned OPENCLAW_IMAGE_REF digest in .env.example, and idempotent secret generation in bootstrap.sh. The externals (HF model licences, Discord, browser OAuth, image-gen weights) are externalised precisely because they're decisions the operator must make — not oversights.

Hungarian TTS opt-in (CC-BY-NC)

The English TTS surface (Kokoro 82M, Apache 2.0) ships in the default profile and is safe for any usage. Hungarian TTS is opt-in because the only production-grade open-weights Hungarian TTS at the time of writing — the sarpba/F5-TTS_V1_hun_v2 fine-tune of F5-TTS — is distributed under CC-BY-NC-4.0 (Creative Commons, non-commercial only).

The wrapper code in openclaw-tts-f5hun/ is MIT (matches the rest of this repo). The model weights are pulled from HuggingFace at build time — by building the image you accept the upstream model license. This repo ships no model weights of any kind.

To activate Hungarian TTS, the easiest path is to re-run bootstrap.sh — it now prompts to opt in, generates F5HUN_API_TOKEN, sets F5HUN_URL to the in-compose service, and adds COMPOSE_PROFILES=hu to .env. Then:

docker compose --profile hu up -d --build openclaw-tts-f5hun

Or by hand: uncomment the three lines in the "Optional: Hungarian TTS" block in .env.example, fill in F5HUN_API_TOKEN (openssl rand -base64 64), and either set COMPOSE_PROFILES=hu in .env or pass --profile hu on the docker compose command line.

Once active, the router exposes default_hu / hu_diana voice ids, and the diacritic-based autodetect silently re-routes Hungarian-text requests (detected by áéíóöőúüű) to the HU backend even when OpenClaw asks for an English default voice like coral. The autodetect is a no-op when the HU profile is not active.

For commercial Hungarian deployments, override F5_CHECKPOINT / F5_VOCAB on the openclaw-tts-f5hun service to point at a checkpoint with a fitting license. Details + voice catalog in openclaw-tts-f5hun/README.md.

Why this stack

Running a useful local (or hybrid) agent on top of OpenClaw + vLLM is trickier than the surface picture suggests:

  • The OpenClaw onboarding wizard doesn't register NVFP4 models against a self-hosted vLLM provider, leaves memorySearch disabled, ships an empty gateway.trustedProxies, and writes a placeholder API key — all of which silently break things later.
  • Gemma 4 tool calling requires a specific chat template that isn't in the official vLLM image, plus a one-line fix to the upstream gemma4 tool-call parser so colon namespaces like discord:add_reaction aren't rejected by the regex — both ship as part of the local vllm-llm/ image build.
  • The bundled OpenClaw searxng plugin ships default-disabledwebSearch looks wired up but doesn't actually fire until you flip it on.
  • Hybrid (BM25 + vector) retrieval and MMR re-rank are native OpenClaw features but aren't on by default.
  • Discord slash commands silently fail in guilds because of an upstream dual-permission check; the auto-ack reaction has a known cycle bug; the coding tool profile (default for non-main agents) excludes browser, tts, and canvas so a Discord-routed agent can't reach for half the tools the main agent uses. The patcher fixes all three by default — and every override is env-gated so you can disable any of them.
  • On GB10 specifically, unified-memory GPU budgeting between two concurrent vLLM processes needs care (LLM_GPU_MEM_UTIL vs EMBED_GPU_MEM_UTIL).

This repo captures a known-good wiring for all of the above in a single deterministic docker compose up. The patch-config.mjs patcher re-applies its 27+ steps on every restart so the wiring survives onboarding-wizard reruns, image upgrades, and manual edits — every step is logged with a [patch-config] line and gated by user-managed protection (your hand-edits to openclaw.json are preserved).

License

MIT. Model weights retain their upstream licenses:

  • Gemma 4: Apache 2.0
  • bge-m3: MIT
  • Kokoro 82M (English TTS, default): Apache 2.0
  • F5-TTS Hungarian (sarpba/F5-TTS_V1_hun_v2, opt-in): CC-BY-NC-4.0 — non-commercial use only. The HU service block is parked behind a Docker Compose profile and does not start by default; building it triggers the weight download and constitutes acceptance of the upstream model license. See openclaw-tts-f5hun/README.md for details on swapping the checkpoint for one with a commercial license.

Contributing

Pull requests welcome. See docs/CUSTOMIZATION.md for the extension points that matter (model swap, quantization swap, custom agents). For issues that aren't about this stack itself, please file them upstream at vllm-project/vllm or openclaw/openclaw.