A one-command, production-grade local AI agent stack — OpenClaw + vLLM + bge-m3 multilingual embeddings + SearxNG private web search + hybrid (BM25 + vector) memory retrieval, wired together in a single
docker composefile.Calibrated for the NVIDIA GB10 "Grace-Blackwell" Superchip (NVIDIA DGX Spark, ASUS Ascent GB10) running Gemma 4 26B-A4B MoE NVFP4 and Gemma 4 31B IT NVFP4 dense side by side on separate ports (8004 / 8005, separate OpenClaw provider entries) — pick either model in the UI without restarting. Portable to other hardware — swap the LLM for whatever fits your GPU, or point OpenClaw at a cloud LLM API and keep everything else.
The default profile's tuning decisions — NVFP4 quantization, GPU memory split between LLM and embedding, FP8 KV cache, concurrency bands, context-window budgeting — are calibrated to the GB10 Superchip's specific hardware profile: 128 GB of unified LPDDR5X, 273 GB/s bandwidth, and native FP4 tensor-core acceleration (sm_120/sm_121). On a DGX Spark or ASUS Ascent GB10 you get those numbers out of the box. On other hardware everything except the LLM service is reusable as-is.
| You are… | What you get | Time to working stack |
|---|---|---|
| A GB10 owner (DGX Spark, ASUS Ascent GB10) | The calibrated reference profile. Boot the stack and run Gemma 4 26B-A4B MoE NVFP4 (~25 tok/s decode single-stream, ~112 tok/s aggregate at 4-paralel) with multilingual embeddings, hybrid memory, private web search, bilingual TTS — on your hardware, no cloud. The dense 31B is preserved as a profile-gated alternative for parity testing. | ~30 min, mostly model download |
| An x86_64 + NVIDIA GPU operator (RTX 4090, A6000, etc.) | Same wiring; swap vllm-llm for a model your VRAM holds (Gemma 4 12B BF16, Qwen 2.5, Llama 3.3). All non-LLM services transfer unchanged. |
~30 min + tuning |
| A cloud-LLM user (OpenAI, Anthropic, OpenRouter, Bedrock, remote vLLM) | Park the local LLM service, point three env vars at your hosted endpoint. You still get the local agent stack: bge-m3 embeddings, SearxNG private search, hybrid memory, dreaming, heartbeat, TTS. | ~10 min (no GPU) |
| A contributor or curious reader | A worked example of a deterministic, opinionated AI agent stack. Every wiring decision has a why in the comments; the patcher is small enough to read in one sitting. | n/a — start with docs/ARCHITECTURE.md |
If none of those rows describe you, this repo probably isn't your fit — it's optimized for self-hosting on real hardware (or a real cloud LLM), not for trying out a chatbot on a laptop.
A fully local agent platform (or local-plus-cloud-LLM hybrid — your choice), with:
| Component | What it does |
|---|---|
| Gemma 4 26B-A4B MoE (NVFP4) | 25.2B-total / 3.8B-active Google Gemma 4 mixture-of-experts model (128 experts, top-8 routing), quantized by NVIDIA to NVFP4 with the nvfp4_experts_only recipe. Native tool calling, 256K context, multimodal (text + image), ~25 tok/s decode single-stream on GB10 with Marlin SM121 backend + CUDA graphs (~4× faster than dense; ~112 tok/s aggregate at 4-parallel). |
| Gemma 4 31B IT (NVFP4) — concurrent dense | 31.3B dense Google Gemma 4 quantized to NVFP4. Runs side-by-side with the MoE on port 8005 (provider id vllm-dense), single-user / 256K context / ~6.9 tok/s decode profile. Pick either model in the OpenClaw UI without restarting; the dense backend exists for parity testing, multimodal-heavy workloads where dense quality matters, or as a fallback. |
| bge-m3 embeddings | BAAI/bge-m3 multilingual dense embeddings via vLLM. 100+ languages, 1024-dim, 8K context, EN↔HU cosine ≈ 0.88. |
| SearxNG meta-search | Self-hosted, privacy-respecting web search backend wired into OpenClaw's native webSearch provider. Strict engine whitelist (DuckDuckGo, Brave, Mojeek, Qwant, Startpage, Wikipedia family, Reddit, GitHub, arXiv) — queries never reach Google / Bing / Yandex / Yahoo / Baidu. |
| OpenClaw gateway | The open-source agent runtime: Chrome extension UI, CLI, persistent memory, heartbeat, multi-agent world-building. |
| Bilingual TTS surface | OpenAI-compatible /v1/audio/speech router fronting Kokoro 82M (English, Apache 2.0, ~500 MB-1 GB VRAM, ships by default) and an opt-in F5-TTS Hungarian backend (CC-BY-NC model weights — see below). Wired into OpenClaw via the sanctioned messages.tts.providers.openai baseUrl override. Diacritic-based autodetect re-routes Hungarian-text requests to the HU backend transparently when both are active. |
| Whisper STT (EN + HU) | OpenAI-compatible /v1/audio/transcriptions via Systran/faster-whisper-large-v3 on a self-built CUDA 13 image (~150 LOC FastAPI wrapper around faster-whisper — the upstream speaches image rejects Blackwell tensor-core compute types on sm_120, so we self-build to match the vllm-llm / openclaw-tts-en wheel pattern). ~3 GB VRAM, autodetects language (FLEURS Hungarian WER 14.1%). Wired into OpenClaw's tools.media.audio pipeline — voice-note uploads in the Control UI chat, Discord voice channels, the VoiceCall CLI, and Talk / Voicewake nodes all transcribe through this service. MIT wrapper + MIT Whisper weights. |
| Browser automation (opt-in) | OpenClaw's built-in browser tool attaches to a self-hosted Playwright Chromium cluster over Chrome DevTools Protocol — one warm Chromium per onboarded credential. 1x manual OAuth onboarding per service via a noVNC bridge (./bootstrap-browser-login.sh github-user1); afterwards the agent reaches authenticated content with no per-call re-auth until the upstream session expires (~14d GitHub, ~30d Notion, etc.). Activate via --profile browser. Apache 2.0. Limitation: passkey-only auth flows don't work over noVNC by W3C origin-bound spec — use password+TOTP or API tokens for those. Details in docs/reference/browser-automation.md. |
| Idempotent config patcher | A small Node script that makes your OpenClaw config deterministic — runs on every up, never clobbers onboarding choices it shouldn't. Wires hybrid (BM25 + vector) retrieval with MMR re-rank on top of memorySearch, flips the bundled SearxNG plugin on, points the openai TTS provider at the bundled router, upserts the STT entry into tools.media.audio.models[], and writes one browser.profiles.<name>.cdpUrl per registered Chromium profile. |
Everything lives in one Docker Compose file. No separate vLLM service definitions, no reverse-proxied DNS trickery, no host.docker.internal workarounds — containers reach each other by their compose service name on the default bridge network.
The reference profile (docker compose up -d with no edits) is designed and tested on:
- NVIDIA DGX Spark (GB10 Superchip, 128 GB unified LPDDR5X, 273 GB/s)
- ASUS Ascent GB10 (same GB10 Superchip, same memory architecture)
Works unchanged on any future workstation built around the GB10 Superchip — the stack doesn't depend on DGX- or ASUS-specific firmware, only on the Blackwell datacenter compute capabilities (sm_120/sm_121) and the GB10's 128 GB unified memory budget.
The reference profile won't boot as-is on non-GB10 hardware — vllm/vllm-openai:gemma4-cu130 and the NVFP4 model both need Blackwell FP4 kernels. Two supported alternatives, sketched briefly in docs/CUSTOMIZATION.md:
- Other NVIDIA GPU: switch to a stock vLLM image and a model that fits your VRAM (smaller Gemma 4 NVFP4 if you have a Blackwell desktop; Gemma 4 12B BF16 / Qwen 2.5 / Llama 3.3 elsewhere). The memory-split and concurrency constants in
.env.examplewill need re-tuning for your card. - Cloud LLM: park the local vLLM services behind
profiles: ["never"]and setOPENAI_BASE_URL/LLM_BASE_URL/EMBED_BASE_URLin.envto your hosted endpoints (cloud OpenAI-compatible API, remote vLLM on another box, etc.). bge-m3 stays local by default but can also be remoted. Everything downstream — gateway, SearxNG, hybrid retrieval, dreaming, heartbeat — is unchanged.
| Scenario | MoE 26B-A4B (default) | Dense 31B (alternative) |
|---|---|---|
| Decode throughput, 1 concurrent user | ~24.9 tok/s sustained (NVFP4 + Marlin MoE backend on SM121 + CUDA graphs, measured 2026-05-08) | ~6.9 tok/s sustained (NVFP4) |
| Aggregate throughput, 4 concurrent users | ~112 tok/s (~28 tok/s per user — continuous-batching + CUDA graphs amortize kernel-launch overhead) | n/a (single-stream profile) |
| Stable context window, 1 concurrent user | 256K reachable (prefill-bound past ~100K — 100K prompt + 200 gen ≈ 70s wall) | ~220K tokens before vLLM preemption |
| Stable context window, paged 4 simul users | ~4.3× concurrency at 256K (paging-runtime; full simul-256K = preempt) | ~110K tokens each, continuous batching |
| Model footprint | ~16.5 GB (NVFP4 weights, vision tower included) | ~17 GB (NVFP4 weights, vision tower included) |
| Vision prefill per image | ~280 vision tokens for a ≈ 512×512 region, sub-second encode (both) | — |
| First-boot cold start (after model download) | ~3–4 min from up to gateway-ready (both) |
— |
| KV cache | FP8 (halves cache footprint vs default BF16 KV cache) | FP8 (same) |
Numbers come from a DGX Spark with 128 GB unified LPDDR5X. Single-prompt streaming with a warm KV cache; throughput drops with longer contexts and more concurrent users. Re-tune the LLM_GPU_MEM_UTIL / LLM_MAX_NUM_SEQS constants in .env for other hardware.
Four shell commands plus one in-browser onboarding step — that's the minimal path to a working default-profile install. First-boot is two-phase by design (the gateway waits for explicit OpenClaw onboarding before applying the wiring); skip the heads-up below at your peril.
git clone https://github.com/chestercs/dgx-openclaw-stack.git
cd dgx-openclaw-stack
./bootstrap.sh # interactive, non-destructive, idempotent
docker compose up -d # 10 default services; gateway will crash-loop
# with "Missing config" until you onboard
# Phase 2 — open the OpenClaw Chrome extension OR run `openclaw setup` in a
# shell on the host, pair with `ws://<your-host>:18789` using the gateway
# token printed by bootstrap. Onboarding writes openclaw.json.
docker compose up -d --force-recreate openclaw-config-init openclaw-gateway openclaw-cliThat's it — the patcher applies all wiring steps and the gateway goes healthy. Two-phase fresh-install onboarding (gateway crash-loop → onboarding → patcher applies wiring) is the OpenClaw security model, not a bug; details in SETUP.md. If anything goes sideways, the symptoms map directly onto entries in docs/TROUBLESHOOTING.md.
This brings up the 10 default services (LLM MoE + dense + embedding + gateway + cli + config-init + searxng + tts-en + tts-router + stt-whisper). Several capabilities are opt-in profiles that don't start with the default up:
--profile hu— Hungarian TTS (F5-TTS, CC-BY-NC weights — see Hungarian TTS opt-in).--profile browser— Playwright Chromium for login-gated sites; per-credential 1× OAuth via the noVNC helper.--profile python— Python code-execution sandbox (MCP).- Image generation lives in a separate compose file and proxies to your existing ComfyUI install (the repo ships no model weights).
- Discord integration is a separate operator-side flow (Developer Portal app → bot token →
openclaw channels add). The patcher handles every Discord-related field automatically once you've created the channel; seedocs/discord-bot-setup.mdanddocs/reference/discord-config.md.
A more honest "what reproduces from a fresh clone vs what's manual" breakdown is in § Reproducibility from a fresh clone below.
┌──────────────────────────────────────────────────────────────────────┐
│ DGX Spark / ASUS GB10 │
│ │
│ Default profile (10 services, all on the compose bridge network) │
│ ┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐ │
│ │ vllm-llm │ │ vllm-llm-dense │ │ vllm-embedding │ │
│ │ :8004 (MoE 26B) │ │ :8005 (dense 31)│ │ :8005 (bge-m3) │ │
│ └────────▲────────┘ └────────▲────────┘ └────────▲────────┘ │
│ ┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐ │
│ │ searxng │ │ openclaw-tts-en │ │ openclaw-tts- │ │
│ │ :8080 privacy │ │ Kokoro 82M EN │ │ router :8090 │ │
│ │ meta-search │ │ Apache 2.0 │ │ OAI-compat seam │ │
│ └────────▲────────┘ └────────▲────────┘ └────────▲────────┘ │
│ ┌─────────────────┐ │
│ │ openclaw-stt- │ │
│ │ whisper :8093 │ │
│ │ faster-whisper │ │
│ └────────▲────────┘ │
│ │ compose DNS (service names) │
│ ┌────────┴──────────────────────────────────────────────────────┐ │
│ │ openclaw-gateway :18789 (only published port) │◀── Chrome ext.
│ │ ├ openclaw-config-init (one-shot patcher, runs every up) │◀── CLI
│ │ └ openclaw-cli (always-up, shares gateway netns) │ │
│ └───────────────────────────────────────────────────────────────┘ │
│ │
│ Opt-in profiles (parked unless explicitly enabled) │
│ ─ --profile hu → openclaw-tts-f5hun (F5-TTS HU, CC-BY-NC) │
│ ─ --profile browser → openclaw-browser (Chromium + noVNC) │
│ ─ --profile python → openclaw-python-sandbox (MCP exec) │
│ │
│ Separate compose (proxies to operator-side ComfyUI on host) │
│ ─ openclaw-image-comfyui/docker-compose.yml --profile image-gen │
└──────────────────────────────────────────────────────────────────────┘
All inter-container traffic is on the compose default bridge network; only port 18789 is published to the host. Put a reverse proxy (Nginx Proxy Manager, Caddy, Traefik, or a Cloudflared tunnel) in front for public access over wss://.
Deep dive: docs/ARCHITECTURE.md.
- One compose file for everything. LLM, embedding, web search, and agent stack in one
docker compose up -d. - NVFP4-native. Ships with the official
vllm/vllm-openai:gemma4-cu130image; no custom build required. - True tool calling. The shipped
tool_chat_template_gemma4.jinjaplus--tool-call-parser gemma4 --enable-auto-tool-choiceproduces OpenAI-formattool_calls. OpenClaw uses them directly. - Multimodal. Gemma 4's vision tower is included. Drop an image into the chat; the model reads it at ~280 tokens per image by default.
- Multilingual RAG built in. bge-m3 gives you high-quality cross-lingual embeddings for
memorySearchout of the box. - Hybrid retrieval + MMR.
memorySearchruns BM25 (SQLite FTS5) alongside vector similarity and re-ranks the candidate set with MMR for diversity — exact-keyword / ID matches stop falling through the cracks of pure cosine search. - Privacy-respecting web search. Self-hosted SearxNG wired into OpenClaw's native
webSearchtool. No commercial search API, no query leak to Google / Bing / Yandex. Strict engine whitelist (DuckDuckGo, Brave, Mojeek, Qwant, Startpage + Wikipedia / Reddit / GitHub / arXiv). - Bilingual self-hosted TTS. Kokoro 82M (English, Apache 2.0) ships by default and runs alongside the LLM on the same GB10 GPU. Optional Hungarian (F5-TTS, opt-in via
--profile hu) for fully local cross-language voice; details in Hungarian TTS opt-in. - Bilingual self-hosted STT.
Systran/faster-whisper-large-v3on a self-built CUDA 13 image (~150 LOC FastAPI wrapper — Blackwell compat ate the upstream speaches-ai image), autodetecting English and Hungarian (FLEURS HU WER 14.1%). ~3 GB VRAM at float16, wired into OpenClaw'stools.media.audiopipeline — voice-note uploads, Discord voice, VoiceCall CLI, and Talk/Voicewake nodes all transcribe through it. Details indocs/reference/stt-stack.md. - Optional FLUX-Krea-dev image generation. The
openclaw-image-comfyuiMCP bridge (opt-in via--profile image-gen, separate compose file) drives the operator's existing ComfyUI install throughflux-krea-2k(SFW, 1280×720 default) andflux-krea-2k-adult(same pipeline + flux-uncensored-v2 LoRA) workflow templates. The bridge ships no model weights; the recommended ~35 GB download is documented indocs/reference/image-comfyui-bridge.md. For 4K output, render at 2K native and upscale externally with ESRGAN — diffusion-based upscalers (SUPIR, UltimateSDUpscale tile pass) produce visible tile-seam artifacts on FLUX latents and were dropped. - Discord-ready out of the box. Once you create a bot in Discord's Developer Portal and run
openclaw channels add --channel discord, the patcher writes 11 production-tested Discord overrides automatically (progressive streaming, slash-command authz for issue #19310, tool-surface widening for the Discord-routed agent, cron + browser cheatsheets in the workspace). Every override is env-gated and individually disable-able — seedocs/reference/discord-config.mdfor the at-a-glance table. - Long context, honest numbers. 256K model max; realistic stable bands (per user count) are documented in the compose file.
- Idempotent configuration. The patcher re-applies a known-good state on every
up. Safe to run repeatedly. - Reverse-proxy ready.
gateway.trustedProxiesis pre-populated; add your LAN CIDR viaOPENCLAW_LAN_CIDRif needed. - Non-destructive bootstrap.
bootstrap.shnever overwrites an existing.envvalue or host directory.
dgx-openclaw-stack/
├─ docker-compose.yml # default + opt-in profiles (hu, browser, python)
├─ patch-config.mjs # idempotent OpenClaw config patcher (27+ steps,
│ # header docblock indexes every one)
├─ bootstrap.sh # non-destructive interactive first-time setup
├─ bootstrap-browser-login.sh # 1x OAuth onboarding helper (noVNC bridge)
├─ rotate-secrets.sh # rotate gateway / service tokens in place
├─ .env.example # documented env template (every tunable lives here)
├─ templates/
│ ├─ tool_chat_template_gemma4.jinja # Gemma 4 tool-call chat template
│ ├─ discord-text-agent/AGENTS.md.example # discord-friend agent template
│ └─ userscripts/ # web chat UI userscripts (opt-in)
├─ searxng/
│ └─ settings/settings.yml # SearxNG override: JSON API + strict engine whitelist
├─ vllm-llm/ # custom vLLM image (gemma4 tool-call parser patch
│ # for colon namespaces — see Dockerfile)
├─ openclaw-base-ext/ # local extension of the openclaw image (adds ffmpeg)
├─ openclaw-tts-en/ # English TTS service (Kokoro 82M, Apache 2.0)
├─ openclaw-tts-router/ # OpenAI-compat TTS router (passthrough + ffmpeg)
├─ openclaw-tts-f5hun/ # OPT-IN Hungarian TTS (--profile hu, CC-BY-NC weights)
├─ openclaw-stt-whisper/ # Self-built CUDA 13 STT image (faster-whisper)
├─ openclaw-browser/ # OPT-IN browser automation (--profile browser)
├─ openclaw-python-sandbox/ # OPT-IN Python MCP exec sandbox (--profile python)
├─ openclaw-image-comfyui/ # OPT-IN image-gen MCP bridge — SEPARATE compose file
│ # (proxies to operator's existing ComfyUI install)
├─ docs/
│ ├─ ARCHITECTURE.md # service-by-service design rationale
│ ├─ CUSTOMIZATION.md # model swaps, remote backends, hardware retuning
│ ├─ TROUBLESHOOTING.md # common failure modes and fixes
│ ├─ discord-bot-setup.md # zero-to-bot Discord Developer Portal walkthrough
│ └─ reference/ # deeper reference docs (15+ files — see reference/README.md)
├─ README.md # you are here — pitch + quickstart
├─ SETUP.md # end-user first-boot walkthrough
├─ CHANGELOG.md # versioned release notes
├─ CLAUDE.md # contributor / coding-agent guide
├─ CONTRIBUTING.md # how to file issues + send PRs
└─ LICENSE # MIT (model weights retain upstream licenses)
Honest scope: a git clone + ./bootstrap.sh + docker compose up -d followed
by the onboarding handshake brings up the 10 default services and the full
agent baseline (Gemma 4 MoE + dense, embedding, gateway, web search, EN TTS,
STT, hybrid memory, all 27+ patcher overrides). The advanced surfaces (Discord,
image-gen, Hungarian TTS, browser automation) need explicit operator steps —
each is documented but none is a one-command install. The table below is the
honest answer to "will my clone end up where the maintainer's deploy is?".
| Layer | Reproduces from compose up alone? |
What's needed beyond bootstrap |
|---|---|---|
| 10 default services (LLM MoE + dense, embedding, gateway, cli, config-init, searxng, tts-en, tts-router, stt-whisper) | ✅ after onboarding | Gateway is expected to crash-loop until you complete the Chrome-extension wizard or openclaw onboard — then re-run the patcher trio. SETUP.md §5–6b walks through this. |
| Gemma 4 NVFP4 weights | ❌ | HF account, accept the Gemma 4 license, put your hf_… token in .env. bootstrap.sh prompts for it. |
| Hungarian TTS (F5-TTS) | ❌ | --profile hu + accept the CC-BY-NC model license at build time. See Hungarian TTS opt-in. |
| Browser automation | ❌ | --profile browser, then per-credential noVNC OAuth via ./bootstrap-browser-login.sh <profile>. Each login is 1× manual (W3C origin-bound — passkeys don't work). |
| Python sandbox | ❌ | --profile python, secrets generated by bootstrap.sh. |
| Discord integration | ❌ | Discord Developer Portal (create app + bot token), openclaw channels add --channel discord, copy templates/discord-text-agent/AGENTS.md.example to workspace-discord/AGENTS.md. Walkthrough: docs/discord-bot-setup.md. The 11 patcher overrides (steps 20-30) auto-apply once the channel exists; the operator-tunable env knobs are catalogued in docs/reference/discord-config.md. |
| Image generation | ❌ | Separate compose at openclaw-image-comfyui/docker-compose.yml (--profile image-gen), plus your own ComfyUI install on host.docker.internal:13036 (or LAN IP), plus model weights of your choice (FLUX Krea / SDXL fine-tunes). The repo ships no weights. See docs/reference/image-comfyui-bridge.md. |
| Memory contents | ❌ (by design) | User's accumulated notes under workspace/memory/*.md are operator data, not code. Back up with tar czf openclaw-$(date +%F).tar.gz -C $OPENCLAW_CONFIG_DIR . |
What the repo does guarantee: bit-stable wiring of every service it ships,
deterministic patcher state on every up, pinned OPENCLAW_IMAGE_REF digest in
.env.example, and idempotent secret generation in bootstrap.sh. The
externals (HF model licences, Discord, browser OAuth, image-gen weights) are
externalised precisely because they're decisions the operator must make — not
oversights.
The English TTS surface (Kokoro 82M, Apache 2.0) ships in the default profile
and is safe for any usage. Hungarian TTS is opt-in because the only
production-grade open-weights Hungarian TTS at the time of writing — the
sarpba/F5-TTS_V1_hun_v2 fine-tune of F5-TTS — is distributed under
CC-BY-NC-4.0 (Creative Commons, non-commercial only).
The wrapper code in openclaw-tts-f5hun/ is MIT (matches the rest of this
repo). The model weights are pulled from HuggingFace at build time — by
building the image you accept the upstream model license. This repo ships
no model weights of any kind.
To activate Hungarian TTS, the easiest path is to re-run bootstrap.sh —
it now prompts to opt in, generates F5HUN_API_TOKEN, sets F5HUN_URL to the
in-compose service, and adds COMPOSE_PROFILES=hu to .env. Then:
docker compose --profile hu up -d --build openclaw-tts-f5hunOr by hand: uncomment the three lines in the "Optional: Hungarian TTS" block
in .env.example, fill in F5HUN_API_TOKEN (openssl rand -base64 64), and
either set COMPOSE_PROFILES=hu in .env or pass --profile hu on the
docker compose command line.
Once active, the router exposes default_hu / hu_diana voice ids, and the
diacritic-based autodetect silently re-routes Hungarian-text requests
(detected by áéíóöőúüű) to the HU backend even when OpenClaw asks for an
English default voice like coral. The autodetect is a no-op when the HU
profile is not active.
For commercial Hungarian deployments, override F5_CHECKPOINT / F5_VOCAB
on the openclaw-tts-f5hun service to point at a checkpoint with a fitting
license. Details + voice catalog in
openclaw-tts-f5hun/README.md.
Running a useful local (or hybrid) agent on top of OpenClaw + vLLM is trickier than the surface picture suggests:
- The OpenClaw onboarding wizard doesn't register NVFP4 models against a self-hosted vLLM provider, leaves
memorySearchdisabled, ships an emptygateway.trustedProxies, and writes a placeholder API key — all of which silently break things later. - Gemma 4 tool calling requires a specific chat template that isn't in the official vLLM image, plus a one-line fix to the upstream
gemma4tool-call parser so colon namespaces likediscord:add_reactionaren't rejected by the regex — both ship as part of the localvllm-llm/image build. - The bundled OpenClaw
searxngplugin ships default-disabled —webSearchlooks wired up but doesn't actually fire until you flip it on. - Hybrid (BM25 + vector) retrieval and MMR re-rank are native OpenClaw features but aren't on by default.
- Discord slash commands silently fail in guilds because of an upstream dual-permission check; the auto-ack reaction has a known cycle bug; the
codingtool profile (default for non-main agents) excludesbrowser,tts, andcanvasso a Discord-routed agent can't reach for half the tools the main agent uses. The patcher fixes all three by default — and every override is env-gated so you can disable any of them. - On GB10 specifically, unified-memory GPU budgeting between two concurrent vLLM processes needs care (
LLM_GPU_MEM_UTILvsEMBED_GPU_MEM_UTIL).
This repo captures a known-good wiring for all of the above in a single deterministic docker compose up. The patch-config.mjs patcher re-applies its 27+ steps on every restart so the wiring survives onboarding-wizard reruns, image upgrades, and manual edits — every step is logged with a [patch-config] line and gated by user-managed protection (your hand-edits to openclaw.json are preserved).
MIT. Model weights retain their upstream licenses:
- Gemma 4: Apache 2.0
- bge-m3: MIT
- Kokoro 82M (English TTS, default): Apache 2.0
- F5-TTS Hungarian (
sarpba/F5-TTS_V1_hun_v2, opt-in): CC-BY-NC-4.0 — non-commercial use only. The HU service block is parked behind a Docker Compose profile and does not start by default; building it triggers the weight download and constitutes acceptance of the upstream model license. Seeopenclaw-tts-f5hun/README.mdfor details on swapping the checkpoint for one with a commercial license.
Pull requests welcome. See docs/CUSTOMIZATION.md for the extension points that matter (model swap, quantization swap, custom agents). For issues that aren't about this stack itself, please file them upstream at vllm-project/vllm or openclaw/openclaw.