title	Quickstart
description	From zero to first voice turn in 15 minutes.

Quickstart

Get Dotty talking in 15 minutes. This is the single opinionated happy path -- see SETUP.md for build-from-source and alternative configurations.

What you need

Item	Notes
M5Stack CoreS3 + StackChan servo kit	The robot. See hardware.md for details.
Linux or macOS host with Docker	Runs all four server-side containers. Any distro works. No GPU required for the default stack — see Server hardware below.
2.4 GHz WiFi	The ESP32-S3 does not support 5 GHz.

Server hardware

The default stack is CPU-only — no GPU is required. The voice pipeline ships with FunASR SenseVoiceSmall for ASR and Piper (LocalPiper) for TTS, both of which run comfortably on a modern multi-core x86-64 or Apple Silicon CPU.

Scenario	Needs a GPU?	Notes
Default (FunASR ASR + LocalPiper TTS + a cloud LLM via OpenRouter/OpenAI-compatible key)	No	Any 64-bit Linux/macOS host with Docker and ~4 GB free RAM. This is the Quickstart happy path.
`WhisperLocal` ASR instead of FunASR	Yes	`faster-whisper` float16 needs CUDA. This is the only reason the Quickstart compose file carries a `runtime: nvidia` block.
Self-hosting the LLM locally (Ollama / llama-swap instead of a cloud key)	Recommended	VRAM scales with the model — roughly ~5 GB for an 8B model, ~18 GB for a 30B. See run-fully-local.md and llama-swap-concurrent-models.md. CPU-only inference works but is slow.

You do not have to touch the GPU config manually. make setup (step 4) auto-detects the NVIDIA Docker runtime: if it's present, setup selects WhisperLocal on the GPU; if it's absent, setup selects FunASR on the CPU and strips the runtime: nvidia / NVIDIA_* blocks out of the rendered docker-compose.yml. The # --- BEGIN/END CUDA BLOCK --- markers in docker-compose.yml.template exist for exactly this.

If you render the compose file by hand instead of running make setup, delete the two marked sections (# --- BEGIN CUDA BLOCK --- … # --- END CUDA BLOCK --- and # --- BEGIN CUDA ENV --- … # --- END CUDA ENV ---) on a host without nvidia-container-toolkit. Leaving them in is what produces the could not select device driver "nvidia" error from Docker.

1. Flash the firmware

Download the latest release from GitHub Releases (look for a tag starting with fw-v). Grab all six binaries: bootloader.bin, partition-table.bin, ota_data_initial.bin, stack-chan.bin, generated_assets.bin, and human_face_detect.espdl.

Install esptool and flash over USB-C:

pip install esptool

python -m esptool --chip esp32s3 -b 460800 \
  --before default_reset --after hard_reset \
  write_flash --flash_mode dio --flash_size 16MB --flash_freq 80m \
  0x0      bootloader.bin \
  0x8000   partition-table.bin \
  0xd000   ota_data_initial.bin \
  0x20000  stack-chan.bin \
  0xa60000 generated_assets.bin \
  0xe70000 human_face_detect.espdl

Flashing the bootloader (0x0) and partition table (0x8000) is required — skip them and the device keeps whatever partition layout the previous firmware left behind. That layout won't match these images, and the robot boot-loops on No bootable app partitions.

Verify checksums against SHA256SUMS.txt in the release if desired.

2. Clone the repo

git clone --recursive https://github.com/BrettKinny/dotty-stackchan.git
cd dotty-stackchan

3. Configure

cp .env.example .env

Edit .env and set OPENROUTER_API_KEY=<YOUR_API_KEY> (or any OpenAI-compatible key). You can skip this if you're running fully local — either via Ollama (single binary, simple) or via llama-swap (Docker, supports multiple resident models). See cookbook/run-fully-local.md and cookbook/llama-swap-concurrent-models.md.

The shipped .config.yaml selects PiVoiceLLM as the default LLM provider, which runs the dotty-pi container (the pi coding agent) on the same Docker host. One alternate provider — OpenAICompat (any OpenAI-compatible cloud or local endpoint) — is available via selected_module.LLM in data/.config.yaml.

4. Run setup

make setup

The interactive wizard prompts for your server IP, robot name, timezone, and LLM provider. It downloads the ASR and TTS models (~100 MB), substitutes placeholders in config files, and starts the Docker containers.

Verify everything is healthy:

make doctor

All checks should pass (green). If any fail, see troubleshooting.md.

5. Bring up the containers

All four server-side services run as Docker containers on the same host. docker compose up -d (from docker-compose.yml.template after make setup substitutes your placeholders) starts the main xiaozhi-esp32-server container. The brain container and the perception/dashboard container are brought up separately:

dotty-pi (the voice-tool brain): see dotty-pi/README.md for build and run instructions.
dotty-behaviour (perception bus + admin dashboard): see dotty-behaviour/README.md for build and run instructions. The scripts/deploy-behaviour.sh helper deploys it.
bridge.py (admin dashboard service, :8081): runs as a container on the same host (bridge/Dockerfile + bridge/docker-compose.yml, deployed via scripts/deploy-bridge-unraid.sh).

No separate host, no systemd bridge unit, no SSH to a second machine.

6. Connect the robot

Power on the robot (USB-C or battery).
On the device screen, navigate to Settings > Advanced Options.
Enter the OTA URL: http://<XIAOZHI_HOST>:8003/xiaozhi/ota/
The robot connects via WebSocket and shows a face.

7. First voice turn

Tap the screen to enter voice mode and say "Hello Dotty!"

You should see:

LED colour	State
Green	Listening -- you are speaking
Orange	Thinking -- waiting for LLM response
Blue	Talking -- playing the response

The face expression changes to match the response emoji. First-turn latency is roughly 5 seconds, dominated by the LLM round-trip.

Next steps

Change the persona -- give Dotty a different personality.
Swap the voice -- try a different TTS voice.
Run fully local -- Ollama compose profile, zero cloud dependencies.
Run two local models concurrently -- keep a small voice model and a big "think" model both resident via llama-swap's matrix DSL.
Disable Kid Mode -- for unrestricted use.
Architecture overview -- full data flow.
Kid Mode -- on by default, what it enforces.

Placeholders

This repo uses placeholders in place of real IPs, usernames, and filesystem paths. Substitute these everywhere before deploying:

Placeholder	Meaning
`<XIAOZHI_HOST>`	LAN IP of the server running all Docker containers. The robot reaches this on WiFi, so it must be a LAN IP, not a Tailscale/VPN IP.
`<XIAOZHI_USER>`	SSH user for the server (whatever your distro defaults to: `root`, `ubuntu`, `dietpi`, etc.).
`<XIAOZHI_HOSTNAME>`	Hostname or Tailscale name of the server (optional, IP works for everything).
`<XIAOZHI_PATH>`	Path on the server where you clone/install this repo (e.g. `/opt/xiaozhi-server/` or `/srv/xiaozhi-server/`).
`<YOUR_NAME>`	Your name / org, used in the persona prompt in `.config.yaml`.
`<ROBOT_NAME>`	Name the robot introduces itself as, referenced in the persona prompt in `.config.yaml`. Any string — pick whatever you want. The default example uses the hardware name ("StackChan").

Port numbers (8000, 8003, 8081, 8090) are product-generic and should not be changed unless you also reconfigure the respective services.

Files you will definitely need to edit before first run:

.config.yaml — replace <XIAOZHI_HOST> and customise the prompt: block.
docker-compose.yml — set TZ to your timezone.

Deployment layout

All four containers run on the single Docker host (<XIAOZHI_HOST>):

Container	Purpose	Port
`xiaozhi-esp32-server`	Voice pipeline: ASR, TTS, WebSocket to StackChan	8000 (WS), 8003 (OTA/HTTP)
`dotty-pi`	pi coding agent — the voice-tool brain	internal (via `docker exec`)
`dotty-behaviour`	Perception bus + ambient consumers + calendar	8090
`bridge.py`	Admin dashboard	8081

Container volume mounts for xiaozhi-esp32-server:

Host path	Container path	Purpose
`data/.config.yaml`	`/opt/xiaozhi-esp32-server/data/.config.yaml`	Config override (read-only mount)
`models/SenseVoiceSmall/`	`/opt/xiaozhi-esp32-server/models/SenseVoiceSmall/`	ASR weights
`models/piper/`	`/opt/xiaozhi-esp32-server/models/piper/`	Piper TTS voice models (`.onnx` + `.json`)
`tmp/`	`/opt/xiaozhi-esp32-server/tmp/`	Scratch
`custom-providers/pi_voice/`	`/opt/xiaozhi-esp32-server/core/providers/llm/pi_voice/`	PiVoiceLLM provider (directory mount)
`custom-providers/openai_compat/`	`/opt/xiaozhi-esp32-server/core/providers/llm/openai_compat/`	OpenAICompat alternate provider
`custom-providers/edge_stream/edge_stream.py`	`/opt/xiaozhi-esp32-server/core/providers/tts/edge_stream.py`	Streaming EdgeTTS provider (file mount)
`custom-providers/piper_local/piper_local.py`	`/opt/xiaozhi-esp32-server/core/providers/tts/piper_local.py`	Local Piper TTS provider (file mount)
`custom-providers/asr/fun_local.py`	`/opt/xiaozhi-esp32-server/core/providers/asr/fun_local.py`	Patched FunASR — adds `language` config key so SenseVoiceSmall can be pinned to English

The full file inventory lives in architecture.md.

Endpoints

What	URL	Who calls it
OTA (enter into StackChan settings)	`http://<XIAOZHI_HOST>:8003/xiaozhi/ota/`	The robot on boot
WebSocket	`ws://<XIAOZHI_HOST>:8000/xiaozhi/v1/`	The robot after OTA handshake
Perception / ambient events	`http://<XIAOZHI_HOST>:8090`	xiaozhi-server → dotty-behaviour
Admin dashboard	`http://<XIAOZHI_HOST>:8081/ui`	Humans (LAN-only HTMX UI)
Bridge health	`http://<XIAOZHI_HOST>:8081/health`	Humans, monitoring

Reboot survival

All containers use restart: unless-stopped. Ensure dockerd starts at boot on your distro. Use docker compose restart or docker restart <container> for transient restarts rather than docker compose down (which marks the container stopped and prevents auto-restart on reboot).

Common operations

# Tail xiaozhi-server logs (voice pipeline)
ssh <XIAOZHI_USER>@<XIAOZHI_HOST> 'docker logs -f xiaozhi-esp32-server'

# Tail dotty-behaviour logs (perception + dashboard)
ssh <XIAOZHI_USER>@<XIAOZHI_HOST> 'docker logs -f dotty-behaviour'

# Tail dotty-pi logs (brain container)
ssh <XIAOZHI_USER>@<XIAOZHI_HOST> 'docker logs -f dotty-pi'

# Restart voice pipeline after config change
ssh <XIAOZHI_USER>@<XIAOZHI_HOST> 'cd <XIAOZHI_PATH> && docker compose restart'

# Admin dashboard
open http://<XIAOZHI_HOST>:8081/ui

# Bridge health
curl http://<XIAOZHI_HOST>:8081/health

Changing voice

The default TTS is LocalPiper (offline, runs inside the container). To change the Piper voice, edit TTS.LocalPiper.voice and the corresponding model_path / config_path in data/.config.yaml. To switch to cloud EdgeTTS instead, set selected_module.TTS: EdgeTTS and edit TTS.EdgeTTS.voice (any Microsoft Edge Neural voice ID works, e.g. en-US-AvaNeural). Restart the container after changes.

Changing persona (the robot's personality)

Edit personas/dotty_voice.md (loaded by the pi agent on the PiVoiceLLM path) and restart the relevant container. The prompt: key in data/.config.yaml is also injected as a secondary system message. Full instructions: cookbook/change-persona.md.

Changing VAD sensitivity

VAD.SileroVAD.min_silence_duration_ms in data/.config.yaml. Default: 700 ms. Lower = cuts off quicker. Higher = waits longer for slow speakers.

Changing the LLM model

For the PiVoiceLLM path (default): see dotty-pi/README.md for the model selection rules — in particular, the llama-swap matrix DSL constraint that prevents the voice-model set from being evicted. For the OpenAICompat path: edit LLM.OpenAICompat.model (or repoint url / api_key) in data/.config.yaml and docker compose restart. Note: there is no live in-flight model-swap on either path — smart-mode model-swap is v2 scope and not wired (the instant hot-swap once provided by the removed Tier1Slim provider is gone).

Troubleshooting

make doctor          # health checks
make logs            # tail server logs
curl http://<XIAOZHI_HOST>:8081/health   # test the bridge/dashboard

See troubleshooting.md for common issues.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Quickstart

What you need

Server hardware

1. Flash the firmware

2. Clone the repo

3. Configure

4. Run setup

5. Bring up the containers

6. Connect the robot

7. First voice turn

Next steps

Placeholders

Deployment layout

Endpoints

Reboot survival

Common operations

Changing voice

Changing persona (the robot's personality)

Changing VAD sensitivity

Changing the LLM model

Troubleshooting

FilesExpand file tree

quickstart.md

Latest commit

History

quickstart.md

File metadata and controls

Quickstart

What you need

Server hardware

1. Flash the firmware

2. Clone the repo

3. Configure

4. Run setup

5. Bring up the containers

6. Connect the robot

7. First voice turn

Next steps

Placeholders

Deployment layout

Endpoints

Reboot survival

Common operations

Changing voice

Changing persona (the robot's personality)

Changing VAD sensitivity

Changing the LLM model

Troubleshooting