From 55e87650e9567623456a3319e811d27a73c9d215 Mon Sep 17 00:00:00 2001 From: Brett Kinny Date: Sat, 6 Jun 2026 20:58:30 +1000 Subject: [PATCH 1/4] docs: simplify ROADMAP + slim/correct docs to match reality MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Scale back the public-facing scope and cut docs for things that are aspirational, redundant, or contradicted by the 'won't build' list. ROADMAP (simplified to two sections, no over-promising): - collapsed to 'What works today' + 'Maybe someday' + a won't-build line - dropped stale/done items (whisper ASR swap and privacy LEDs both shipped; kid-ASR accuracy mitigations landed) after verifying against the code - added 'Improve Security Mode' Deleted (verified aspirational/redundant/spent): - docs/signed-releases.md — GPG signing was commented-out scaffold only - docs/advanced/variant-port-guide.md — firmware is CoreS3-only - docs/hardware-support.md — redundant with hardware.md (the 'matrix' is a won't-build item) - docs/speaker-id-investigation.md — superseded by latent-capabilities.md - docs/vision-alignment-review-2026-05-29.md — spent audit; findings fixed by #129 - KEYS.txt + the firmware-release GPG-signing scaffold — descoped with signed-releases Trimmed to match reality: - observability.md — only kid_mode + content_filter_hits actually record today; the other 8 metrics are schema-only. Marked live-vs-unwired; cut phantom alerts. - ota-verification.md — noted firmware-update OTA isn't set up (WS discovery + clock only) - reproducible-builds.md — dropped dead signed-releases reference Corrected: - modes.md — LED contract described active-fork StateManager (6=face,11=listening); the pinned submodule (35f701a) ships privacy LEDs (6=mic green, 11=camera red) and has no StateManager. Added a precise two-firmware caveat. - README — ASR bullet now notes WhisperLocal/SenseVoiceOnnx; added Privacy LEDs feature Link hygiene: remapped 6 hardware-support links to hardware.md, removed deleted docs from mkdocs nav, dropped the signed-releases cross-ref from sbom.md. No dangling markdown links remain in lychee-scanned files. Co-Authored-By: Claude Opus 4.8 (1M context) --- .github/workflows/firmware-release.yml | 31 --- KEYS.txt | 13 - README.md | 3 +- ROADMAP.md | 99 ++----- docs/about.md | 2 +- docs/advanced/variant-port-guide.md | 201 -------------- docs/faq.md | 6 +- docs/hardware-support.md | 79 ------ docs/modes.md | 17 ++ docs/observability.md | 57 ++-- docs/ota-verification.md | 15 +- docs/quickstart.md | 2 +- docs/reproducible-builds.md | 4 +- docs/sbom.md | 2 - docs/signed-releases.md | 153 ----------- docs/speaker-id-investigation.md | 101 -------- docs/vision-alignment-review-2026-05-29.md | 288 --------------------- mkdocs.yml | 7 +- 18 files changed, 77 insertions(+), 1003 deletions(-) delete mode 100644 KEYS.txt delete mode 100644 docs/advanced/variant-port-guide.md delete mode 100644 docs/hardware-support.md delete mode 100644 docs/signed-releases.md delete mode 100644 docs/speaker-id-investigation.md delete mode 100644 docs/vision-alignment-review-2026-05-29.md diff --git a/.github/workflows/firmware-release.yml b/.github/workflows/firmware-release.yml index 231fc53..409d098 100644 --- a/.github/workflows/firmware-release.yml +++ b/.github/workflows/firmware-release.yml @@ -78,29 +78,6 @@ jobs: echo "--- checksums ---" cat SHA256SUMS.txt - # ─── GPG signing (TODO: enable when GPG_PRIVATE_KEY secret is set) ─── - # See docs/signed-releases.md for the full setup. Required repo secrets: - # GPG_PRIVATE_KEY, GPG_PASSPHRASE, GPG_KEY_ID - # Add the *.asc files to the Release `files:` block below once enabled. - # - # - name: Import GPG signing key - # if: ${{ secrets.GPG_PRIVATE_KEY != '' }} - # run: | - # echo "${{ secrets.GPG_PRIVATE_KEY }}" | gpg --batch --import - # echo "default-key ${GPG_KEY_ID}" >> ~/.gnupg/gpg.conf - # env: - # GPG_KEY_ID: ${{ secrets.GPG_KEY_ID }} - # - # - name: Sign release artifacts - # if: ${{ secrets.GPG_PRIVATE_KEY != '' }} - # working-directory: firmware/firmware/build - # run: | - # for f in bootloader.bin partition-table.bin ota_data_initial.bin stack-chan.bin generated_assets.bin human_face_detect.espdl SHA256SUMS.txt; do - # gpg --batch --yes --pinentry-mode loopback \ - # --passphrase "${{ secrets.GPG_PASSPHRASE }}" \ - # --detach-sign --armor "$f" - # done - - name: Create GitHub Release uses: softprops/action-gh-release@v2 with: @@ -114,14 +91,6 @@ jobs: firmware/firmware/build/generated_assets.bin firmware/firmware/build/human_face_detect.espdl firmware/firmware/build/SHA256SUMS.txt - # Add once signing is enabled: - # firmware/firmware/build/bootloader.bin.asc - # firmware/firmware/build/partition-table.bin.asc - # firmware/firmware/build/ota_data_initial.bin.asc - # firmware/firmware/build/stack-chan.bin.asc - # firmware/firmware/build/generated_assets.bin.asc - # firmware/firmware/build/human_face_detect.espdl.asc - # firmware/firmware/build/SHA256SUMS.txt.asc notify-failure: name: Notify on Failure diff --git a/KEYS.txt b/KEYS.txt deleted file mode 100644 index 07d687e..0000000 --- a/KEYS.txt +++ /dev/null @@ -1,13 +0,0 @@ -# Maintainer GPG public key fingerprints -# -# This file is the single source of truth for the GPG keys that sign Dotty -# release artifacts. Until a real key is generated, the fingerprint below is -# a placeholder — release artifacts during this period are NOT signed. -# -# Brett Kinny -# -# To verify a release artifact: -# gpg --keyserver keys.openpgp.org --recv-keys -# gpg --verify .asc -# -# See docs/signed-releases.md for the full signing + verification workflow. diff --git a/README.md b/README.md index 1065d15..e591ee8 100644 --- a/README.md +++ b/README.md @@ -23,13 +23,14 @@ So Dotty is the version that passes: every component runs on hardware I own, eve ## Features - **Kid Mode (on by default)** — age-appropriate responses via persona + per-turn prompt steering. An output content filter is planned, not yet shipped ([#138](https://github.com/BrettKinny/dotty-stackchan/issues/138)); Kid Mode is not a substitute for supervision. Toggle off for general-purpose use. See [`docs/kid-mode.md`](./docs/kid-mode.md). -- **Local ASR** — FunASR SenseVoiceSmall runs on your hardware, no cloud transcription. +- **Local ASR** — FunASR SenseVoiceSmall by default, no cloud transcription. WhisperLocal auto-selects on GPU hosts (better kid-speech accuracy); SenseVoiceOnnx is a lighter low-RAM option. - **Local or cloud TTS** — Piper (offline) or EdgeTTS (cloud). Swap with a config change. - **Streaming responses** — the bridge streams LLM output to the voice pipeline for lower perceived latency. - **Emoji expressions** — every response starts with an emoji that the firmware maps to a face animation (smile, laugh, sad, surprise, thinking, angry, love, sleepy, neutral). - **Voice tools** — the pi agent can search its memory, escalate hard questions to a bigger model, take a photo, and play songs, all mid-conversation. - **States, toggles & LEDs** — a six-state mutex (`idle / talk / story_time / security / sleep / dance`) plus two orthogonal toggles (`kid_mode`, `smart_mode`), all owned by the firmware StateManager and surfaced on the 12-pixel LED ring. Shipped on the active firmware fork (commit `d78118b`, 2026-04-27); the `firmware/firmware/` submodule pin in this repo lags, so flash from the active fork to get it. See "States, Toggles & LEDs" below and [`docs/modes.md`](./docs/modes.md). - **Vision (camera)** — the robot's built-in camera can capture images for multimodal LLM queries. +- **Privacy LEDs** — hardware-bound mic (green) and camera (red) indicators on the LED ring. They light from the codec/camera enable signals via RAII guards, so a misbehaving server or model can't capture with the lights off. - **Calendar context** — optional calendar integration feeds upcoming events into the conversation context. - **Hackable** — every seam is swappable: LLM, TTS, ASR, agent framework. Fork it, rip out what you don't want, wire in your own. diff --git a/ROADMAP.md b/ROADMAP.md index e09d30c..276c912 100644 --- a/ROADMAP.md +++ b/ROADMAP.md @@ -1,92 +1,27 @@ # Roadmap -> This is a living document. See [CONTRIBUTING.md](CONTRIBUTING.md) to get involved. +Personal hobby project. Might ship, might not — no timeline, no guarantees. If you want something built, feel free to contribute or ask nicely. -## Shipping now (v0.1) +## What works today -v0.1 is the first tagged release — early-feedback alpha. Everything in this list runs end-to-end on the maintainer's hardware. v1.0 is gated on real-world feedback from external users; see [Known issues](#known-issues-as-of-v01) below. +The core voice loop runs end-to-end on a single Docker host: -- **Kid Mode** -- opt-in child-safety guardrails: topic blocklist, self-harm redirect, content filter, age-appropriate vocabulary (on by default, disable with `DOTTY_KID_MODE=false`) -- **Local ASR** -- FunASR SenseVoiceSmall, English-pinned, runs on your Docker host -- **Local TTS** -- Piper voice synthesis, no cloud dependency -- **Streaming LLM responses** -- NDJSON token-level streaming with first-token latency ~1.2s -- **Emoji-driven expressions** -- LLM output prefixed with emoji; firmware maps to face animations -- **Persona system** -- swappable persona files (`personas/*.md`), customizable via `make setup` -- **MCP tool integration** -- 11 firmware-advertised tools (head servos, LEDs, camera, reminders, volume, brightness, screen theme) -- **Photo-based vision** -- "What do you see?" triggers camera capture + vision model description -- **Calendar context injection** -- Google Calendar events surfaced to the LLM for contextual reminders -- **Length-aware brevity** -- default 1-2 short sentences, up to 6 for open-ended asks (story, explanation, list); cap enforced in code via `MAX_SENTENCES` -- **ASR noise filtering** -- rejects punctuation-only / sub-threshold utterances -- **Single-host deployment** -- all four server services (xiaozhi-server, dotty-pi, dotty-behaviour, bridge.py) run as Docker containers on one machine -- **`make setup` wizard** -- interactive first-run: name your robot, fetch models, validate config -- **MkDocs Material docs site** -- architecture, protocols, quickstart, troubleshooting, FAQ -- **Kid Mode channel routing** -- voice channels are kid-safe by default; the kid-mode sandwich (English-pin, emoji prefix, topic blocklist, jailbreak resistance) only applies when the inbound `channel` is in `VOICE_CHANNELS`, so messaging-platform channels (Telegram, etc.) skip it automatically -- **`/xiaozhi/admin/*` endpoints** -- live-session control surface on xiaozhi-server: `set-state`, `set-toggle`, `set-face-identified`, `set-head-angles`, `inject-text`, `abort`, `take-photo`, `play-asset`, `songs`, `say`, `devices`. See [`architecture.md`](https://brettkinny.github.io/dotty-stackchan/latest/architecture/#admin-surface-two-services-two-prefixes) -- **Smart-mode** -- smart-mode flips ambient/behaviour only. The inner-loop **model-swap is v2 scope and not wired** on the live `PiVoiceLLM` path. (The instant in-process model-swap once provided by the now-removed `Tier1Slim` provider — added in `b73f583`, removed in the 2026-05-29 alignment pass — is gone.) -- **llama-swap voice/coding matrix** -- `qwen3.5:4b` (voice inner loop) + `qwen3.6:27b-think` (think_hard target) co-resident under the `voice` matrix set; `qwen3.6:27b` for `pi` CLI runs alone under `coding`. See [`cookbook/llama-swap-concurrent-models.md`](https://brettkinny.github.io/dotty-stackchan/latest/cookbook/llama-swap-concurrent-models/) -- **Perception event bus** -- firmware `face_detected` / `face_lost` / `sound_event` / `state_changed` frames relay through xiaozhi's `EventTextMessageHandler` to `dotty-behaviour`'s `/api/perception/event`, fanned out to 11 consumer classes (config-gated; includes face_greeter, sound_turner, face_lost_aborter, wake_word_turner, face_identified_refresher, purr_player) -- **Fully-local backend support** -- `compose.local.override.yml` for Ollama (single binary, simple) plus llama-swap recipe for concurrent multi-model serving. Both shipped; choose based on whether you need multiple models resident at once -- **Voice catalog + install helper** -- `docs/voice-catalog.md` (12 Piper + 6 EdgeTTS) + `make voice-install` -- shipped -- **Versioned docs via `mike`** -- `/latest/`, `/v0.1/`, `/dev/` URL structure shipped -- **Observability hooks** -- Prometheus `/metrics` + Grafana dashboard at `monitoring/grafana-dashboard.json` -- shipped -- **Head-pet hold-to-listen wake** -- firmware fires `WakeWordInvoke("head_pet_hold")` after 2 s touch; works in the dark. Also emits `head_pet_started`/`_ended` perception events for purr consumer +- Voice I/O through xiaozhi-esp32-server (ASR + TTS) +- Brain via the `dotty-pi` container (pi coding agent with voice tools) +- Perception events (face detection, sound events) through `dotty-behaviour` +- Admin dashboard via `bridge.py` +- Kid Mode, on by default — prompt-level steering for age-appropriate responses, **not** an output content filter, and no substitute for supervision +- Local ASR (SenseVoiceSmall) and local TTS (Piper) — no cloud dependency +- Streaming responses, emoji expressions, swappable personas -## Known issues (as of v0.1) +Rough edges: face emoji rendering misses 4 of 9 emotions visually; the sound-direction localizer has a left-bias on CoreS3 hardware. -The 30+ planning docs accumulated during the v0.1 prep sprint surfaced these. None are blockers for trying Dotty out, but you should know about them: +## Maybe someday -- **Face emoji rendering** — only 5 of 9 enforced emotions render distinctly on the LCD. Sad clamps to a one-eye wink (rotation `-400` clamps to 0 on left eye), Surprise is byte-identical to Neutral (weight `120` clamps to `100`), Loving is a copy-paste of Happy, Laughing is an alias of Happy by design. Fix is queued (~25-40 LoC firmware patch). -- **Sound-direction localizer always reads left.** I2S channel 1 on the M5Stack CoreS3 is the AEC speaker-loopback reference, not the right mic. Energy detection works; direction does not. Sound-driven head-turn behaves accordingly. -- **Kid-voice ASR accuracy** — SenseVoiceSmall mangles short kid utterances ("macarena" → "maarna"). Post-ASR corrections + phrase boost help but have hit their ceiling. whisper.cpp / faster-whisper swap planned (Phase 1 CPU-only ships immediately, Phase 2 GPU once dual RTX 3060s arrive). -- **Privacy-indicator LEDs not yet hardwired.** The camera streams DMA buffers permanently after init; mic + camera enable are software-controlled with no hardware-guaranteed indicator. **Hard prereq for face recognition / continuous vision; do not ship those features without it.** -- **Smart Mode regression** (fixed in v0.1 itself) — between `434988d` and the v0.1 fix, every voice "smart mode" trigger silently fell back to the default model. If you're forking from before the v0.1 tag, pull the fix. +No promises — just a few ideas I might get around to, roughly in order of interest: -## In progress +- Better wake word ("Hey Dotty" instead of the current "Hi, ESP") +- Story Mode improvements (longer narratives, character voices) +- Improve Security Mode -Actively being worked on or partially complete. **Big push 2026-04-25 evening:** ~26 commits scaffolding much of what was previously "Planned" — see [CHANGELOG.md](CHANGELOG.md) `[Unreleased]` for the full inventory. Most items below have code on `main` but are not yet deployed live or fully wired. - -- **Phase 4 firmware StateManager bench checks** -- the on-device six-state mutex (`idle / talk / story_time / security / sleep / dance`) and 12-pixel LED contract shipped to the active firmware fork (commit `d78118b`, 2026-04-27) and end-to-end-verified autonomously. Visual / interactive bench checks on the live device pending in [#38](https://github.com/BrettKinny/dotty-stackchan/issues/38) (Phase 4 foundation), [#39](https://github.com/BrettKinny/dotty-stackchan/issues/39) (Phase 5 sleep behaviour), [#40](https://github.com/BrettKinny/dotty-stackchan/issues/40) (Phase 6 security behaviour). The `firmware/firmware/` submodule pin in this repo lags the active fork; bump (or build from the active fork) to flash a Phase 4+ build. -- **CI pipeline** -- YAML lint, compose validation, config parse check, firmware dry-build, docs link check -- **Firmware release workflow** -- GitHub Actions building `.bin` artifacts on tag push -- **Quickstart improvements** -- linear "flash, clone, configure, talk" path assuming published firmware releases -- **First-audio latency reduction** -- two-tier path lands inner-loop turns under 1 s warm; further improvements queued (escalation parallelism, llama.cpp MTP PR #22673 for ~1.5-2× on think_hard) -- **ASR accuracy for children's speech** -- post-ASR corrections live; Whisper Phase 1 scaffold landed at v0.1; A/B verification pending -- **Face detection + tracking** -- shipped firmware-side; smoother+faster tuning queued (EMA 0.5, speed 500, deadband, MSR thr 0.40). Flash + bench-test pending -- **Layer 4 identity (description-based)** -- shipped + deployed. VLM (Gemini 2.0 Flash) returns a free-form description plus a roster name match against `household.yaml`'s `appearance:` field. No biometrics, no persistent identifiers. The earlier dlib biometric scaffold (`bridge/face_db.py` + `face_recognizer.py` + on-device `FaceRecognizer` + `ParentalGate` + 4 MCP tools) was removed — description-based covers the use case and biometrics conflicted with the no-storage identity posture -- **Layer 6 proactive greetings** -- `bridge/proactive_greeter.py` + lifespan wiring shipped. Cooldown + time-of-day windowing + kid-safe sandwich + calendar-aware prompt + template fallback. Depends on Layer 4 for named greetings; works today with `face_detected` (unknown identity) for generic -- **Layer 1 privacy-indicator LEDs** -- firmware scaffold drives mic/camera state via RAII peripheral guards. Camera `VIDIOC_STREAMOFF` wiring deferred (closes the always-streaming hole; queued) -- **Wake word "Hey Dotty"** -- interim shipped: firmware default switched Chinese → English "Hi, ESP". Custom "Hey Dotty" microWakeWord roadmap documented (`docs/wake-word.md`); needs sample collection + Colab training (~2 weeks calendar) -- **Purr-on-head-pet** -- server consumer shipped (`_perception_purr_player`); fires on `head_pet_started`. Asset path `bridge/assets/purr.opus` is a drop-in (asset itself not committed) -- **Dancing mode** -- shipped at v0.1; karaoke + LLM-initiated dance + Phase 2 vocal singing remain -- **Reproducible + signed firmware builds** -- SBOM + signed-releases scaffolds shipped. Maintainer GPG key + IDF Dockerfile SHA256 pin pending - -## Planned - -Designed but not yet started. Roughly in priority order. - -- **Improve Security Mode** -- expand beyond the current LED-flash + alert posture: configurable triggers, escalation rules, and richer notification surfaces -- **Improve Story Mode** -- longer-form narrative pacing, character voices, save/resume, and child-led branching -- **Easily configurable model profiles** -- first-class config surface for swapping the local / kid / smart models (and adding new ones) without hand-editing daemon `config.toml` files -- **Improve Kid Mode -- configurable age band** -- per-child age setting that tunes vocabulary, topic blocklist strictness, and response length; today Kid Mode is one-size-fits-all -- **Improve Dance Mode -- user song library** -- let users drop their own audio files into a song folder and have Dotty discover, list, and dance to them (current dance set is built-in only) -- **Speech bubble sync** -- tie on-screen text bubble visibility to actual audio playback state (deferred at v0.1 — Brett says timing looks fine in practice) -- **Singing mode** -- vocal synthesis or pitch-shifted TTS over backing tracks (Phase 2 of dance work) -- **Runtime OTA provisioning** -- captive-portal WiFi + OTA URL setup on first boot (no rebuild to retarget) -- **Layer 2.5 stereo mic + camera person tracking** -- sound-source localization + camera fusion for 360° awareness in idle mode -- **Phase 3 continuous vision classifier** -- EfficientDet/YOLOX at 1Hz on the Docker host GPU once dual RTX 3060s land -- **Sleep-mode "dream" memory compaction** -- while Dotty is in `sleep` state (idle, overnight), a background pass feeds the day's memory writes (perception events, conversation turns, declared facts, scene snapshots) to the smart model for compaction + summarisation. Two outputs: rewrite/prune the raw memory store (drop duplicates and low-signal perception spam, keep durable facts and notable events), and emit a separate human-readable daily summary that next-day turns can pull as "yesterday's context". Sleep-state-gated so the heavy LLM call never runs during interactive states. Pairs with the per-person memory and ambient scene memory work -- **Variant board port guide** -- walkthrough for adding support for other ESP32-S3 boards - -## Community wishlist - -Ideas we would welcome help with. None are blockers. - -- **ESP Web Tools web flasher** -- one-click browser flash via `esptool.js` on GitHub Pages -- **Voice catalog + install helper** -- curated Piper/EdgeTTS voices with a download script -- **Versioned docs via `mike`** -- `/latest/` + `/v1.0/` so older firmware users see matching docs -- **Observability hooks** -- Prometheus metrics on the bridge (latency, token counts, error rates) + starter Grafana dashboard -- **Variant board port guide** -- walkthrough for adding support for other ESP32-S3 boards -- **Face/emoji asset catalog** -- document the expression-id-to-emoji mapping; show how to add a new face -- **Firmware/server compatibility matrix** -- pin which server versions work with which firmware versions -- **`make audit` network verifier** -- user-runnable tool to confirm "local except LLM" claim against their own install -- **Reproducible + signed firmware builds** -- toolchain-pinned `.bin` with GPG-signed release artifacts +I'm not planning to build variant-board ports, firmware compatibility matrices, signed/reproducible builds, observability dashboards, or OTA provisioning flows myself. If any of those matter to you, PRs are welcome. diff --git a/docs/about.md b/docs/about.md index 73f7baa..e4ff53e 100644 --- a/docs/about.md +++ b/docs/about.md @@ -61,7 +61,7 @@ All audio processing (VAD, ASR) happens on your LAN. The LLM call is the only co - [SETUP.md](SETUP.md) — deployment guide. - [architecture.md](./architecture.md) — how the components fit together (diagrams + ops surfaces). -- [hardware-support.md](./hardware-support.md) — what hardware you need. +- [hardware.md](./hardware.md) — what hardware you need. - [faq.md](./faq.md) — common questions. Last verified: 2026-05-17. diff --git a/docs/advanced/variant-port-guide.md b/docs/advanced/variant-port-guide.md deleted file mode 100644 index 6b84cdd..0000000 --- a/docs/advanced/variant-port-guide.md +++ /dev/null @@ -1,201 +0,0 @@ ---- -title: Variant Port Guide -description: How to run Dotty's voice stack on an ESP32-S3 board other than the M5Stack CoreS3. ---- - -# Variant port guide - -Dotty's server stack (xiaozhi-server, bridge, dotty-pi, dotty-behaviour) is protocol-agnostic — it doesn't care which ESP32-S3 board is on the other end of the WebSocket. All the interesting porting work is in the firmware. - -This guide explains how to bring up the voice pipeline on a different ESP32-S3 board, and what hardware adaptation is needed to get the robot-body features (servos, LEDs, display) working. - -## TL;DR - -| Goal | Firmware path | Effort | -|---|---|---| -| Voice only (ASR / TTS / LLM) | `78/xiaozhi-esp32` with your board's config | Low — add board config + flash | -| Full robot body (servos, LEDs, avatar) | Port `m5stack/StackChan` to your board | Medium–high — display + servo + LED adaptation | - ---- - -## Server side: nothing to change - -xiaozhi-server, the bridge, dotty-pi, and dotty-behaviour all run on the Docker host — not on the device. They communicate over the Xiaozhi WebSocket protocol, which is board-agnostic. - -The only server-side value that varies per board is the OTA firmware filename, which you set in the device's `sdkconfig` before flashing. - ---- - -## Firmware path decision - -Two codebases speak the Xiaozhi protocol: - -| Firmware | Board support | Robot body | Use when | -|---|---|---|---| -| [`78/xiaozhi-esp32`](https://github.com/78/xiaozhi-esp32) | 70+ ESP32-S3 targets | No — generic voice assistant | You want voice quickly on a custom board, no servo/avatar | -| [`m5stack/StackChan`](https://github.com/m5stack/StackChan) | CoreS3 out of the box | Yes — servos, avatar, LEDs, MCP tools | You have a StackChan-like body and want full robot integration | - -Both firmwares are vendored in this repo under `firmware/` (a git submodule pointing to `BrettKinny/StackChan`). The StackChan firmware pulls in `78/xiaozhi-esp32` v2.2.4 at build time via `fetch_repos.py`. - ---- - -## Option A — voice pipeline on a new ESP32-S3 board - -Use `78/xiaozhi-esp32` directly. You get ASR / TTS / LLM but no servo or avatar control. - -### 1. Check if your board already has a config - -After running `fetch_repos.py`, the upstream firmware is cloned into `firmware/firmware/xiaozhi-esp32/`. Board configs live under `boards/`: - -```bash -ls firmware/firmware/xiaozhi-esp32/boards/ -``` - -If your board is listed (search by chipset, e.g. `esp32s3_*`), you can build directly: - -```bash -idf.py set-target esp32s3 -idf.py -D SDKCONFIG_DEFAULTS="boards//sdkconfig.defaults" build -``` - -### 2. Create a new board definition - -If your board isn't in the list, add one. Each board directory needs at minimum an `sdkconfig.defaults` that sets: - -- Flash size and PSRAM type (`CONFIG_ESPTOOLPY_FLASHSIZE`, `CONFIG_SPIRAM_*`) -- Audio codec I2S pins and clock rates -- Microphone channel configuration -- Display interface pins (if using the avatar renderer) - -Use a similar existing board as your starting point. `boards/m5stack_core_s3/sdkconfig.defaults` is the closest reference for any M5Stack product. - -``` -firmware/firmware/xiaozhi-esp32/boards/ - m5stack_core_s3/ - sdkconfig.defaults ← reference config - your_board_name/ - sdkconfig.defaults ← create this -``` - -### 3. Build and flash - -The abridged build + flash commands are below; the project's root `CLAUDE.md` has the full version with gotchas (CMake GLOB cache, `%lld` printf quirks, patch regeneration, `/dev/ttyACM0` reattach behaviour). - -```bash -cd firmware/firmware - -# Fetch upstream + apply patches, then build -docker run --rm -v "$PWD:/project" -w /project \ - espressif/idf:v5.5.4 bash -lc \ - 'git config --global --add safe.directory "*" && python fetch_repos.py && idf.py build' - -# Flash (adjust the port if needed) -docker run --rm -v "$PWD:/project" -w /project \ - --device=/dev/ttyACM0 espressif/idf:v5.5.4 \ - bash -lc 'idf.py -p /dev/ttyACM0 -b 921600 flash' -``` - -### 4. Verify the WebSocket connection - -Once flashed, the device should connect and negotiate the handshake. Check the server logs: - -```bash -docker logs xiaozhi-esp32-server | grep -E '(hello|tools/list|connected)' -``` - -A `tools/list` response confirms the device is advertising its MCP tools and the voice pipeline is ready. - ---- - -## Option B — porting m5stack/StackChan to a new board - -If you have servo hardware and want the full robot-body MCP tools, adapt the StackChan firmware. It targets the CoreS3 explicitly in several places. - -### Adaptation checklist - -**Display (avatar renderer)** - -The M5Stack Avatar library assumes an ILI9342C display at 320×240 over SPI. If your display uses a different controller: - -1. Update the `DisplayDevice` typedef and initialization in the display driver. -2. Adjust resolution constants if your panel differs from 320×240. -3. Test face animations independently before wiring the audio pipeline. - -**Audio codec (ASR input)** - -The CoreS3 uses the ES7210 codec for mic input via I2S. If your board uses a different codec: - -1. Find and update the codec init sequence in the board-specific audio driver. -2. Update I2S clock, sample rate, and codec register writes. -3. The Xiaozhi protocol expects 16 kHz mono input — resample in firmware if your codec runs at a different rate. - -**Servos** - -The StackChan kit uses feedback servos on a dedicated UART bus (yaw: continuous-rotation, model not specified by M5Stack; pitch: SCS0009 with a recommended 5°–85° travel window). If your board uses a different servo controller or different pins: - -1. Update pin definitions in the servo driver. -2. Update the physical angle limits (min/max) for your mechanism. -3. The spring-physics motion system (`motion.h`) is board-agnostic above the servo layer and does not need changing. - -**RGB LEDs** - -The kit has 12 NeoPixel-compatible LEDs. If your board has a different count or layout: - -1. Update `LED_COUNT` and the layout mapping in the LED driver. -2. LED color patterns are defined in `bridge.py` server-side — changing them is a config change, not a firmware change. - -**MCP tool registration** - -Each hardware peripheral exposed to the LLM is registered via `McpServer::AddTool`. If your board lacks a peripheral (e.g. no NFC), the tool still registers but returns an error when called. Guard missing hardware with a build-time config check: - -```cpp -#if CONFIG_YOUR_BOARD_HAS_NFC - McpServer::AddTool("self.nfc.read_tag", /* ... */); -#endif -``` - -### Patch workflow - -This repo carries changes to the upstream `78/xiaozhi-esp32` as a patch: - -``` -firmware/firmware/patches/xiaozhi-esp32.patch -``` - -After editing the local `xiaozhi-esp32/` working tree, regenerate: - -```bash -git -C firmware/firmware/xiaozhi-esp32 diff HEAD > firmware/firmware/patches/xiaozhi-esp32.patch -``` - -Verify the patch applies cleanly to a fresh v2.2.4 checkout before committing. - -Changes to `m5stack/StackChan`-specific code go directly into the submodule (tracked on the `dotty` branch of `BrettKinny/StackChan`). - ---- - -## Testing your port - -Once the device connects, run through: - -1. **WebSocket handshake** — `tools/list` in the server logs should list all advertised MCP tools. -2. **Voice round-trip** — speak a simple phrase and confirm ASR → LLM → TTS returns audio to the device. -3. **MCP tool call** — exercise an MCP tool by speaking an instruction - ("Turn your head to the right") and confirming the firmware acts on it. - The bridge no longer offers a text-injection endpoint for this: the old - `POST /api/message` route was retired in the #36 cutover (the `PiVoiceLLM` - voice path doesn't use it), and `bridge.py` is now dashboard-only at - `:8081` (`/ui`, `/admin/*`, `/health`, `/metrics`). -4. **LED feedback** — confirm the three-state pattern (listening / thinking / speaking) works on your LED hardware. - ---- - -## See also - -- [hardware-support.md](../hardware-support.md) — verified / build-only / out-of-scope tier matrix. -- [hardware.md](../hardware.md) — CoreS3 specs and MCP tool catalog. -- [protocols.md](../protocols.md) — Xiaozhi WebSocket protocol reference. -- [`78/xiaozhi-esp32 boards/`](https://github.com/78/xiaozhi-esp32/tree/main/boards) — upstream board definitions. -- [`m5stack/StackChan`](https://github.com/m5stack/StackChan) — the firmware we vendor and build from. - -Last verified: 2026-05-17. diff --git a/docs/faq.md b/docs/faq.md index f371aa6..479c0d2 100644 --- a/docs/faq.md +++ b/docs/faq.md @@ -11,7 +11,7 @@ description: Frequently asked questions about hardware, setup, and configuration The verified setup is an **M5Stack CoreS3** mounted in the **M5Stack StackChan servo kit** (2x feedback servos — yaw continuous-rotation + SCS0009 pitch — for pan/tilt, 12 RGB LEDs, 3D-printed chassis). You also need a Docker-capable host on your LAN (a spare PC or any Linux box with Docker) to run the voice and brain containers. -See [hardware-support.md](./hardware-support.md) for the full spec table and support tiers. +See [hardware.md](./hardware.md) for the full hardware spec. --- @@ -95,14 +95,14 @@ It depends on what you mean by "StackChan variant": - **Original Stack-chan by Shinya Ishikawa** (`meganetaaan/stack-chan`): That's a different firmware (TypeScript on Moddable SDK). The server-side infrastructure doesn't care what firmware the device runs as long as it speaks the Xiaozhi WebSocket protocol. But the original Stack-chan firmware doesn't speak Xiaozhi — so no, not without porting the firmware. - **Other ESP32-S3 boards running `78/xiaozhi-esp32`**: The server side will work (same WebSocket protocol), but you won't get StackChan-specific features (servos, avatar, LEDs) without board-specific firmware work. -See [hardware-support.md](./hardware-support.md) for the full support matrix. +See [hardware.md](./hardware.md) for hardware details. --- ## See also - [about.md](./about.md) — what the project is and who it's for. -- [hardware-support.md](./hardware-support.md) — hardware requirements and support tiers. +- [hardware.md](./hardware.md) — hardware requirements. - [troubleshooting.md](./troubleshooting.md) — symptom-based debugging guide. - [SETUP.md](SETUP.md) — deployment guide. diff --git a/docs/hardware-support.md b/docs/hardware-support.md deleted file mode 100644 index 2c21619..0000000 --- a/docs/hardware-support.md +++ /dev/null @@ -1,79 +0,0 @@ ---- -title: Hardware Support -description: Verified and untested hardware configurations for the voice stack. ---- - -# Hardware support - -## TL;DR - -- **One verified configuration**: M5Stack CoreS3 + StackChan servo kit. -- Other ESP32-S3 boards supported by the vendored xiaozhi-esp32 firmware will likely build and boot, but robot-body features (servos, avatar, LEDs) need board-specific adaptation. -- Non-S3 ESP32 boards and the older M5Stack Core2 are out of scope. - ---- - -## Support tiers - -### Verified - -The only hardware this stack has been tested end-to-end on. - -| Component | Detail | -|---|---| -| **Main board** | M5Stack CoreS3 | -| **SoC** | ESP32-S3, dual-core Xtensa LX7 @ 240 MHz | -| **Memory** | 8 MB PSRAM (Quad), 16 MB flash | -| **Display** | 2.0" IPS 320x240, capacitive touch (ILI9342C) | -| **Camera** | GC0308, 0.3 MP | -| **Microphone** | MSM261S4030H0R (dual-mic, via ES7210 codec) | -| **Speaker** | AW88298 amplifier, 16-bit I2S, 1 W | -| **Wi-Fi** | 2.4 GHz only (no 5 GHz) | -| **Body kit** | M5Stack StackChan servo kit | -| **Servos** | 2x feedback servos — yaw (X axis): 360° continuous rotation, model not specified by M5Stack; pitch (Y axis): **SCS0009**, 90° travel, M5Stack-recommended operating range 5°–85° | -| **Additional** | 12x WS2812C RGB LEDs, 3-zone touch panel (Si12T), NFC (ST25R3916), IR tx/rx (IRM56384), 550 mAh supplementary battery, PY32L020 IO expander, INA226 battery monitor, in-box handheld ESP-NOW remote controller (see [hardware.md](./hardware.md#what-the-stackchan-kit-adds-on-top)) | -| **Assembled dimensions** | 54.0 × 70.5 × 61.5 mm, 187.2 g | -| **Firmware** | Built from [`m5stack/StackChan`](https://github.com/m5stack/StackChan) (Arduino C++) | - -This is the configuration described throughout the rest of the docs. The servo kit provides the head-pan and head-tilt movement that makes StackChan look like a robot rather than a screen on a desk. - -**Servo note.** The StackChan pitch servo is documented as an SCS0009; the yaw servo's model isn't specified by M5Stack but is a feedback servo with continuous rotation. There is currently no firmware-side velocity or acceleration cap, which means head movements can be abrupt. This is a known limitation documented in [hardware.md](./hardware.md#safety-relevant-hardware-facts). - -For the full BOM and 3D-printed chassis STLs, see the upstream repo: [m5stack/StackChan](https://github.com/m5stack/StackChan). - -### Build-only (untested) - -The vendored [`78/xiaozhi-esp32`](https://github.com/78/xiaozhi-esp32) firmware (the upstream protocol reference, not the firmware we flash) supports 70+ ESP32-S3 target boards. Any ESP32-S3 board in that list should: - -- **Build** successfully from source. -- **Boot** and connect to xiaozhi-esp32-server over WebSocket. -- **Run ASR/TTS** through the voice pipeline (audio in, audio out). - -What will likely **not** work without board-specific adaptation: - -- Servo control (the StackChan firmware's servo code targets the kit's specific servo bus and feedback protocol). -- Avatar display (the M5Stack Avatar library assumes a 320x240 ILI9342C display and the CoreS3's touch controller). -- LED patterns (hardcoded to the kit's 12-LED layout). -- MCP tools that touch kit-specific peripherals (head yaw/pitch, LED color, NFC, IR). - -If you want to run this stack on a different ESP32-S3 board, you are signing up for firmware-level porting work. The server-side infrastructure (xiaozhi-esp32-server, bridge, dotty-pi, dotty-behaviour) doesn't care what board is on the other end of the WebSocket. - -### Out of scope - -These are explicitly not supported and are unlikely to work without significant effort: - -| Hardware | Why | -|---|---| -| **M5Stack Core2** | Older StackChan hardware. Different SoC (ESP32, not ESP32-S3), different display controller, different audio codec. The `m5stack/StackChan` firmware targets CoreS3 only. You would need to port the firmware or use the original `meganetaaan/stack-chan` Moddable JS firmware, which is a completely different codebase. | -| **ESP32 (non-S3)** | Insufficient PSRAM for the voice pipeline. The S3's 8 MB PSRAM is load-bearing for audio buffering. | -| **Non-ESP32 boards** | The firmware is Arduino C++ targeting the ESP-IDF toolchain. ARM, RISC-V, x86, etc. boards are a different universe. | - ---- - -## See also - -- [hardware.md](./hardware.md) — full CoreS3 specs, firmware lineage, on-device MCP tool catalog. -- [references.md](./references.md#hardware) — upstream hardware links. -- [m5stack/StackChan](https://github.com/m5stack/StackChan) — hardware BOM, chassis STLs, firmware source. - -Last verified: 2026-05-18. diff --git a/docs/modes.md b/docs/modes.md index 9545a0e..e570347 100644 --- a/docs/modes.md +++ b/docs/modes.md @@ -78,6 +78,23 @@ The two toggles are orthogonal — they compose freely. `kid_mode = on` AND `sma ## LED contract (12-pixel ring) +!!! warning "Two firmwares, two right-ring layouts" + The contract below describes the **active-fork Phase 4 StateManager** + (`BrettKinny/StackChan @ dotty`). The firmware **submodule pinned in this + repo** (`35f701a`) does **not** include StateManager — it ships the + **privacy-LED** layout instead, which claims two of the same right-ring + pixels for a different purpose: + + - **pixel 6** = microphone indicator (green when the mic is open; pulsing + when audio is streaming to the server) + - **pixel 11** = camera indicator (red when the camera is capturing) + + These are bound to the codec/camera hardware via RAII guards (see the + firmware's `main/stackchan/privacy/PRIVACY_LEDS.md`). So if you flash from + the submodule, **pixels 6 and 11 mean mic/camera — not face-state and + listening.** The face-state pip, toggle pips, and listening pip described + below arrive once StateManager lands in the submodule pin. + ``` LEFT RING (global 0–5) RIGHT RING (global 6–11) ┌───────────────────┐ ┌────────────────────────────┐ diff --git a/docs/observability.md b/docs/observability.md index 0158c5e..829242a 100644 --- a/docs/observability.md +++ b/docs/observability.md @@ -5,15 +5,17 @@ description: Prometheus metrics and a starter Grafana dashboard for the bridge d # Observability -The `bridge.py` dashboard service exposes a Prometheus exposition endpoint at `/metrics` -covering first-audio latency, request rate / errors per endpoint, -perception events, calendar health, and Kid Mode state. -A starter Grafana dashboard lives at +The `bridge.py` dashboard service exposes a Prometheus exposition endpoint at `/metrics`, +and a starter Grafana dashboard lives at [`monitoring/grafana-dashboard.json`](https://github.com/BrettKinny/dotty-stackchan/blob/main/monitoring/grafana-dashboard.json). -These metrics are the **measurement prerequisite** for the -[first-audio latency reduction](ROADMAP.md) follow-up work. Numbers -come first; you can't tune what you can't see. +!!! note "Status: minimal" + This is a hobby project and observability isn't a focus. Today only two + metrics are actually recorded: **Kid Mode state** and **content-filter + hits**. The other metrics below are *defined* in `bridge/metrics.py` but + not yet wired into the request path, so most Grafana panels read 0. The + endpoint and dashboard are scaffolding to build on if you want them, not a + maintained monitoring setup. !!! warning "LAN-only — never expose `/metrics` to the internet" The bridge listener should live on your home LAN (or behind a @@ -71,32 +73,27 @@ failure rate, and a Kid Mode single-stat toggle. ## What each metric means +**Recorded today:** + | Metric | Type | What it tells you | | --- | --- | --- | -| `dotty_first_audio_latency_seconds` | Histogram | Bridge-side seconds from request received to first content chunk emitted. Tightly correlated with perceived robot responsiveness. | -| `dotty_request_duration_seconds{endpoint}` | Histogram | End-to-end duration per endpoint (`message`, `message_stream`, `vision_explain`, `calendar_today`, `perception_event`). | -| `dotty_request_errors_total{endpoint,kind}` | Counter | Errors partitioned by endpoint and `kind` (`timeout`, `binary_missing`, `exception`). | -| `dotty_llm_tokens_total{kind,model}` | Counter | LLM token volume; reserved for future per-call accounting. | -| `dotty_active_acp_sessions` | Gauge | Legacy metric from the retired ZeroClaw path — retained in the schema but always 0. | -| `dotty_calendar_fetch_failures_total{kind}` | Counter | Google Calendar fetch errors partitioned by `kind` (`timeout`, `parse`, `other`, `orchestrator`). The cache backs off automatically; sustained failures mean look at the bridge log. A spike of `timeout` reads as a network/quota issue; `parse` usually means the upstream `gws` CLI changed shape. | -| `dotty_smart_mode_invocations_total` | Counter | Smart-Mode requests (the `metadata.smart_mode` flag opted into the larger LLM). | | `dotty_kid_mode_active` | Gauge | `1` if Kid Mode guardrails are active, `0` otherwise. Flipped live by the portal admin endpoint. | -| `dotty_perception_events_total{type}` | Counter | Ambient-perception events ingested, partitioned by `face_detected` / `face_lost` / `sound_event`. | - -## Suggested alerts - -Start small — these are the four signals worth paging on for a -home-deployed robot: - -- **First-audio latency P95 > 3 s for 10 minutes.** - `histogram_quantile(0.95, sum by (le) (rate(dotty_first_audio_latency_seconds_bucket[5m]))) > 3` -- **Sustained error rate.** - `sum by (endpoint, kind) (rate(dotty_request_errors_total[5m])) > 0.05` -- **Calendar fetch flatlined failing.** - `sum(rate(dotty_calendar_fetch_failures_total[15m])) > 0.005` for 30 m. -- **Bridge target down.** - `up{job="dotty-bridge"} == 0` for 5 m. Catches the case where - Docker hasn't restarted the bridge container. +| `dotty_content_filter_hits_total` | Counter | Times the content filter blocked or rewrote model output. | + +**Defined in `bridge/metrics.py` but not yet wired into the request path** — +they exist so the endpoint schema is stable, but currently read 0: +`dotty_first_audio_latency_seconds`, `dotty_request_duration_seconds`, +`dotty_request_errors_total`, `dotty_llm_tokens_total`, +`dotty_calendar_fetch_failures_total`, `dotty_smart_mode_invocations_total`, +`dotty_perception_events_total`, and `dotty_active_acp_sessions` (a legacy +ZeroClaw metric, always 0). + +## Suggested alert + +Until the latency/error metrics are wired, only one signal is meaningful: + +- **Bridge target down.** `up{job="dotty-bridge"} == 0` for 5 m — catches the + case where Docker hasn't restarted the bridge container. ## Adding new metrics diff --git a/docs/ota-verification.md b/docs/ota-verification.md index a83384c..59f4f01 100644 --- a/docs/ota-verification.md +++ b/docs/ota-verification.md @@ -186,15 +186,12 @@ sequenceDiagram The firmware retries the OTA check up to 10 times with exponential backoff (starting at 10 s, doubling each retry) before giving up. -## How firmware updates are triggered - -1. **On boot** — the firmware always calls `CheckVersion()` during startup. If the response contains a `firmware` section with a newer version (or `force: 1`), the device downloads and flashes the binary, then reboots. - -2. **Via MCP tool** — the device exposes a user-only MCP tool that triggers a reboot. After reboot, the normal OTA check runs again. There is no "check for update now" command that skips the reboot. - -3. **Server-side** — place a firmware binary at a URL accessible to the device. Configure the xiaozhi-server to include the `firmware` section in its OTA response with the new version and URL. The next device boot (or reboot) will pick it up. - -The firmware update uses ESP-IDF's `esp_https_ota` API with A/B partitioning (`ota_0` / `ota_1`). If the new firmware fails to boot, the bootloader rolls back to the previous partition automatically (`CONFIG_BOOTLOADER_APP_ROLLBACK_ENABLE=y`). +> **Firmware updates over OTA are not set up in this project.** This deployment +> doesn't host firmware binaries, so the OTA response carries no `firmware` +> section — the endpoint is used purely for WebSocket discovery + clock sync. +> Flashing is done over USB-C (see the "Firmware iteration" section in +> `CLAUDE.md`). The schema above documents the protocol the firmware *can* +> parse, not a flow that's wired up here. ## Manual testing diff --git a/docs/quickstart.md b/docs/quickstart.md index 0cbce38..790b47a 100644 --- a/docs/quickstart.md +++ b/docs/quickstart.md @@ -13,7 +13,7 @@ alternative configurations. | Item | Notes | |------|-------| -| **M5Stack CoreS3 + StackChan servo kit** | The robot. See [hardware-support.md](hardware-support.md) for details. | +| **M5Stack CoreS3 + StackChan servo kit** | The robot. See [hardware.md](hardware.md) for details. | | **Linux or macOS host with Docker** | Runs all four server-side containers. Any distro works. **No GPU required** for the default stack — see [Server hardware](#server-hardware) below. | | **2.4 GHz WiFi** | The ESP32-S3 does not support 5 GHz. | diff --git a/docs/reproducible-builds.md b/docs/reproducible-builds.md index 821ade9..96cd3fe 100644 --- a/docs/reproducible-builds.md +++ b/docs/reproducible-builds.md @@ -77,8 +77,8 @@ The `firmware-release.yml` workflow fires on `fw-v*` tags and: the flat binaries, then generates `SHA256SUMS.txt` over all six 5. Attaches binaries + checksums to the GitHub Release -GPG signing of release artifacts is scaffolded (see `docs/signed-releases.md`) -and enabled once `GPG_PRIVATE_KEY` / `GPG_PASSPHRASE` repo secrets are set. +Release artifacts are not GPG-signed; verification relies on the published +`SHA256SUMS.txt` checksums above. ## Known non-determinism risks diff --git a/docs/sbom.md b/docs/sbom.md index 38ba41d..c7d62f5 100644 --- a/docs/sbom.md +++ b/docs/sbom.md @@ -140,8 +140,6 @@ File an issue and check `LICENSE` + `COMPATIBILITY.md` before merging. the SBOM snapshots against. - [`SECURITY.md`](SECURITY.md) — threat model + how to report security issues that an SBOM scan might surface. -- [`docs/signed-releases.md`](signed-releases.md) — companion scaffold for - GPG-signing release artifacts. ## Follow-ups diff --git a/docs/signed-releases.md b/docs/signed-releases.md deleted file mode 100644 index 3a3abfa..0000000 --- a/docs/signed-releases.md +++ /dev/null @@ -1,153 +0,0 @@ ---- -title: Signed Releases -description: GPG-signing release artifacts for supply-chain trust — how to sign locally, how to verify, how to wire it into the firmware-release GitHub Actions workflow, and where the maintainer key fingerprint is published. ---- - -# Signed releases - -## Why GPG-sign release artifacts - -When the StackChan firmware (or a server-side `.tar.gz`) is downloaded from -GitHub Releases, the only thing standing between the user and a compromised -binary is whatever trust they place in the source. GPG signatures collapse -that trust into a single, verifiable cryptographic check: - -- **Tamper detection.** A signature mismatch immediately surfaces a binary - that has been altered after the maintainer signed it. -- **Provenance.** A valid signature against the published maintainer - fingerprint proves the artifact came from someone who controls that key. -- **Recoverable trust.** If the GitHub release infrastructure ever served the - wrong file (account compromise, mirror hijack), users with the maintainer - fingerprint locally cached can still detect it. - -This is **LEVEL-2 polish** — the goal is to make signing *possible* and -*documented* today, even if every release does not get signed. As soon as the -maintainer key exists, signatures become opt-in for users who care. - -## Maintainer key fingerprint - -The maintainer's GPG public-key fingerprint is published in [`KEYS.txt`](KEYS.txt) -at the repo root. This file is the single source of truth — the README links -to it rather than embedding the fingerprint, so the fingerprint never goes -stale across docs. - -> **Placeholder:** the current `KEYS.txt` ships with ``. -> The maintainer fills in the real fingerprint on first use. Until then, no -> Dotty release artifact has a verifiable signature. - -## Signing a release locally - -The minimal path. Requires a GPG keypair on the signing host. - -```bash -# Detached, ASCII-armoured signature alongside the binary. -gpg --detach-sign --armor stack-chan.bin -# → produces stack-chan.bin.asc -``` - -Sign the SHA256SUMS.txt as well — that lets a verifier check every artifact -in one go: - -```bash -gpg --detach-sign --armor SHA256SUMS.txt -# → produces SHA256SUMS.txt.asc -``` - -Both `.bin.asc` and `SHA256SUMS.txt.asc` get attached to the GitHub Release -alongside the binaries. - -## Verifying a release (user side) - -```bash -# 1. Fetch the maintainer's public key. -gpg --keyserver keys.openpgp.org --recv-keys - -# 2. Verify the signature against the artifact. -gpg --verify stack-chan.bin.asc stack-chan.bin - -# Expected output: -# gpg: Good signature from " " -# Primary key fingerprint: -``` - -If `gpg --verify` reports `BAD signature`, **do not flash the firmware** — -treat the artifact as compromised and report it to the maintainer. - -If `gpg --verify` reports `Can't check signature: No public key`, the local -keyring does not have the maintainer key yet — re-run the `--recv-keys` step. - -## GitHub Actions integration - -The signing step belongs inside `.github/workflows/firmware-release.yml`, -between "Generate SHA256 checksums" and "Create GitHub Release." - -```yaml -- name: Import GPG signing key - if: ${{ secrets.GPG_PRIVATE_KEY != '' }} - run: | - echo "${{ secrets.GPG_PRIVATE_KEY }}" | gpg --batch --import - echo "default-key ${GPG_KEY_ID}" >> ~/.gnupg/gpg.conf - env: - GPG_KEY_ID: ${{ secrets.GPG_KEY_ID }} - -- name: Sign artifacts - if: ${{ secrets.GPG_PRIVATE_KEY != '' }} - working-directory: firmware/firmware/build - run: | - for f in bootloader.bin partition-table.bin ota_data_initial.bin stack-chan.bin generated_assets.bin human_face_detect.espdl SHA256SUMS.txt; do - gpg --batch --yes --pinentry-mode loopback \ - --passphrase "${{ secrets.GPG_PASSPHRASE }}" \ - --detach-sign --armor "$f" - done -``` - -Add the `.asc` files to the `files:` block of the `softprops/action-gh-release` -step so they ship with the release. The `if:` guards mean the workflow keeps -working when secrets are not yet configured — the build proceeds, just -unsigned. - -### Required repository secrets - -| Secret | Source | -|--------------------|------------------------------------------------------------------------| -| `GPG_PRIVATE_KEY` | `gpg --armor --export-secret-keys ` on the signing host | -| `GPG_PASSPHRASE` | The passphrase used when generating the key | -| `GPG_KEY_ID` | The short or long key ID (`gpg --list-secret-keys --keyid-format LONG`) | - -Set them in **Settings → Secrets and variables → Actions** on the GitHub repo. - -## Publishing the public key - -Three places publish the same fingerprint, redundantly so a tampered copy in -one place is contradicted by the others: - -1. **`KEYS.txt`** at the repo root — primary source of truth. -2. **README.md** — one-line "Verifying releases" pointer at `KEYS.txt` - (fingerprint not duplicated; that goes stale). -3. **A public keyserver** — `keys.openpgp.org` is the recommended choice - (verified-uploads only, GDPR-compliant identity stripping). - -```bash -# Maintainer: publish your key once. -gpg --keyserver keys.openpgp.org --send-keys -``` - -## Cross-references - -- [`COMPATIBILITY.md`](COMPATIBILITY.md#release-process) — when a release - is cut, signing becomes part of the cutting process. -- [`docs/sbom.md`](sbom.md) — sister scaffold; signed SBOMs let a verifier - cross-check the signed binary against an audited dependency tree. -- [`SECURITY.md`](SECURITY.md) — threat model the signing scaffold - defends against. - -## Follow-ups - -- Generate the maintainer keypair and replace `` in - `KEYS.txt`. -- Configure repo secrets and uncomment the signing step in - `firmware-release.yml`. -- Sign at least one tagged release end-to-end to validate the user-side - verification flow. -- Consider [`sigstore`](https://www.sigstore.dev/) / `cosign` keyless signing - as a complementary path — no maintainer key to lose, OIDC-rooted trust. diff --git a/docs/speaker-id-investigation.md b/docs/speaker-id-investigation.md deleted file mode 100644 index ac268ab..0000000 --- a/docs/speaker-id-investigation.md +++ /dev/null @@ -1,101 +0,0 @@ ---- -title: Speaker-ID Investigation Log (Phase 1) -description: Timeboxed probe (2026-04-25) into whether xiaozhi-esp32-server exposes a speaker-ID hint the bridge could consume as Signal E in the SpeakerResolver. ---- - -# Speaker-ID — investigation log (Phase 1, 2026-04-25) - -This is the timeboxed half-day probe that came out of the SpeakerResolver -work. The question: **does xiaozhi-esp32-server already expose a -speaker-ID hint our bridge can consume as Signal E in the resolver, -without us having to ship Layer 4 face-recognition firmware?** - -## Answer - -**Available upstream — not wired in our deployment today.** Adding it -is a separate, well-bounded task (~1–2 weeks). Until then the resolver -runs on Signals A/B/C/D (self-ID, calendar, time-of-day, perception) -which is enough to deliver the family-companion feature. - -## What's there upstream - -`docs/latent-capabilities.md` flags **Voiceprint speaker ID** as a -*Voice-pipeline – unused* feature (line 45): - -> Distinguish family members; apply per-user persona/context — Medium -> priority — cross-refs child-safety (different guardrails for kids vs -> adults). - -The underlying support comes from xiaozhi-esp32-server's optional -voiceprint module; SenseVoice's emotion + AED outputs are already -similar latent capabilities the deployment doesn't expose. - -## What our patches expose today - -`custom-providers/xiaozhi-patches/websocket_server.py` (the patched -fork) wires: - -- `self._asr` (ASR module instance, line 60) -- `self._llm` (LLM provider — currently `PiVoiceLLM`) -- `self._memory` (initialised but disabled — `Memory: nomem` in - `.config.yaml`) - -There is **no voiceprint hook**. Searching for `speaker`, `voiceprint`, -`voice_id`, `speaker_id` across `xiaozhi-patches/` returns zero -matches (other than the unrelated "small speaker" persona text in the -config). - -`custom-providers/pi_voice/pi_voice.py` builds a metadata dict for the -pi RPC call. There is no slot for a speaker hint coming from xiaozhi. - -## What it would take to wire it - -To turn "voiceprint exists upstream" into "Signal E in the resolver": - -1. **Server-side enable** — add a `Voiceprint:` block to `.config.yaml` - pointing at a voiceprint provider (the xiaozhi-server fork ships - one; needs config + likely a model download). -2. **Enrollment ritual** — capture a few seconds of speech per - household member, run them through the voiceprint module, persist - the resulting embeddings in the server's voiceprint store. This - needs a portal flow (admin-only) and parental-PIN gating to match - the rest of our identity-data posture. -3. **WS metadata surface** — patch `websocket_server.py` to attach the - recognised speaker id (and confidence) to the LLM call metadata so - the bridge can read it. This is one extra field on the dict already - passed to the `LLMProvider`. -4. **Provider passthrough** — `custom-providers/pi_voice/pi_voice.py` - forwards xiaozhi metadata into the pi RPC request. - Add a passthrough for `speaker_id` / `speaker_confidence`. -5. **Resolver Signal E** — extend `bridge/speaker.py:_signal_perception` - (or a new `_signal_voiceprint`) to read `payload.metadata` and - produce a vote with the same shape as the existing perception - signal. Weight likely 0.6–0.8 — between `SIG_PERCEPTION` (face-rec) - and `SIG_STICKY` once we trust enrollments. - -## Why we're not doing it now - -- The resolver already works without it. Self-ID (Signal A) is the - canonical correction handle, and calendar + time-of-day cover the - routine 80% of weekday traffic. -- Voiceprint enrollment overlaps with face-rec enrollment ergonomically - — once Layer 4 ships, we'd want a single "enroll a family member" - ritual that captures both modalities together. Doing voiceprint first - means redoing the enrollment UX twice. -- The 1–2 week time would be better spent on Phase 2 (memory - persistence) which compounds the value of the identity work we just - shipped. - -## Outcome - -Resolver ships with Signals A/B/C/D + a clean extension point for -Signal E. The `_signal_perception` method in `bridge/speaker.py:418` -already pattern-matches on `name == "face_recognized"`; adding a -`name == "voiceprint_match"` branch when the time comes is a five-line -change. No bridge-side schema change required. - -## See also - -- `docs/latent-capabilities.md:45` — original capability flag -- `bridge/speaker.py` — the resolver this would plug into -- `tasks.md` — Layer 4 face-rec roadmap (parallel identity track) diff --git a/docs/vision-alignment-review-2026-05-29.md b/docs/vision-alignment-review-2026-05-29.md deleted file mode 100644 index 492cd69..0000000 --- a/docs/vision-alignment-review-2026-05-29.md +++ /dev/null @@ -1,288 +0,0 @@ ---- -title: Vision & Alignment Review (2026-05-29) -description: Whole-repo vision and alignment snapshot reconciling README, CLAUDE.md, docs, ROADMAP, CHANGELOG, all four service trees, firmware, scripts, and tests after the post-#36 / #115 drift. ---- - -# Dotty — Whole-Repo Vision & Alignment Review - -*Snapshot: 2026-05-29 · Branch `main` · Reconciles `README.md`, `CLAUDE.md`, `docs/*`, `ROADMAP.md`, `CHANGELOG.md`, all four service trees, firmware, scripts, and tests.* - ---- - -## 0. Decisions Locked (2026-05-29, Brett) - -The six vision ambiguities from §3 were resolved by the maintainer. These now **override** the open-question framing below — §3 is kept for the rationale, but the answers are authoritative: - -- **A. story_time + security backing paths → both still PENDING (Phase 7/8).** `SecurityCycle` exists in `dotty-behaviour` but is treated as *scaffolding, not a live path*. Docs must mark both story_time and security as unimplemented; the dashboard's security ring reads scaffolding (L10) and should say so. -- **B. Canonical consumer count → 11 classes, runtime config-gated.** State "11 consumer classes" with the running set noted as env-gated; enumerate by class name everywhere a count appears. -- **C. Kid-safety gap → content filter only.** The `build_turn_suffix(kid_mode)` sandwich *ships* on the live PiVoiceLLM path; the gap (#22) is specifically the absent blocked-words **content filter** (exists in no live code). Restate all kid-safety docs accordingly. -- **D. Cloud egress → vision/VLM path, swappable.** The only LAN egress today is the vision/VLM call via `dotty-behaviour` (default `openrouter.ai`, repointable to a local VLM). The text LLM is fully local; smart-mode LLM model-swap is v2/unwired. State the self-host invariant this way. -- **E. Tier1Slim → REMOVE ENTIRELY.** Delete the provider, its tests, and its docs. The vision's "two alternate fallback providers" framing collapses to: PiVoiceLLM (default) + OpenAICompat (alternate). Affects M4, M5, L5, and `docs/tier1slim.md`. -- **F. Dashboard self-update/restart/reboot actions → REMOVE.** Drop the `update_bridge`/`restart_bridge`/`reboot_all` handlers and their UI buttons (M6, M7); rely on `restart: unless-stopped` + `deploy-bridge-unraid.sh`. - ---- - -## 1. Canonical Vision (Ground Truth) - -### 1.1 One-sentence vision - -Dotty is a fully self-hosted, kid-safe-by-default voice assistant for the M5Stack **StackChan** desktop robot, where every seam (ASR, TTS, LLM, agent, firmware) is swappable and nothing leaves the LAN except an explicitly-routed, replaceable LLM/VLM call. - -### 1.2 Canonical architecture - -**Two hosts:** the **robot** (StackChan, ESP32-S3, LAN WiFi only) and a **single Docker host** (``) running all four server-side containers. - -| Service | Role / owns | Authoritative port | -|---|---|---| -| **xiaozhi-esp32-server** | Voice I/O pipeline: VAD → ASR (FunASR SenseVoiceSmall / WhisperLocal) → LLM provider → TTS (LocalPiper default; EdgeTTS alternate). Emotion dispatch, OTA, the `/xiaozhi/admin/*` live-session control surface, the perception event relay. | **8000** (WS), **8003** (OTA/HTTP) | -| **dotty-pi** | The brain. A `pi` coding agent + `dotty-pi-ext` extension; owns the agent loop and tool dispatch. Reached only via `docker exec -i dotty-pi pi --mode rpc` (JSONL over stdio). | (no host port; stdio exec) | -| **dotty-behaviour** | Perception event bus, ambient consumers, vision/audio **data** endpoints, proactive greeter, calendar context. Successor to the bridge's perception role. **Serves dashboard data; does not host the dashboard UI.** | **8090** | -| **bridge.py** | Admin **dashboard only** (FastAPI, `/ui`). Localhost-only `/admin/*` mutation routes. | **8081** | - -**Voice path (default `PiVoiceLLM`, selected via `selected_module.LLM` in `data/.config.yaml`):** -robot → (Opus/WS) → xiaozhi-server (VAD/ASR → text) → `PiVoiceLLM` → `docker exec` into **dotty-pi** → pi outer loop on `qwen3.5:4b` (`--thinking off`), `think_hard` escalates to `qwen3.6:27b-think` → TTS-bound text streams back → xiaozhi-server strips the leading emoji into an emotion frame, runs TTS → audio + face to robot. **PiVoiceLLM enforces the kid-safety sandwich** via `_wrap_with_sandwich()` → `build_turn_suffix(kid_mode)` (`pi_voice.py:94-132`). - -**Voice tools live in `dotty-pi-ext` (`src/index.ts:21-27`), and there are SEVEN registered:** `memory_lookup`, `remember`, `recall_person`, `remember_person`, `think_hard`, `take_photo`, `play_song`. (`recall_person`/`remember_person` were added in #53; the historical "five voice tools" framing predates them. There is no `set_led` tool — LED control is firmware-owned by design.) - -**Perception path:** firmware emits JSON `event` frames (`face_detected`, `face_lost`, `sound_event`, `state_changed`, `head_pet_started`/`head_pet_ended`, `chat_status`, `dance_started`/`dance_ended`). xiaozhi-server's `EventTextMessageHandler` (`custom-providers/xiaozhi-patches/textMessageHandlerRegistry.py`) POSTs each to **dotty-behaviour** `/api/perception/event`. dotty-behaviour runs **11 consumer classes** (`dotty-behaviour/consumers/`): `face_greeter`, `sound_turner`, `face_lost_aborter`, `wake_word_turner`, `face_identified_refresher`, `purr_player`, `scene_synthesis`, `idle_photographer`, `sleep_dreamer`, `dance_reflector`, `security_cycle` (count is env-gated at runtime; the *class* count is the canonical figure). The #115 series rewired the **dashboard** to pull perception/vision/audio cards from dotty-behaviour. - -**WS lifecycle invariant:** xiaozhi only opens the WS during a conversation. Idle perception producers must call `OpenAudioChannel()` first (done in firmware `Application::SendEvent`) or events silently drop. - -### 1.3 Known-good invariants (treat as spec) - -1. **Emoji-prefix requirement** — every LLM response starts with one of 9 mapped emojis (😊😆😢😮🤔😠😐😍😴). On the live PiVoiceLLM path there is **no code fallback**; the persona prompt + xiaozhi `prompt:` block are load-bearing. (The old `_ensure_emoji_prefix` fallback was ZeroClaw-only and is gone.) -2. **Six-state mutex, firmware-owned** — exactly one of `idle / talk / story_time / security / sleep / dance` (`state_manager.{h,cpp}`). -3. **Two orthogonal toggles** — `kid_mode`, `smart_mode`; dashboard/admin-only, sticky across turns/reboots. -4. **12-pixel LED contract** — left ring 0–5 state arc; right ring 6/8/9/11 indicators, 7/10 reserved; 5 Hz re-assert. `docs/modes.md` is authoritative. -5. **Self-host invariant** — only explicitly-routed cloud LLM/VLM/audio-caption calls (OpenRouter when smart_mode/vision active) and EdgeTTS-if-selected leave the LAN. -6. **`/admin/*` mutation endpoints are `127.0.0.1`-only** on bridge.py (`_admin_require_localhost`, `bridge.py:864-896`). -7. **The brain seam is a custom xiaozhi LLM provider** (`custom-providers/pi_voice/`), so the brain is swappable without touching xiaozhi. -8. **`:8080` is the llama-swap endpoint, never the dashboard** — the dashboard is `:8081` everywhere. - -### 1.4 Retired as of #36 (2026-05-19) - -ZeroClaw (Rust brain) + its FastAPI bridge on the RPi; the ACP protocol; the `ZeroClawLLM` provider (`custom-providers/zeroclaw/` removed); `docs/multi-daemon-split.md`, `docs/advanced/multi-host.md`; the `_ensure_emoji_prefix` bridge fallback; the `zeroclaw-bridge` systemd unit; the RPi host. References to these are **legitimate only** in `CHANGELOG.md` history and `docs/cutover-behaviour.md`. Anywhere else they are stale and should be flagged. - ---- - -## 2. Executive Summary - -**Overall alignment health: GOOD with a long tail of post-#36 doc/code rot.** The production architecture (four containers, PiVoiceLLM voice path, dotty-behaviour perception) is coherent and the code largely matches the vision. The defects are concentrated in two patterns: (1) **retired-ZeroClaw references** that survived the #36 cutover in live code, scripts, tests, and docs; (2) **the `:8080` vs `:8081` dashboard-port slip** propagated corpus-wide. There are **no critical (production-breaking) defects on the live voice or perception path** — the highest-impact issues are in operator tooling (`make doctor`/`status`/`audit`), cross-container state-file wiring, and an authoritative doc (`modes.md`) being stale. - -**Issue counts (post-verification, adjusted severities; duplicates merged):** - -| Severity | Count | -|---|---| -| Critical | 0 | -| High | 3 | -| Medium | 14 | -| Low | 22 | - -All findings below are **confirmed** against the working tree. - ---- - -## 3. Vision Ambiguities to Resolve (Brett decides) - -These are genuine product/architecture decisions the audit cannot make. Each is framed as a concrete choice. - -**A. Story_time & security backing-path liveness.** -`docs/modes.md:176-179` routes `story_time` through `bridge → direct OpenRouter` and `security` through a "bridge ambient task" — but #36 retired bridge.py's voice/perception roles, and `bridge/security_watch.py` is dead code (never started). **Decide:** (a) these moved to dotty-behaviour (then `modes.md` is stale and `security_cycle.py` is the live path), or (b) they are genuinely unimplemented (Phase 7/8 pending, #26). The security panel's empty-cycles behaviour and the `modes.md` source-of-truth table both hinge on this. - -**B. Canonical ambient-consumer count.** -Code registers **11 consumer classes** (`dotty-behaviour/consumers/`); docs variously say "9" (architecture.md, protocols.md, dotty-behaviour/README.md) or "six" (modes.md, ROADMAP.md), and no enumeration matches reality. **Decide:** adopt **11** as canonical (or state "up to N, config-gated" with the full class list), then propagate. The vision §1.2 above now uses 11. - -**C. Does "kid-safety not on the live path" still hold?** -The vision's stub note says PiVoiceLLM has "no equivalent enforcement layer," but `pi_voice.py:94-132` *does* apply the `build_turn_suffix(kid_mode)` sandwich. **Decide:** is the #22 gap the *content filter* (blocked-words regex, genuinely absent) and not the sandwich (present)? If so, restate the gap precisely so the stub status stops contradicting shipped code. - -**D. Does any OpenRouter egress occur on the live PiVoiceLLM path today?** -Invariant §1.3.5 names OpenRouter as cloud egress for smart_mode/vision, but smart-mode model-swap is unimplemented on the default path (v2 scope). **Decide:** confirm whether OpenRouter is reached at all on PiVoiceLLM today (e.g. only via vision-explain?), or only on the retired ZeroClaw/Tier1Slim paths — so the self-host invariant is stated accurately. - -**E. Tier1Slim escalation — revive or retire?** -Tier1Slim still POSTs to the dead `/api/voice/escalate` (`:8080`). **Decide:** mark it permanently chitchat-only (then fix docstring + tests to say so), or re-point escalation at a live endpoint. - -**F. In-app dashboard self-update — keep or drop?** -The `/ui` Update/Restart/Reboot-All actions assume the RPi/systemd model. **Decide:** drop them on the container deploy, or reimplement via `docker`/compose. - ---- - -## 4. Findings by Severity - -### HIGH - -#### H1. `make doctor`/`status`/`audit`/`setup` treat the retired ZeroClaw RPi bridge (`:8080`) as a live service — and probe the wrong host -- **Files:** `Makefile:122-124, 340, 352-362, 406-428, 527-534`; `scripts/dotty_doctor.py:4-5, 94-97, 174-183` -- **What's wrong:** `setup` prompts for `ZEROCLAW_HOST`/`ZEROCLAW_USER` (default `dietpi`); `doctor`/`status` extract a host via `grep -oP 'url: http://\K[0-9.]+'` and curl `http://$ZEROCLAW_HOST:8080/health` labelled "Bridge health"; `audit` SSHes `dietpi@$ZEROCLAW_HOST`. Post-#36 the only `url: http://` block in the rendered config is the **llama-swap** Tier1Slim endpoint — so the regex extracts the llama-swap IP and the "Bridge health" check actually probes llama-swap. No Makefile reference points at `:8081` or `:8090`. -- **Why it matters:** This is live operator tooling. `make doctor`/`status` report a misleading PASS/FAIL against an unrelated service; `make setup` forces users to enter a nonexistent host; `make audit` SSHes a powered-off RPi. The exact `:8080` conflation the vision warns about, now actively mis-probing. -- **Fix (code):** Drop the ZeroClaw prompts/extraction. Health-check `http://:8081/health` (bridge) and add `http://:8090/health` (dotty-behaviour). Apply the identical fix to `scripts/dotty_doctor.py` (the finding's "mirror the Makefile fix" wording is inaccurate — both files share the bug; neither is fixed yet). - -#### H2. Kid/smart-mode state-file contract is broken across the xiaozhi and bridge containers -- **Files:** `receiveAudioHandle.py:27-32`; `bridge/dashboard.py:1434`; `docker-compose.yml.template:30, 61`; `compose.all-in-one.yml`; `bridge/docker-compose.yml:51-52, 62-63` -- **What's wrong:** Inside the xiaozhi container, `receiveAudioHandle.py` *reads* kid/smart-mode from `/root/zeroclaw-bridge/state/{kid-mode,smart-mode}` (env-overridable via `DOTTY_KID_MODE_STATE`/`DOTTY_SMART_MODE_STATE`). The bridge container *writes* under `DOTTY_BRIDGE_DIR` (default `/root/zeroclaw-bridge`), and `bridge/docker-compose.yml` correctly redirects it to `/var/lib/dotty-bridge/state/…`. But the **xiaozhi compose sets neither env var nor a shared volume** — so the reader falls through to a dead RPi path that doesn't exist in the container, and reads its env/default (`kid_mode` → `true`, `smart_mode` → `false`) instead of the dashboard's persisted state. -- **Why it matters:** Producer and consumer point at disjoint, non-shared locations. The firmware kid/smart pips desync from the dashboard on every WS reconnect/restart — violating the "toggles sticky across reboots" invariant in practice. (Mitigating: the kid-mode fall-through default is `true`, i.e. toward safety; `_write_smart_mode_state` is dead on the xiaozhi side, so no orphaned-dir write actually occurs.) -- **Fix (code):** Pick one shared state location (named volume mounted into **both** containers), set `DOTTY_KID_MODE_STATE`/`DOTTY_SMART_MODE_STATE` in the xiaozhi compose and `DOTTY_BRIDGE_DIR` in the bridge compose to point at it, and stop defaulting to `/root/zeroclaw-bridge`. Add a `make doctor` check that the write path and read path resolve to the same mount. - -#### H3. Brain tool-count contract disagrees everywhere — code registers 7, all docs say 5, and one README invents a tool -- **Files:** `dotty-pi-ext/src/index.ts:21-27`; `dotty-pi/README.md:20-22`; `dotty-pi-ext/README.md:8, 14-27, 78-96`; `dotty-pi/docker-compose.yml:8`; `dotty-pi-ext/package.json:4, 20-22`; `CLAUDE.md` (architecture diagram + dotty-pi-ext bullet) -- **What's wrong:** `index.ts` registers **7** tools (`memory_lookup`, `recall_person`, `remember`, `remember_person`, `think_hard`, `play_song`, `take_photo`). Every doc says "five voice tools." Worse, `dotty-pi/README.md:20-22` lists a **non-existent `set_led`** as shipped (which the sibling `dotty-pi-ext/README.md:48-56` explicitly documents as *not* a tool, and which would violate the firmware-owned-LED invariant) and **omits the real `remember`**. `package.json` even has `test:recall`/`test:rememberperson` scripts (lines 20-22) for tools its own description omits. The "five" framing was inherited into this vision doc's §2 too. -- **Why it matters:** A maintainer reading the deploy doc would believe a tool exists that doesn't (and contradicts a stated invariant), and would miss two real tools and the live `remember`. -- **Fix (doc):** Remove `set_led` from `dotty-pi/README.md`, restore `remember`. Update `dotty-pi-ext/README.md` table + "5 of 5" status, `docker-compose.yml:8`, `package.json:4` description, the `(planned)` layout block (`README.md:78-96` still shows `set_led.ts` and stale `lib/` names), and `CLAUDE.md` to the canonical 7 (or "the original five + the #53 person pair"). Refresh the layout tree to the real `src/` files. - ---- - -### MEDIUM - -#### M1. Bridge dashboard documented at `:8080` across multiple active docs -- **Files:** `SETUP.md:23, 244, 245`; `docs/quickstart.md:210, 243`; `docs/observability.md:31, 48`; `COMPATIBILITY.md:62`; `docs/advanced/variant-port-guide.md:185`; `compose.all-in-one.yml:9`; `CLAUDE.md:29`; `CHANGELOG.md:13` -- **What's wrong:** Authoritative dashboard port is **8081** (`bridge.py:1` docstring + `:1040` PORT default, `bridge/Dockerfile:45/47`, `bridge/docker-compose.yml:43`, `deploy-bridge-unraid.sh`). These docs/configs still say `:8080` for the dashboard URL, health check, `/metrics` scrape, and (in `variant-port-guide.md:185`) POST to the *retired* `/api/message` endpoint. `quickstart.md:210` even contradicts its own line 209 (which correctly says 8081). `CLAUDE.md:29` prose says `:8080` while its own Ports table (`:63`) says 8081. `bridge/docker-compose.yml:24-25` documents the exact 8080→8081 correction that these were supposed to follow. -- **Why it matters:** Copy-pasted health/metrics/dashboard commands hit connection-refused; `variant-port-guide.md` doubly points at a dead endpoint. -- **Fix (doc):** Change every **dashboard** `:8080` → `:8081`. Leave `:8080` intact where it is the llama-swap endpoint (`llm-backends.md`, `tier1slim.md` `TIER1SLIM_LOCAL_URL`, `llama-swap-concurrent-models.md`, `CHANGELOG.md:39`). Correct `CHANGELOG.md:13`, `CLAUDE.md:29`, `compose.all-in-one.yml:9`. - -#### M2. `docs/modes.md` (the authoritative behaviour doc) routes voice/perception through retired bridge.py -- **Files:** `docs/modes.md:48, 176-177, 179, 193, 202`; `docs/architecture.md:206-216`; `bridge.py` -- **What's wrong:** `modes.md` "Sources of truth" cites bridge.py methods that no longer exist (`_apply_model_swap`, `_apply_tier1slim_runtime`, `_update_perception_state`, `_capture_room_view`, "all `_perception_*` consumers" — all 0 occurrences in the current 1042-line `bridge.py`), and routes `story_time`/`security`/`dance` through bridge.py. `architecture.md:206-216` labels dotty-behaviour consumers with the same stale `_perception_*` method names and a wrong "3 additional consumers" list. -- **Why it matters:** `modes.md` is the doc the whole system treats as canonical for states/toggles/LED/transitions — and it is the one most stale about where perception lives. -- **Fix (doc):** Re-attribute consumers/admin mutations to dotty-behaviour by **class name** (`FaceGreeter`, `SoundTurner`, …) and `perception/state.py`; resolve the story_time/security backing path per Ambiguity A. - -#### M3. Retired `_ensure_emoji_prefix`/sandwich machinery documented as the live enforcement mechanism (multi-doc) -- **Files:** `docs/emoji-mapping.md:32-46`; `docs/troubleshooting.md:33-34`; `docs/kid-mode.md:299-306`; `bridge.py` -- **What's wrong:** `emoji-mapping.md` presents `bridge.py::_ensure_emoji_prefix()` as the live fallback; `troubleshooting.md` tells users to verify `bridge.py::_build_sandwich_prompt` and "tail the bridge logs" to fix Chinese responses; `kid-mode.md` lists `_wrap_voice`, `_content_filter`, `_BLOCKED_WORDS_RE`, `ACPClient.prompt()`, `on_chunk` as current bridge.py code. All these symbols are **0 occurrences** in the current `bridge.py` — they were the ZeroClaw path. `protocols.md:157` and `voice-pipeline.md:120` already say the fallback is gone, so the corpus is self-inconsistent. -- **Why it matters:** Contradicts the load-bearing no-code-fallback invariant (§1.3.1) and misdirects troubleshooting at a service not in the voice path. `troubleshooting.md` and `emoji-mapping.md` are the worst because they imply a safety net that doesn't run. -- **Fix (doc):** Rewrite all three to point at the persona prompt + `.config.yaml prompt:` + `custom-providers/textUtils.py` (`build_turn_suffix`, `EMOJI_MAP`, `get_emotion`). State explicitly the PiVoiceLLM path has no code emoji fallback. Reconcile the `personas/dotty_voice.md` vs `personas/default.md` naming while editing. - -#### M4. Tier1Slim is live-selectable but its escalation/memory calls and docstring describe the retired ZeroClaw bridge at `:8080` -- **Files:** `custom-providers/tier1_slim/tier1_slim.py:9-18, 41, 356-396` -- **What's wrong:** Module docstring: "POSTs each tool call to bridge.py's `/api/voice/escalate` … which dispatches to ZeroClaw memory"; `BRIDGE_URL` defaults to `http://localhost:8080`; `_dispatch_tool`/`_post_remember`/`_post_memory_log` POST to dead endpoints. Endpoints were retired in #36; dashboard is `:8081`. -- **Why it matters:** Live, user-selectable code presenting a dead path as current, with a doubly-wrong default. Misleads anyone treating Tier1Slim as functional. -- **Fix (code/doc, pending Ambiguity E):** Add a docstring banner that escalation is non-functional post-#36 (chitchat-only rollback); correct `:8080` → `:8081` if kept; remove "dispatches to ZeroClaw memory." - -#### M5. `tests/test_tier1_slim.py` asserts the ZeroClaw escalation handshake as a live contract -- **Files:** `tests/test_tier1_slim.py:1-13, 279-310, 408-459` -- **What's wrong:** Docstring claims "tier1_slim.py is the live voice path"; `DispatchToolTests` asserts `/api/voice/escalate` is hit; `ResponseToolPathTests.test_escalation_chains_to_streaming_final` asserts escalation succeeds. PiVoiceLLM is the live path; Tier1Slim escalation is dead. -- **Why it matters:** A green test gives false confidence that a retired handshake works. -- **Fix (test):** Fix the "live voice path" docstring (it's PiVoiceLLM). `xfail`/mark `DispatchToolTests` + `ResponseToolPathTests` as covering a retired rollback path, or restrict to the chitchat fast-path. - -#### M6. Dashboard self-restart/update/reboot actions invoke the retired `zeroclaw-bridge` systemd unit -- **Files:** `bridge/dashboard.py:1741, 1766, 1893` -- **What's wrong:** `update_bridge`, `restart_bridge`, `reboot_all` all spawn `systemctl restart zeroclaw-bridge`. The container has no systemd and no such unit (`CMD ["python","bridge.py"]` under tini). Because the commands are `sh -c`-wrapped, `Popen` succeeds and the handlers report success while the inner call silently fails. -- **Why it matters:** Restart/Update/Reboot-All buttons no-op while reporting success. Admin-only, behind auth, not on the production path. -- **Fix (code, pending Ambiguity F):** Exit the process and rely on `restart: unless-stopped`, or remove these self-mutation actions, or gate behind a config flag. - -#### M7. Dashboard self-update git-pulls into the retired RPi install dir `/root/zeroclaw-bridge` -- **Files:** `bridge/dashboard.py:1434, 1705-1715` -- **What's wrong:** `BRIDGE_INSTALL_DIR` defaults to `/root/zeroclaw-bridge`; `_pull_and_install_bridge` copies `bridge.py` + `bridge/` over it and expects a service restart. Under the container deploy the real source is the immutable `/app` image layer; copying into `/root/zeroclaw-bridge` does nothing and the image is never rebuilt. Live deploy is `deploy-bridge-unraid.sh` (build + `compose up -d`). -- **Why it matters:** Exposed-but-dead `/ui/actions/update-bridge` reports success while doing nothing. -- **Fix (code):** Remove/disable the in-app self-update on the container deploy; keep a status-only "update available" chip if wanted (drop the install+restart half). - -#### M8. `dotty-pi-ext` README + `turn_logger.ts` describe the #36 cutover as still pending -- **Files:** `dotty-pi-ext/README.md:8-12`; `dotty-pi-ext/src/lib/turn_logger.ts:5-7` -- **What's wrong:** README: "Bridge.py is still the source of truth in production until the … cutover (#36)…"; `turn_logger.ts`: "Pre-cutover this code is dormant … Post-cutover, when xiaozhi flips to PiVoiceLLM, this is the last write path…". #36 executed 2026-05-19; PiVoiceLLM is the live default and this turn-logger is the *active* write path. -- **Why it matters:** Labels the live write path "dormant," misleading a maintainer of the PiVoiceLLM path. -- **Fix (doc):** Update both to post-#36 reality. - -#### M9. `dotty-behaviour` README + app description claim it hosts the admin dashboard -- **Files:** `dotty-behaviour/README.md:3-6, 83`; `dotty-behaviour/main.py:347-351` -- **What's wrong:** README opener and the FastAPI description say dotty-behaviour "hosts the admin dashboard." It mounts only health/perception/vision/audio/voice/calendar/scene_synthesis routers — no dashboard router/templates/static. Its own in-code comments correctly say "the bridge dashboard" consumes its data. README.md:83 also lists a future "Dashboard (ported from bridge/dashboard.py)" slice that contradicts the #115 decision (dashboard stays in bridge.py, pulls data from dotty-behaviour). -- **Why it matters:** Two services claim the same role; bridge.py-is-dashboard-only is a vision invariant. -- **Fix (doc):** Reword to "serves perception/vision/audio data endpoints consumed by the bridge dashboard"; drop/mark-abandoned the Dashboard slice row. - -#### M10. `dotty-pi/README.md` lists `set_led` and omits `remember` -*(Merged into H3.)* - -#### M11. `docs/brain.md` states smart-mode model-swap as live on the default path -- **Files:** `docs/brain.md:13` -- **What's wrong:** Unqualified TL;DR bullet "Smart-mode flips the inner-loop model to a cloud model," under a section naming PiVoiceLLM the default. `modes.md:155, 181` correctly caveat this as v2 scope for PiVoiceLLM (instantaneous only with `DOTTY_VOICE_PROVIDER=tier1slim`). brain.md is internally inconsistent (its own model matrix attributes the swap to Tier1Slim rows). -- **Why it matters:** Propagates a wrong "live" claim for a v2/unimplemented feature. -- **Fix (doc):** Add the modes.md v2-scope caveat to the bullet. - -#### M12. `docs/voice-mode-entry.md` attributes face-greeter to bridge.py and references a removed Discord daemon -- **Files:** `docs/voice-mode-entry.md:24, 26, 42`; `bridge.py` -- **What's wrong:** "the relay forwards that event to the bridge, where `_perception_face_greeter` (in `bridge.py`)…" (0 occurrences in bridge.py); "see the env block in `bridge.py` for the full set of knobs" (knobs `FACE_GREET_TEXT`/`FACE_GREET_MIN_INTERVAL_SEC` now live in `dotty-behaviour/config.py:277-281`); "the Discord daemon and the portal 'Greet' button" (the Discord *daemon* was removed — `bridge/templates/discord.html` is deleted per git status; the surviving `community/discord/provision.py` is an unrelated provisioning bot). -- **Why it matters:** Misdirects anyone configuring the face-greeting path to the wrong service and file. -- **Fix (doc):** Repoint to `dotty-behaviour/consumers/face_greeter.py` + `config.py` knobs; remove the Discord-daemon reference. - -#### M13. `mkdocs.yml` nav omits `modes.md` (the authoritative doc) and 7 others -- **Files:** `mkdocs.yml:39-80`; orphans: `docs/modes.md`, `docs/quickstart.md`, `docs/tier1slim.md`, `docs/wake-word.md`, `docs/proactive-greetings.md`, `docs/cutover-behaviour.md`, `docs/speaker-id-investigation.md`, `docs/cookbook/llama-swap-concurrent-models.md` -- **What's wrong:** Eight docs are present but absent from the nav (and there's no `not_in_nav`/exclude mechanism — MkDocs Material warns per orphan). `modes.md` and `quickstart.md` are cross-linked from in-nav `docs/README.md` yet unreachable via site nav. -- **Why it matters:** The doc the vision designates authoritative isn't in the published site. -- **Fix (doc):** Add `modes.md`, `tier1slim.md`, `wake-word.md`, `quickstart.md`, `proactive-greetings.md`, and the llama-swap cookbook page to nav. `cutover-behaviour.md` and `speaker-id-investigation.md` may stay out as historical/investigation notes (document why). - -#### M14. `bridge.py` `/admin/*` localhost-only gate has no test coverage -- **Files:** `bridge.py:864-896`; `tests/test_dashboard_csrf.py`; `tests/test_bridge_routes.py` -- **What's wrong:** `_admin_require_localhost` (403 for non-loopback) is a router-level dependency on every `/admin/*` route — invariant §1.3.6 — but no test asserts a spoofed non-loopback `client.host` gets 403, nor that loopback passes. -- **Why it matters:** A security invariant guarded only by code, with zero regression test. -- **Fix (test):** Add a test POSTing to an `/admin/*` route with a spoofed non-loopback `request.client.host` (assert 403) plus a loopback case (assert pass). - ---- - -### LOW - -| # | Title | Files | Fix | -|---|---|---|---| -| L1 | `pi_voice` README "What's not yet done" lists the sandwich as unimplemented, but it ships (`_wrap_with_sandwich`) and the provider is the live default | `custom-providers/pi_voice/README.md:21-28, 148-164` | Delete the stale "not yet done"/"open questions" sections; keep only the genuinely-open memory write-back / persona items | -| L2 | `pi_voice` README diagram cites `qwen3.6:27b`/`--thinking minimal`/`--extensions`; actual flags are `qwen3.5:4b`/`--thinking off`, no `--extensions` | `README.md:40-43`; `pi_client.py:49-60` | Fix the diagram to `qwen3.5:4b`/`--thinking off`, drop `--extensions` | -| L3 | `textUtils.py` header lists removed `zeroclaw.py` as an active importer | `custom-providers/textUtils.py:10-21` | Drop the zeroclaw bullet; list `pi_voice`, `tier1_slim`, `openai_compat` | -| L4 | `openai_compat._ensure_emoji_prefix` is dead code (never called; live logic is inline) | `custom-providers/openai_compat/openai_compat.py:51-58` | Remove or wire it in | -| L5 | `kid_mode` is a process-start snapshot in tier1_slim/openai_compat with no doc warning; `set_runtime` doesn't re-derive the suffix | `tier1_slim.py:38-39, 193`; `openai_compat.py:27-28`; `pi_voice.py:111-114` | Mirror the pi_voice comment into both; note `set_runtime` doesn't hot-swap kid_mode | -| L6 | Perception relay docstring names the retired `zeroclaw-bridge` as relay target | `xiaozhi-patches/textMessageHandlerRegistry.py:1-8` | Reword to dotty-behaviour `/api/perception/event` | -| L7 | `ota_handler` comment cites `vision_explain` pointing at the ZeroClaw host | `xiaozhi-patches/ota_handler.py:322-327` | Reword to dotty-behaviour `:8090` | -| L8 | `CLAUDE.md:75` misdescribes the xiaozhi-admin route set (`kid-mode`/`smart-mode`/`tts` routes don't exist) and patch-file inventory | `CLAUDE.md:75`; `http_server.py:680-735` | Match the implemented routes (per `docs/architecture.md:184-194`); list `ota_handler.py` + `textMessageHandlerRegistry.py` | -| L9 | `SimpleHttpServer._get_websocket_url` is dead code (and logic-drifted from the live `OTAHandler` twin) | `http_server.py:33-40` | Remove | -| L10 | `bridge/security_watch.py` consumer loop is dead (ported to dotty-behaviour); dashboard reads its permanently-empty ring | `bridge/security_watch.py:108`; `bridge/dashboard.py:2478`; `dotty-behaviour/consumers/security_cycle.py:16` | Route cycles through a dotty-behaviour getter, or drop the dead read | -| L11 | `bridge.py:319` cites non-existent `bridge/MEMORY-INDEX.md` / `brain-db-fts-only.md` | `bridge.py:319` | Drop the dangling pointer | -| L12 | Consumer count wrong in `dotty-behaviour` README/main.py (says 9, code has 11; README slice-table drops `sound_turner`, `face_identified_refresher`) | `dotty-behaviour/README.md:4, 80`; `main.py:349`; `consumers/__init__.py:31` | Adopt 11; fix the slice table *(see Ambiguity B)* | -| L13 | `architecture.md` consumer table uses stale bridge.py `_perception_*` names + wrong "3 additional" list | `docs/architecture.md:206-216` | Rename to dotty-behaviour classes; list real extras *(merge with M2)* | -| L14 | `household/registry.py` docstring default `~/.zeroclaw/household.yaml` + "running ZeroClaw host"; real default is `STATE_DIR/household.yaml` | `dotty-behaviour/household/registry.py:3, 5, 19`; `config.py:298-300` | Fix docstring to the dotty-behaviour state-dir path | -| L15 | `perception.py` ingress docstring omits `head_pet_ended` (firmware emits it) | `dotty-behaviour/routes/perception.py:53-56`; `firmware/.../head_pet.cpp:78` | Add `head_pet_ended` | -| L16 | Tool count 5-vs-7 in dotty-pi-ext README/compose/package.json | `dotty-pi-ext/README.md:8, 14-27`; `docker-compose.yml:8`; `package.json:4` | *(merge with H3)* | -| L17 | dotty-pi-ext README "(planned) layout" shows `set_led.ts`, omits person-tool/lib files | `dotty-pi-ext/README.md:78-96` | Refresh tree or drop "(planned)" *(merge with H3)* | -| L18 | `docs/kid-mode.md` "Where the Code Lives" table lists retired ZeroClaw symbols | `docs/kid-mode.md:299-306` | Point at `build_turn_suffix`/textUtils (Tier1Slim path); note PiVoiceLLM layers 1+2 *(relates to M3)* | -| L19 | `compose.all-in-one.yml:9` labels dashboard "port 8080" | `compose.all-in-one.yml:9` | `8080` → `8081` *(merge with M1)* | -| L20 | `deploy-bridge-unraid.sh` header references `deploy-bridge.sh`/`install-bridge.sh` which no longer exist | `scripts/deploy-bridge-unraid.sh:9-14` | Update/drop the comment | -| L21 | `compose.local.override.yml` attaches ollama to undefined `dotty` network — offline-stack `up` fails at parse | `compose.local.override.yml:29-30`; `compose.all-in-one.yml` | Add top-level `networks: { dotty: {} }` or drop the block | -| L22 | `style.md` teaches `../protocols.md` for a sibling doc (broken link) | `docs/style.md:72` | Change to `./protocols.md` | -| L23 | `textUtils` emoji-parse machinery (`get_emotion`/`EMOJI_MAP`/`check_emoji`) has no test | `custom-providers/textUtils.py:64-176` | Add `test_textutils.py` pinning `EMOJI_MAP` vs the 9-emoji `ALLOWED_EMOJIS` set + basic parsing | -| L24 | `test_security_watch.py` tests the superseded bridge security loop | `tests/test_security_watch.py:163-204` | Drop `StateGatedConsumerTests`/`RunCaptureCycleTests` (live coverage is `dotty-behaviour/tests/test_consumer_security_cycle.py`); keep `get_recent_cycles` tests | -| L25 | `test_dashboard_csrf.py` probes `/api/perception/event` (removed from bridge.py) with a stale comment | `tests/test_dashboard_csrf.py:161-170`; `bridge.py:504` | Repoint at a live `/api/*`/`/metrics` route; fix comment | -| L26 | `ZEROCLAW_BIN` env stub in CSRF test bootstrap (ACP spawn path removed) | `tests/test_dashboard_csrf.py:34` | Remove the `setdefault` | -| L27 | `.env.example` carries a stale `zeroclaw-bridge` section: `ZEROCLAW_BIN`, `PORT=8080`, ACP `ZEROCLAW_*`, `~/.zeroclaw/` paths | `.env.example:16-37, 98, 122, 169, 193-194` | Remove ZeroClaw/ACP vars; `PORT=8080`→`8081`; repoint `~/.zeroclaw/` to dotty-behaviour/dotty-bridge state dirs | - ---- - -## 5. Cross-Cutting Inconsistencies - -Three patterns recur across components and account for most findings: - -**(a) Retired-ZeroClaw references in live surfaces.** The RPi bridge / `zeroclaw-bridge` systemd unit / `/root/zeroclaw-bridge` path / `/api/voice/escalate` endpoint / ACP env vars survive in: `make doctor/status/audit/setup` (H1), `scripts/dotty_doctor.py` (H1), the cross-container state files (H2), dashboard self-update/restart actions (M6, M7), Tier1Slim (M4) + its tests (M5), `.env.example` (L27), and ~10 docstrings/comments (L3, L6, L7, L14) and docs (M2, M3, M12, L18). The **`dotty-deploy-bridge` skill is itself stale** — it still pushes to `dietpi@:/root/zeroclaw-bridge/` and restarts `zeroclaw-bridge`; the live deploy is `scripts/deploy-bridge-unraid.sh`. **Retire or rewrite that skill.** - -**(b) The `:8080` dashboard-port slip.** `:8080` is correct *only* for llama-swap; it is wrong for the dashboard (canonically `:8081`) in `CHANGELOG.md:13`, `CLAUDE.md:29` prose, `compose.all-in-one.yml:9`, `SETUP.md`, `quickstart.md`, `observability.md`, `COMPATIBILITY.md`, `variant-port-guide.md`, and `.env.example` (M1, L19, L27). A single sweep should fix all dashboard mentions while preserving llama-swap `:8080`. - -**(c) Stale perception ownership + consumer counts.** Post-#36 perception lives in `dotty-behaviour/consumers/` (11 classes), but `modes.md`, `architecture.md`, `protocols.md`, `ROADMAP.md`, and `dotty-behaviour/README.md` variously say "six"/"9", use bridge.py `_perception_*` method names, drop real consumers, or (M9) claim dotty-behaviour hosts the dashboard. Pick the canonical count once (Ambiguity B) and propagate by class name. - ---- - -## 6. Recommended Remediation Order - -**Phase 0 — Brett's decisions (unblocks the rest).** Resolve Ambiguities A–F (§3). A, B, C, E directly gate the wording of M2, L12/L13, M4/M5, and the vision's own stub/invariant text. - -**Phase 1 — Code fixes that affect real behaviour (highest impact first):** -1. **H2** — wire a shared kid/smart state volume into both compose files. *(Restores toggle persistence — a stated invariant.)* -2. **H1** — fix `make doctor/status/audit/setup` + `dotty_doctor.py` to probe `:8081`/`:8090`, drop ZeroClaw host. *(Operators run these constantly.)* -3. **M6 + M7** — fix/remove the dashboard self-restart/update actions (pending Ambiguity F). -4. **L21** — add the `dotty` network so the offline-stack `up` parses. - -**Phase 2 — Code/test hygiene (no behaviour change, prevents regressions):** -5. **M14** — add the localhost-gate test (security invariant guard). -6. **M5, L24, L25, L26** — fix/mark the stale tests so green ≠ false confidence. -7. **M4** — Tier1Slim docstring/default-URL banner (pending Ambiguity E). -8. **L4, L9, L10, L11** — remove dead code / dangling pointers. - -**Phase 3 — Documentation sweep (largest count, lowest risk; batch the cross-cutting patterns):** -9. **Port sweep (M1, L19, L27, the doc half of the cross-cutting `:8080`):** one pass, dashboard `:8080`→`:8081`, leave llama-swap intact. -10. **ZeroClaw-reference sweep (M3, M8, M12, L3, L6, L7, L14, L18 + the `dotty-deploy-bridge` skill):** repoint to PiVoiceLLM/dotty-behaviour/textUtils; retire the skill. -11. **Authoritative-doc fixes (M2, L13):** update `modes.md`/`architecture.md` perception ownership + backing paths (depends on Ambiguity A). -12. **Brain-tool + consumer-count reconciliation (H3, L12, L16, L17, M9, M11):** propagate the canonical 7 tools and 11 consumers (depends on Ambiguity B); fix `dotty-behaviour` role wording. -13. **Smaller doc fixes (L1, L2, L5, L8, L15, L20, L22, M13):** README/diagram/nav/style cleanups. -14. **L23** — add the `textUtils` emoji-map test last (cheap spec guard). - -**Doc-only vs code:** Of the 39 merged findings, **~26 are doc/comment-only** (Phase 3 + L8/L15/L22 etc.), **~9 are code/config** (H1, H2, M4, M6, M7, L4, L9, L21, plus the H3 doc-but-touches-manifests), and **~4 are test** (M5, M14, L24–L26). The doc fixes are low-risk and can be batched by the two cross-cutting sweeps; prioritize the Phase 1 code fixes because they affect live behaviour and operator trust. \ No newline at end of file diff --git a/mkdocs.yml b/mkdocs.yml index ac43f28..6ecd2e4 100644 --- a/mkdocs.yml +++ b/mkdocs.yml @@ -41,9 +41,7 @@ nav: - Getting Started: SETUP.md - Quickstart: quickstart.md - Architecture: architecture.md - - Hardware: - - Hardware: hardware.md - - Hardware Support: hardware-support.md + - Hardware: hardware.md - Voice Pipeline: voice-pipeline.md - Voice Mode Entry: voice-mode-entry.md - Voice Catalog: voice-catalog.md @@ -63,8 +61,6 @@ nav: - Add Emoji: cookbook/add-emoji.md - Disable Kid Mode: cookbook/disable-kid-mode.md - Concurrent Models (llama-swap): cookbook/llama-swap-concurrent-models.md - - Advanced: - - Variant port guide: advanced/variant-port-guide.md - Reference: - Emoji Mapping: emoji-mapping.md - Interaction Map: interaction-map.md @@ -79,7 +75,6 @@ nav: - Security: SECURITY.md - Compatibility: COMPATIBILITY.md - SBOM: sbom.md - - Signed Releases: signed-releases.md - Reproducible Builds: reproducible-builds.md - Versioning: versioning.md - Changelog: CHANGELOG.md From b97cccbe392cee95d10055e9ac6cb4b24f0b7ce9 Mon Sep 17 00:00:00 2001 From: Brett Kinny Date: Sat, 6 Jun 2026 20:59:22 +1000 Subject: [PATCH 2/4] =?UTF-8?q?docs(readme):=20finalize=20=E2=80=94=20fix?= =?UTF-8?q?=20stale=20kid-ASR/whisper=20claim=20+=20reconcile=20LED=20layo?= =?UTF-8?q?ut=20with=20shipped=20privacy=20LEDs?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Co-Authored-By: Claude Opus 4.8 (1M context) --- README.md | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/README.md b/README.md index e591ee8..b8f1f8f 100644 --- a/README.md +++ b/README.md @@ -8,7 +8,7 @@ > ⚠️ **Heads up: this is not a stable project yet.** Dotty is buggy, frequently broken, and actively changing day-to-day. End-to-end behaviour works on the maintainer's hardware but regressions land all the time, the API and config surface shifts without notice, and a fresh deploy on someone else's gear has not been verified. Treat this as a hobby-grade work-in-progress, not a polished product. Bugs, PRs, and "this didn't work for me" issues all very welcome. 🍺☕ If you do try a fresh end-to-end deploy, please get in touch — I'll buy you a beer or a coffee. > -> **Known rough edges:** face emoji rendering is missing visual differentiation for 4 of 9 emotions (sad / surprise / love / laughing); sound-direction localizer has a hardware-AEC-related left-bias on M5Stack CoreS3 (energy detection works, direction is unreliable); kid-voice ASR accuracy on SenseVoice has a kid-speech gap that whisper.cpp will close in a follow-up. +> **Known rough edges:** face emoji rendering is missing visual differentiation for 4 of 9 emotions (sad / surprise / love / laughing); sound-direction localizer has a hardware-AEC-related left-bias on M5Stack CoreS3 (energy detection works, direction is unreliable); kid-voice ASR on the SenseVoice CPU default still garbles some short utterances (WhisperLocal, auto-selected on GPU hosts, handles high-pitched kid speech better). Dotty is a fully self-hosted voice stack for the M5Stack StackChan desktop robot. Open-source firmware on the robot, [xiaozhi-esp32-server](https://github.com/xinnan-tech/xiaozhi-esp32-server) for voice I/O, and a local **pi** coding agent as the brain. ASR, TTS, and session state all run on your own hardware. The LLM is pluggable — the shipped default runs a small fast model for plain conversation and escalates hard questions to a more capable model, with [llama-swap](./docs/cookbook/llama-swap-concurrent-models.md) as the recommended local backend. Swap in [Ollama](./docs/cookbook/run-fully-local.md) for the simpler single-binary option, or point at OpenRouter / any OpenAI-compatible API if you'd rather use the cloud. @@ -53,6 +53,8 @@ The 12-pixel LED ring shows the current state at a glance. **Left ring 0-5 is th On the right ring, **indices 8-9 are toggle pips** for kid_mode (salmon pink) and smart_mode (orange), and **index 11 (bottom) lights red while you have the turn** (LISTENING). The `idle → talk` transition fires on `face_detected` from the firmware; VLM identity recognition runs in parallel and feeds the LLM context. +> Heads up: that right-ring layout is the active-fork StateManager. On the firmware **submodule pinned in this repo**, pixels 6 and 11 instead drive the **privacy LEDs** — 6 = mic (green), 11 = camera (red) — and the StateManager pips arrive once the submodule catches up to the active fork. + Full state taxonomy, colour palette, transition diagram, and per-state backing architecture: [`docs/modes.md`](./docs/modes.md). ## Web dashboard (locally hosted) From c88695334cd7f8cad09643408beb8e1cb6dd8691 Mon Sep 17 00:00:00 2001 From: Brett Kinny Date: Sat, 6 Jun 2026 21:02:13 +1000 Subject: [PATCH 3/4] docs(readme): drop the beer/coffee offer line Co-Authored-By: Claude Opus 4.8 (1M context) --- README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/README.md b/README.md index b8f1f8f..c47a0af 100644 --- a/README.md +++ b/README.md @@ -6,7 +6,7 @@ **Your self-hosted [StackChan](https://github.com/m5stack/StackChan) robot assistant — kid-minded by default, hackable by design, private by architecture.** -> ⚠️ **Heads up: this is not a stable project yet.** Dotty is buggy, frequently broken, and actively changing day-to-day. End-to-end behaviour works on the maintainer's hardware but regressions land all the time, the API and config surface shifts without notice, and a fresh deploy on someone else's gear has not been verified. Treat this as a hobby-grade work-in-progress, not a polished product. Bugs, PRs, and "this didn't work for me" issues all very welcome. 🍺☕ If you do try a fresh end-to-end deploy, please get in touch — I'll buy you a beer or a coffee. +> ⚠️ **Heads up: this is not a stable project yet.** Dotty is buggy, frequently broken, and actively changing day-to-day. End-to-end behaviour works on the maintainer's hardware but regressions land all the time, the API and config surface shifts without notice, and a fresh deploy on someone else's gear has not been verified. Treat this as a hobby-grade work-in-progress, not a polished product. Bugs, PRs, and "this didn't work for me" issues all very welcome. > > **Known rough edges:** face emoji rendering is missing visual differentiation for 4 of 9 emotions (sad / surprise / love / laughing); sound-direction localizer has a hardware-AEC-related left-bias on M5Stack CoreS3 (energy detection works, direction is unreliable); kid-voice ASR on the SenseVoice CPU default still garbles some short utterances (WhisperLocal, auto-selected on GPU hosts, handles high-pitched kid speech better). From d6a15c57746a4b419b3510c1cdee603aa91216b9 Mon Sep 17 00:00:00 2001 From: Brett Kinny Date: Sat, 6 Jun 2026 21:06:21 +1000 Subject: [PATCH 4/4] =?UTF-8?q?docs(readme):=20add=20candid=20'Where=20it'?= =?UTF-8?q?s=20at'=20note=20=E2=80=94=20vibe-coded=20hard,=20now=20consoli?= =?UTF-8?q?dating?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Co-Authored-By: Claude Opus 4.8 (1M context) --- README.md | 2 ++ 1 file changed, 2 insertions(+) diff --git a/README.md b/README.md index c47a0af..75aa1d1 100644 --- a/README.md +++ b/README.md @@ -9,6 +9,8 @@ > ⚠️ **Heads up: this is not a stable project yet.** Dotty is buggy, frequently broken, and actively changing day-to-day. End-to-end behaviour works on the maintainer's hardware but regressions land all the time, the API and config surface shifts without notice, and a fresh deploy on someone else's gear has not been verified. Treat this as a hobby-grade work-in-progress, not a polished product. Bugs, PRs, and "this didn't work for me" issues all very welcome. > > **Known rough edges:** face emoji rendering is missing visual differentiation for 4 of 9 emotions (sad / surprise / love / laughing); sound-direction localizer has a hardware-AEC-related left-bias on M5Stack CoreS3 (energy detection works, direction is unreliable); kid-voice ASR on the SenseVoice CPU default still garbles some short utterances (WhisperLocal, auto-selected on GPU hosts, handles high-pitched kid speech better). +> +> **Where it's at:** the first stretch of this was vibe-coded pretty hard — I moved fast, chased ideas, and let the scope sprawl to prove out the whole end-to-end loop. That phase is over. I'm now deliberately pulling it back: deleting half-built features, trimming the docs, and hardening what's left so the core is solid rather than sprawling. Fewer things, done properly. Dotty is a fully self-hosted voice stack for the M5Stack StackChan desktop robot. Open-source firmware on the robot, [xiaozhi-esp32-server](https://github.com/xinnan-tech/xiaozhi-esp32-server) for voice I/O, and a local **pi** coding agent as the brain. ASR, TTS, and session state all run on your own hardware. The LLM is pluggable — the shipped default runs a small fast model for plain conversation and escalates hard questions to a more capable model, with [llama-swap](./docs/cookbook/llama-swap-concurrent-models.md) as the recommended local backend. Swap in [Ollama](./docs/cookbook/run-fully-local.md) for the simpler single-binary option, or point at OpenRouter / any OpenAI-compatible API if you'd rather use the cloud.