Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
31 changes: 0 additions & 31 deletions .github/workflows/firmware-release.yml
Original file line number Diff line number Diff line change
Expand Up @@ -78,29 +78,6 @@ jobs:
echo "--- checksums ---"
cat SHA256SUMS.txt

# ─── GPG signing (TODO: enable when GPG_PRIVATE_KEY secret is set) ───
# See docs/signed-releases.md for the full setup. Required repo secrets:
# GPG_PRIVATE_KEY, GPG_PASSPHRASE, GPG_KEY_ID
# Add the *.asc files to the Release `files:` block below once enabled.
#
# - name: Import GPG signing key
# if: ${{ secrets.GPG_PRIVATE_KEY != '' }}
# run: |
# echo "${{ secrets.GPG_PRIVATE_KEY }}" | gpg --batch --import
# echo "default-key ${GPG_KEY_ID}" >> ~/.gnupg/gpg.conf
# env:
# GPG_KEY_ID: ${{ secrets.GPG_KEY_ID }}
#
# - name: Sign release artifacts
# if: ${{ secrets.GPG_PRIVATE_KEY != '' }}
# working-directory: firmware/firmware/build
# run: |
# for f in bootloader.bin partition-table.bin ota_data_initial.bin stack-chan.bin generated_assets.bin human_face_detect.espdl SHA256SUMS.txt; do
# gpg --batch --yes --pinentry-mode loopback \
# --passphrase "${{ secrets.GPG_PASSPHRASE }}" \
# --detach-sign --armor "$f"
# done

- name: Create GitHub Release
uses: softprops/action-gh-release@v2
with:
Expand All @@ -114,14 +91,6 @@ jobs:
firmware/firmware/build/generated_assets.bin
firmware/firmware/build/human_face_detect.espdl
firmware/firmware/build/SHA256SUMS.txt
# Add once signing is enabled:
# firmware/firmware/build/bootloader.bin.asc
# firmware/firmware/build/partition-table.bin.asc
# firmware/firmware/build/ota_data_initial.bin.asc
# firmware/firmware/build/stack-chan.bin.asc
# firmware/firmware/build/generated_assets.bin.asc
# firmware/firmware/build/human_face_detect.espdl.asc
# firmware/firmware/build/SHA256SUMS.txt.asc

notify-failure:
name: Notify on Failure
Expand Down
13 changes: 0 additions & 13 deletions KEYS.txt

This file was deleted.

11 changes: 8 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,9 +6,11 @@

**Your self-hosted [StackChan](https://github.com/m5stack/StackChan) robot assistant — kid-minded by default, hackable by design, private by architecture.**

> ⚠️ **Heads up: this is not a stable project yet.** Dotty is buggy, frequently broken, and actively changing day-to-day. End-to-end behaviour works on the maintainer's hardware but regressions land all the time, the API and config surface shifts without notice, and a fresh deploy on someone else's gear has not been verified. Treat this as a hobby-grade work-in-progress, not a polished product. Bugs, PRs, and "this didn't work for me" issues all very welcome. 🍺☕ If you do try a fresh end-to-end deploy, please get in touch — I'll buy you a beer or a coffee.
> ⚠️ **Heads up: this is not a stable project yet.** Dotty is buggy, frequently broken, and actively changing day-to-day. End-to-end behaviour works on the maintainer's hardware but regressions land all the time, the API and config surface shifts without notice, and a fresh deploy on someone else's gear has not been verified. Treat this as a hobby-grade work-in-progress, not a polished product. Bugs, PRs, and "this didn't work for me" issues all very welcome.
>
> **Known rough edges:** face emoji rendering is missing visual differentiation for 4 of 9 emotions (sad / surprise / love / laughing); sound-direction localizer has a hardware-AEC-related left-bias on M5Stack CoreS3 (energy detection works, direction is unreliable); kid-voice ASR accuracy on SenseVoice has a kid-speech gap that whisper.cpp will close in a follow-up.
> **Known rough edges:** face emoji rendering is missing visual differentiation for 4 of 9 emotions (sad / surprise / love / laughing); sound-direction localizer has a hardware-AEC-related left-bias on M5Stack CoreS3 (energy detection works, direction is unreliable); kid-voice ASR on the SenseVoice CPU default still garbles some short utterances (WhisperLocal, auto-selected on GPU hosts, handles high-pitched kid speech better).
>
> **Where it's at:** the first stretch of this was vibe-coded pretty hard — I moved fast, chased ideas, and let the scope sprawl to prove out the whole end-to-end loop. That phase is over. I'm now deliberately pulling it back: deleting half-built features, trimming the docs, and hardening what's left so the core is solid rather than sprawling. Fewer things, done properly.

Dotty is a fully self-hosted voice stack for the M5Stack StackChan desktop robot. Open-source firmware on the robot, [xiaozhi-esp32-server](https://github.com/xinnan-tech/xiaozhi-esp32-server) for voice I/O, and a local **pi** coding agent as the brain. ASR, TTS, and session state all run on your own hardware. The LLM is pluggable — the shipped default runs a small fast model for plain conversation and escalates hard questions to a more capable model, with [llama-swap](./docs/cookbook/llama-swap-concurrent-models.md) as the recommended local backend. Swap in [Ollama](./docs/cookbook/run-fully-local.md) for the simpler single-binary option, or point at OpenRouter / any OpenAI-compatible API if you'd rather use the cloud.

Expand All @@ -23,13 +25,14 @@ So Dotty is the version that passes: every component runs on hardware I own, eve
## Features

- **Kid Mode (on by default)** — age-appropriate responses via persona + per-turn prompt steering. An output content filter is planned, not yet shipped ([#138](https://github.com/BrettKinny/dotty-stackchan/issues/138)); Kid Mode is not a substitute for supervision. Toggle off for general-purpose use. See [`docs/kid-mode.md`](./docs/kid-mode.md).
- **Local ASR** — FunASR SenseVoiceSmall runs on your hardware, no cloud transcription.
- **Local ASR** — FunASR SenseVoiceSmall by default, no cloud transcription. WhisperLocal auto-selects on GPU hosts (better kid-speech accuracy); SenseVoiceOnnx is a lighter low-RAM option.
- **Local or cloud TTS** — Piper (offline) or EdgeTTS (cloud). Swap with a config change.
- **Streaming responses** — the bridge streams LLM output to the voice pipeline for lower perceived latency.
- **Emoji expressions** — every response starts with an emoji that the firmware maps to a face animation (smile, laugh, sad, surprise, thinking, angry, love, sleepy, neutral).
- **Voice tools** — the pi agent can search its memory, escalate hard questions to a bigger model, take a photo, and play songs, all mid-conversation.
- **States, toggles & LEDs** — a six-state mutex (`idle / talk / story_time / security / sleep / dance`) plus two orthogonal toggles (`kid_mode`, `smart_mode`), all owned by the firmware StateManager and surfaced on the 12-pixel LED ring. Shipped on the active firmware fork (commit `d78118b`, 2026-04-27); the `firmware/firmware/` submodule pin in this repo lags, so flash from the active fork to get it. See "States, Toggles & LEDs" below and [`docs/modes.md`](./docs/modes.md).
- **Vision (camera)** — the robot's built-in camera can capture images for multimodal LLM queries.
- **Privacy LEDs** — hardware-bound mic (green) and camera (red) indicators on the LED ring. They light from the codec/camera enable signals via RAII guards, so a misbehaving server or model can't capture with the lights off.
- **Calendar context** — optional calendar integration feeds upcoming events into the conversation context.
- **Hackable** — every seam is swappable: LLM, TTS, ASR, agent framework. Fork it, rip out what you don't want, wire in your own.

Expand All @@ -52,6 +55,8 @@ The 12-pixel LED ring shows the current state at a glance. **Left ring 0-5 is th

On the right ring, **indices 8-9 are toggle pips** for kid_mode (salmon pink) and smart_mode (orange), and **index 11 (bottom) lights red while you have the turn** (LISTENING). The `idle → talk` transition fires on `face_detected` from the firmware; VLM identity recognition runs in parallel and feeds the LLM context.

> Heads up: that right-ring layout is the active-fork StateManager. On the firmware **submodule pinned in this repo**, pixels 6 and 11 instead drive the **privacy LEDs** — 6 = mic (green), 11 = camera (red) — and the StateManager pips arrive once the submodule catches up to the active fork.

Full state taxonomy, colour palette, transition diagram, and per-state backing architecture: [`docs/modes.md`](./docs/modes.md).

## Web dashboard (locally hosted)
Expand Down
Loading
Loading