Skip to content

waitdeadai/walkclaude

Repository files navigation

walkclaude

License: Apache 2.0 Python 3.12+ Claude Code

A voice-driven Claude Code orchestration engine. Put on headphones, walk around, and run multi-project agent workflows by voice — local-first, with optional cloud STT/TTS.

walkclaude is the open-source distribution of a voice-orchestration stack built around three Python modules:

  • voice_gateway/ — local aiortc WebRTC gateway. Browser microphone in, browser/Cartesia TTS out. STT via local Whisper (faster-whisper) by default, Deepgram or OpenAI Realtime as cloud upgrades. Local LLM via Ollama for the zero-API "walk-mode" path.
  • hermes_comms/ — multi-channel communication bus with intent classification, decision packets, approval flows, evidence log, file custody, redaction, and a streaming HTTP server. Brokers between voice intents and downstream tool calls.
  • hermes_parallel/ — multi-project parallel agent runner. Tracks runs across projects, maintains a per-project registry, and exposes a CLI for inspecting in-flight work.

Plus minimal stub modules (crm_map/, hermes_inbox/) that satisfy the import surface so the code boots out-of-the-box without the private CRM/inbox components it was extracted from.

What this gives you

A working bench for the "walk around with headphones and orchestrate Claude Code" workflow:

  1. Open the gateway in a browser tab on your laptop.
  2. Pair your phone (Tailscale or your local network).
  3. Talk. STT transcribes locally via Whisper or via Deepgram/OpenAI Realtime if you supply keys. The local LLM (Ollama) routes intents. Approved actions hit the comms bus. The parallel runner tracks per-project state. The agent gives back a TTS reply.
  4. Walk.

You get the working substrate. You don't get a polished consumer app — that's a different project.

Scope and honest limits

What's in:

  • The full voice-gateway code (~5,000 lines of Python): WebRTC signalling, audio buffering, latency ledger, local + cloud STT/LLM/TTS adapters, browser client HTML, OpenAI Realtime adapter, task-loopback test harness.
  • The full hermes_comms code (~3,000 lines): adapters, approvals, context store, decision packets, evidence log, file custody, intent classifier, outbox, policy, redaction, server.
  • The full hermes_parallel code (~1,300 lines): registry, runner, policy, models, CLI.
  • Public-worthy docs from the parent harness: voice-gateway runbooks, voice-operations runbook, secure-remote-voice-tunnel design, local-zero-api walk-mode design, voice-first multi-project deepresearch.

What's NOT in:

  • The private commercial crm_map and hermes_inbox modules. Stubbed in this repo with no-op implementations that satisfy the imports — replace with your own implementations if you need real CRM approval flows or external-channel inbox bridges (WhatsApp / Telegram / SMS / email).
  • Two demo simulation files (week_simulation.py, real_day_simulation.py) that depended deeply on the private modules and a parent-harness project layout. Removed cleanly.
  • CI / test infrastructure. The original tests required a real audio pipeline + private modules to run end-to-end. Tests are not included; PRs welcome.
  • Polished UX. Browser client is a single-page HTML demo; production UX is your problem.

This is a v0.1.0 foundational drop — the working code, the working architecture, the docs. Expect rough edges. The system has been running internally since 2026-04 against a real workload; the open-source extraction was made on 2026-05-11.

Install

Requires Python 3.12+, ffmpeg (for some audio paths), and at least one of: a local Whisper install for STT, a Deepgram API key, or an OpenAI API key with Realtime access.

git clone https://github.com/waitdeadai/walkclaude
cd walkclaude

python3 -m venv .venv && source .venv/bin/activate
pip install -r requirements-voice.txt
pip install -r requirements-comms.txt

cp env.example .env
# Edit .env to add your keys (or leave them blank for local-only walk-mode).

Run

Voice gateway (local-only zero-API mode)

If you have Ollama and a local Whisper model installed:

python -m voice_gateway.gateway --zero-api --port 8080

Open http://localhost:8080 in a browser, allow microphone, and speak. The local LLM responds; the local TTS speaks back.

Voice gateway (cloud STT/TTS path)

python -m voice_gateway.gateway --port 8080

Requires DEEPGRAM_API_KEY, CARTESIA_API_KEY, and OPENAI_API_KEY in your environment.

Comms server

python -m hermes_comms.server --store .walkclaude/hermes-comms --port 8081

The HTTP API exposes intent classification, approval flows, decision packets, file custody, and evidence retrieval.

Parallel runner CLI

python -m hermes_parallel.cli list
python -m hermes_parallel.cli start <project-slug> --objective "<...>" --lane-count 2
python -m hermes_parallel.cli status <run-id>

State stored under .walkclaude/hermes-parallel/.

Walk-mode end-to-end

See docs/local-zero-api-walk-mode.md for the local-only headless setup walkthrough, and docs/voice-first-multi-project-agent-orchestration-deepresearch.md for the design rationale.

For remote access (talking to your home machine from outside your LAN), see docs/secure-remote-voice-tunnel.md. The current recommendation is Tailscale; LiveKit/SIP cloud-bridge integration is documented separately.

Architecture diagram

┌──────────────┐           ┌──────────────┐          ┌─────────────────┐
│   Browser    │  WebRTC   │              │   HTTP   │                 │
│ (mic + TTS)  ├──────────►│ voice_gateway├─────────►│  hermes_comms   │
│              │◄──────────┤   (aiortc)   │◄─────────┤  (intents +     │
└──────────────┘           │              │          │   approvals)    │
                           │  STT: local  │          └────────┬────────┘
                           │  Whisper or  │                   │
                           │  Deepgram    │                   ▼
                           │              │          ┌─────────────────┐
                           │  LLM: local  │          │ hermes_parallel │
                           │  Ollama or   │          │ (multi-project  │
                           │  OpenAI      │          │  run tracker)   │
                           │              │          └─────────────────┘
                           │  TTS: local  │
                           │  Kokoro or   │
                           │  Cartesia    │
                           └──────────────┘

Sister projects

walkclaude is the voice-orchestration sibling to the LLM Dark Patterns Hooks suite (10 single-purpose Stop hooks for Claude Code) and the minmaxing governance harness. The three projects compose: minmaxing for spec/verify discipline, dark-patterns hooks for closeout-language enforcement, walkclaude for voice-driven multi-project orchestration.

Contributing

PRs welcome on:

  • Real CI / test infrastructure (the existing internal tests need too much setup to run portably).
  • Replacing the crm_map / hermes_inbox stubs with adapter interfaces other CRMs/inbox systems can implement.
  • Browser client polish.
  • New STT / TTS / LLM provider adapters.
  • Docs corrections.

The bar for new features: the change should make walkclaude more useful for somebody walking around with headphones running Claude Code. Anything that doesn't pass that bar belongs in a fork.

License

Apache-2.0.

About

Voice-driven Claude Code orchestration. Walk around with headphones, talk to your multi-project agent workflows. Local-first STT/LLM/TTS with optional cloud upgrades. Apache-2.0.

Topics

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors

Languages