-
Notifications
You must be signed in to change notification settings - Fork 70
[Feature]: optional TTS replies for voice/audio prompts #63
Copy link
Copy link
Closed
Labels
enhancementNew feature or requestNew feature or request
Description
Problem
The bot already supports voice/audio input via STT, which is great for mobile use.
A natural follow-up for that workflow is optional TTS output: when a user sends a Telegram voice/audio message toggle TTS on with /tts, the bot could return the normal text reply plus an audio rendering of that same final assistant response.
This would improve hands-free/mobile usability without changing the normal text-first workflow. (idea from OpenClaw).
Proposal
- Keep normal text prompts exactly as they are today: text reply only
- For Telegram
voice/audioinput:- transcribe with the existing STT flow
- send the normal final text response
- optionally send a TTS audio file of that exact final assistant text
Make it opt-in via env configmake a/ttstoggle, disabled by default
Proposed config
Something like:
- use TTS_ENABLED=false/tts toggle instead
TTS_API_URL=(fallback toSTT_API_URLif unset)TTS_API_KEY=(fallback toSTT_API_KEYif unset)TTS_MODEL=gpt-4o-mini-ttsTTS_VOICE=alloy
Scope / guardrails
To keep this small and low-risk:
- no change for text-origin prompts
- no streaming spoken output
- just one final audio file after the normal text reply
Why this seems aligned
This stays within the current single-chat / predictable interaction model in CONCEPT.md:
- it does not add parallelism or group-specific behavior
- it only extends the existing voice-input path
- it remains optional and disabled by default
Implementation notes
I already prototyped this locally my fork and it was pretty contained:
- small TTS client modeled after the existing STT client
- lightweight tracking so only audio-origin prompts trigger TTS
- hook into the final assistant completion path after the normal text reply
- docs + tests included
Done criteria (optional)
- When sending a voice memo/file,
and .env has it enabledand enabled via/tts, bot responds with text output and then an audio file of that text
- When sending a text input, bot always replies as before with text regardless of configuration.
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
enhancementNew feature or requestNew feature or request