Welcome to Rhasspy 3! This is a developer preview, so many of the manual steps here will be replaced with something more user-friendly in the future.
To get started, just clone the repo. Rhasspy's core does not currently have any dependencies outside the Python standard library.
git clone https://github.com/rhasspy/rhasspy3
cd rhasspy3Installed programs and downloaded models are stored in the config directory, which is empty by default:
rhasspy3/config/configuration.yaml- overridesrhasspy3/configuration.yamlprograms/- installed programs<domain>/<name>/
data/- downloaded models<domain>/<name>/
Programs in Rhasspy are divided into domains.
Rhasspy loads two configuration files:
rhasspy3/configuration.yaml(base)config/configuration.yaml(user)
The file in config will override the base configuration. You can see what the final configuration looks like with:
script/run bin/config_print.pyPrograms that were not designed for Rhasspy can be used with adapters.
For example, add the following to your configuration.yaml (in the config directory):
programs:
mic:
arecord:
command: |
arecord -q -r 16000 -c 1 -f S16_LE -t raw -
adapter: |
mic_adapter_raw.py --rate 16000 --width 2 --channels 1
pipelines:
default:
mic:
name: arecordNow you can run a microphone test:
script/run bin/mic_test_energy.pyWhen speaking, you should see the bar change with volume. If not, check the available devices with arecord -L and update the arecord command in configuration.yaml with -D <device_name> (prefer devices that start with plughw:).
Press CTRL+C to quit.
Pipelines will be discussed later. For now, know that the pipeline named default will be run if you don't specify one. The mic test script can do this:
script/run bin/mic_test_energy.py --pipeline my-pipelineYou can also override the mic program:
script/run bin/mic_test_energy.py --mic-program other-program-from-configLet's install our first program, Silero VAD.
Start by copying from programs/ to config/programs, then run the setup script:
mkdir -p config/programs/vad/
cp -R programs/vad/silero config/programs/vad/
config/programs/vad/silero/script/setupOnce the setup script completes, add the following to your configuration.yaml:
programs:
mic: ...
vad:
silero:
command: |
script/speech_prob "share/silero_vad.onnx"
adapter: |
vad_adapter_raw.py --rate 16000 --width 2 --channels 1 --samples-per-chunk 512
pipelines:
default:
mic: ...
vad:
name: sileroThis calls a command inside config/programs/vad/silero and uses an adapter. Notice that the command's working directory will always be config/programs/<domain>/<name>.
You can test out the voice activity detection (VAD) by recording an audio sample:
script/run bin/mic_record_sample.py sample.wavSay something for a few seconds and then wait for the program to finish. Afterwards, listen to sample.wav and verify that it sounds correct. You may need to adjust microphone settings with alsamixer
Now for the fun part! We'll be installing faster-whisper, an optimized version of Open AI's Whisper model.
mkdir -p config/programs/asr/
cp -R programs/asr/faster-whisper config/programs/asr/
config/programs/asr/faster-whisper/script/setupBefore using faster-whisper, we need to download a model:
config/programs/asr/faster-whisper/script/download.py tiny-int8Notice that the model was downloaded to config/data/asr/faster-whisper:
find config/data/asr/faster-whisper/
config/data/asr/faster-whisper/
config/data/asr/faster-whisper/tiny-int8
config/data/asr/faster-whisper/tiny-int8/vocabulary.txt
config/data/asr/faster-whisper/tiny-int8/model.bin
config/data/asr/faster-whisper/tiny-int8/config.jsonThe tiny-int8 model is the smallest and fastest model, but may not give the best transcriptions. Run download.py without any arguments to see the available models, or follow the instructions to make your own!
Add the following to configuration.yaml:
programs:
mic: ...
vad: ...
asr:
faster-whisper:
command: |
script/wav2text "${data_dir}/tiny-int8" "{wav_file}"
adapter: |
asr_adapter_wav2text.py
pipelines:
default:
mic: ...
vad: ...
asr:
name: faster-whisperYou can now transcribe a voice command:
script/run bin/asr_transcribe.py(say something)
You should see a transcription of what you said as part of an event.
Speech to text systems can take a while to load their models, so a lot of time is wasted if we start from scratch each time.
Some speech to text and text to speech programs have included servers. These usually use Unix domain sockets to communicate with a small client program.
Add the following to your configuration.yaml:
programs:
mic: ...
vad: ...
asr:
faster-whisper: ...
faster-whisper.client:
command: |
client_unix_socket.py var/run/faster-whisper.socket
servers:
asr:
faster-whisper:
command: |
script/server --language "en" "${data_dir}/tiny-int8"
pipelines:
default:
mic: ...
vad: ...
asr:
name: faster-whisper.clientStart the server in a separate terminal:
script/run bin/server_run.py asr faster-whisperWhen it prints "Ready", transcribe yourself speaking again:
script/run bin/asr_transcribe.py(say something)
You should receive your transcription a bit faster than before.
Rhasspy includes a small HTTP server that allows you to access programs and pipelines over a web API. To get started, run the setup script:
script/setup_http_serverRun HTTP server in a separate terminal:
script/http_server --debugNow you can transcribe a WAV file over HTTP:
curl -X POST -H 'Content-Type: audio/wav' --data-binary @etc/what_time_is_it.wav 'localhost:13331/asr/transcribe'You can run one or more program servers along with the HTTP server:
script/http_server --debug --server asr faster-whisperNOTE: You will need to restart the HTTP server when you change configuration.yaml
Next, we'll install Porcupine:
mkdir -p config/programs/wake/
cp -R programs/wake/porcupine1 config/programs/wake/
config/programs/wake/porcupine1/script/setupCheck available wake word models with:
config/programs/wake/porcupine1/script/list_models
alexa_linux.ppn
americano_linux.ppn
blueberry_linux.ppn
bumblebee_linux.ppn
computer_linux.ppn
grapefruit_linux.ppn
grasshopper_linux.ppn
hey google_linux.ppn
hey siri_linux.ppn
jarvis_linux.ppn
ok google_linux.ppn
pico clock_linux.ppn
picovoice_linux.ppn
porcupine_linux.ppn
smart mirror_linux.ppn
snowboy_linux.ppn
terminator_linux.ppn
view glass_linux.ppnNOTE: These will be slightly different on a Raspberry Pi (_raspberry-pi.ppn instead of _linux.ppn).
Add to configuration.yaml:
programs:
mic: ...
vad: ...
asr: ...
wake:
porcupine1:
command: |
.venv/bin/python3 bin/porcupine_stream.py --model "${model}"
template_args:
model: "porcupine_linux.ppn"
servers:
asr: ...
pipelines:
default:
mic: ...
vad: ...
asr: ...
wake:
name: porcupine1Notice that we include template_args in the programs section. This lets us change specific settings in pipelines, which will be demonstrated in a moment.
Test wake word detection:
script/run bin/wake_detect.py --debug(say "porcupine")
Now change the model in configuration.yaml:
programs:
mic: ...
vad: ...
asr: ...
wake: ...
servers:
asr: ...
pipelines:
default:
mic: ...
vad: ...
asr: ...
wake:
name: porcupine1
template_args:
model: "grasshopper_linux.ppn"Test wake word detection again:
script/run bin/wake_detect.py --debug(say "grasshopper")
For non-English models, first download the extra data files:
config/programs/wake/porcupine1/script/download.pyNext, adjust your configuration.yaml. For example, this uses the German keyword "ananas":
programs:
wake:
porcupine1:
command: |
.venv/bin/python3 bin/porcupine_stream.py --model "${model}" --lang_model "${lang_model}"
template_args:
model: "${data_dir}/resources/keyword_files_de/linux/ananas_linux.ppn"
lang_model: "${data_dir}/lib/common/porcupine_params_de.pv"
Inspect the files in config/data/wake/porcupine1 for supported languages and keywords. At this time, English, German (de), French (fr), and Spanish (es) are available with keywords for linux, raspberry-pi, and many other platforms.
Going back to "grasshopper", we can test over HTTP server (restart server):
curl -X POST 'localhost:13331/pipeline/run?stop_after=wake'(say "grasshopper")
Test full voice command:
curl -X POST 'localhost:13331/pipeline/run?stop_after=asr'(say "grasshopper", pause, voice command, wait)
There are two types of intent handlers in Rhasspy, ones that handle transcripts directly (text) and others that handle structured intents (name + entities). For this example, we will be handling text directly from asr.
In configuration.yaml:
programs:
mic: ...
vad: ...
asr: ...
wake: ...
handle:
date_time:
command: |
bin/date_time.py
adapter: |
handle_adapter_text.py
servers:
asr: ...
pipelines:
default:
mic: ...
vad: ...
asr: ...
wake: ...
handle:
name: date_time
Install date time demo script:
mkdir -p config/programs/handle/
cp -R programs/handle/date_time config/programs/handle/This script just looks for the words "date" and "time" in the text, and responds appropriately.
You can test it on some text:
echo 'What time is it?' | script/run bin/handle_text.py --debugNow let's test it with a full voice command:
script/run bin/pipeline_run.py --debug --stop-after handle(say "grasshopper", pause, "what time is it?")
It works too over HTTP (restart server):
curl -X POST 'localhost:13331/pipeline/run?stop_after=handle'(say "grasshopper", pause, "what's the date?")
The final stages of our pipeline will be text to speech (tts) and audio output (snd).
Install Piper:
mkdir -p config/programs/tts/
cp -R programs/tts/piper config/programs/tts/
config/programs/tts/piper/script/setup.pyand download an English voice:
config/programs/tts/piper/script/download.py englishCall download.py without any arguments to see available voices.
Add to configuration.yaml:
programs:
mic: ...
vad: ...
asr: ...
wake: ...
handle: ...
tts:
piper:
command: |
bin/piper --model "${model}" --output_file -
adapter: |
tts_adapter_text2wav.py
template_args:
model: "${data_dir}/en-us-blizzard_lessac-medium.onnx"
snd:
aplay:
command: |
aplay -q -r 22050 -f S16_LE -c 1 -t raw
adapter: |
snd_adapter_raw.py --rate 22050 --width 2 --channels 1
servers:
asr: ...
pipelines:
default:
mic: ...
vad: ...
asr: ...
wake: ...
handle: ...
tts:
name: piper
snd:
name: aplayWe can test the text to speech and audio output programs:
script/run bin/tts_speak.py 'Welcome to the world of speech synthesis.'The bin/tts_synthesize.py can be used if you want to just output a WAV file.
script/run bin/tts_synthesize.py 'Welcome to the world of speech synthesis.' > welcome.wavThis also works over HTTP (restart server):
curl -X POST \
--data 'Welcome to the world of speech synthesis.' \
--output welcome.wav \
'localhost:13331/tts/synthesize'Or to speak over HTTP:
curl -X POST --data 'Welcome to the world of speech synthesis.' 'localhost:13331/tts/speak'Like speech to text, text to speech models can take a while to load. Let's add a server for Piper to configuration.yaml:
programs:
mic: ...
vad: ...
asr: ...
wake: ...
handle: ...
tts:
piper.client:
command: |
client_unix_socket.py var/run/piper.socket
snd: ...
servers:
asr: ...
tts:
piper:
command: |
script/server "${model}"
template_args:
model: "${data_dir}/en-us-blizzard_lessac-medium.onnx"
pipelines:
default:
mic: ...
vad: ...
asr: ...
wake: ...
handle: ...
tts:
name: piper.client
snd: ...Now we can run both servers with the HTTP server:
script/http_server --debug --server asr faster-whisper --server tts piperText to speech requests should be faster now.
As a final example, let's run a complete pipeline from wake word detection to text to speech response:
script/run bin/pipeline_run.py --debug(say "grasshopper", pause, "what time is it?", wait)
Rhasspy should speak the current time.
This also works over HTTP:
curl -X POST 'localhost:13331/pipeline/run'(say "grasshopper", pause, "what is the date?", wait)
Rhasspy should speak the current date.
- Connect Rhasspy to Home Assistant
- Run one or more satellites