Streamline PiperEngine.synthesize to allow use of medium- and high-quality models#244
Streamline PiperEngine.synthesize to allow use of medium- and high-quality models#244InspectorCaracal wants to merge 1 commit intoKoljaB:masterfrom
PiperEngine.synthesize to allow use of medium- and high-quality models#244Conversation
|
That's a great simplification of the code, thank you for that! I wasn't aware that piper does not always synthesizes with 16000 Hz. This leaves one issue: every engine reports the exact sample rate by implementing get_stream_info to get the output stream initialized properly. Currently this is still hardcoded to 16000 in the PiperEngine, like this: def get_stream_info(self):
"""
Returns PyAudio stream configuration for Piper.
Returns:
tuple: (format, channels, rate)
"""
return pyaudio.paInt16, 1, 16000By removing writing of the wav file we lose the opportunity to read that sample rate directly from the wav. I'd love to hear your opinion. What would you suggest how we should handle this? We can rewrite the wav file which as you correctly stated introduces I/O overhead and thus is ugly. We could also add a parameter to the PiperEngine constructor allowing for customization of the sample rate by the user. Also not perfect, since it requires active interaction. Maybe there's a third option which I can't figure that right now. What do you think? Thanks again for the PR. |
|
Ahh, interesting! I didn't notice that since I only tested that the final audio wasn't being played at the wrong speed. Since piper requires a configuration file in JSON format and has a documented fallback location for when it isn't specified, the best way is probably reading that file and integrating the needed fields into the PiperVoice class, then referencing it from there. Should be straightforward, I'll try it out in a bit. |
The current implementation requires writing a temporary .wav file as well as forces the sample rate to 16000 via validation of that file. However:
subprocess.runcan return the raw audio data which the player needs, so the file write isn't necessary.This PR modifies the
synthesizemethod to change the file-output parameter given to Piper to raw output that can be added directly to the queue, and removes the WAV-file validation since reading a WAV file is no longer necessary. The change is primarily intended to allow using all sizes of piper voices, but should also reduce I/O overhead.