Skip to content

bykemalh/S2ST

Repository files navigation

Speech-to-Speech Translator

This project uses Google Cloud Speech-to-Text API to transcribe speech to text, DeepL API to translate the transcribed text, and ElevenLabs API to convert the translated text back to speech. This creates a seamless speech-to-speech translation system.

Prerequisites

Before running this project, ensure you have the following dependencies installed:

  • Python 3.7 or later
  • Google Cloud SDK (gcloud)
  • Pyaudio
  • Requests
  • Pygame
  • DeepL API key
  • ElevenLabs API key

Installation

  1. Clone the repository:

    git clone https://github.com/bykemalh/S2ST.git
    cd S2ST
  2. Set up a virtual environment:

    python3 -m venv env
    source env/bin/activate  # On Windows use `env\Scripts\activate`
  3. Install the required Python packages:

    pip install google-cloud-speech pyaudio deepl requests pygame
  4. Install Google Cloud SDK: Follow the installation instructions for your operating system here.

  5. Authenticate with Google Cloud:

    gcloud auth login
    gcloud auth application-default login
  6. Enable the Google Cloud Speech-to-Text API:

    gcloud services enable speech.googleapis.com
  7. Set up API keys: Replace the placeholder values in the script with your actual DeepL and ElevenLabs API keys.

    auth_key = "your-deepl-auth-key"
    xi_api_key = "your-elevenlabs-api-key"

Running the Application

To run the application, simply execute the main.py script:

python S2ST_NewAdvanced.py

How It Works

  1. Audio Input:

    • The application opens a microphone stream using the pyaudio library and captures audio in real-time.
  2. Speech-to-Text:

    • The captured audio is sent to the Google Cloud Speech-to-Text API, which returns the transcribed text.
  3. Translation:

    • The transcribed text is translated to English using the DeepL API.
  4. Text-to-Speech:

    • The translated text is sent to the ElevenLabs API, which converts it to speech and plays it back.

Dependencies

Ensure you have the following libraries installed:

  • google-cloud-speech
  • pyaudio
  • deepl
  • requests
  • pygame

You can install these dependencies using the following command:

pip install google-cloud-speech pyaudio deepl requests pygame

Configuration

Modify the following variables in the script to match your settings:

  • auth_key: Your DeepL API key.
  • xi_api_key: Your ElevenLabs API key.
  • voice_id: The voice ID to be used with ElevenLabs API.
  • RATE: The audio sample rate (default is 16000).
  • CHUNK: The audio chunk size (default is 1600).

Logging

Logging is set up in the script to capture errors during the text-to-speech conversion process. You can enable more detailed logging by uncommenting the logging configuration line.

# logging.basicConfig(level=logging.DEBUG)

License

This project is licensed under the MIT License. See the LICENSE file for details.

Contributing

If you wish to contribute to this project, please fork the repository and create a pull request.

Acknowledgments


Developed By

This algorithm was developed by Kemal Hafızoğlu.

About

Speech to Speech Translation Python

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages