Speech-to-Speech Translator

This project uses Google Cloud Speech-to-Text API to transcribe speech to text, DeepL API to translate the transcribed text, and ElevenLabs API to convert the translated text back to speech. This creates a seamless speech-to-speech translation system.

Prerequisites

Before running this project, ensure you have the following dependencies installed:

Python 3.7 or later
Google Cloud SDK (gcloud)
Pyaudio
Requests
Pygame
DeepL API key
ElevenLabs API key

Installation

Clone the repository:

git clone https://github.com/bykemalh/S2ST.git
cd S2ST

Set up a virtual environment:

python3 -m venv env
source env/bin/activate  # On Windows use `env\Scripts\activate`

Install the required Python packages:

pip install google-cloud-speech pyaudio deepl requests pygame

Install Google Cloud SDK: Follow the installation instructions for your operating system here.

Authenticate with Google Cloud:

gcloud auth login
gcloud auth application-default login

Enable the Google Cloud Speech-to-Text API:

gcloud services enable speech.googleapis.com

Set up API keys: Replace the placeholder values in the script with your actual DeepL and ElevenLabs API keys.
```
auth_key = "your-deepl-auth-key"
xi_api_key = "your-elevenlabs-api-key"
```

Running the Application

To run the application, simply execute the main.py script:

python S2ST_NewAdvanced.py

How It Works

Audio Input:
- The application opens a microphone stream using the pyaudio library and captures audio in real-time.
Speech-to-Text:
- The captured audio is sent to the Google Cloud Speech-to-Text API, which returns the transcribed text.
Translation:
- The transcribed text is translated to English using the DeepL API.
Text-to-Speech:
- The translated text is sent to the ElevenLabs API, which converts it to speech and plays it back.

Dependencies

Ensure you have the following libraries installed:

google-cloud-speech
pyaudio
deepl
requests
pygame

You can install these dependencies using the following command:

pip install google-cloud-speech pyaudio deepl requests pygame

Configuration

Modify the following variables in the script to match your settings:

auth_key: Your DeepL API key.
xi_api_key: Your ElevenLabs API key.
voice_id: The voice ID to be used with ElevenLabs API.
RATE: The audio sample rate (default is 16000).
CHUNK: The audio chunk size (default is 1600).

Logging

Logging is set up in the script to capture errors during the text-to-speech conversion process. You can enable more detailed logging by uncommenting the logging configuration line.

# logging.basicConfig(level=logging.DEBUG)

License

This project is licensed under the MIT License. See the LICENSE file for details.

Contributing

If you wish to contribute to this project, please fork the repository and create a pull request.

Acknowledgments

Developed By

This algorithm was developed by Kemal Hafızoğlu.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
.gitattributes		.gitattributes
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
S2ST_Advanced.py		S2ST_Advanced.py
S2ST_Basic.py		S2ST_Basic.py
S2ST_Main.py		S2ST_Main.py
TTS_ElevanLabs.py		TTS_ElevanLabs.py
TranslateDeepL.py		TranslateDeepL.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Speech-to-Speech Translator

Prerequisites

Installation

Running the Application

How It Works

Dependencies

Configuration

Logging

License

Contributing

Acknowledgments

Developed By

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Speech-to-Speech Translator

Prerequisites

Installation

Running the Application

How It Works

Dependencies

Configuration

Logging

License

Contributing

Acknowledgments

Developed By

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages