Real-time Audio Transcription System

This project implements a near real-time audio transcription system with a GPU-backed transcription server and a client for continuous audio streaming and transcription.

Components

Server (server-transcribe.py): Runs on a CUDA GPU-enabled machine, providing transcription services via HTTP.
Client (send-streaming-voice.py): Captures audio and sends it to the server for real-time transcription.

Server: server-transcribe.py

Requirements

Python 3.10+
FastAPI
uvicorn
NVIDIA GPU with CUDA support
NeMo ASR model (Canary-1B)

Setup

Install required packages:

pip install fastapi uvicorn nemo_toolkit[all] python-dotenv

Ensure necessary CUDA libraries are installed for your GPU.

Running the Server

python server-transcribe.py

Client: send-streaming-voice(-silero).py

Requirements

Python 3.7+
pyaudio
requests
python-dotenv

Setup

Install required packages:

pip install pyaudio requests python-dotenv

Configuration

Create a .env file in the same directory as send-streaming-voice.py:

TRANSCRIBE_ENDPOINT=http://your-server-ip:8726/transcribe

Replace your-server-ip with the IP address or hostname of your GPU server.

Running the Client

python send-streaming-voice.py

Evaluation

python text-diff.py test/data/text-source.txt test/data/text-transcribed(-silero).txt

Usage

Start the server on your GPU-enabled machine.
Run the client on the machine where you want to capture audio.
Speak into the microphone connected to the client machine.
The client will continuously send audio chunks to the server.
The server will process these chunks and return transcriptions.
The client will print the transcriptions as they are received.

System Architecture

[Client Machine]                 [GPU Server]
+------------------+             +------------------+
|                  |             |                  |
| Microphone       |             | NVIDIA GPU       |
|      |           |             |      |           |
|      v           |             |      v           |
| send-streaming-  |  HTTP POST  | server-          |
| voice.py         | --------->  | transcribe.py    |
|      |           | Audio Data  |      |           |
|      |           |             |      |           |
|      |           | HTTP        |      |           |
|      |           | Response    |      |           |
|      v           | <---------  |      |           |
| Display          |Transcription|      |           |
| Transcription    |             |      |           |
+------------------+             +------------------+

Implementation Details

Server (server-transcribe.py)

Uses FastAPI to create an HTTP server.
Loads a pre-trained NeMo ASR model (Canary-1B) for transcription.
Receives audio chunks via POST requests.
Processes audio using the GPU for fast transcription.
Returns transcribed text as JSON responses.

Client (send-streaming-voice.py)

Uses PyAudio to capture audio from the microphone.
Streams audio in chunks (default 5 seconds).
Sends each audio chunk to the server as a WAV file in a POST request.
Receives and displays the transcription for each chunk.

Notes

The server's firewall should allow incoming connections on the specified port.
Audio is captured in 5-second chunks by default. Adjust this in the AudioStreamer class if needed.
The transcription model used is NVIDIA's Canary-1B. You may need to adjust paths or model loading based on your specific setup.

Troubleshooting

For audio capture issues, ensure your microphone is properly configured and recognized by your system.
For server connection issues, verify that the TRANSCRIBE_ENDPOINT in the client's .env file is correct and that the server is running and accessible.
GPU-related issues on the server side may require checking CUDA installation and compatibility with the NeMo toolkit.

License

MIT License

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
test		test
.pre-commit-config.yaml		.pre-commit-config.yaml
LICENSE		LICENSE
README.md		README.md
send_streaming_voice-silero.py		send_streaming_voice-silero.py
send_streaming_voice.py		send_streaming_voice.py
server-transcribe-moonshine.py		server-transcribe-moonshine.py
server-transcribe.py		server-transcribe.py
text-diff.py		text-diff.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Real-time Audio Transcription System

Components

Server: server-transcribe.py

Requirements

Setup

Running the Server

Client: send-streaming-voice(-silero).py

Requirements

Setup

Configuration

Running the Client

Evaluation

Usage

System Architecture

Implementation Details

Server (server-transcribe.py)

Client (send-streaming-voice.py)

Notes

Troubleshooting

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Real-time Audio Transcription System

Components

Server: server-transcribe.py

Requirements

Setup

Running the Server

Client: send-streaming-voice(-silero).py

Requirements

Setup

Configuration

Running the Client

Evaluation

Usage

System Architecture

Implementation Details

Server (server-transcribe.py)

Client (send-streaming-voice.py)

Notes

Troubleshooting

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages