Live Translator

Real-time speech-to-text translation application for Linux. Captures system audio, transcribes it using Whisper, and translates to your target language using local LLMs (Ollama) or cloud APIs.

Features

Real-time audio capture from system audio (PipeWire/PulseAudio)
Speech-to-text using Faster-Whisper (local, offline, CPU/CUDA)
Translation via Ollama (local), OpenAI, or Anthropic
Speaker diarization - detects and labels different speakers (local, no API key needed)
Auto language detection - skips translation when audio is in expected languages
GTK4 overlay window with scrollable translation history
Customizable appearance (colors, fonts, opacity)
Custom translation prompts for specialized terminology
Performance tuning - adjust speed/accuracy tradeoff
Per-session logging - separate log file for each session
AI Assistant with question detection and contextual help
Settings saved to config file for persistence

Requirements

Linux with PipeWire or PulseAudio
Python 3.10+
GTK4
Ollama (for local translation)

Installation

1. Install system dependencies

# Ubuntu/Debian
sudo apt-get install -y \
    libgirepository-2.0-dev \
    gcc \
    libcairo2-dev \
    pkg-config \
    python3-dev \
    gir1.2-gtk-4.0 \
    libpulse-dev \
    portaudio19-dev

# For PipeWire (usually pre-installed)
# Ensure wpctl and pw-record are available

2. Create virtual environment and install Python dependencies

cd live_translator
python3 -m venv venv
source venv/bin/activate

# Install core dependencies
pip install faster-whisper ollama PyGObject pyaudio

# Optional: for CUDA support (GPU acceleration)
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118

# Optional: for speaker diarization
pip install pyannote.audio

To activate the virtual environment in future sessions:

source venv/bin/activate

Note: The start.sh script automatically uses the venv, so manual activation is only needed for development.

3. Install Ollama and a translation model

# Install Ollama (https://ollama.com)
curl -fsSL https://ollama.com/install.sh | sh

# Pull a model for translation
ollama pull mistral:7b

Usage

Basic usage

./start.sh

Command-line options

./start.sh [options]

Options:
  -w, --whisper-model    Whisper model: tiny, base, small, medium, large-v2, large-v3
                         (default: base)
  -m, --ollama-model     Ollama model for translation (default: mistral:7b)
  -s, --source-language  Source language code (default: en)
  -t, --target-language  Target language name (default: Russian)
  -d, --device           Device for Whisper: cpu, cuda (default: cpu)
  --compute-type         Compute type: int8, float16, float32 (default: int8)

Examples

# Default (English to Russian, base model)
./start.sh

# Use smaller/faster model
./start.sh -w tiny

# Use larger model for better accuracy
./start.sh -w small

# Translate to German
./start.sh -t German

# Use GPU (requires CUDA and cuDNN)
./start.sh -d cuda

Settings

Click the gear icon in the window header to open settings:

Appearance

Window opacity
Background color
Original/translated text colors
Font sizes

Transcription

Whisper model size
Device (CPU/CUDA)
Compute type
Source language
Speaker diarization (local, no API key needed)
Performance tuning (beam size, silence duration, etc.)
Auto language detection with expected languages list

Translation

Provider (Ollama, OpenAI, Anthropic)
Model name
Target language
Custom prompt template

Logging

Enable/disable translation logging
Custom log directory
Log original and translated text separately

AI Assistant

Enable/disable AI assistant
Auto-detect and answer questions from transcription
Context-based summaries and learning help

Settings are saved to ~/.config/live-translator/settings.json

Recommended Models

Transcription (Whisper)

For optimal performance with different VRAM constraints:

tiny (39M) - Fastest, lowest quality, ~1GB
base (74M) - Good balance, ~2GB - RECOMMENDED for CPU
small (244M) - Better accuracy, ~2.5GB
medium (769M) - High accuracy, ~4GB - RECOMMENDED for CUDA
large-v3 (1.5B) - Best accuracy, ~8GB - For CUDA with 8GB+ VRAM

Translation (Ollama)

For CUDA with 8GB VRAM:

Model	Size	Speed	Accuracy	Notes
mistral:7b	4.4GB	⚡⚡⚡	⭐⭐⭐	✅ BEST - Fast & reliable
neural-chat:7b	4.1GB	⚡⚡⚡	⭐⭐⭐	✅ Optimized for chat
llama2:7b	3.8GB	⚡⚡⚡	⭐⭐⭐	✅ Good alternative
llama3:8b	4.7GB	⚡⚡	⭐⭐⭐⭐	✅ Better accuracy
openhermes-2.5:7b	4.0GB	⚡⚡⚡	⭐⭐⭐	✅ EXCELLENT

For CPU (slower, but works):

mistral:7b (4.4GB) - Best for CPU
llama2:7b (3.8GB) - Good alternative, smaller

Download recommended models:

# Best for translation - fastest and reliable
ollama pull mistral:7b

# Alternatives with good quality
ollama pull neural-chat:7b
ollama pull llama3:8b
ollama pull llama2:7b

Models NOT recommended:

❌ codellama:* - Specialized for code, not translation
❌ Models > 8GB - Will not fit in 8GB VRAM
❌ deepseek-r1:7b - Slow, oriented to reasoning not translation

Project Structure

live_translator/
├── main.py              # Main application entry point
├── overlay.py           # GTK4 overlay window
├── audio_capture.py     # System audio capture (PipeWire/PulseAudio)
├── mic_capture.py       # Microphone audio capture
├── transcriber.py       # Speech-to-text with Faster-Whisper
├── translator.py        # Translation via Ollama/OpenAI/Anthropic
├── tts_engine.py        # Text-to-speech synthesis
├── virtual_output.py    # Virtual audio output for TTS
├── reverse_mode.py      # Reverse translation mode (speech-to-speech)
├── qa_assistant.py      # AI assistant for Q&A
├── logger.py            # Session logging
├── settings.py          # Settings management
├── settings_dialog.py   # Settings UI dialog
└── start.sh             # Launch script

Troubleshooting

No audio capture

Ensure PipeWire or PulseAudio is running
Check that pw-record or parec is available
Verify audio is playing from a source

Whisper model download

First run will download the Whisper model (~150MB for base). This may take a few minutes.

CUDA memory errors

If you see "CUDA out of memory" errors:

Reduce model size:
```
./start.sh -w base -d cuda
```
Switch to CPU:
```
./start.sh -d cpu
```
Use smaller translation model:
- In Settings → Translation, select a smaller model (4-5GB)
- Recommended: neural-chat:7b or mistral:7b
Disable speaker diarization:
- In Settings → Transcription, turn off "Enable Speaker Detection"
- Diarization requires extra VRAM
Reduce compute precision:
- In Settings → Transcription
- Change "Compute Type" to int8 (uses less memory)

CUDA not detected

Ensure CUDA and cuDNN are properly installed:

# Check CUDA installation
nvidia-smi

# Install CUDA (Ubuntu/Debian)
sudo apt-get install cuda-toolkit

Translation not working

Ensure Ollama is running: ollama serve
Check the model is pulled: ollama list
Try a different model in Settings → Translation
Check "Expected Languages" setting matches your content
If auto-detect is on, it may skip translation if language matches target

Performance optimization

For faster transcription:
- Settings → Transcription → Beam Size: reduce to 1-2
- Settings → Transcription → Min Silence Duration: reduce to 200ms
- Whisper model: use base instead of small
For better accuracy:
- Increase Beam Size (3-5)
- Use larger Whisper model (small or medium)
- Use better translation model (llama3:8b instead of mistral:7b)

License

MIT License

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
src/live_translator		src/live_translator
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
start.sh		start.sh

Folders and files

Latest commit

History

Repository files navigation

Live Translator

Features

Requirements

Installation

1. Install system dependencies

2. Create virtual environment and install Python dependencies

3. Install Ollama and a translation model

Usage

Basic usage

Command-line options

Examples

Settings

Appearance

Transcription

Translation

Logging

AI Assistant

Recommended Models

Transcription (Whisper)

Translation (Ollama)

Project Structure

Troubleshooting

No audio capture

Whisper model download

CUDA memory errors

CUDA not detected

Translation not working

Performance optimization

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages