Real-time speech-to-text translation application for Linux. Captures system audio, transcribes it using Whisper, and translates to your target language using local LLMs (Ollama) or cloud APIs.
- Real-time audio capture from system audio (PipeWire/PulseAudio)
- Speech-to-text using Faster-Whisper (local, offline, CPU/CUDA)
- Translation via Ollama (local), OpenAI, or Anthropic
- Speaker diarization - detects and labels different speakers (local, no API key needed)
- Auto language detection - skips translation when audio is in expected languages
- GTK4 overlay window with scrollable translation history
- Customizable appearance (colors, fonts, opacity)
- Custom translation prompts for specialized terminology
- Performance tuning - adjust speed/accuracy tradeoff
- Per-session logging - separate log file for each session
- AI Assistant with question detection and contextual help
- Settings saved to config file for persistence
- Linux with PipeWire or PulseAudio
- Python 3.10+
- GTK4
- Ollama (for local translation)
# Ubuntu/Debian
sudo apt-get install -y \
libgirepository-2.0-dev \
gcc \
libcairo2-dev \
pkg-config \
python3-dev \
gir1.2-gtk-4.0 \
libpulse-dev \
portaudio19-dev
# For PipeWire (usually pre-installed)
# Ensure wpctl and pw-record are availablecd live_translator
python3 -m venv venv
source venv/bin/activate
# Install core dependencies
pip install faster-whisper ollama PyGObject pyaudio
# Optional: for CUDA support (GPU acceleration)
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118
# Optional: for speaker diarization
pip install pyannote.audioTo activate the virtual environment in future sessions:
source venv/bin/activateNote: The start.sh script automatically uses the venv, so manual activation is only needed for development.
# Install Ollama (https://ollama.com)
curl -fsSL https://ollama.com/install.sh | sh
# Pull a model for translation
ollama pull mistral:7b./start.sh./start.sh [options]
Options:
-w, --whisper-model Whisper model: tiny, base, small, medium, large-v2, large-v3
(default: base)
-m, --ollama-model Ollama model for translation (default: mistral:7b)
-s, --source-language Source language code (default: en)
-t, --target-language Target language name (default: Russian)
-d, --device Device for Whisper: cpu, cuda (default: cpu)
--compute-type Compute type: int8, float16, float32 (default: int8)
# Default (English to Russian, base model)
./start.sh
# Use smaller/faster model
./start.sh -w tiny
# Use larger model for better accuracy
./start.sh -w small
# Translate to German
./start.sh -t German
# Use GPU (requires CUDA and cuDNN)
./start.sh -d cudaClick the gear icon in the window header to open settings:
- Window opacity
- Background color
- Original/translated text colors
- Font sizes
- Whisper model size
- Device (CPU/CUDA)
- Compute type
- Source language
- Speaker diarization (local, no API key needed)
- Performance tuning (beam size, silence duration, etc.)
- Auto language detection with expected languages list
- Provider (Ollama, OpenAI, Anthropic)
- Model name
- Target language
- Custom prompt template
- Enable/disable translation logging
- Custom log directory
- Log original and translated text separately
- Enable/disable AI assistant
- Auto-detect and answer questions from transcription
- Context-based summaries and learning help
Settings are saved to ~/.config/live-translator/settings.json
For optimal performance with different VRAM constraints:
- tiny (39M) - Fastest, lowest quality, ~1GB
- base (74M) - Good balance, ~2GB - RECOMMENDED for CPU
- small (244M) - Better accuracy, ~2.5GB
- medium (769M) - High accuracy, ~4GB - RECOMMENDED for CUDA
- large-v3 (1.5B) - Best accuracy, ~8GB - For CUDA with 8GB+ VRAM
For CUDA with 8GB VRAM:
| Model | Size | Speed | Accuracy | Notes |
|---|---|---|---|---|
| mistral:7b | 4.4GB | ⚡⚡⚡ | ⭐⭐⭐ | ✅ BEST - Fast & reliable |
| neural-chat:7b | 4.1GB | ⚡⚡⚡ | ⭐⭐⭐ | ✅ Optimized for chat |
| llama2:7b | 3.8GB | ⚡⚡⚡ | ⭐⭐⭐ | ✅ Good alternative |
| llama3:8b | 4.7GB | ⚡⚡ | ⭐⭐⭐⭐ | ✅ Better accuracy |
| openhermes-2.5:7b | 4.0GB | ⚡⚡⚡ | ⭐⭐⭐ | ✅ EXCELLENT |
For CPU (slower, but works):
- mistral:7b (4.4GB) - Best for CPU
- llama2:7b (3.8GB) - Good alternative, smaller
Download recommended models:
# Best for translation - fastest and reliable
ollama pull mistral:7b
# Alternatives with good quality
ollama pull neural-chat:7b
ollama pull llama3:8b
ollama pull llama2:7bModels NOT recommended:
- ❌ codellama:* - Specialized for code, not translation
- ❌ Models > 8GB - Will not fit in 8GB VRAM
- ❌ deepseek-r1:7b - Slow, oriented to reasoning not translation
live_translator/
├── main.py # Main application entry point
├── overlay.py # GTK4 overlay window
├── audio_capture.py # System audio capture (PipeWire/PulseAudio)
├── mic_capture.py # Microphone audio capture
├── transcriber.py # Speech-to-text with Faster-Whisper
├── translator.py # Translation via Ollama/OpenAI/Anthropic
├── tts_engine.py # Text-to-speech synthesis
├── virtual_output.py # Virtual audio output for TTS
├── reverse_mode.py # Reverse translation mode (speech-to-speech)
├── qa_assistant.py # AI assistant for Q&A
├── logger.py # Session logging
├── settings.py # Settings management
├── settings_dialog.py # Settings UI dialog
└── start.sh # Launch script
- Ensure PipeWire or PulseAudio is running
- Check that
pw-recordorparecis available - Verify audio is playing from a source
First run will download the Whisper model (~150MB for base). This may take a few minutes.
If you see "CUDA out of memory" errors:
-
Reduce model size:
./start.sh -w base -d cuda
-
Switch to CPU:
./start.sh -d cpu
-
Use smaller translation model:
- In Settings → Translation, select a smaller model (4-5GB)
- Recommended:
neural-chat:7bormistral:7b
-
Disable speaker diarization:
- In Settings → Transcription, turn off "Enable Speaker Detection"
- Diarization requires extra VRAM
-
Reduce compute precision:
- In Settings → Transcription
- Change "Compute Type" to
int8(uses less memory)
Ensure CUDA and cuDNN are properly installed:
# Check CUDA installation
nvidia-smi
# Install CUDA (Ubuntu/Debian)
sudo apt-get install cuda-toolkit- Ensure Ollama is running:
ollama serve - Check the model is pulled:
ollama list - Try a different model in Settings → Translation
- Check "Expected Languages" setting matches your content
- If auto-detect is on, it may skip translation if language matches target
-
For faster transcription:
- Settings → Transcription → Beam Size: reduce to 1-2
- Settings → Transcription → Min Silence Duration: reduce to 200ms
- Whisper model: use
baseinstead ofsmall
-
For better accuracy:
- Increase Beam Size (3-5)
- Use larger Whisper model (
smallormedium) - Use better translation model (
llama3:8binstead ofmistral:7b)
MIT License