Comprehensive benchmarking and performance analysis tool for Palabra AI translation pipelines.
The benchmark module provides detailed metrics and analysis for translation quality, latency, and performance. It processes audio files through the full translation pipeline while collecting extensive telemetry data.
Key Features:
- ⚡ End-to-end latency measurements
- 📊 Detailed performance metrics
- 🎯 Translation quality analysis
- 💾 Complete trace data export
- 📝 Comprehensive JSON reports
- 🔊 Debug audio output (input + output comparison)
# One-time setup
cp .env.example .env # Add your PALABRA_CLIENT_ID and PALABRA_CLIENT_SECRET
make build
# Run benchmark
make bench -- examples/speech/en.mp3 en es --out ./resultsuv run python -m palabra_ai.benchmark examples/speech/en.mp3 en es --out ./resultsmake bench -- examples/speech/en.mp3 en es --out ./resultsNote: When using make bench, you must use -- to separate make options from benchmark arguments.
# With language arguments
<command> <audio_file> <source_lang> <target_lang> [options]
# With config file
<command> <audio_file> --config <config.json> [options]| Argument | Required | Description |
|---|---|---|
audio_file |
Yes | Path to input audio file (any format supported by FFmpeg) |
source_lang |
Conditional | Source language code (required without --config) |
target_lang |
Conditional | Target language code (required without --config) |
--config <file> |
No | Load full configuration from JSON file |
--out <dir> |
No | Output directory for results (default: console only) |
Use standard language codes: en, es, fr, de, ru, ja, zh, etc.
See main README for complete language list.
# English to Spanish
uv run python -m palabra_ai.benchmark examples/speech/en.mp3 en es --out ./bench_results
# Arabic to English
uv run python -m palabra_ai.benchmark examples/speech/ar.mp3 ar en --out ./bench_results# With custom config
uv run python -m palabra_ai.benchmark examples/speech/en.mp3 \
--config examples/benchmark_config.json \
--out ./bench_results# Simple benchmark with Docker
make bench -- examples/speech/es.mp3 es en --out ./results
# With config file
make bench -- examples/speech/nbc.wav --config examples/benchmark_config.json --out ./results# Skip file output, print report to console
uv run python -m palabra_ai.benchmark examples/speech/en.mp3 en esWhen --out is specified, the benchmark creates a timestamped set of files:
| File | Description |
|---|---|
<timestamp>_bench_report.json |
Complete analysis report with metrics |
<timestamp>_bench_trace.json |
Full pipeline trace data |
<timestamp>_bench_config.json |
Configuration used for benchmark |
<timestamp>_bench_sysinfo.json |
System information and environment |
<timestamp>_bench.log |
Detailed debug logs |
<timestamp>_bench_in_<lang>.wav |
Input audio (preprocessed) |
<timestamp>_bench_out_<lang>.wav |
Output audio (translated) |
<timestamp>_bench_runresult_debug.json |
Runtime result data |
The JSON report includes:
- Latency metrics: End-to-end, ASR, translation, TTS timings
- Sentence analysis: Per-sentence breakdown with timestamps
- Quality metrics: Translation validation, confidence scores
- Performance data: Processing speed, queue levels, tempo adjustments
- System info: Hardware, OS, Python version, library versions
Create a JSON config file with your pipeline settings:
{
"pipeline": {
"transcription": {
"source_language": "en",
"segment_confirmation_silence_threshold": 0.7,
"sentence_splitter": {
"enabled": true
}
},
"translations": [
{
"target_language": "es",
"speech_generation": {
"voice_cloning": false,
"voice_id": "default_low"
}
}
],
"translation_queue_configs": {
"global": {
"desired_queue_level_ms": 5000,
"max_queue_level_ms": 20000,
"auto_tempo": true
}
}
}
}Then run:
uv run python -m palabra_ai.benchmark audio.wav --config config.json --out ./resultsSet credentials via environment variables:
export PALABRA_CLIENT_ID=your_client_id
export PALABRA_CLIENT_SECRET=your_client_secretOr use .env file:
PALABRA_CLIENT_ID=your_client_id
PALABRA_CLIENT_SECRET=your_client_secret- Forces 100ms chunk duration for optimal performance
- Enables all message types for complete data collection
- Captures full trace data and debug logs
- Generates progress bar with real-time status
The benchmark shows real-time progress:
Processing en→es: 45%|████████████▌ | [00:23<00:28]
- Saves partial results on errors
- Full traceback capture
- Debug audio export for analysis
- Detailed error logs
# Copy environment template
cp .env.example .env
# Edit .env and add credentials
# PALABRA_CLIENT_ID=...
# PALABRA_CLIENT_SECRET=...
# Build image
make build# Update code (no rebuild needed)
git pull
# Run benchmark
make bench -- --config examples/benchmark_config.json --out ./results examples/speech/en.mp3
# Results saved to host filesystem
ls -la results/✅ No rebuild on code changes - Project directory is mounted as volume
✅ Fast startup - .venv is cached in Docker volume
✅ File sharing - Config files and results shared between host and container
✅ Clean environment - Isolated Python environment with all dependencies
Rebuild the Docker image:
make rebuildContainer runs as your user, files should have correct permissions.
Make sure paths are relative to project root (working directory is /workspace in container).
Check the debug files in output directory:
*_bench_error.txt- Error details and traceback*_bench.log- Full debug logs*_bench_runresult_debug.json- Runtime state
Ensure FFmpeg is installed and audio file is valid:
ffprobe your_audio.wavUse --out flag to enable file output:
uv run python -m palabra_ai.benchmark audio.wav en es --out ./resultsParse the JSON report programmatically:
import json
from pathlib import Path
report = json.loads(Path("results/20241010_150000_bench_report.json").read_text())
# Extract metrics
avg_latency = report["average_latency_ms"]
sentences = report["sentences"]
for s in sentences:
print(f"Sentence {s['index']}: {s['latency_ms']}ms")Run multiple benchmarks:
#!/bin/bash
for audio in examples/speech/*.wav; do
uv run python -m palabra_ai.benchmark "$audio" en es --out "./results/$(basename $audio .wav)"
doneThe trace file (*_bench_trace.json) contains complete pipeline data:
- All messages exchanged
- Timing information
- Audio buffers
- Configuration state
- Main README - SDK documentation
- Installation Guide - Setup instructions
- Examples - Sample code and configs