Palabra AI Benchmarking Guide

Comprehensive benchmarking and performance analysis tool for Palabra AI translation pipelines.

Overview

The benchmark module provides detailed metrics and analysis for translation quality, latency, and performance. It processes audio files through the full translation pipeline while collecting extensive telemetry data.

Key Features:

⚡ End-to-end latency measurements
📊 Detailed performance metrics
🎯 Translation quality analysis
💾 Complete trace data export
📝 Comprehensive JSON reports
🔊 Debug audio output (input + output comparison)

Quick Start

Three Ways to Run Benchmarks

1. Using Docker (Recommended)

# One-time setup
cp .env.example .env  # Add your PALABRA_CLIENT_ID and PALABRA_CLIENT_SECRET
make build

# Run benchmark
make bench -- examples/speech/en.mp3 en es --out ./results

2. Using uv (Direct)

uv run python -m palabra_ai.benchmark examples/speech/en.mp3 en es --out ./results

3. Using Make (Local)

make bench -- examples/speech/en.mp3 en es --out ./results

Note: When using make bench, you must use -- to separate make options from benchmark arguments.

Usage

Basic Syntax

# With language arguments
<command> <audio_file> <source_lang> <target_lang> [options]

# With config file
<command> <audio_file> --config <config.json> [options]

Arguments

Argument	Required	Description
`audio_file`	Yes	Path to input audio file (any format supported by FFmpeg)
`source_lang`	Conditional	Source language code (required without `--config`)
`target_lang`	Conditional	Target language code (required without `--config`)
`--config <file>`	No	Load full configuration from JSON file
`--out <dir>`	No	Output directory for results (default: console only)

Language Codes

Use standard language codes: en, es, fr, de, ru, ja, zh, etc.

See main README for complete language list.

Examples

Simple Language Pair

# English to Spanish
uv run python -m palabra_ai.benchmark examples/speech/en.mp3 en es --out ./bench_results

# Arabic to English
uv run python -m palabra_ai.benchmark examples/speech/ar.mp3 ar en --out ./bench_results

Using Configuration File

# With custom config
uv run python -m palabra_ai.benchmark examples/speech/en.mp3 \
  --config examples/benchmark_config.json \
  --out ./bench_results

Docker Examples

# Simple benchmark with Docker
make bench -- examples/speech/es.mp3 es en --out ./results

# With config file
make bench -- examples/speech/nbc.wav --config examples/benchmark_config.json --out ./results

Console-Only Output

# Skip file output, print report to console
uv run python -m palabra_ai.benchmark examples/speech/en.mp3 en es

Output Files

When --out is specified, the benchmark creates a timestamped set of files:

Generated Files

File	Description
`<timestamp>_bench_report.json`	Complete analysis report with metrics
`<timestamp>_bench_trace.json`	Full pipeline trace data
`<timestamp>_bench_config.json`	Configuration used for benchmark
`<timestamp>_bench_sysinfo.json`	System information and environment
`<timestamp>_bench.log`	Detailed debug logs
`<timestamp>_bench_in_<lang>.wav`	Input audio (preprocessed)
`<timestamp>_bench_out_<lang>.wav`	Output audio (translated)
`<timestamp>_bench_runresult_debug.json`	Runtime result data

Report Contents

The JSON report includes:

Latency metrics: End-to-end, ASR, translation, TTS timings
Sentence analysis: Per-sentence breakdown with timestamps
Quality metrics: Translation validation, confidence scores
Performance data: Processing speed, queue levels, tempo adjustments
System info: Hardware, OS, Python version, library versions

Configuration

Using Config Files

Create a JSON config file with your pipeline settings:

{
  "pipeline": {
    "transcription": {
      "source_language": "en",
      "segment_confirmation_silence_threshold": 0.7,
      "sentence_splitter": {
        "enabled": true
      }
    },
    "translations": [
      {
        "target_language": "es",
        "speech_generation": {
          "voice_cloning": false,
          "voice_id": "default_low"
        }
      }
    ],
    "translation_queue_configs": {
      "global": {
        "desired_queue_level_ms": 5000,
        "max_queue_level_ms": 20000,
        "auto_tempo": true
      }
    }
  }
}

Then run:

uv run python -m palabra_ai.benchmark audio.wav --config config.json --out ./results

Environment Variables

Set credentials via environment variables:

export PALABRA_CLIENT_ID=your_client_id
export PALABRA_CLIENT_SECRET=your_client_secret

Or use .env file:

PALABRA_CLIENT_ID=your_client_id
PALABRA_CLIENT_SECRET=your_client_secret

Benchmark Features

Automatic Settings

Forces 100ms chunk duration for optimal performance
Enables all message types for complete data collection
Captures full trace data and debug logs
Generates progress bar with real-time status

Progress Tracking

The benchmark shows real-time progress:

Processing en→es: 45%|████████████▌             | [00:23<00:28]

Error Handling

Saves partial results on errors
Full traceback capture
Debug audio export for analysis
Detailed error logs

Docker Workflow

Setup (One Time)

# Copy environment template
cp .env.example .env

# Edit .env and add credentials
# PALABRA_CLIENT_ID=...
# PALABRA_CLIENT_SECRET=...

# Build image
make build

Run Benchmarks

# Update code (no rebuild needed)
git pull

# Run benchmark
make bench -- --config examples/benchmark_config.json --out ./results examples/speech/en.mp3

# Results saved to host filesystem
ls -la results/

Docker Features

✅ No rebuild on code changes - Project directory is mounted as volume ✅ Fast startup - .venv is cached in Docker volume ✅ File sharing - Config files and results shared between host and container ✅ Clean environment - Isolated Python environment with all dependencies

Troubleshooting

Dependencies Changed?

Rebuild the Docker image:

make rebuild

Permission Issues?

Container runs as your user, files should have correct permissions.

Can't Find Audio Files?

Make sure paths are relative to project root (working directory is /workspace in container).

Benchmark Fails?

Check the debug files in output directory:

*_bench_error.txt - Error details and traceback
*_bench.log - Full debug logs
*_bench_runresult_debug.json - Runtime state

Audio Processing Issues?

Ensure FFmpeg is installed and audio file is valid:

ffprobe your_audio.wav

No Output Files?

Use --out flag to enable file output:

uv run python -m palabra_ai.benchmark audio.wav en es --out ./results

Advanced Usage

Analyzing Results

Parse the JSON report programmatically:

import json
from pathlib import Path

report = json.loads(Path("results/20241010_150000_bench_report.json").read_text())

# Extract metrics
avg_latency = report["average_latency_ms"]
sentences = report["sentences"]

for s in sentences:
    print(f"Sentence {s['index']}: {s['latency_ms']}ms")

Batch Processing

Run multiple benchmarks:

#!/bin/bash
for audio in examples/speech/*.wav; do
    uv run python -m palabra_ai.benchmark "$audio" en es --out "./results/$(basename $audio .wav)"
done

Custom Analysis

The trace file (*_bench_trace.json) contains complete pipeline data:

All messages exchanged
Timing information
Audio buffers
Configuration state

FilesExpand file tree

BENCHMARK.md

Latest commit

History