Offline speaker diarization using ONNX models (pyannote-segmentation-3.0).
| Model | Description |
|---|---|
model.onnx |
Base segmentation model from onnx-community/pyannote-segmentation-3.0 |
model_with_embedding.onnx |
Extended version with speaker embeddings as an additional output (generated via speech_embedding_export.py) |
Outputs detected speakers with timestamps and confidence scores:
uv run python speech_diarizer.pyAutomatically downloads the model and sample audio (mlk.wav), then prints segments like:
SPEAKER_01 0.37s - 2.84s (conf=0.951)
SPEAKER_02 2.84s - 5.21s (conf=0.876)
Extract per-segment speaker embeddings alongside timestamps:
uv run python speech_embedding.pyOutput includes embedding dimensions for each segment, useful for downstream clustering or verification.
Re-exports the base ONNX model to include the LeakyRelu activation (speaker embeddings) as a graph output:
uv run python speech_embedding_export.py
# Produces: model_with_embedding.onnxDependencies are managed via uv:
pip install uv # if not already installed
uv sync # installs dependencies from pyproject.toml