Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
34 commits
Select commit Hold shift + click to select a range
a8b1d1c
feat(transcribe): add speaker diarization support
basnijholt Jan 10, 2026
be3ad09
chore: let pyannote-audio manage torch dependency
basnijholt Jan 10, 2026
07d722b
fix: use 'token' instead of deprecated 'use_auth_token' for pyannote
basnijholt Jan 10, 2026
ecea290
docs: add all required model licenses and token permission info
basnijholt Jan 10, 2026
441c6dc
fix: pre-load audio with torchaudio to avoid torchcodec/FFmpeg issues
basnijholt Jan 10, 2026
0465090
fix: handle new DiarizeOutput API from pyannote-audio
basnijholt Jan 10, 2026
17fd7bf
fix: show all required model URLs on gated repo access error
basnijholt Jan 10, 2026
ffde208
Merge remote-tracking branch 'origin/main' into feat/speaker-diarization
basnijholt Feb 5, 2026
a7c6a64
Merge ffde2087db7f016f9c7e08ee10a3c22b53891944 into 7e4948c9861e7ffc4…
basnijholt Feb 5, 2026
a0ca401
Update auto-generated docs
github-actions[bot] Feb 5, 2026
4698867
feat(transcribe): add speaker diarization with wav2vec2 alignment
basnijholt Feb 5, 2026
f22bd24
Merge 46988673ce58e72f7c74c210c4b4a50ac995dbc3 into 7e4948c9861e7ffc4…
basnijholt Feb 5, 2026
f279564
Update auto-generated docs
github-actions[bot] Feb 5, 2026
7c47c95
fix(diarization): use first-party imports at top level and simplify p…
basnijholt Feb 5, 2026
48312c5
refactor(transcribe): extract diarization logic into helper function
basnijholt Feb 5, 2026
9a24e47
Merge 48312c5b010478eb76770fa2176ec9b913e83336 into 7e4948c9861e7ffc4…
basnijholt Feb 5, 2026
806000b
Update auto-generated docs
github-actions[bot] Feb 5, 2026
72ab43d
feat(diarization): add Literal type for diarize_format and alignment …
basnijholt Feb 5, 2026
fff4f74
Improve diarization alignment and robustness
basnijholt Feb 5, 2026
8d1d141
Merge fff4f74e95754271780f2e336fbc5bf2576b0a96 into 7e4948c9861e7ffc4…
basnijholt Feb 5, 2026
76acace
Update auto-generated docs
github-actions[bot] Feb 5, 2026
33449b9
Clamp diarization bounds to window
basnijholt Feb 5, 2026
e4bd223
Merge branch 'feat/speaker-diarization' of github.com:basnijholt/agen…
basnijholt Feb 5, 2026
275e545
feat(dev): add --force flag to `dev clean` command (#414)
basnijholt Feb 5, 2026
a7b886e
Add beam search backtracking and wildcard emissions to CTC alignment
basnijholt Feb 5, 2026
675f14d
Match WhisperX CTC alignment behavior and remove dead code
basnijholt Feb 5, 2026
03045e5
Fix alignment edge cases and simplify speaker assignment
basnijholt Feb 6, 2026
a1271e2
Clean up alignment code and add missing test coverage
basnijholt Feb 6, 2026
c49f5e4
Fix padding duration bug and improve test coverage
basnijholt Feb 6, 2026
7f4324d
Add diarization extra to install-extras help text
basnijholt Feb 6, 2026
b35c23c
Add diarization extra to CI test matrix
basnijholt Feb 6, 2026
9e32789
Merge b35c23c60e4a872bc5576b25d20288cd099562c1 into 275e545cdcc05d5b8…
basnijholt Feb 6, 2026
f943744
Update auto-generated docs
github-actions[bot] Feb 6, 2026
9194ddb
Add missing 'it' language to --align-language help text
basnijholt Feb 6, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .github/scripts/sync_extras.py
Original file line number Diff line number Diff line change
Expand Up @@ -37,6 +37,7 @@
"rag": ("RAG proxy (ChromaDB, embeddings)", ["chromadb", "pydantic_ai"]),
"memory": ("Long-term memory proxy", ["chromadb", "yaml", "pydantic_ai"]),
"vad": ("Voice Activity Detection (Silero VAD via ONNX)", ["onnxruntime"]),
"diarization": ("Speaker diarization (pyannote.audio)", ["pyannote.audio"]),
"whisper": ("Local Whisper ASR (faster-whisper)", ["faster_whisper"]),
"whisper-mlx": ("MLX Whisper for Apple Silicon", ["mlx_whisper"]),
"tts": ("Local Piper TTS", ["piper"]),
Expand Down
2 changes: 1 addition & 1 deletion .github/workflows/pytest.yml
Original file line number Diff line number Diff line change
Expand Up @@ -33,7 +33,7 @@ jobs:
run: uv run --all-extras pytest -vvv
- name: Run pytest (non-macOS - exclude mlx-whisper)
if: matrix.os != 'macos-latest'
run: uv run --extra audio --extra llm --extra rag --extra memory --extra vad --extra faster-whisper --extra piper --extra kokoro --extra server --extra speed --extra test pytest -vvv
run: uv run --extra audio --extra diarization --extra llm --extra rag --extra memory --extra vad --extra faster-whisper --extra piper --extra kokoro --extra server --extra speed --extra test pytest -vvv
- name: Upload coverage reports to Codecov
if: matrix.os == 'ubuntu-latest' && matrix.python-version == '3.13'
uses: codecov/codecov-action@v5
Expand Down
49 changes: 44 additions & 5 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -421,6 +421,7 @@ agent-cli install-extras rag memory vad
Available extras:

• audio - Audio recording/playback
• diarization - Speaker diarization (pyannote.audio)
• faster-whisper - Whisper ASR via CTranslate2
• kokoro - Kokoro neural TTS (GPU)
• llm - LLM framework (pydantic-ai)
Expand All @@ -444,9 +445,9 @@ agent-cli install-extras rag memory vad


╭─ Arguments ────────────────────────────────────────────────────────────────────────────╮
│ extras [EXTRAS]... Extras to install: audio, faster-whisper, kokoro, llm, │
│ memory, mlx-whisper, piper, rag, server, speed, vad,
│ whisper-transformers, wyoming
│ extras [EXTRAS]... Extras to install: audio, diarization, faster-whisper, │
kokoro, llm, memory, mlx-whisper, piper, rag, server, │
speed, vad, whisper-transformers, wyoming │
╰────────────────────────────────────────────────────────────────────────────────────────╯
╭─ Options ──────────────────────────────────────────────────────────────────────────────╮
│ --list -l Show available extras with descriptions (what each one enables) │
Expand Down Expand Up @@ -730,7 +731,7 @@ the `[defaults]` section of your configuration file.
│ --llm --no-llm Clean up transcript with LLM: fix errors, │
│ add punctuation, remove filler words. Uses │
│ --extra-instructions if set (via CLI or │
│ config file).
│ config file). Not compatible with --diarize.
│ [default: no-llm] │
╰────────────────────────────────────────────────────────────────────────────────────────╯
╭─ Audio Recovery ───────────────────────────────────────────────────────────────────────╮
Expand Down Expand Up @@ -852,6 +853,44 @@ the `[defaults]` section of your configuration file.
│ provide context for │
│ LLM cleanup. │
╰────────────────────────────────────────────────────────────────────────────────────────╯
╭─ Diarization ──────────────────────────────────────────────────────────────────────────╮
│ --diarize --no-diarize Enable speaker diarization │
│ (requires pyannote-audio). │
│ Install with: pip install │
│ agent-cli[diarization] │
│ [default: no-diarize] │
│ --diarize-format [inline|json] Output format for diarization │
│ ('inline' for [Speaker N]: │
│ text, 'json' for structured │
│ output). │
│ [default: inline] │
│ --hf-token TEXT HuggingFace token for pyannote │
│ models. Required for │
│ diarization. Token must have │
│ 'Read access to contents of all │
│ public gated repos you can │
│ access' permission. Accept │
│ licenses at: │
│ https://hf.co/pyannote/speaker… │
│ https://hf.co/pyannote/segment… │
│ https://hf.co/pyannote/wespeak… │
│ [env var: HF_TOKEN] │
│ --min-speakers INTEGER Minimum number of speakers │
│ (optional hint for │
│ diarization). │
│ --max-speakers INTEGER Maximum number of speakers │
│ (optional hint for │
│ diarization). │
│ --align-words --no-align-words Use wav2vec2 forced alignment │
│ for word-level speaker │
│ assignment (more accurate but │
│ slower). │
│ [default: no-align-words] │
│ --align-language TEXT Language code for word │
│ alignment model (e.g., 'en', │
│ 'fr', 'de', 'es', 'it'). │
│ [default: en] │
╰────────────────────────────────────────────────────────────────────────────────────────╯

```

Expand Down Expand Up @@ -1050,7 +1089,7 @@ uv tool install "agent-cli[vad]" -p 3.13
╭─ LLM Configuration ────────────────────────────────────────────────────────────────────╮
│ --llm --no-llm Clean up transcript with LLM: fix errors, add punctuation, │
│ remove filler words. Uses --extra-instructions if set (via CLI │
│ or config file).
│ or config file). Not compatible with --diarize.
│ [default: no-llm] │
╰────────────────────────────────────────────────────────────────────────────────────────╯
╭─ Process Management ───────────────────────────────────────────────────────────────────╮
Expand Down
6 changes: 6 additions & 0 deletions agent_cli/_extras.json
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,12 @@
"sounddevice"
]
],
"diarization": [
"Speaker diarization (pyannote.audio)",
[
"pyannote.audio"
]
],
"faster-whisper": [
"Whisper ASR via CTranslate2",
[
Expand Down
Loading
Loading