feat(vad): add POST /v1/vad/confidence — per-packet ONNX confidence endpoint by chazmaniandinkle · Pull Request #128 · myrgic/mod3

chazmaniandinkle · 2026-06-01T23:30:30Z

Why

/v1/vad accepts a WAV multipart upload and runs the full Silero torch path — right for utterance VAD, overkill for 512-sample frames arriving at 20ms intervals from a Discord SocketReader thread.

The vendored pipecat SileroVADAnalyzer (vendor/pipecat_vad/) uses ONNX, has no torch dependency, returns per-frame confidence in <5ms, and was already in main via voice_confidence() in vad.py — but had no HTTP surface.

What

POST /v1/vad/confidence[?sample_rate=16000]
Body:     raw int16 little-endian PCM (512 samples @ 16kHz or 256 @ 8kHz)
Response: {"confidence": float, "available": bool, "latency_ms": float}

available=false (confidence=0.0) when onnxruntime is not installed — graceful degradation
Empty body → {confidence: 0.0, available: <bool>, latency_ms: 0.0} (used as a health probe by callers)
Listed in GET /capabilities as vad_confidence

Callers

The Discord voice adapter (~/.hermes/hermes-agent/plugins/platforms/discord/adapter.py) is updated in parallel:

_check_mod3_vad_available() now probes POST /v1/vad/confidence with empty body instead of GET /health → modalities.vad (which was always false because the torch model isn't loaded at startup)
_check_vad_bargein() now POSTs raw PCM bytes to /v1/vad/confidence instead of WAV-wrapping for /v1/vad

Tests

12 tests in tests/test_vad_confidence_endpoint.py:

Happy path: confidence value, available=true, latency_ms present, sample_rate param forwarded
Unavailable path: available=false, confidence=0.0, empty body
Capabilities manifest: vad_confidence key present

…ndpoint Adds a lightweight per-packet VAD endpoint for barge-in loops (Discord VC, etc.) that do not need full utterance-level speech detection. Why: - /v1/vad accepts a WAV multipart upload and runs the full Silero torch path. That's the right API for utterance VAD but overkill for 512-sample frames arriving at 20ms intervals from a Discord SocketReader thread. - The vendored pipecat SileroVADAnalyzer (vendor/pipecat_vad/) uses ONNX, has no torch dependency, returns per-frame confidence in <5ms, and was already present in main but had no HTTP surface. Contract: POST /v1/vad/confidence[?sample_rate=16000] Body: raw int16 little-endian PCM (exactly 512 samples @ 16kHz or 256 @ 8kHz) Response: {confidence: float, available: bool, latency_ms: float} available=false (and confidence=0.0) when onnxruntime is not installed. Empty body returns {confidence: 0.0, available: <bool>, latency_ms: 0.0}. Also: - Imports is_pipecat_vad_available + voice_confidence from vad.py (already in main) - Listed in GET /capabilities endpoints manifest as 'vad_confidence' - 12 tests covering happy path, unavailable path, empty body, sample_rate param forwarding, capabilities manifest Updates: fix mod3/http_api.py file header comment and endpoints dict.

The model __call__ returns shape (batch, frames). With batch_size=1 and a single frame, out[0] is shape (1,) — a 1D array. float() on a 1D array raises 'only 0-dimensional arrays can be converted to Python scalars', so voice_confidence() was silently returning 0.0 on every frame. VAD was effectively disabled despite reporting available=true. Fix: float(np.squeeze(new_confidence).item()) — works for any batch size and output shape. Root cause found via direct venv test after the HTTP endpoint showed 0.0 confidence on sine waves and noise. The error was swallowed by the bare except in voice_confidence().

chazmaniandinkle added 2 commits June 1, 2026 19:30

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(vad): add POST /v1/vad/confidence — per-packet ONNX confidence endpoint#128

feat(vad): add POST /v1/vad/confidence — per-packet ONNX confidence endpoint#128
chazmaniandinkle wants to merge 2 commits into
mainfrom
feat/vad-confidence-endpoint

chazmaniandinkle commented Jun 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

chazmaniandinkle commented Jun 1, 2026

Why

What

Callers

Tests

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant