(speechmatics + inference): add VAD by tinalenguyen · Pull Request #5750 · livekit/agents

tinalenguyen · 2026-05-16T05:30:36Z

When set to EXTERNAL mode, the Speechmatics STT needs finalize() to be called to flush the partial transcripts and mark as end of speech. We pass a VAD, similar to Mistral's plugin, or initialize one so it works right out the box.

We mirror the plugin in the inference code; we also accept/initialize a VAD. Inference is already set to handle the VAD event.

…ard compat

devin-ai-integration

Devin Review found 1 new potential issue.

View 9 additional findings in Devin Review.

devin-ai-integration · 2026-05-16T22:00:14Z

+            if is_given(parsed_language) and not is_given(language):
+                language = parsed_language
+
+        is_speechmatics, vad = _resolve_vad_for_model(model, vad if is_given(vad) else None)


🟡 _resolve_vad_for_model conflates vad=None (opt-out) with vad=NOT_GIVEN (auto-load), making it impossible to opt out of VAD for Speechmatics models

When a user explicitly passes vad=None to the inference STT(model="speechmatics/enhanced", vad=None), the intent (mirroring the direct Speechmatics plugin's documented contract at livekit-plugins/livekit-plugins-speechmatics/livekit/plugins/speechmatics/stt.py:258-260) is to opt out of auto-loaded VAD. However, at line 474, is_given(None) returns True (since None is not a NotGiven instance), so vad_instance is passed as None to _resolve_vad_for_model. Inside that function (line 220), the condition is_speechmatics and vad_instance is None triggers Silero auto-loading regardless, making opt-out impossible. The Speechmatics plugin correctly distinguishes these cases using not is_given(vad) at stt.py:277.

Prompt for agents

The _resolve_vad_for_model function needs to distinguish between vad=NOT_GIVEN (auto-load Silero for Speechmatics) and vad=None (explicit opt-out). Currently the calling code at line 474 converts both to None before calling the function. One approach: pass a sentinel or boolean flag to _resolve_vad_for_model indicating whether the user explicitly provided a vad value. For example, add an auto_load: bool parameter that is True only when vad was NOT_GIVEN. In _resolve_vad_for_model, only auto-load Silero when is_speechmatics and auto_load is True. When auto_load is False and vad_instance is None, skip the auto-load. This mirrors the direct Speechmatics plugin's logic at speechmatics/stt.py:277 which uses not is_given(vad) to decide auto-loading.

Was this helpful? React with 👍 or 👎 to provide feedback.

as of right now, speechmatics inference STT models need a VAD to be run here. it is possible they will support server-side endpointing in the future. this differs from the speechmatics stt plugin approach, which already exposes finalize() which allows users to flush end-of-speech on their own

longcw · 2026-05-17T03:03:28Z

+            and not is_given(min_endpointing_delay)
+            and not _user_provided_turn_handling
+        ):
+            endpointing["min_delay"] = 0.0


min_endpointing_delay is deprecated and we should check if user specified min_delay in turn_handling? also, should we move this to a separate pr since it's not related to speechmatics plugin.

makes sense, i also thought that the stt capability would play well with indicating STTs that need VAD. i removed those changes here for now though

russellmartin-livekit · 2026-05-17T05:54:13Z

+def _resolve_vad_for_model(
+    model: NotGivenOr[STTModels | str],
+    vad_instance: vad.VAD | None,
+) -> vad.VAD | None:
+    is_speechmatics = (
+        is_given(model) and isinstance(model, str) and model.startswith("speechmatics/")
+    )
+    if vad_instance is not None and not is_speechmatics:
+        logger.warning(
+            "`vad` will be ignored: model %r handles endpointing server-side.",
+            model,
+        )
+        return None
+    if is_speechmatics and vad_instance is None:
+        try:
+            from livekit.plugins.silero import VAD as SileroVAD
+        except ImportError as e:
+            raise ImportError(
+                "livekit-plugins-silero is required: model "
+                f"{model!r} does not handle endpointing server-side."
+            ) from e
+        vad_instance = SileroVAD.load()
+    return vad_instance
+
+


In the case where AgentSession has VAD wouldn't this mean we have 2 VAD instances?

yes, to use just 1 would require the user to store it and pass the same instance

maybe it would be helpful for the user to have separate settings for stt and session level vad, but as of right now the session vad can't be connected to stt

longcw

lgtm! something nit:

longcw · 2026-05-17T10:50:37Z

+                vad_task.cancel()
+                try:
+                    await vad_task
+                except asyncio.CancelledError:
+                    pass


nit: use utils.cancel_and_wait

longcw · 2026-05-17T10:53:35Z

+                if ws.closed:
+                    return
+                try:
+                    await ws.send_str(json.dumps({"type": "session.finalize"}))


one questions, what will happen if VAD fires EOS on noise, will the STT return an empty or a random transcript?

i believe the STT will return an empty one ""

first draft

5b882e2

chenghao-mou requested a review from a team May 16, 2026 05:30

This comment was marked as resolved.

Sign in to view

addr comment

621b84e

This comment was marked as resolved.

Sign in to view

fix

2ce48d6

This comment was marked as resolved.

Sign in to view

add separate field for interruption silence allowance, maintain backw…

06fd834

…ard compat

devin-ai-integration Bot reviewed May 16, 2026

View reviewed changes

longcw reviewed May 17, 2026

View reviewed changes

separate

f1ea135

tinalenguyen changed the title ~~(speechmatics): add VAD and server_endpointing capability~~ (speechmatics + inference): add VAD May 17, 2026

russellmartin-livekit reviewed May 17, 2026

View reviewed changes

longcw approved these changes May 17, 2026

View reviewed changes

addr comment

03621a6

tinalenguyen merged commit a3df48f into main May 18, 2026
24 checks passed

tinalenguyen deleted the tina/speechmatics-vad branch May 18, 2026 03:31

Conversation

tinalenguyen commented May 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

This comment was marked as resolved.

Uh oh!

This comment was marked as resolved.

Uh oh!

This comment was marked as resolved.

Uh oh!

devin-ai-integration Bot left a comment

Choose a reason for hiding this comment

Uh oh!

devin-ai-integration Bot May 16, 2026

Choose a reason for hiding this comment

Uh oh!

tinalenguyen May 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

longcw May 17, 2026

Choose a reason for hiding this comment

Uh oh!

tinalenguyen May 17, 2026

Choose a reason for hiding this comment

Uh oh!

russellmartin-livekit May 17, 2026

Choose a reason for hiding this comment

Uh oh!

tinalenguyen May 17, 2026

Choose a reason for hiding this comment

Uh oh!

longcw left a comment

Choose a reason for hiding this comment

Uh oh!

longcw May 17, 2026

Choose a reason for hiding this comment

Uh oh!

longcw May 17, 2026

Choose a reason for hiding this comment

Uh oh!

tinalenguyen May 17, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

tinalenguyen commented May 16, 2026 •

edited

Loading

tinalenguyen May 16, 2026 •

edited

Loading