-
Notifications
You must be signed in to change notification settings - Fork 2.6k
Expose LiveKit VAD events to STT providers #4565
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
- Add STT.on_vad_event() default no-op hook (documented) - Forward VAD events from AgentActivity to the active STT (best-effort) - RTZR STT can optionally consume VAD events for endpointing via use_vad_event
|
Note Other AI code review bot(s) detectedCodeRabbit has detected other AI code review bot(s) in this pull request and will avoid duplicating their findings in the review comments. This may lead to a less comprehensive review. 📝 WalkthroughWalkthroughAdds VAD event propagation and hooks across agents and STT, integrates STT.on_vad_event calls in agent activity, enhances RTZR plugin typing/token handling, and substantially refactors the RTZR STT implementation to manage stream lifecycle, VAD-driven flows, WebSocket communication, and idle timeouts. Changes
Sequence Diagram(s)sequenceDiagram
participant VAD as VAD Engine
participant AA as Agent Activity
participant STT as STT (Base)
participant RTZR as RTZR STT
participant Stream as SpeechStream
VAD->>AA: on_start_of_speech()
AA->>STT: on_vad_event(START_OF_SPEECH)
STT->>RTZR: on_vad_event()
RTZR->>Stream: handle start -> _emit_audio() / START_OF_SPEECH
VAD->>AA: on_vad_inference_done()
AA->>STT: on_vad_event(SPEECH_FRAME)
STT->>RTZR: on_vad_event()
RTZR->>Stream: _emit_audio()
VAD->>AA: on_end_of_speech()
AA->>STT: on_vad_event(END_OF_SPEECH)
STT->>RTZR: on_vad_event()
RTZR->>Stream: _flush_pending_frames() -> END_OF_SPEECH
Stream->>Stream: _idle_watchdog (25s)
Stream->>RTZR: _handle_idle_timeout() -> close WS / cleanup
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches
🧹 Recent nitpick comments
📜 Recent review detailsConfiguration used: Organization UI Review profile: CHILL Plan: Pro 📒 Files selected for processing (3)
🧰 Additional context used📓 Path-based instructions (1)**/*.py📄 CodeRabbit inference engine (AGENTS.md)
Files:
🧬 Code graph analysis (2)livekit-agents/livekit/agents/stt/stt.py (4)
livekit-plugins/livekit-plugins-rtzr/livekit/plugins/rtzr/stt.py (6)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
🔇 Additional comments (14)
✏️ Tip: You can disable this entire section by setting Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 31f25c4bd5
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| async def _process_vad_events(self) -> None: | ||
| try: | ||
| while True: | ||
| ev = await self._vad_event_queue.get() | ||
| if ev.type == agents_vad.VADEventType.START_OF_SPEECH: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add a termination path for VAD event processing
The VAD task runs an infinite while True loop awaiting self._vad_event_queue.get() and never exits on its own. Because _run awaits asyncio.gather(send_task, vad_task), once the input channel closes and send_task finishes, the stream will still hang waiting on this VAD task, leaving _run (and the stream) stuck until it is externally cancelled. This means consumers that rely on natural end-of-stream (e.g., after end_input) will hang unless they manually cancel; consider closing the queue or adding a sentinel/exit condition tied to input shutdown.
Useful? React with 👍 / 👎.
| for frame in ev.frames: | ||
| payload = frame.data.tobytes() | ||
| async with self._send_lock: | ||
| await ws.send_bytes(payload) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Resample VAD frames before sending to RTZR
In _handle_vad_start the code forwards ev.frames directly to the websocket. These frames come from the VAD pipeline (raw rtc.AudioFrames forwarded in audio_recognition.push_audio) and bypass the STT resampling path that normally converts input to self._rtzr_stt._params.sample_rate. If the room audio sample rate differs from the RTZR configuration (common 48 kHz input vs 8 kHz RTZR), the service receives audio at the wrong rate and transcripts/endpointing become unreliable. Consider resampling ev.frames or using the already-resampled audio from the STT input queue.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
🤖 Fix all issues with AI agents
In `@livekit-agents/livekit/agents/stt/stt.py`:
- Around line 238-249: The type annotation on the method on_vad_event currently
uses a quoted forward reference ("VADEvent"); remove the unnecessary quotes so
the signature becomes def on_vad_event(self, ev: VADEvent) -> None: since the
module uses from __future__ import annotations. Update the annotation only (no
behavior changes), keep the docstring and return, and ensure the VADEvent symbol
is still available/imported where referenced.
In `@livekit-plugins/livekit-plugins-rtzr/livekit/plugins/rtzr/rtzrapi.py`:
- Around line 157-163: The token refresh check in the method that returns the
access token uses an inverted threshold and only refreshes after a token has
been expired for over an hour; update the condition that checks
token["expire_at"] to refresh proactively (use token["expire_at"] < time.time()
+ 3600) so the code calls self._refresh_token() within the last hour of
validity; ensure you still handle token is None and raise RTZRAPIError("Failed
to obtain RTZR access token") if refresh fails (references: variable token, key
"expire_at", method _refresh_token, and RTZRAPIError).
🧹 Nitpick comments (1)
livekit-plugins/livekit-plugins-rtzr/livekit/plugins/rtzr/stt.py (1)
197-469: Consider bounding_pending_speech_framesto avoid unbounded growth.
If VAD start events are delayed or dropped, buffered audio could grow without limit. A small cap with backpressure logging would prevent memory spikes.♻️ Optional mitigation
-_pending_speech_frames: deque[bytes] = deque() +_MAX_PENDING_FRAMES = 200 +_pending_speech_frames: deque[bytes] = deque(maxlen=_MAX_PENDING_FRAMES)
📜 Review details
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (7)
CONTRIBUTING.mdlivekit-agents/livekit/agents/stt/stt.pylivekit-agents/livekit/agents/voice/agent_activity.pylivekit-plugins/livekit-plugins-rtzr/livekit/plugins/rtzr/__init__.pylivekit-plugins/livekit-plugins-rtzr/livekit/plugins/rtzr/py.typedlivekit-plugins/livekit-plugins-rtzr/livekit/plugins/rtzr/rtzrapi.pylivekit-plugins/livekit-plugins-rtzr/livekit/plugins/rtzr/stt.py
🧰 Additional context used
📓 Path-based instructions (1)
**/*.py
📄 CodeRabbit inference engine (AGENTS.md)
**/*.py: Format code with ruff
Run ruff linter and auto-fix issues
Run mypy type checker in strict mode
Maintain line length of 100 characters maximum
Ensure Python 3.9+ compatibility
Use Google-style docstrings
Files:
livekit-agents/livekit/agents/voice/agent_activity.pylivekit-agents/livekit/agents/stt/stt.pylivekit-plugins/livekit-plugins-rtzr/livekit/plugins/rtzr/__init__.pylivekit-plugins/livekit-plugins-rtzr/livekit/plugins/rtzr/stt.pylivekit-plugins/livekit-plugins-rtzr/livekit/plugins/rtzr/rtzrapi.py
🧠 Learnings (1)
📚 Learning: 2026-01-16T07:44:56.353Z
Learnt from: CR
Repo: livekit/agents PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-01-16T07:44:56.353Z
Learning: Follow the Plugin System pattern where plugins in livekit-plugins/ are separate packages registered via the Plugin base class
Applied to files:
CONTRIBUTING.md
🧬 Code graph analysis (3)
livekit-agents/livekit/agents/voice/agent_activity.py (4)
livekit-agents/livekit/agents/voice/agent_session.py (1)
stt(1256-1257)livekit-agents/livekit/agents/voice/agent.py (1)
stt(508-518)livekit-plugins/livekit-plugins-rtzr/livekit/plugins/rtzr/stt.py (1)
on_vad_event(145-188)livekit-agents/livekit/agents/stt/stt.py (1)
on_vad_event(238-249)
livekit-agents/livekit/agents/stt/stt.py (5)
livekit-agents/livekit/agents/voice/agent_activity.py (1)
vad(2792-2793)livekit-agents/livekit/agents/voice/agent_session.py (1)
vad(1268-1269)livekit-agents/livekit/agents/voice/agent.py (1)
vad(560-570)livekit-agents/livekit/agents/vad.py (1)
VADEvent(26-68)livekit-plugins/livekit-plugins-rtzr/livekit/plugins/rtzr/stt.py (1)
on_vad_event(145-188)
livekit-plugins/livekit-plugins-rtzr/livekit/plugins/rtzr/stt.py (3)
livekit-agents/livekit/agents/stt/stt.py (4)
on_vad_event(238-249)stream(224-232)STT(101-264)SpeechEvent(72-76)livekit-agents/livekit/agents/vad.py (2)
VADEvent(26-68)VADEventType(19-22)livekit-plugins/livekit-plugins-rtzr/livekit/plugins/rtzr/rtzrapi.py (1)
build_config(238-265)
🪛 GitHub Check: ruff
livekit-agents/livekit/agents/stt/stt.py
[failure] 238-238: Ruff (UP037)
livekit-agents/livekit/agents/stt/stt.py:238:32: UP037 Remove quotes from type annotation
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
- GitHub Check: type-check (3.13)
- GitHub Check: type-check (3.9)
- GitHub Check: unit-tests
🔇 Additional comments (7)
livekit-plugins/livekit-plugins-rtzr/livekit/plugins/rtzr/py.typed (1)
1-1: No action needed. Thepy.typedmarker file is correctly placed and will be automatically included in the package distribution by hatchling when it parsespackages = ["livekit"]in the build configuration. This follows the established pattern used across 50+ existing plugins in the codebase, all of which use the same configuration without requiring explicitpy.typedentries inpyproject.toml.Likely an incorrect or invalid review comment.
livekit-plugins/livekit-plugins-rtzr/livekit/plugins/rtzr/__init__.py (1)
18-20: Nice clarity on__init__return type.
Clean, explicit typing without behavioral impact.CONTRIBUTING.md (1)
84-85: Good update to the mypy plugin list.
Keeps local type-checking aligned with the RTZR plugin.livekit-agents/livekit/agents/voice/agent_activity.py (1)
1226-1231: Solid defensive VAD hook propagation.
The try/except guard keeps the pipeline resilient while enabling observability.Also applies to: 1249-1254, 1272-1277
livekit-plugins/livekit-plugins-rtzr/livekit/plugins/rtzr/rtzrapi.py (1)
45-72: Typing + validation hardening looks good.
The structured token parsing, keyword validation, and context manager typing are clean improvements.Also applies to: 142-151, 176-184, 248-249
livekit-plugins/livekit-plugins-rtzr/livekit/plugins/rtzr/stt.py (2)
64-195: VAD hook integration and stream registration look solid.
Nice alignment with the new STT hook and multi-stream coordination.
471-569: Recv loop changes are clear and consistent.
The transcript event emission and error handling read well.
✏️ Tip: You can disable this entire section by setting review_details to false in your review settings.
Summary
Expose LiveKit VAD events to STT providers via a new
STT.on_vad_event()hook, and forward VAD events fromAgentActivity. The RTZR STT plugin can optionally use these events (use_vad_event) to align streaming websocket lifecycle/endpointing with LiveKit VAD. This PR also includes type-safety fixes needed formypy -p livekit.plugins.rtzr.Motivation
LiveKit already computes speaking state and speech boundaries via VAD, but STT providers currently can’t observe it. This makes it harder to:
We also want the RTZR plugin to be clean under mypy strict checks.
Changes
livekit.agents.stt.STT: addon_vad_event(ev: VADEvent)hook (default no-op; documented)livekit.agents.voice.AgentActivity: forward VAD events toself.stt.on_vad_event(...)(exceptions swallowed to avoid destabilizing the pipeline)livekit.plugins.rtzr.STT: optional VAD-driven behavior controlled byuse_vad_eventrtzrapi.py__init__Backwards compatibility
use_vad_event.How to test
uv run ruff check --output-format=github .uv run ruff format .uv run pytestuv run mypy --install-types --non-interactive -p livekit.plugins.rtzrNotes
CHANGELOG.mdor package manifest updates are included, per project guidelines.Summary by CodeRabbit
New Features
use_vad_eventconfiguration option.Improvements
Documentation
✏️ Tip: You can customize this high-level summary in your review settings.