Add support for GPT-Realtime-2.0 by ShayneP · Pull Request #5690 · livekit/agents

ShayneP · 2026-05-08T17:15:40Z

Summary

Update realtime agent output handling for Realtime 2.0 responses and fix transcript synchronization races around overlapping segment lifecycle events.

Realtime 2.0 can produce multiple message items for a single response. The Agents output stack exposes playback through a shared AudioOutput segment contract, so this PR forwards realtime message outputs sequentially through the sink. That keeps playback-start/playback-finished events attributable to the correct message and avoids adding or truncating the wrong assistant item during interruption.

This PR also hardens TranscriptSynchronizer segment handling when audio/text flush timing is not perfectly aligned.

Changes

Support multiple realtime message items in one generation.
Forward each realtime message’s audio/text output to completion before starting the next message’s sink forwarding.
Wait for per-message audio playout before registering the next message’s playback listener.
Preserve existing single-message realtime behavior for current providers.
Fix transcript synchronizer stale-duration handling by resetting per-segment pushed audio duration during flush.
Keep pending/finalizing transcript segment impls alive until their text/audio inputs are complete.
Emit interrupted or already-ready playback-finished events synchronously to preserve existing state transition ordering.
Delay non-interrupted synced playback-finished events only when needed to include complete synchronized transcript text.
Apply pause/resume to active, pending, and finalizing transcript segments.
Add focused regression tests for transcript segment advancement, delayed text completion, and pause/resume behavior.

Compatibility

This is intended to be public-API compatible. It does not change the AudioOutput or TextOutput interfaces or event payload shapes.

Existing realtime models emitted a single message item per response, so the sequential forwarding path preserves prior behavior while correctly handling the new Realtime 2.0 multi-message response shape.

The transcript synchronizer changes are internal lifecycle fixes. The main observable timing change is that a non-interrupted synced playback_finished event may wait for text/audio inputs to complete when playback finishes before transcript input drains. Interrupted and already-ready playback events remain synchronous.

Testing

make check
make unit-tests
uv run pytest tests/test_agent_session.py tests/test_transcript_synchronizer.py -q
git diff --check

make unit-tests result: 635 passed, 2 skipped

devin-ai-integration

✅ Devin Review: No Issues Found

Devin Review analyzed this PR and found no bugs or issues to report.

tinalenguyen

tested it and it works well, though i think the start/stop speaking_at times recorded in the chat messages aren't accurate 🤔 all messages in the same generation would have the same started_speaking_at timestamps for the chatmessages

longcw · 2026-05-15T09:19:37Z

+                    msg_tasks.clear()
+
+                    if audio_output is not None and audio_out is not None:
+                        await audio_output.wait_for_playout()


should we call perform_audio_forwarding once for multiple segment instead of calling it multiple times in one response, like merge the streams from generation_ev.message_stream? in that way we don't need to change the output and synchronizer?

~~btw, I didn't see multiple messages in a single response during my testing, is there a way to trigger that or it's just random?~~

update: I saw multiple segment with asking it to do so, like tell me a story with two parts

ShayneP added 4 commits May 8, 2026 10:06

Update for Realtime 2.0

3b6d189

Fix transcripts

fe7a0eb

Codex review

e1f5cd6

Commit makefile

91ac698

ShayneP requested review from theomonnom and tinalenguyen May 8, 2026 17:15

chenghao-mou requested a review from a team May 8, 2026 17:15

devin-ai-integration Bot reviewed May 8, 2026

View reviewed changes

tinalenguyen reviewed May 12, 2026

View reviewed changes

tinalenguyen mentioned this pull request May 13, 2026

Support gpt-realtime-2 #5684

Open

longcw reviewed May 15, 2026

View reviewed changes

longcw mentioned this pull request May 18, 2026

feat(realtime): support multi-message generation per response #5763

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add support for GPT-Realtime-2.0#5690

Add support for GPT-Realtime-2.0#5690
ShayneP wants to merge 4 commits into
mainfrom
ShayneP/realtime-2.0

ShayneP commented May 8, 2026

Uh oh!

devin-ai-integration Bot left a comment

Uh oh!

tinalenguyen left a comment

Uh oh!

longcw May 15, 2026 •

edited

Loading

Uh oh!

longcw May 15, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

ShayneP commented May 8, 2026

Summary

Changes

Compatibility

Testing

Uh oh!

devin-ai-integration Bot left a comment

Choose a reason for hiding this comment

✅ Devin Review: No Issues Found

Uh oh!

tinalenguyen left a comment

Choose a reason for hiding this comment

Uh oh!

longcw May 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

longcw May 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

longcw May 15, 2026 •

edited

Loading

longcw May 15, 2026 •

edited

Loading