Skip to content

fix(transcription): warn only when rotateSegment backlog grows#1514

Draft
julien-lottie wants to merge 3 commits into
livekit:mainfrom
lottiehq-oss:claude/sleepy-moore-88784b
Draft

fix(transcription): warn only when rotateSegment backlog grows#1514
julien-lottie wants to merge 3 commits into
livekit:mainfrom
lottiehq-oss:claude/sleepy-moore-88784b

Conversation

@julien-lottie
Copy link
Copy Markdown
Contributor

Summary

TranscriptionSynchronizer.rotateSegment currently logs

rotateSegment called while previous segment is still being rotated

at warn level whenever a rotation is requested while another is in flight. In practice this fires at every turn boundary — onPlaybackFinished, output attach/detach, new-utterance events, etc. naturally overlap with the prior segment's close-and-recreate Task. The rotation is safely serialized (rotateSegmentTaskImpl awaits oldTask.result before recreating the SegmentSynchronizerImpl), so a single overlap is expected and no transcript data is lost.

The result is that production logs get flooded with a benign warning.

This PR tracks the depth of the queue behind the in-flight rotation and only warns when more than one rotation is stacked — which is when a backlog is actually growing and an operator should care.

Behavior

  • 0 queued (no overlap): silent (unchanged)
  • 1 queued (the common case, single overlap at a turn boundary): silent (was a warn)
  • 2+ queued (real backlog): warn with the actual depth

Implementation

  • queuedRotations: number = 0 field on TranscriptionSynchronizer
  • rotateSegment() increments when stacking onto an in-flight task, warns when > 1
  • rotateSegmentTaskImpl() decrements in a finally block (floored at 0 to keep the initial constructor-scheduled task safe)

Test plan

  • pnpm build:agents — clean
  • pnpm -w test agents/src/voice/transcription/synchronizer.test.ts — 21/21 pass
  • pnpm -w format:check clean on touched files

🤖 Generated with Claude Code

The "rotateSegment called while previous segment is still being rotated"
log fires at every turn boundary because playback-finished, output
attach/detach, and new-utterance events naturally overlap with the prior
segment's close-and-recreate task. The rotation is safely serialized —
the new Task awaits oldTask.result before recreating the
SegmentSynchronizerImpl — so a single overlap is the expected case and
no transcript data is lost.

Track the number of rotations queued behind the in-flight one and only
warn when more than one is stacked. That's when the backlog is actually
growing and an operator should pay attention.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@changeset-bot
Copy link
Copy Markdown

changeset-bot Bot commented May 15, 2026

🦋 Changeset detected

Latest commit: 32951d1

The changes in this PR will be included in the next version bump.

This PR includes changesets to release 31 packages
Name Type
@livekit/agents Patch
@livekit/agents-plugin-anam Patch
@livekit/agents-plugin-assemblyai Patch
@livekit/agents-plugin-baseten Patch
@livekit/agents-plugin-bey Patch
@livekit/agents-plugin-cartesia Patch
@livekit/agents-plugin-cerebras Patch
@livekit/agents-plugin-deepgram Patch
@livekit/agents-plugin-elevenlabs Patch
@livekit/agents-plugin-fishaudio Patch
@livekit/agents-plugin-google Patch
@livekit/agents-plugin-hedra Patch
@livekit/agents-plugin-hume Patch
@livekit/agents-plugin-inworld Patch
@livekit/agents-plugin-lemonslice Patch
@livekit/agents-plugin-liveavatar Patch
@livekit/agents-plugin-livekit Patch
@livekit/agents-plugin-minimax Patch
@livekit/agents-plugin-mistral Patch
@livekit/agents-plugin-mistralai Patch
@livekit/agents-plugin-neuphonic Patch
@livekit/agents-plugin-openai Patch
@livekit/agents-plugin-phonic Patch
@livekit/agents-plugin-resemble Patch
@livekit/agents-plugin-rime Patch
@livekit/agents-plugin-runway Patch
@livekit/agents-plugin-sarvam Patch
@livekit/agents-plugin-silero Patch
@livekit/agents-plugins-test Patch
@livekit/agents-plugin-trugen Patch
@livekit/agents-plugin-xai Patch

Not sure what this means? Click here to learn what changesets are.

Click here if you're a maintainer who wants to add another changeset to this PR

@CLAassistant
Copy link
Copy Markdown

CLAassistant commented May 15, 2026

CLA assistant check
All committers have signed the CLA.

Copy link
Copy Markdown
Contributor

@devin-ai-integration devin-ai-integration Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ Devin Review: No Issues Found

Devin Review analyzed this PR and found no potential bugs to report.

View in Devin Review to see 2 additional findings.

Open in Devin Review

@toubatbrian
Copy link
Copy Markdown
Contributor

@julien-lottie could you confirm if this change is working with real e2e agent runs?

Production data on Lottie's eliza-agent (dash0) shows every observed
`rotateSegment backlog` warn fires within ~0.2–1.2s of session startup,
before the first agent utterance — never at a mid-conversation turn
boundary. The trigger is a race between the constructor-scheduled
initial rotation task and the room's CONN_CONNECTED event, which
stacks two extra rotateSegment calls onto the chain before the initial
task drains. The chain settles long before any TTS frame is produced,
so the caller-perceived latency is zero.

Track when the initial task has resolved at least once via
`Task.addDoneCallback`, and gate the warn behind that flag. The
counter (`queuedRotations`) keeps incrementing during startup so the
serialisation invariants are preserved; only the noisy log line is
suppressed.

Real mid-conversation backlogs (which is what the warn was designed to
surface) still trip the warn once the synchronizer leaves the startup
window. Add two regression tests covering both paths.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@julien-lottie
Copy link
Copy Markdown
Contributor Author

julien-lottie commented May 16, 2026

@toubatbrian - yes and it shows a significant reduction in warnings. However we keep having a a systematic warning on startup. See below. I'm testing a patch that fixes that in our environment and will convert back from draft when confirmed this one removes the final warnings


Added startup-suppression for the backlog warn. Production data on a downstream deployment shows every observed rotateSegment backlog warn fires at session startup — within ~0.2–1.2s of Resolved voice runtime configuration, before any agent utterance — not at turn boundaries.

The race is between the constructor-scheduled initial rotation task and the room's connection_state_changed event handling, which stacks two extra rotateSegment calls onto the chain before the initial task drains. Chain settles before TTS produces any frame, so caller-perceived latency is zero (verified by comparing cfg→first-speech latency on calls with vs. without the warn — warned calls were actually faster on a small sample).

Mechanism

  • Task.addDoneCallback on the initial task flips initialRotationDone.
  • rotateSegment keeps counting queuedRotations so the serialisation invariants are preserved; only the logger.warn is gated.
  • Real mid-conversation backlogs still trip the warn once the synchronizer leaves the startup window.

Two regression tests added; 23/23 synchronizer tests pass.

@julien-lottie julien-lottie marked this pull request as draft May 16, 2026 07:42
Copy link
Copy Markdown
Contributor

@devin-ai-integration devin-ai-integration Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Devin Review found 1 new potential issue.

View 5 additional findings in Devin Review.

Open in Devin Review

Comment on lines +565 to +569
const initialTask = Task.from((controller) => this.rotateSegmentTaskImpl(controller.signal));
initialTask.addDoneCallback(() => {
this.initialRotationDone = true;
});
this.rotateSegmentTask = initialTask;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 initialRotationDone is set after barrier() resolves due to microtask ordering, making the flag unreliable

The initialRotationDone flag is set via addDoneCallback (synchronizer.ts:566-568), which fires in the Task constructor's .finally() chain on resultFuture.await (utils.ts:505-518). However, barrier() (synchronizer.ts:636-641) resolves via a direct await on the same resultFuture.await promise. Because the constructor's .then() handler was registered before the barrier's .then(), both fire in registration order, but the .finally() (which invokes done callbacks) is scheduled one microtask tick after the barrier continuation. This means initialRotationDone is still false when code executes synchronously after await barrier() returns.

In the test at line 339, await synchronizer.barrier() returns, then three synchronous rotateSegment() calls are made (lines 343-345). When the third call checks this.queuedRotations > 1 && this.initialRotationDone (synchronizer.ts:619), initialRotationDone is false, so no warning is logged. The test then expects backlogWarns.length >= 1 (line 350), which should fail.

In production, this means the suppression window is slightly wider than intended (more warnings suppressed), which is benign but doesn't match the documented intent.

Prompt for agents
The initialRotationDone flag is set via Task.addDoneCallback, which fires asynchronously (in a .finally() two microtask ticks after the result promise resolves). But barrier() returns as soon as the result promise resolves, so any code running synchronously after await barrier() sees initialRotationDone as false.

To fix, stop relying on addDoneCallback for setting initialRotationDone. Instead, set initialRotationDone = true at the END of rotateSegmentTaskImpl itself (inside the try block, after the impl close/recreate), but only for the initial invocation (when oldTask is undefined). This makes the flag synchronous with the task's own execution, so it is guaranteed to be true before the Task resolves.

Alternatively, in the test, add an extra await (e.g. await new Promise(resolve => queueMicrotask(resolve))) after barrier() to allow the done callback microtask to drain before proceeding. However, fixing the source is cleaner.

Relevant code:
- synchronizer.ts constructor (lines 565-569): initialTask.addDoneCallback
- synchronizer.ts rotateSegmentTaskImpl (lines 643-659): the try/finally block
- utils.ts Task class (lines 505-518): done callback firing in .finally()
- synchronizer.test.ts (lines 327-353): the test that likely fails
Open in Devin Review

Was this helpful? React with 👍 or 👎 to provide feedback.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch — adopted the suggested fix in 32951d1.

Empirically the existing test passes because the constructor's .then(noop, noop) continuation is registered before barrier()'s await this.rotateSegmentTask.result, so when the future resolves the order is:

  1. Constructor's .then(noop, noop) runs → resolves the intermediate promise → schedules .finally(invokeCallbacks) as a fresh microtask
  2. barrier()'s await resumes → barrier() completes → schedules test's continuation as a fresh microtask
  3. invokeCallbacks runs (sets the flag)
  4. Test continuation runs (sees flag = true)

So in this exact configuration the test isn't actually broken. But you're right that it's fragile — any change to the chain shape (e.g. dropping the .then(noop, noop) step in Task.runTask, or another await in barrier()) would flip the ordering silently.

The fix moves the assignment into rotateSegmentTaskImpl's finally block. By the time runTask's .then(value => this.resultFuture.resolve(value)) fires, the task body's finally has already run, so any continuation on task.result deterministically observes initialRotationDone === true. No reliance on addDoneCallback microtask timing.

Existing tests still pass (23/23).

Devin Review flagged the previous `Task.addDoneCallback` approach as
microtask-fragile: the callback fires from a `.finally()` chained off
the result promise, which is queued *after* any direct `await result`
continuation. The current test passes because of a favourable
ordering, but a small refactor of the surrounding code could quietly
flip it.

Set the flag synchronously in `rotateSegmentTaskImpl`'s `finally`
block instead. By the time `runTask` resolves the future, the finally
has already executed, so any continuation (including `barrier()` and
the test's `await synchronizer.barrier()`) observes
`initialRotationDone = true` deterministically. No reliance on
microtask scheduling.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants