fix(transcription): warn only when rotateSegment backlog grows#1514
fix(transcription): warn only when rotateSegment backlog grows#1514julien-lottie wants to merge 3 commits into
Conversation
The "rotateSegment called while previous segment is still being rotated" log fires at every turn boundary because playback-finished, output attach/detach, and new-utterance events naturally overlap with the prior segment's close-and-recreate task. The rotation is safely serialized — the new Task awaits oldTask.result before recreating the SegmentSynchronizerImpl — so a single overlap is the expected case and no transcript data is lost. Track the number of rotations queued behind the in-flight one and only warn when more than one is stacked. That's when the backlog is actually growing and an operator should pay attention. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
🦋 Changeset detectedLatest commit: 32951d1 The changes in this PR will be included in the next version bump. This PR includes changesets to release 31 packages
Not sure what this means? Click here to learn what changesets are. Click here if you're a maintainer who wants to add another changeset to this PR |
|
@julien-lottie could you confirm if this change is working with real e2e agent runs? |
Production data on Lottie's eliza-agent (dash0) shows every observed `rotateSegment backlog` warn fires within ~0.2–1.2s of session startup, before the first agent utterance — never at a mid-conversation turn boundary. The trigger is a race between the constructor-scheduled initial rotation task and the room's CONN_CONNECTED event, which stacks two extra rotateSegment calls onto the chain before the initial task drains. The chain settles long before any TTS frame is produced, so the caller-perceived latency is zero. Track when the initial task has resolved at least once via `Task.addDoneCallback`, and gate the warn behind that flag. The counter (`queuedRotations`) keeps incrementing during startup so the serialisation invariants are preserved; only the noisy log line is suppressed. Real mid-conversation backlogs (which is what the warn was designed to surface) still trip the warn once the synchronizer leaves the startup window. Add two regression tests covering both paths. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
@toubatbrian - yes and it shows a significant reduction in warnings. However we keep having a a systematic warning on startup. See below. I'm testing a patch that fixes that in our environment and will convert back from draft when confirmed this one removes the final warnings Added startup-suppression for the backlog warn. Production data on a downstream deployment shows every observed The race is between the constructor-scheduled initial rotation task and the room's Mechanism
Two regression tests added; 23/23 synchronizer tests pass. |
| const initialTask = Task.from((controller) => this.rotateSegmentTaskImpl(controller.signal)); | ||
| initialTask.addDoneCallback(() => { | ||
| this.initialRotationDone = true; | ||
| }); | ||
| this.rotateSegmentTask = initialTask; |
There was a problem hiding this comment.
🟡 initialRotationDone is set after barrier() resolves due to microtask ordering, making the flag unreliable
The initialRotationDone flag is set via addDoneCallback (synchronizer.ts:566-568), which fires in the Task constructor's .finally() chain on resultFuture.await (utils.ts:505-518). However, barrier() (synchronizer.ts:636-641) resolves via a direct await on the same resultFuture.await promise. Because the constructor's .then() handler was registered before the barrier's .then(), both fire in registration order, but the .finally() (which invokes done callbacks) is scheduled one microtask tick after the barrier continuation. This means initialRotationDone is still false when code executes synchronously after await barrier() returns.
In the test at line 339, await synchronizer.barrier() returns, then three synchronous rotateSegment() calls are made (lines 343-345). When the third call checks this.queuedRotations > 1 && this.initialRotationDone (synchronizer.ts:619), initialRotationDone is false, so no warning is logged. The test then expects backlogWarns.length >= 1 (line 350), which should fail.
In production, this means the suppression window is slightly wider than intended (more warnings suppressed), which is benign but doesn't match the documented intent.
Prompt for agents
The initialRotationDone flag is set via Task.addDoneCallback, which fires asynchronously (in a .finally() two microtask ticks after the result promise resolves). But barrier() returns as soon as the result promise resolves, so any code running synchronously after await barrier() sees initialRotationDone as false.
To fix, stop relying on addDoneCallback for setting initialRotationDone. Instead, set initialRotationDone = true at the END of rotateSegmentTaskImpl itself (inside the try block, after the impl close/recreate), but only for the initial invocation (when oldTask is undefined). This makes the flag synchronous with the task's own execution, so it is guaranteed to be true before the Task resolves.
Alternatively, in the test, add an extra await (e.g. await new Promise(resolve => queueMicrotask(resolve))) after barrier() to allow the done callback microtask to drain before proceeding. However, fixing the source is cleaner.
Relevant code:
- synchronizer.ts constructor (lines 565-569): initialTask.addDoneCallback
- synchronizer.ts rotateSegmentTaskImpl (lines 643-659): the try/finally block
- utils.ts Task class (lines 505-518): done callback firing in .finally()
- synchronizer.test.ts (lines 327-353): the test that likely fails
Was this helpful? React with 👍 or 👎 to provide feedback.
There was a problem hiding this comment.
Good catch — adopted the suggested fix in 32951d1.
Empirically the existing test passes because the constructor's .then(noop, noop) continuation is registered before barrier()'s await this.rotateSegmentTask.result, so when the future resolves the order is:
- Constructor's
.then(noop, noop)runs → resolves the intermediate promise → schedules.finally(invokeCallbacks)as a fresh microtask barrier()'s await resumes →barrier()completes → schedules test's continuation as a fresh microtaskinvokeCallbacksruns (sets the flag)- Test continuation runs (sees flag = true)
So in this exact configuration the test isn't actually broken. But you're right that it's fragile — any change to the chain shape (e.g. dropping the .then(noop, noop) step in Task.runTask, or another await in barrier()) would flip the ordering silently.
The fix moves the assignment into rotateSegmentTaskImpl's finally block. By the time runTask's .then(value => this.resultFuture.resolve(value)) fires, the task body's finally has already run, so any continuation on task.result deterministically observes initialRotationDone === true. No reliance on addDoneCallback microtask timing.
Existing tests still pass (23/23).
Devin Review flagged the previous `Task.addDoneCallback` approach as microtask-fragile: the callback fires from a `.finally()` chained off the result promise, which is queued *after* any direct `await result` continuation. The current test passes because of a favourable ordering, but a small refactor of the surrounding code could quietly flip it. Set the flag synchronously in `rotateSegmentTaskImpl`'s `finally` block instead. By the time `runTask` resolves the future, the finally has already executed, so any continuation (including `barrier()` and the test's `await synchronizer.barrier()`) observes `initialRotationDone = true` deterministically. No reliance on microtask scheduling. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Summary
TranscriptionSynchronizer.rotateSegmentcurrently logsat
warnlevel whenever a rotation is requested while another is in flight. In practice this fires at every turn boundary —onPlaybackFinished, output attach/detach, new-utterance events, etc. naturally overlap with the prior segment's close-and-recreateTask. The rotation is safely serialized (rotateSegmentTaskImplawaitsoldTask.resultbefore recreating theSegmentSynchronizerImpl), so a single overlap is expected and no transcript data is lost.The result is that production logs get flooded with a benign warning.
This PR tracks the depth of the queue behind the in-flight rotation and only warns when more than one rotation is stacked — which is when a backlog is actually growing and an operator should care.
Behavior
warn)warnwith the actual depthImplementation
queuedRotations: number = 0field onTranscriptionSynchronizerrotateSegment()increments when stacking onto an in-flight task, warns when> 1rotateSegmentTaskImpl()decrements in afinallyblock (floored at 0 to keep the initial constructor-scheduled task safe)Test plan
pnpm build:agents— cleanpnpm -w test agents/src/voice/transcription/synchronizer.test.ts— 21/21 passpnpm -w format:checkclean on touched files🤖 Generated with Claude Code