Summary
When a realtime StoryRun receives spec.cancelRequested=true, Bobrapet currently routes it into handleGracefulCancel and keeps it alive until the resolved timeout expires. In the current implementation that timeout is derived from transport lifecycle.drainTimeoutSeconds, so room-end stop is delayed by the transport cutover drain window.
In the live cluster this produced a fixed ~2 minute wait after the call was already over.
Expected behavior
Realtime room-end stop should not inherit transport cutover drain timeout.
Once the realtime topology is gone, Bobrapet should finish the StoryRun quickly instead of waiting for drainTimeoutSeconds and then force-deleting StepRuns.
Observed behavior
Live run:
livekit-voice/livekit-voice-assistant-rm-6jwy7uutjydd-241a4c029032ae1d
spec.cancelRequested: true
storyrun.bubustack.io/graceful-cancel-observed-at: 2026-04-22T18:13:32.023539756Z
status.finishedAt: 2026-04-22T18:15:32Z
status.duration: 2m13s
Controller log for the same run:
Graceful cancel timeout expired; deleting remaining StepRuns
timeout:"2m0s"
startedAt:"2026-04-22T18:13:32.023539756Z"
The live example is currently configured with:
examples/realtime/livekit-voice/story.yaml:56-61
drainTimeoutSeconds: 120
Why this is wrong
The current StoryRun stop path couples room-end cancellation to transport lifecycle drain:
internal/controller/runs/storyrun_controller.go:278-280
cancelRequested short-circuits normal DAG reconciliation and goes straight into handleGracefulCancel
internal/controller/runs/storyrun_controller.go:1672-1795
handleGracefulCancel resolves its timeout from story transport drainTimeoutSeconds
api/transport/v1alpha1/transport_settings_types.go:433-440
DrainTimeoutSeconds is documented as drain-before-cutover transport lifecycle behavior, not room-end StoryRun stop behavior
So the controller currently uses a transport upgrade/cutover knob as the end-call stop delay.
Impact
- Realtime StoryRuns stay alive long after the call is already over
- Users see “stop” take minutes even when there is nothing left to process
- The controller only cleans up after the timeout loop expires
- This blocks the fast realtime termination path from being useful in production
Acceptance criteria
cancelRequested=true for realtime runs must not automatically wait on transport drainTimeoutSeconds
- Room-end stop should use a dedicated shutdown contract, or finish as soon as realtime topology termination is observed
- Bobrapet should not keep a canceled realtime StoryRun alive for the full transport cutover window when the room is already gone
- Add regression coverage for a realtime StoryRun where stop is requested after room termination and verify the StoryRun reaches terminal state without waiting for the transport cutover drain timeout
Summary
When a realtime StoryRun receives
spec.cancelRequested=true, Bobrapet currently routes it intohandleGracefulCanceland keeps it alive until the resolved timeout expires. In the current implementation that timeout is derived from transportlifecycle.drainTimeoutSeconds, so room-end stop is delayed by the transport cutover drain window.In the live cluster this produced a fixed ~2 minute wait after the call was already over.
Expected behavior
Realtime room-end stop should not inherit transport cutover drain timeout.
Once the realtime topology is gone, Bobrapet should finish the StoryRun quickly instead of waiting for
drainTimeoutSecondsand then force-deleting StepRuns.Observed behavior
Live run:
livekit-voice/livekit-voice-assistant-rm-6jwy7uutjydd-241a4c029032ae1dspec.cancelRequested: truestoryrun.bubustack.io/graceful-cancel-observed-at: 2026-04-22T18:13:32.023539756Zstatus.finishedAt: 2026-04-22T18:15:32Zstatus.duration: 2m13sController log for the same run:
Graceful cancel timeout expired; deleting remaining StepRunstimeout:"2m0s"startedAt:"2026-04-22T18:13:32.023539756Z"The live example is currently configured with:
examples/realtime/livekit-voice/story.yaml:56-61drainTimeoutSeconds: 120Why this is wrong
The current StoryRun stop path couples room-end cancellation to transport lifecycle drain:
internal/controller/runs/storyrun_controller.go:278-280cancelRequestedshort-circuits normal DAG reconciliation and goes straight intohandleGracefulCancelinternal/controller/runs/storyrun_controller.go:1672-1795handleGracefulCancelresolves its timeout from story transportdrainTimeoutSecondsapi/transport/v1alpha1/transport_settings_types.go:433-440DrainTimeoutSecondsis documented as drain-before-cutover transport lifecycle behavior, not room-end StoryRun stop behaviorSo the controller currently uses a transport upgrade/cutover knob as the end-call stop delay.
Impact
Acceptance criteria
cancelRequested=truefor realtime runs must not automatically wait on transportdrainTimeoutSeconds