Realtime StoryRun stop path waits on transport drainTimeoutSeconds instead of topology termination

## Summary

When a realtime StoryRun receives `spec.cancelRequested=true`, Bobrapet currently routes it into `handleGracefulCancel` and keeps it alive until the resolved timeout expires. In the current implementation that timeout is derived from transport `lifecycle.drainTimeoutSeconds`, so room-end stop is delayed by the transport cutover drain window.

In the live cluster this produced a fixed ~2 minute wait after the call was already over.

## Expected behavior

Realtime room-end stop should not inherit transport cutover drain timeout.

Once the realtime topology is gone, Bobrapet should finish the StoryRun quickly instead of waiting for `drainTimeoutSeconds` and then force-deleting StepRuns.

## Observed behavior

Live run:
- `livekit-voice/livekit-voice-assistant-rm-6jwy7uutjydd-241a4c029032ae1d`
- `spec.cancelRequested: true`
- `storyrun.bubustack.io/graceful-cancel-observed-at: 2026-04-22T18:13:32.023539756Z`
- `status.finishedAt: 2026-04-22T18:15:32Z`
- `status.duration: 2m13s`

Controller log for the same run:
- `Graceful cancel timeout expired; deleting remaining StepRuns`
- `timeout:"2m0s"`
- `startedAt:"2026-04-22T18:13:32.023539756Z"`

The live example is currently configured with:
- `examples/realtime/livekit-voice/story.yaml:56-61`
- `drainTimeoutSeconds: 120`

## Why this is wrong

The current StoryRun stop path couples room-end cancellation to transport lifecycle drain:
- `internal/controller/runs/storyrun_controller.go:278-280`
  - `cancelRequested` short-circuits normal DAG reconciliation and goes straight into `handleGracefulCancel`
- `internal/controller/runs/storyrun_controller.go:1672-1795`
  - `handleGracefulCancel` resolves its timeout from story transport `drainTimeoutSeconds`
- `api/transport/v1alpha1/transport_settings_types.go:433-440`
  - `DrainTimeoutSeconds` is documented as drain-before-cutover transport lifecycle behavior, not room-end StoryRun stop behavior

So the controller currently uses a transport upgrade/cutover knob as the end-call stop delay.

## Impact

- Realtime StoryRuns stay alive long after the call is already over
- Users see “stop” take minutes even when there is nothing left to process
- The controller only cleans up after the timeout loop expires
- This blocks the fast realtime termination path from being useful in production

## Acceptance criteria

- `cancelRequested=true` for realtime runs must not automatically wait on transport `drainTimeoutSeconds`
- Room-end stop should use a dedicated shutdown contract, or finish as soon as realtime topology termination is observed
- Bobrapet should not keep a canceled realtime StoryRun alive for the full transport cutover window when the room is already gone
- Add regression coverage for a realtime StoryRun where stop is requested after room termination and verify the StoryRun reaches terminal state without waiting for the transport cutover drain timeout


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Realtime StoryRun stop path waits on transport drainTimeoutSeconds instead of topology termination #87

Summary

Expected behavior

Observed behavior

Why this is wrong

Impact

Acceptance criteria

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Realtime StoryRun stop path waits on transport drainTimeoutSeconds instead of topology termination #87

Description

Summary

Expected behavior

Observed behavior

Why this is wrong

Impact

Acceptance criteria

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions