feat: add stop button to dashboard for cancelling requests#1389
feat: add stop button to dashboard for cancelling requests#1389AlexCheema wants to merge 11 commits intomainfrom
Conversation
closing the http request to the api now - sends a cancellation from the api - writes that canellation in the master - worker plans off the cancellation - runner observes that cancellation after every generation step (+1 communication per token) - cancellation happens synchronously to prevent gpu locks
When a chat, image generation, or image edit request is streaming, the send button is replaced with a stop button. Clicking it aborts the HTTP connection, which triggers the backend cancellation chain. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Fix broken monkeypatch of `mx.all_gather` by patching `mx.distributed.all_gather` correctly - Apply formatter to runner.py for CI compliance Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2f36078 to
905b395
Compare
closing the http request to the api now - sends a cancellation from the api - writes that canellation in the master - worker plans off the cancellation - runner observes that cancellation after every generation step (+1 communication per token) - cancellation happens synchronously to prevent gpu locks
905b395 to
79661ea
Compare
closing the http request to the api now - sends a cancellation from the api - writes that canellation in the master - worker plans off the cancellation - runner observes that cancellation after every generation step (+1 communication per token) - cancellation happens synchronously to prevent gpu locks
79661ea to
333119e
Compare
Resolve conflicts by keeping runner_id on CancelTask and using it directly in worker dispatch and plan generation. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Keep mx.distributed.all_gather monkeypatch which matches how runner.py actually calls all_gather. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
0f4ecaa to
33060b3
Compare
…stop-button # Conflicts: # src/exo/master/placement.py # src/exo/worker/engines/mlx/generator/generate.py # src/exo/worker/main.py # src/exo/worker/runner/runner.py # src/exo/worker/runner/runner_supervisor.py
- Resolve 5 merge conflicts from origin/main merge - Fix `await runner.shutdown()` → `runner.shutdown()` (now sync) - Apply nix fmt formatting - Exclude start_distributed_test.py from pytest collection Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
33060b3 to
4ba0210
Compare
1d67441 to
30e4a74
Compare
Code Review: PR #1389 -- Stop Button / Request CancellationNice feature -- this is a solid end-to-end implementation of cancellation from frontend through to the MLX runner. The design of using Bugs1. In case TaskFinished():
generated_events.append(
TaskDeleted(
task_id=self.command_task_mapping[
command.finished_command_id # <-- KeyError if missing
]
)
)
self.command_task_mapping.pop(
command.finished_command_id, None
)In the normal cancellation flow ( case TaskFinished():
if (task_id := self.command_task_mapping.pop(
command.finished_command_id, None
)) is not None:
generated_events.append(TaskDeleted(task_id=task_id))2. case CancelTask(
cancelled_task_id=cancelled_task_id, runner_id=runner_id
):
await self.runners[runner_id].cancel_task(cancelled_task_id)If the runner was already shut down and removed from 3. async def _start_runner_task(self, task: Task):
if (instance := self.state.instances.get(task.instance_id)) is not None:
await self.runners[
instance.shard_assignments.node_to_runner[self.node_id]
].start_task(task)If the instance is gone, the task is silently dropped with no logging and no status update. This could leave tasks stuck in Design Concerns4. Image generation cancellation is only half-implemented The backend API layer sends 5. Using 6. check_for_cancel_every = min(
math.ceil(toks / (time.perf_counter() - t)), 100
)This computes tokens-per-second during warmup and caps at 100. The intent seems to be "check for cancel roughly once per second." But:
7. Non-streaming Changing Minor / Nits8. Duplicated The async def _send_task_cancelled(self, command_id: CommandId) -> None:
command = TaskCancelled(cancelled_command_id=command_id)
with anyio.CancelScope(shield=True):
await self.command_sender.send(
ForwarderCommand(origin=self.node_id, command=command)
)9. case CancelTask(
cancelled_task_id=cancelled_task_id, runner_id=runner_id
):
await self.runners[runner_id].cancel_task(cancelled_task_id)The destructured 10. If the chunk stream is cancelled (aborted) before any chunks arrive, 11. MISSED_THINGS.md changes are unrelated The bulk of MISSED_THINGS.md changes (marking items as completed) seem orthogonal to the stop-button feature. Consider splitting these into a separate commit for cleaner history. 12. Adding 13. The removal of TestingThe test updates in
These would be valuable additions, especially given the distributed nature of the system where races are common. SummaryThe core design is sound -- using HTTP abort to trigger anyio cancellation, propagating through
|
Code Review -- PR #1389: feat: add stop button to dashboard for cancelling requestsCI: All checks passing (typecheck, aarch64-darwin, x86_64-linux, aarch64-linux) OverviewThis PR implements end-to-end request cancellation: a "Stop" button in the Svelte dashboard that aborts the HTTP connection, which propagates through FastAPI's cancellation to the event-sourcing system, ultimately reaching the MLX runner's hot generation loop. The architecture is:
The PR also bundles unrelated cleanup: MISSED_THINGS.md checkbox updates, Critical Issues1. In case TaskFinished():
generated_events.append(
TaskDeleted(
task_id=self.command_task_mapping[
command.finished_command_id # <-- KeyError if absent
]
)
)
self.command_task_mapping.pop(
command.finished_command_id, None
)While the normal cancellation flow (TaskCancelled then TaskFinished) won't trigger this because case TaskFinished():
if (task_id := self.command_task_mapping.pop(
command.finished_command_id, None
)) is not None:
generated_events.append(TaskDeleted(task_id=task_id))2. In case CancelTask(cancelled_task_id=cancelled_task_id, runner_id=runner_id):
await self.runners[runner_id].cancel_task(cancelled_task_id)If the runner was already popped from Significant Issues3. Non-streaming The
The OpenAI Python client and most HTTP clients handle this fine, but some lower-level integrations or proxies may behave differently. This is a semantic change to the API contract for non-streaming mode and deserves explicit documentation or a comment explaining the tradeoff. 4. Image generation cancellation signal is sent but never consumed by the runner The API sends 5. In assert model is not NoneIf the user clicks Stop before the first chunk arrives, Minor Issues6. In async def _start_runner_task(self, task: Task):
if (instance := self.state.instances.get(task.instance_id)) is not None:
await self.runners[
instance.shard_assignments.node_to_runner[self.node_id]
].start_task(task)No logging and no status update. The task will remain in 7. Duplicated The same cancellation pattern appears in 8. The PR correctly adds 9.
10. The warmup-time calibration computes tokens/second, caps at 100, then synchronizes across ranks with 11. Unrelated changes bundled into this PR The following are unrelated to the stop button feature and should ideally be in separate commits:
What's Good
Missing TestsThere are no new tests for:
VerdictThe PR implements a well-designed cancellation mechanism with good distributed coordination. The critical issues (#1 and #2) are genuine Review only -- not a merge approval. |
|
Closing as completed! |
Motivation
When using the dashboard chat, there's no way to stop an ongoing request. This adds a stop button (similar to Claude's chat interface) that aborts the HTTP connection, triggering the backend cancellation chain introduced in #1276.
Closes part of #61 (dashboard UX for cancellation).
Changes
app.svelte.ts: AddedAbortControllertracking toAppStore. Each streaming method (sendMessage,regenerateChatCompletion,generateImage,editImage) now creates anAbortControllerand passes itssignaltofetch(). A newstopGeneration()method aborts the controller.handleStreamingError()detectsAbortErrorand preserves partial content instead of showing an error.ChatForm.svelte: When loading, the send button is replaced with a stop button (square icon + "STOP" label). Clicking it callsstopGeneration(), which aborts the HTTP connection. The button has a red hover state to indicate destructive action.Why It Works
AbortController.abort()causes the browser to close the HTTP connection. On the backend (from #1276), the FastAPI server catches theCancelledError, sends aTaskCancelledcommand to the master, which propagates to the worker and runner to stop inference. On the frontend, theAbortErroris caught and handled gracefully — any content already streamed is preserved in the conversation.Test Plan
Manual Testing
npm run build)Automated Testing