Skip to content

feat: add stop button to dashboard for cancelling requests#1389

Closed
AlexCheema wants to merge 11 commits intomainfrom
alexcheema/dashboard-stop-button
Closed

feat: add stop button to dashboard for cancelling requests#1389
AlexCheema wants to merge 11 commits intomainfrom
alexcheema/dashboard-stop-button

Conversation

@AlexCheema
Copy link
Copy Markdown
Contributor

Motivation

When using the dashboard chat, there's no way to stop an ongoing request. This adds a stop button (similar to Claude's chat interface) that aborts the HTTP connection, triggering the backend cancellation chain introduced in #1276.

Closes part of #61 (dashboard UX for cancellation).

Changes

  • app.svelte.ts: Added AbortController tracking to AppStore. Each streaming method (sendMessage, regenerateChatCompletion, generateImage, editImage) now creates an AbortController and passes its signal to fetch(). A new stopGeneration() method aborts the controller. handleStreamingError() detects AbortError and preserves partial content instead of showing an error.
  • ChatForm.svelte: When loading, the send button is replaced with a stop button (square icon + "STOP" label). Clicking it calls stopGeneration(), which aborts the HTTP connection. The button has a red hover state to indicate destructive action.

Why It Works

AbortController.abort() causes the browser to close the HTTP connection. On the backend (from #1276), the FastAPI server catches the CancelledError, sends a TaskCancelled command to the master, which propagates to the worker and runner to stop inference. On the frontend, the AbortError is caught and handled gracefully — any content already streamed is preserved in the conversation.

Test Plan

Manual Testing

  • Send a chat message, click STOP while streaming — partial response is preserved
  • Send another message after stopping — works normally
  • Dashboard builds without errors (npm run build)

Automated Testing

  • No new automated tests — this is a UI-only change wiring into the existing cancellation mechanism

Evanev7 and others added 5 commits February 5, 2026 12:39
closing the http request to the api now
- sends a cancellation from the api
- writes that canellation in the master
- worker plans off the cancellation
- runner observes that cancellation after every generation step (+1
communication per token)
- cancellation happens synchronously to prevent gpu locks
When a chat, image generation, or image edit request is streaming,
the send button is replaced with a stop button. Clicking it aborts the
HTTP connection, which triggers the backend cancellation chain.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Fix broken monkeypatch of `mx.all_gather` by patching `mx.distributed.all_gather` correctly
- Apply formatter to runner.py for CI compliance

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
@Evanev7 Evanev7 force-pushed the runner-cancellation branch from 2f36078 to 905b395 Compare February 5, 2026 14:30
closing the http request to the api now
- sends a cancellation from the api
- writes that canellation in the master
- worker plans off the cancellation
- runner observes that cancellation after every generation step (+1
communication per token)
- cancellation happens synchronously to prevent gpu locks
@Evanev7 Evanev7 force-pushed the runner-cancellation branch from 905b395 to 79661ea Compare February 5, 2026 14:33
closing the http request to the api now
- sends a cancellation from the api
- writes that canellation in the master
- worker plans off the cancellation
- runner observes that cancellation after every generation step (+1
communication per token)
- cancellation happens synchronously to prevent gpu locks
@Evanev7 Evanev7 force-pushed the runner-cancellation branch from 79661ea to 333119e Compare February 5, 2026 14:42
AlexCheema and others added 2 commits February 5, 2026 06:45
Resolve conflicts by keeping runner_id on CancelTask and using it
directly in worker dispatch and plan generation.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Keep mx.distributed.all_gather monkeypatch which matches how runner.py
actually calls all_gather.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
@Evanev7 Evanev7 force-pushed the runner-cancellation branch 15 times, most recently from 0f4ecaa to 33060b3 Compare February 12, 2026 00:49
AlexCheema and others added 2 commits February 13, 2026 10:12
…stop-button

# Conflicts:
#	src/exo/master/placement.py
#	src/exo/worker/engines/mlx/generator/generate.py
#	src/exo/worker/main.py
#	src/exo/worker/runner/runner.py
#	src/exo/worker/runner/runner_supervisor.py
- Resolve 5 merge conflicts from origin/main merge
- Fix `await runner.shutdown()` → `runner.shutdown()` (now sync)
- Apply nix fmt formatting
- Exclude start_distributed_test.py from pytest collection

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@Evanev7 Evanev7 force-pushed the runner-cancellation branch from 33060b3 to 4ba0210 Compare February 16, 2026 12:21
@Evanev7 Evanev7 force-pushed the runner-cancellation branch 2 times, most recently from 1d67441 to 30e4a74 Compare February 16, 2026 13:21
Base automatically changed from runner-cancellation to main February 16, 2026 13:30
@AlexCheema
Copy link
Copy Markdown
Contributor Author

Code Review: PR #1389 -- Stop Button / Request Cancellation

Nice feature -- this is a solid end-to-end implementation of cancellation from frontend through to the MLX runner. The design of using AbortController on the frontend to trigger anyio cancellation on the backend, which then propagates through the event-sourcing system, is clean. A few things worth addressing:


Bugs

1. TaskFinished handler will KeyError on cancelled tasks that are re-cancelled or raced

In master/main.py, the TaskCancelled handler does not remove the entry from self.command_task_mapping, but TaskFinished uses bracket access:

case TaskFinished():
    generated_events.append(
        TaskDeleted(
            task_id=self.command_task_mapping[
                command.finished_command_id  # <-- KeyError if missing
            ]
        )
    )
    self.command_task_mapping.pop(
        command.finished_command_id, None
    )

In the normal cancellation flow (_token_chunk_stream), TaskCancelled is sent in the except block and TaskFinished in the finally block, so both commands arrive and the mapping still exists when TaskFinished runs. This works today. But if TaskFinished ever arrives without a mapping entry (e.g., duplicate TaskFinished, or the command was never mapped), it will crash. The pop(..., None) on the next line suggests defensive intent -- the bracket access above should match. Consider using .get() with a guard:

case TaskFinished():
    if (task_id := self.command_task_mapping.pop(
        command.finished_command_id, None
    )) is not None:
        generated_events.append(TaskDeleted(task_id=task_id))

2. CancelTask handler in worker/main.py can KeyError if runner is gone

case CancelTask(
    cancelled_task_id=cancelled_task_id, runner_id=runner_id
):
    await self.runners[runner_id].cancel_task(cancelled_task_id)

If the runner was already shut down and removed from self.runners (e.g., via the Shutdown case which calls self.runners.pop(runner_id)), this will raise KeyError. This is a realistic race since cancellation and shutdown can be triggered concurrently. Should use .get() with a guard, or at least a try/except KeyError.

3. _start_runner_task silently drops tasks when instance is gone

async def _start_runner_task(self, task: Task):
    if (instance := self.state.instances.get(task.instance_id)) is not None:
        await self.runners[
            instance.shard_assignments.node_to_runner[self.node_id]
        ].start_task(task)

If the instance is gone, the task is silently dropped with no logging and no status update. This could leave tasks stuck in Running state forever. At minimum, a logger.warning would help with debugging.


Design Concerns

4. Image generation cancellation is only half-implemented

The backend API layer sends TaskCancelled for image generation and image edits (in _generate_image_stream and _collect_image_chunks), but the runner's image generation loop (ImageGeneration and ImageEdits cases in runner.py) does not check cancel_receiver. Only text generation checks for cancellation. This means the cancellation signal propagates through the event system, the task status becomes Cancelled, but the runner keeps computing until it finishes the image. Worth at least a comment/TODO, or checking the cancel channel in the image generation loops too.

5. CANCEL_CURRENT_TASK sentinel is a fragile pattern

Using TaskId("CANCEL_CURRENT_TASK") as a sentinel value is brittle -- it relies on a magic string that could theoretically collide (TaskId is just a string wrapper). The cancelled_tasks.discard(TaskId("CANCEL_CURRENT_TASK")) at the top of each task loop iteration also means that if a CANCEL_CURRENT_TASK arrives between tasks, it gets discarded before the next task even starts. This seems intentional (it's for shutdown), but the interaction between this sentinel and the regular task_id-based cancellation is subtle and undocumented. A dedicated boolean flag or an enum value would be clearer.

6. check_for_cancel_every calibration during warmup

check_for_cancel_every = min(
    math.ceil(toks / (time.perf_counter() - t)), 100
)

This computes tokens-per-second during warmup and caps at 100. The intent seems to be "check for cancel roughly once per second." But:

  • If warmup generates very few tokens (e.g., 1 token) very fast, the division could yield a very large number, but min(..., 100) handles that.
  • If warmup is slow (e.g., 1 tok/sec), you'd check every token, which adds overhead from mx_any (a distributed all-reduce) on every single token. That could meaningfully slow down generation on multi-node setups.
  • The all_gather + max to synchronize check_for_cancel_every across nodes is smart, but adds complexity. A simple constant (e.g., 32 or 64 tokens) might be good enough and more predictable.

7. Non-streaming collect_chat_response now uses StreamingResponse

Changing collect_chat_response from returning a ChatCompletionResponse to yielding a single JSON string and wrapping it in StreamingResponse is a pragmatic workaround for FastAPI cancellation detection. However, this changes the HTTP response behavior: it will now use chunked transfer encoding even for non-streaming requests. Some OpenAI API clients may not handle this correctly -- they might expect a simple Content-Length-based response for non-streaming mode. Worth verifying with common clients (e.g., the official openai Python library).


Minor / Nits

8. Duplicated TaskCancelled sending pattern

The TaskCancelled + shield pattern is copy-pasted in four places (_token_chunk_stream, _generate_image_stream, _collect_image_chunks, and the duplicate in _collect_image_chunks). Consider extracting to a helper:

async def _send_task_cancelled(self, command_id: CommandId) -> None:
    command = TaskCancelled(cancelled_command_id=command_id)
    with anyio.CancelScope(shield=True):
        await self.command_sender.send(
            ForwarderCommand(origin=self.node_id, command=command)
        )

9. cancelled_task_id in CancelTask is unused in worker/main.py

case CancelTask(
    cancelled_task_id=cancelled_task_id, runner_id=runner_id
):
    await self.runners[runner_id].cancel_task(cancelled_task_id)

The destructured cancelled_task_id variable is used, but only as a pass-through. This is fine, just noting that the pattern match destructuring for both fields is correct.

10. assert model is not None in collect_chat_response after cancellation

If the chunk stream is cancelled (aborted) before any chunks arrive, model will still be None and the assert model is not None at line 237 will fire. When the user clicks "Stop" very quickly (before the first token), this path could be hit. The raise in the except block of _token_chunk_stream should propagate cancellation before reaching this point, but it's worth confirming the interaction.

11. MISSED_THINGS.md changes are unrelated

The bulk of MISSED_THINGS.md changes (marking items as completed) seem orthogonal to the stop-button feature. Consider splitting these into a separate commit for cleaner history.

12. pyproject.toml change to ignore tests

Adding --ignore=tests/start_distributed_test.py to pytest addopts is also unrelated to the stop button feature and should ideally be in its own commit with context on why this test needs to be excluded.

13. broadcast_from_zero removal

The removal of broadcast_from_zero from utils_mlx.py is fine since it appears unused, but it's not directly related to cancellation. Just flagging as unrelated cleanup bundled into this PR.


Testing

The test updates in test_event_ordering.py and test_placement.py correctly adapt to the new function signatures. However, there are no new tests for:

  • The cancellation flow itself (e.g., sending a TaskCancelled command and verifying the state transitions).
  • The _cancel_tasks planning function.
  • The cancel_task method on RunnerSupervisor.
  • Edge cases like cancelling an already-completed task, cancelling when the runner is gone, or rapid cancel-after-cancel.

These would be valuable additions, especially given the distributed nature of the system where races are common.


Summary

The core design is sound -- using HTTP abort to trigger anyio cancellation, propagating through TaskCancelled command to TaskStatusUpdated(Cancelled) event, and checking cancel_receiver in the hot loop with distributed coordination via mx_any. The main actionable items are:

  1. Fix the potential KeyError in TaskFinished handler (use .get() or .pop() with default)
  2. Fix the potential KeyError in CancelTask handler when runner is gone
  3. Add logging in _start_runner_task when instance is missing
  4. Consider adding cancellation checks to image generation loops (or document why not)
  5. Add tests for the cancellation paths

@AlexCheema
Copy link
Copy Markdown
Contributor Author

Code Review -- PR #1389: feat: add stop button to dashboard for cancelling requests

CI: All checks passing (typecheck, aarch64-darwin, x86_64-linux, aarch64-linux)

Overview

This PR implements end-to-end request cancellation: a "Stop" button in the Svelte dashboard that aborts the HTTP connection, which propagates through FastAPI's cancellation to the event-sourcing system, ultimately reaching the MLX runner's hot generation loop. The architecture is:

  1. Dashboard: AbortController on fetch, button swaps from SEND to STOP during loading
  2. API layer: anyio cancellation triggers TaskCancelled command in a shielded scope, TaskFinished follows in finally
  3. Master: TaskCancelled command emits TaskStatusUpdated(Cancelled) event; TaskFinished emits TaskDeleted
  4. Worker plan: New _cancel_tasks planner generates CancelTask tasks for cancelled state
  5. Runner supervisor: New cancel_task() sends task ID over a dedicated MpChannel to the runner process
  6. Runner process: Periodically checks cancel_receiver every N tokens (calibrated during warmup), uses mx_any for distributed coordination so all ranks break together

The PR also bundles unrelated cleanup: MISSED_THINGS.md checkbox updates, mx_barrier/broadcast_from_zero refactoring, pyproject.toml test ignore, and splitting model into inference_model/image_model.

Critical Issues

1. TaskFinished handler will KeyError when the mapping is already gone

In src/exo/master/main.py, the TaskFinished case uses bracket access before the defensive pop:

case TaskFinished():
    generated_events.append(
        TaskDeleted(
            task_id=self.command_task_mapping[
                command.finished_command_id  # <-- KeyError if absent
            ]
        )
    )
    self.command_task_mapping.pop(
        command.finished_command_id, None
    )

While the normal cancellation flow (TaskCancelled then TaskFinished) won't trigger this because TaskCancelled no longer removes the mapping entry, there are still edge cases: duplicate TaskFinished from retries, or the mapping never being created (e.g., command processed before mapping was stored). The pop(..., None) on the next line strongly suggests the author intended defensive access. This should use .pop() with a guard:

case TaskFinished():
    if (task_id := self.command_task_mapping.pop(
        command.finished_command_id, None
    )) is not None:
        generated_events.append(TaskDeleted(task_id=task_id))

2. CancelTask handler in worker can KeyError on missing runner

In src/exo/worker/main.py, the CancelTask handler does:

case CancelTask(cancelled_task_id=cancelled_task_id, runner_id=runner_id):
    await self.runners[runner_id].cancel_task(cancelled_task_id)

If the runner was already popped from self.runners by a concurrent Shutdown task (the Shutdown case calls self.runners.pop(runner_id) at line 239), this will KeyError. The planner's _cancel_tasks function iterates runners at planning time, but the runner can be removed between planning and execution. This needs a .get() guard or try/except KeyError.

Significant Issues

3. Non-streaming collect_chat_response now uses StreamingResponse with chunked encoding

The collect_chat_response function was changed from returning ChatCompletionResponse to an AsyncGenerator[str] wrapped in StreamingResponse(media_type="application/json"). This was done to enable FastAPI cancellation detection (which only works with streaming responses). However, this changes the wire format:

  • Before: Standard HTTP response with Content-Length header
  • After: Chunked transfer encoding (Transfer-Encoding: chunked)

The OpenAI Python client and most HTTP clients handle this fine, but some lower-level integrations or proxies may behave differently. This is a semantic change to the API contract for non-streaming mode and deserves explicit documentation or a comment explaining the tradeoff.

4. Image generation cancellation signal is sent but never consumed by the runner

The API sends TaskCancelled for image generation (_generate_image_stream, _collect_image_chunks), and the master sets TaskStatus.Cancelled, and the worker's planner creates CancelTask tasks, and the supervisor sends the task ID to the runner's cancel_receiver. However, in runner.py, only the TextGeneration case checks cancel_receiver.collect(). The ImageGeneration and ImageEdits loops never check it. The signal reaches the runner process but is ignored. The full chain works on the master/API side (the HTTP connection is severed, the task is cleaned up), but the GPU continues computing until the image generation completes. For long image generation tasks, this wastes significant GPU time. At minimum, a comment should document this limitation.

5. assert model is not None crash on early cancellation in collect_chat_response

In src/exo/master/adapters/chat_completions.py line 237:

assert model is not None

If the user clicks Stop before the first chunk arrives, model is still None. When _token_chunk_stream is cancelled, it raises in the except anyio.get_cancelled_exc_class() block, which should propagate before collect_chat_response reaches the assert. However, if the cancellation occurs between the async for exhausting (e.g., the generator finishes normally but with zero chunks -- a legitimate edge case with some error paths), this assert fires as an unhandled crash. The function should handle the zero-chunks case gracefully.

Minor Issues

6. _start_runner_task silently drops tasks

In src/exo/worker/main.py, the new _start_runner_task silently discards tasks when the instance is gone:

async def _start_runner_task(self, task: Task):
    if (instance := self.state.instances.get(task.instance_id)) is not None:
        await self.runners[
            instance.shard_assignments.node_to_runner[self.node_id]
        ].start_task(task)

No logging and no status update. The task will remain in Running state indefinitely. At minimum, add a logger.warning.

7. Duplicated TaskCancelled + shield pattern in 4 places

The same cancellation pattern appears in _token_chunk_stream, _generate_image_stream, _collect_image_chunks, and a second time in _collect_image_chunks. This should be extracted to a helper method like _send_task_cancelled(command_id).

8. Shutdown case and runner.shutdown() timing

The PR correctly adds runner.shutdown() in a finally block after Shutdown task handling. This properly cleans up the cancel_sender, event channels, and process. However, runner.shutdown() sends CANCEL_CURRENT_TASK via _cancel_sender, then closes channels, then joins/terminates/kills the process -- all inside the worker's plan_step. This can block for up to ~3 seconds on top of the 3-second fail_after timeout. Acceptable but worth noting.

9. CANCEL_CURRENT_TASK sentinel magic string

TaskId("CANCEL_CURRENT_TASK") is a magic string sentinel used for shutdown-time cancellation. It's used in runner_supervisor.shutdown() and checked in the runner's hot loop. This is fragile -- TaskId is Id which is a str wrapper, so there's no type-level protection against collision. A dedicated enum or a separate boolean channel would be more robust.

10. check_for_cancel_every calibration could be a simple constant

The warmup-time calibration computes tokens/second, caps at 100, then synchronizes across ranks with all_gather + max. This adds complexity for marginal benefit. A simple constant like 32 or 64 would check roughly every ~0.5-1s on most hardware and avoids the distributed coordination during warmup. The current approach can also yield check_for_cancel_every = 1 on very slow hardware, adding an mx_any (distributed all-reduce) on every single token.

11. Unrelated changes bundled into this PR

The following are unrelated to the stop button feature and should ideally be in separate commits:

  • MISSED_THINGS.md checkbox updates
  • pyproject.toml --ignore=tests/start_distributed_test.py
  • broadcast_from_zero removal and mx_barrier refactoring in utils_mlx.py
  • Splitting model into inference_model/image_model in runner.py

What's Good

  • Architecture is sound: Using HTTP abort -> anyio cancellation -> event sourcing -> runner process channel is the right approach for this system. No polling loops on the API side.
  • Distributed coordination via mx_any: The use of all_sum to coordinate cancellation across distributed ranks ensures all ranks break together, avoiding deadlocks in collective operations.
  • Dashboard UX: Clean swap between SEND and STOP buttons. The AbortError suppression prevents confusing error messages. Proper type="button" on stop to avoid form submission.
  • Shielded TaskCancelled sending: Using anyio.CancelScope(shield=True) ensures the cancellation command is sent even when the scope is being cancelled.
  • cancel_task timeout in supervisor: The move_on_after(0.5) with _check_runner fallback handles the case where the cancel channel is blocked.
  • Test updates: Existing tests are correctly adapted to new function signatures.

Missing Tests

There are no new tests for:

  • The _cancel_tasks planner function
  • The cancel_task method on RunnerSupervisor
  • The TaskCancelled -> TaskStatusUpdated(Cancelled) -> TaskDeleted flow in the master
  • Edge cases: cancelling already-completed tasks, double-cancellation, cancel + shutdown race

Verdict

The PR implements a well-designed cancellation mechanism with good distributed coordination. The critical issues (#1 and #2) are genuine KeyError risks that should be fixed before merge. Issue #3 (non-streaming response format change) and #4 (image cancellation not reaching the runner) are design decisions that deserve explicit acknowledgment. The unrelated cleanup should ideally be split out for cleaner history.

Review only -- not a merge approval.

@rltakashige
Copy link
Copy Markdown
Collaborator

Closing as completed!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants