Skip to content

Fix graceful process shutdown in macOS app#1372

Merged
AlexCheema merged 7 commits intomainfrom
alexcheema/fix-graceful-process-shutdown
Feb 17, 2026
Merged

Fix graceful process shutdown in macOS app#1372
AlexCheema merged 7 commits intomainfrom
alexcheema/fix-graceful-process-shutdown

Conversation

@AlexCheema
Copy link
Copy Markdown
Contributor

Motivation

Fixes #1370

When the macOS app stops exo, GPU/system memory isn't released. This happens because:

  1. The macOS app calls process.terminate() (SIGTERM) but the Python process only registers a graceful shutdown handler for SIGINT, not SIGTERM. SIGTERM's default Python behavior raises SystemExit which bypasses the cleanup cascade (runner subprocess MLX cleanup via mx.clear_cache(), channel closing, etc.).
  2. The app doesn't wait for the process to actually finish cleanup — it immediately nils out the process reference.

Changes

src/exo/main.py: Register SIGTERM handler alongside SIGINT so the graceful shutdown cascade (Node.shutdown() → cancel task group → worker/runner cleanup → mx.clear_cache() + gc.collect()) runs regardless of which signal is received.

app/EXO/EXO/ExoProcessController.swift: Replace immediate process.terminate() with escalating shutdown per @Evanev7's suggestion:

  1. Send SIGINT via process.interrupt() — triggers the registered Python handler for graceful cleanup
  2. Wait up to 5 seconds for the process to exit
  3. If still running, escalate to SIGTERM via process.terminate()
  4. Wait up to 3 seconds
  5. If still running, force kill via SIGKILL

The escalation runs in a detached Task so the UI updates immediately (status → stopped) without blocking.

Why It Works

The root cause is that SIGTERM wasn't triggering the graceful shutdown path. By registering a SIGTERM handler in Python and sending SIGINT first from the macOS app, the process gets a chance to run the full cleanup cascade: cancelling the task group, shutting down runners (which call del model; mx.clear_cache(); gc.collect()), closing channels, and flushing logs. The escalation to SIGTERM and SIGKILL ensures the process always terminates even if graceful shutdown hangs.

Test Plan

Manual Testing

  • Start exo via macOS app, load a model, run inference
  • Stop via the toggle switch, verify memory is released without requiring a system restart
  • Test rapid stop/start (restart) to ensure no race conditions

Automated Testing

  • uv run basedpyright — 0 errors
  • uv run ruff check — passes
  • nix fmt — no changes

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
@AlexCheema AlexCheema requested a review from Evanev7 February 4, 2026 13:32
@AlexCheema
Copy link
Copy Markdown
Contributor Author

Hey @Evanev7 can you check this? Does it make sense to also catch SIGTERM like this?

Copy link
Copy Markdown
Member

@Evanev7 Evanev7 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah catching sigterm is fine

AlexCheema and others added 6 commits February 5, 2026 05:41
Resolve merge conflict in main.py by keeping signal handlers at the top
of the run method (before task startup). Exclude
tests/start_distributed_test.py from pytest collection as it's a
standalone script, not a test module.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@AlexCheema AlexCheema merged commit db79c35 into main Feb 17, 2026
6 checks passed
@AlexCheema AlexCheema deleted the alexcheema/fix-graceful-process-shutdown branch February 17, 2026 17:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[BUG] Memory not released after stopping exo instance

2 participants