Skip to content

Fix #1784: route mp:abort-process/exit-process via SJLJ on macOS arm64#1785

Merged
Bike merged 1 commit into
clasp-developers:mainfrom
dg1sbg:fix/macos-arm64-interrupt-sjlj-1784
Jun 6, 2026
Merged

Fix #1784: route mp:abort-process/exit-process via SJLJ on macOS arm64#1785
Bike merged 1 commit into
clasp-developers:mainfrom
dg1sbg:fix/macos-arm64-interrupt-sjlj-1784

Conversation

@dg1sbg

@dg1sbg dg1sbg commented Jun 1, 2026

Copy link
Copy Markdown
Contributor

Problem (#1784)

On macOS arm64 (native image), the interrupt regression suite aborts the whole process at its first test, CANCELLATION-INTERRUPT (mp:process-cancel), with startRunStop.cc:135 ... unhandled unknown exceptionAbort signal.

Root cause

A deep C++ AbortProcess throw that originates inside clasp's interrupt-dispatch machinery (check-pending-interruptshandle_queued_interruptfuncall(signal-interrupt) → … → mp:abort-processthrow AbortProcess()) cannot be unwound by macOS-arm64 libunwind across the JIT'd native cleavir frames in that path. The __unwind_info-driven EH unwind derails before reaching catch (AbortProcess&) in Process_O::runInner, so __cxa_throw falls through to std::terminate.

Evidence (all deterministic):

  • A signal-free repro — a process that self-interrupts (mp:interrupt-process (mp:current-process) #'mp:abort-process) and drains via core:check-pending-interrupts — terminates identically, with no _sigtramp on the stack. So this is not the "throw across the signal trampoline" hazard; it's the C++ EH unwinder failing on the JIT frames.
  • The same dispatch path raising a Lisp error under handler-case is caught fine, because clasp's Lisp unwinder (_longjmp, no unwind tables) crosses those frames where the C++ EH unwinder does not.
  • A personality (__gxx_personality_v0) walk shows phase-1 reaching the bytecode VM frames but never runInner's catch; findSectionsImpl stops being consulted at the offending JIT frame.

This is distinct from #1782/#1783 (which fixes shallow C++ throws across native frames by registering compact-unwind); the deep interrupt-dispatch path still derails, and this fix sidesteps it rather than relying on the EH unwinder at all.

Fix

Route mp:abort-process and mp:exit-process through clasp's own SJLJ non-local exit (core::sjlj_throw to the process object as the catch tag, with the matching catch established via call_with_catch in Process_O::runInner) instead of throw AbortProcess() / throw ExitProcess(). _longjmp uses no unwind tables, so it is immune to the derail — and it is exactly how cl:throw/handler-case already unwind across these frames.

  • Gated to _TARGET_OS_DARWIN. Non-Darwin keeps the C++ throw path byte-for-byte unchanged.
  • A completed_normally flag set as the last statement of the catch lambda distinguishes a normal return from a non-local exit (whose _Aborted/_AbortCondition or _ReturnValuesList state is set before unwinding).
  • One file, three functions. The throw sites (mpPackage.cc) and the only catcher (runInner) are all in this file.

Verification (macOS arm64, native boehmprecise image)

  • TEST_SUITES=mp,interrupt52 successes, 0 failures. CANCELLATION-INTERRUPT now passes, as do SLEEP/LOCK/INPUT-INTERRUPTIBLE and UNWIND-PROTECT.INTERRUPT.1/2, and all PROCESS-ABORT-*/PROCESS-EXIT/ATOMIC-*.
  • Broader sweep (mp,interrupt,unwind,conditions,control01,ehkiller,clos,update-instance-abort) → 135 successes, 0 failures, no regressions.

Companion to #1783 (#1782); together they take the macOS-arm64 native CI green through the interrupt suite.

🤖 Generated with Claude Code

…LJ on macOS arm64

On macOS arm64 a C++ exception thrown from inside clasp's interrupt-dispatch
machinery (e.g. an unhandled mp:process-cancel) cannot be unwound by libunwind
across the JIT'd native cleavir frames in that path: the __unwind_info-driven
EH unwind derails before reaching the catch(AbortProcess&) in Process_O::runInner,
so __cxa_throw reaches std::terminate and aborts the whole process. A signal-free
self-interrupt that funcalls mp:abort-process reproduces it identically, and a
Lisp error/handler-case through the same dispatch path is caught fine — so the
fault is the C++ EH unwinder, not signals or _sigtramp.

clasp's own SJLJ non-local exit is _longjmp-based (core::sjlj_throw, unwind.cc)
and uses no unwind tables, so it crosses those JIT frames cleanly — it is how
cl:throw/handler-case already unwind here. Route mp:abort-process and
mp:exit-process through sjlj_throw (to the process object as the catch tag,
caught via call_with_catch established in runInner and distinguished from a
normal return by a completed_normally flag) instead of throw AbortProcess/
ExitProcess, gated to _TARGET_OS_DARWIN. Other platforms keep the C++ throw
path byte-for-byte unchanged.

This fixes the interrupt regression suite on macOS arm64 (CANCELLATION-INTERRUPT
plus SLEEP/LOCK/INPUT-INTERRUPTIBLE and UNWIND-PROTECT.INTERRUPT.1/2), which
previously aborted the whole process at the first cancellation test. Verified:
mp + interrupt suites 52/0, broader sweep (unwind/conditions/control01/ehkiller/
clos/update-instance-abort) 135/0, no regressions.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@dg1sbg dg1sbg force-pushed the fix/macos-arm64-interrupt-sjlj-1784 branch from f2c0596 to 3188217 Compare June 2, 2026 14:01
@Bike

Bike commented Jun 2, 2026

Copy link
Copy Markdown
Member

This PR's more of a problem than the others.
While the explanation of the bug seems plausible, the correct fix is for the underlying problem - making it possible to throw through JITted frames on Macs. As I recall they recently fixed up LLVM ORC with the mac unwinding format, so we can probably do that. I guess that's #1783?
That said, it does make sense to implement abort-process through Clasp unwinding rather than C++ exceptions. It's faster and less prone to std::terminate. Neither of those reasons is OS-specific so there's no reason to gate on _TARGET_OS_DARWIN, and doing so is an unnecessary complication.
Doing this by throwing and catching the process object is no good. There's nothing preventing the programmer from doing (catch mp:*current-process* ...) and thereby redirecting abort-process, and they might even do that accidentally. It would be better to do block/return-from, but that would require an extra slot somewhere for the block. catch but with some less accessible object would be fine - say an internal mp symbol.

@Bike Bike merged commit 3188217 into clasp-developers:main Jun 6, 2026
3 of 6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants