Fix #1784: route mp:abort-process/exit-process via SJLJ on macOS arm64#1785
Conversation
…LJ on macOS arm64 On macOS arm64 a C++ exception thrown from inside clasp's interrupt-dispatch machinery (e.g. an unhandled mp:process-cancel) cannot be unwound by libunwind across the JIT'd native cleavir frames in that path: the __unwind_info-driven EH unwind derails before reaching the catch(AbortProcess&) in Process_O::runInner, so __cxa_throw reaches std::terminate and aborts the whole process. A signal-free self-interrupt that funcalls mp:abort-process reproduces it identically, and a Lisp error/handler-case through the same dispatch path is caught fine — so the fault is the C++ EH unwinder, not signals or _sigtramp. clasp's own SJLJ non-local exit is _longjmp-based (core::sjlj_throw, unwind.cc) and uses no unwind tables, so it crosses those JIT frames cleanly — it is how cl:throw/handler-case already unwind here. Route mp:abort-process and mp:exit-process through sjlj_throw (to the process object as the catch tag, caught via call_with_catch established in runInner and distinguished from a normal return by a completed_normally flag) instead of throw AbortProcess/ ExitProcess, gated to _TARGET_OS_DARWIN. Other platforms keep the C++ throw path byte-for-byte unchanged. This fixes the interrupt regression suite on macOS arm64 (CANCELLATION-INTERRUPT plus SLEEP/LOCK/INPUT-INTERRUPTIBLE and UNWIND-PROTECT.INTERRUPT.1/2), which previously aborted the whole process at the first cancellation test. Verified: mp + interrupt suites 52/0, broader sweep (unwind/conditions/control01/ehkiller/ clos/update-instance-abort) 135/0, no regressions. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
f2c0596 to
3188217
Compare
|
This PR's more of a problem than the others. |
Problem (#1784)
On macOS arm64 (native image), the
interruptregression suite aborts the whole process at its first test,CANCELLATION-INTERRUPT(mp:process-cancel), withstartRunStop.cc:135 ... unhandled unknown exception→Abort signal.Root cause
A deep C++
AbortProcessthrow that originates inside clasp's interrupt-dispatch machinery (check-pending-interrupts→handle_queued_interrupt→funcall(signal-interrupt)→ … →mp:abort-process→throw AbortProcess()) cannot be unwound by macOS-arm64 libunwind across the JIT'd native cleavir frames in that path. The__unwind_info-driven EH unwind derails before reachingcatch (AbortProcess&)inProcess_O::runInner, so__cxa_throwfalls through tostd::terminate.Evidence (all deterministic):
mp:interrupt-process (mp:current-process) #'mp:abort-process) and drains viacore:check-pending-interrupts— terminates identically, with no_sigtrampon the stack. So this is not the "throw across the signal trampoline" hazard; it's the C++ EH unwinder failing on the JIT frames.errorunderhandler-caseis caught fine, because clasp's Lisp unwinder (_longjmp, no unwind tables) crosses those frames where the C++ EH unwinder does not.__gxx_personality_v0) walk shows phase-1 reaching the bytecode VM frames but neverrunInner's catch;findSectionsImplstops being consulted at the offending JIT frame.This is distinct from #1782/#1783 (which fixes shallow C++ throws across native frames by registering compact-unwind); the deep interrupt-dispatch path still derails, and this fix sidesteps it rather than relying on the EH unwinder at all.
Fix
Route
mp:abort-processandmp:exit-processthrough clasp's own SJLJ non-local exit (core::sjlj_throwto the process object as the catch tag, with the matching catch established viacall_with_catchinProcess_O::runInner) instead ofthrow AbortProcess()/throw ExitProcess()._longjmpuses no unwind tables, so it is immune to the derail — and it is exactly howcl:throw/handler-casealready unwind across these frames._TARGET_OS_DARWIN. Non-Darwin keeps the C++ throw path byte-for-byte unchanged.completed_normallyflag set as the last statement of the catch lambda distinguishes a normal return from a non-local exit (whose_Aborted/_AbortConditionor_ReturnValuesListstate is set before unwinding).mpPackage.cc) and the only catcher (runInner) are all in this file.Verification (macOS arm64, native boehmprecise image)
TEST_SUITES=mp,interrupt→ 52 successes, 0 failures.CANCELLATION-INTERRUPTnow passes, as doSLEEP/LOCK/INPUT-INTERRUPTIBLEandUNWIND-PROTECT.INTERRUPT.1/2, and allPROCESS-ABORT-*/PROCESS-EXIT/ATOMIC-*.mp,interrupt,unwind,conditions,control01,ehkiller,clos,update-instance-abort) → 135 successes, 0 failures, no regressions.Companion to #1783 (#1782); together they take the macOS-arm64 native CI green through the
interruptsuite.🤖 Generated with Claude Code