Fix Apple Silicon SIGBUS writing native-module literals (JIT W^X)#1781
Closed
dg1sbg wants to merge 1 commit into
Closed
Fix Apple Silicon SIGBUS writing native-module literals (JIT W^X)#1781dg1sbg wants to merge 1 commit into
dg1sbg wants to merge 1 commit into
Conversation
On Apple Silicon the LLVM ORC JITLink code/data slab is mapped MAP_JIT and is write-protected (execute mode) per-thread by default; a thread must call pthread_jit_write_protect_np(false) before it may write into that memory. clasp stores Lisp object pointers into each native module's literals vector, which lives in this JIT memory. Several of those stores were not bracketed by a switch to write mode, so on Apple Silicon they fault with SIGBUS (EXC_BAD_ACCESS code=2, KERN_PROTECTION_FAILURE). This manifests as a crash in loadltv::attr_clasp_module_native while loading freshly compiled native FASLs during the "Compiling Clasp native image" bootstrap step, and at core::core__literals_vset more generally. (On x86-64 and Linux the page is genuinely RWX, so the bug is latent there.) The helpers JITDataReadWriteMaybeExecute()/JITDataReadExecute() already exist for exactly this purpose. Bracket the three native-module literal write sites with them: - src/core/compiler.cc core__literals_vset: the store into the literals vector. - src/core/loadltv.cc op_setf_literals (lits[i] = scf): replacing a bytecode function with its native simple-fun. - src/core/loadltv.cc attr_clasp_module_native (lits[i] = value loop): filling a freshly loaded native module's literals; the get_ltv read stays in execute mode and only the store switches to write mode. Verified on Apple M5 / macOS 15 / LLVM 22.1.5: before, (require :asdf) against the bytecode image SIGBUS'd at compiler.cc:213; with the first wrap the fault moved to loadltv.cc, and with all three (require :asdf) loads cleanly and the native image (base.nfasl) builds and boots, which previously aborted at the native-image compile step. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
f23f2df to
8036211
Compare
Member
|
Everything here appears to duplicate parts of #1768. I do appreciate all these changes but they would be much easier to merge if better organized. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
On Apple Silicon the LLVM ORC JITLink code/data slab is mapped
MAP_JITand is write-protected (execute mode) per-thread by default; a thread must callpthread_jit_write_protect_np(false)before it may write into that memory. clasp stores Lisp object pointers into each native module's literals vector, which lives in this JIT memory. Several of those stores were not bracketed by a switch to write mode, so on Apple Silicon they fault with SIGBUS (EXC_BAD_ACCESS code=2,KERN_PROTECTION_FAILURE).This manifests as a crash in
loadltv::attr_clasp_module_nativewhile loading freshly compiled native FASLs during the "Compiling Clasp native image" bootstrap step, and atcore::core__literals_vsetmore generally. On x86-64 and Linux the page is genuinely RWX, so the bug is latent there.Fix
The helpers
JITDataReadWriteMaybeExecute()/JITDataReadExecute()already exist for exactly this purpose. This wraps the three native-module literal write sites with them:src/core/compiler.cccore__literals_vset— the store into the literals vector.src/core/loadltv.ccop_setf_literals(lits[i] = scf) — replacing a bytecode function with its native simple-fun.src/core/loadltv.ccattr_clasp_module_native(lits[i] = valueloop) — filling a freshly loaded native module's literals; theget_ltvread stays in execute mode and only the store switches to write mode.Verification
On Apple M5 / macOS 15 / LLVM 22.1.5: before the fix,
(require :asdf)against the bytecode image SIGBUS'd atcompiler.cc:213; with the first wrap the fault moved toloadltv.cc; with all three,(require :asdf)loads cleanly and the native image (base.nfasl) builds and boots — which previously aborted at the native-image compile step.🤖 Generated with Claude Code