Make foreign-alloc memory permanent until explicit free#1792
Open
dg1sbg wants to merge 1 commit into
Open
Conversation
ForeignData_O for %foreign-alloc was created with DeleteOnDtor, so the malloc'd block was freed by the wrapper's GC finalizer. CFFI code (and cffi-libffi in particular) stores the raw address in foreign memory the GC does not scan and drops the wrapper, so the next GC freed still-referenced blocks -- a use-after-free that crashed struct-by-value calls after a few hundred iterations. Allocate with None instead: foreign-alloc memory now persists until %foreign-free, matching the malloc semantics every other CFFI backend provides. with-foreign-object's allocator keeps DeleteOnDtor as it always frees explicitly. Add src/tests/fli/cffi-fsbv-stress.lisp: correctness across scalar / struct return / nested-struct by-value calls plus a 300k-call stress loop that SEGVs on the pre-fix backend.
Contributor
Author
|
CI note: the Locally (macOS arm64, boehmprecise, base = current main + the W^X fixes from #1781/#1786/#1788): full regression suite 1963 passed / 4 failed, all four known-preexisting (SBCL-CROSS-COMPILE-4, INCLUDE-LEVEL-2B/3, TYPES-CLASSES-10), zero new failures; plus the new cffi-fsbv stress test passes (300k mixed struct-by-value calls, no crash). |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
CFFI struct-by-value calls (via cffi-libffi) crash with memory corruption after a few hundred calls on Clasp — SEGV inside
ffi_callreading a freedffi_type*.Root cause
%allocate-foreign-data(fli.cc) creates theForeignData_Owrapper withDeleteOnDtor, so the malloc'd block isfree()d by the wrapper's GC finalizer. That ties the lifetime of the foreign memory to the GC-reachability of the Lisp wrapper.CFFI code routinely stores the raw address inside other foreign (GC-unscanned) memory and drops the wrapper. cffi-libffi's
make-libffi-cifis the concrete case: it caches the cif but drops the wrappers for thearg_typesarray and the per-structffi_typedescriptors, whose raw pointers live inside the cif's malloc'd block. The next GC collects those wrappers, the finalizers free the still-referenced blocks, and the following call is a use-after-free.Every other CFFI backend's
foreign-allocis plain malloc — permanent untilforeign-free. Clasp's violated that contract.Proven by pinning every
%foreign-allocwrapper (200k calls, zero crashes) and by forcing a GC each iteration (crash moves to iteration 0).Fix
Allocate with
core::Noneinstead ofcore::DeleteOnDtorinPERCENTallocate_foreign_dataandallocate_foreign_data, soforeign-allocmemory persists until an explicit%foreign-free— matching every other CFFI backend.with-foreign-object's allocator keepsDeleteOnDtorsince it always frees explicitly on scope exit.Blast radius: the only other in-tree
%foreign-alloccaller (defcallback-native.lisp) already frees explicitly.Test
src/tests/fli/cffi-fsbv-stress.lisp: correctness across scalar / struct-return / nested-struct by-value calls plus a 300k-call stress loop. Verified on macOS arm64 (boehm): SEGVs with the fix reverted, passes with it applied.Note: full struct-by-value support also needs a one-line fix in CFFI's
cffi-clasp.lisp(:defaultlibrary marker not mapped to:rtld-default); that goes upstream to cffi separately.