Skip to content

Make foreign-alloc memory permanent until explicit free#1792

Open
dg1sbg wants to merge 1 commit into
clasp-developers:mainfrom
dg1sbg:bugfix/cffi-fsbv
Open

Make foreign-alloc memory permanent until explicit free#1792
dg1sbg wants to merge 1 commit into
clasp-developers:mainfrom
dg1sbg:bugfix/cffi-fsbv

Conversation

@dg1sbg

@dg1sbg dg1sbg commented Jun 10, 2026

Copy link
Copy Markdown
Contributor

Problem

CFFI struct-by-value calls (via cffi-libffi) crash with memory corruption after a few hundred calls on Clasp — SEGV inside ffi_call reading a freed ffi_type*.

Root cause

%allocate-foreign-data (fli.cc) creates the ForeignData_O wrapper with DeleteOnDtor, so the malloc'd block is free()d by the wrapper's GC finalizer. That ties the lifetime of the foreign memory to the GC-reachability of the Lisp wrapper.

CFFI code routinely stores the raw address inside other foreign (GC-unscanned) memory and drops the wrapper. cffi-libffi's make-libffi-cif is the concrete case: it caches the cif but drops the wrappers for the arg_types array and the per-struct ffi_type descriptors, whose raw pointers live inside the cif's malloc'd block. The next GC collects those wrappers, the finalizers free the still-referenced blocks, and the following call is a use-after-free.

Every other CFFI backend's foreign-alloc is plain malloc — permanent until foreign-free. Clasp's violated that contract.

Proven by pinning every %foreign-alloc wrapper (200k calls, zero crashes) and by forcing a GC each iteration (crash moves to iteration 0).

Fix

Allocate with core::None instead of core::DeleteOnDtor in PERCENTallocate_foreign_data and allocate_foreign_data, so foreign-alloc memory persists until an explicit %foreign-free — matching every other CFFI backend. with-foreign-object's allocator keeps DeleteOnDtor since it always frees explicitly on scope exit.

Blast radius: the only other in-tree %foreign-alloc caller (defcallback-native.lisp) already frees explicitly.

Test

src/tests/fli/cffi-fsbv-stress.lisp: correctness across scalar / struct-return / nested-struct by-value calls plus a 300k-call stress loop. Verified on macOS arm64 (boehm): SEGVs with the fix reverted, passes with it applied.

Note: full struct-by-value support also needs a one-line fix in CFFI's cffi-clasp.lisp (:default library marker not mapped to :rtld-default); that goes upstream to cffi separately.

ForeignData_O for %foreign-alloc was created with DeleteOnDtor, so the
malloc'd block was freed by the wrapper's GC finalizer. CFFI code (and
cffi-libffi in particular) stores the raw address in foreign memory the GC
does not scan and drops the wrapper, so the next GC freed still-referenced
blocks -- a use-after-free that crashed struct-by-value calls after a few
hundred iterations. Allocate with None instead: foreign-alloc memory now
persists until %foreign-free, matching the malloc semantics every other CFFI
backend provides. with-foreign-object's allocator keeps DeleteOnDtor as it
always frees explicitly.

Add src/tests/fli/cffi-fsbv-stress.lisp: correctness across scalar / struct
return / nested-struct by-value calls plus a 300k-call stress loop that SEGVs
on the pre-fix backend.
@dg1sbg

dg1sbg commented Jun 10, 2026

Copy link
Copy Markdown
Contributor Author

CI note: the clasp/macos-latest/native and cando/ubuntu-latest/native failures are pre-existing on main — the latest main run at this PR's base commit (https://github.com/clasp-developers/clasp/actions/runs/27157557544) fails the exact same steps ("Run regression tests" / "Run Cando regression tests"). Both Ubuntu clasp jobs (bytecode + native) pass on this PR.

Locally (macOS arm64, boehmprecise, base = current main + the W^X fixes from #1781/#1786/#1788): full regression suite 1963 passed / 4 failed, all four known-preexisting (SBCL-CROSS-COMPILE-4, INCLUDE-LEVEL-2B/3, TYPES-CLASSES-10), zero new failures; plus the new cffi-fsbv stress test passes (300k mixed struct-by-value calls, no crash).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant