feat: size-routing fix for phase-crossing allocations#8
Closed
feat: size-routing fix for phase-crossing allocations#8
Conversation
Route allocations smaller than MIN_ARENA_BYTES (default 4096) to System allocator even during active phases. This prevents library-internal allocations (tracing-subscriber Registry, hashbrown HashMap, crossbeam Injector blocks) from landing in arena memory that gets recycled on begin_phase(). Root cause: libraries like tracing-subscriber intentionally pool heap capacity across logical lifetimes (ExtensionsInner::clear() retains HashMap backing). When this pooled memory is arena-allocated, the next phase recycles it while the library still holds a reference. Key changes: - MIN_ARENA_BYTES=4096 default (configurable via ZK_ALLOC_MIN_BYTES env) - SLAB_SIZE now configurable via ZK_ALLOC_SLAB_GB env var - POISON_RESET diagnostic mode (ZK_ALLOC_POISON_RESET=1) for debugging - MADV_DONTNEED support in syscall module - 6 test files covering reproduction, stress, boundary cases - findings.tsv: full 15-finding audit trail from exp5 Verified: 100 iterations on Plonky3 prove, 30 proofs on leanMultisig, no measurable performance regression. Replaces flush_rayon as primary soundness fix. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…ned Vec A Vec >= MIN_ARENA_BYTES allocated in phase N and grown via push() in phase N+1 (after arena reset) gets its preserved bytes silently replaced by phase N+1's first allocation. The realloc impl calls copy_nonoverlapping(old_ptr, new_ptr, layout.size()) — but old_ptr now references recycled memory. Both tests (8 KB and 16 KB Vecs) confirm v[0..SIZE] reads OVERWRITE bytes instead of the original FILL after a single push() across the boundary. The size-routing fix (MIN_ARENA_BYTES=4096) does NOT address this — any allocation above the threshold inherits the bug. This is a NEW class of soundness issue, distinct from F1 (rayon Injector) and F4 (tracing Registry) which are library-internal pooled allocations. This one is user-visible. Tests assert the bug REPRODUCES (no fix deployed yet); flip when fix lands. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
There is no RAII guard around begin_phase/end_phase. A panic that propagates out before end_phase() is reached leaves ARENA_ACTIVE=true. Subsequent allocations made by user code that believes the phase is over land in arena and are silently recycled on the next begin_phase(). Test confirms: post-panic Vec lands at slab+0; the next begin_phase + 1MB allocation overlaps that range; post_panic[0] reads 0x33 (the new phase's fill) instead of the original 0xCC. This is a public-API hazard, not a library-internal bug. Recommended fix: introduce a PhaseGuard struct with Drop-based end_phase, or document that phases are unsafe across panics. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
crossbeam-deque's Worker::push doubles its Buffer at fill capacity and never shrinks. Deep rayon::join recursion (depth 512+) drives buffer growth past 4 KB → lands in arena. Phase boundary recycles the slab; the worker still references the old buffer pointer. If the worker subsequently allocates heap data in phase N+1 that overlaps the buffer's offset, push/ pop reads/writes corrupt memory. Findings: - With ZK_ALLOC_MIN_BYTES=0 + rayon-flush off: SIGSEGV (deterministic) - With ZK_ALLOC_MIN_BYTES=4096 (default): 0/20 cycles corrupted - With ZK_ALLOC_MIN_BYTES=4096 + rayon-flush off: 0/20 cycles corrupted Size-routing alone is sufficient; rayon-flush adds nothing for this case. Worker buffers stay below the 4 KB threshold for typical workloads; the size-routing fix correctly diverts the small intermediate growth steps. Three tests: - worker_deque_growth_during_phase (canary in main slab — does not fire) - deep_recursion_phase_cycle_program_integrity (program survives 50 cycles) - worker_buffer_growth_with_per_worker_canary (per-worker canary — fires under no-fix) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
GENERATION is a global atomic. begin_phase() from any thread bumps it, forcing every other thread's next allocation through the cold-reset path and invalidating arena data they still hold. The slabs are per-thread, but the generation counter is global — so cross-thread phase calls reset the wrong slab from the wrong actor. Tests: - cross_thread_begin_phase_invalidates_data: deterministic. T1 holds an arena Vec; T2 calls begin_phase via barrier-controlled handshake; T1's next alloc lands at slab+0 over v's bytes. Asserts v_corrupted=true. - two_threads_running_lifecycle_concurrently_corrupt_each_other: stress observation only — race window for inter-alloc cross-thread interference is too tight to assert reliably. - concurrent_phase_stress_no_crash: 4 worker threads spam begin/end+alloc while main does 5000 begin/end cycles; verifies allocator atomics survive. Same hazard class as F17 (no RAII guard): public-API misuse silently corrupts arena data. Recommended fix: scoped begin_phase that returns PhaseGuard, taking a marker that prevents concurrent / nested misuse. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Drives many crossbeam-deque buffer growths during a phase to retire objects to crossbeam-epoch's Bag<Garbage>, then crosses the boundary and drives more retires. F6 hypothesized Bag nodes (~1.5 KB) are caught by size-routing; this confirms empirically. Empirical observation: - ZK_ALLOC_MIN_BYTES=4096 (default): 50/50 cycles clean - ZK_ALLOC_MIN_BYTES=0 + rayon-flush off: 50/50 cycles clean (Bag-cycling test alone — par_iter test hangs same as F11, not a Bag issue) Bag nodes do not appear to be a load-bearing vector for the bug class — they're small enough to bypass arena under default routing and small enough not to overlap dangerously even without it (in this workload). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
ThreadPoolBuilder::build() allocates Registry, ThreadInfo arrays, initial deques, and Sleep state. Built during an active phase, those land in arena; held across phase boundaries → Registry pointers reference recycled memory; .install() walks corrupted scheduler state. Findings: - ZK_ALLOC_MIN_BYTES=4096 (default): 10 cycles clean (allocations bypass arena under size-routing). - ZK_ALLOC_MIN_BYTES=0 + rayon-flush off: deterministic SIGSEGV on first cycle's .install(). - Pool built BEFORE any phase: clean across 50 phase cycles, even without the fix. Confirms the hazard is specific to mid-phase construction. Three tests: build during phase + use across boundary, build+use+drop in single phase, build before phase + use across many. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Arc::new allocates ArcInner<T> = (strong_count, weak_count, T). For T large enough that the allocation is >= MIN_ARENA_BYTES (default 4096), the ArcInner lands in arena. Holding the Arc across a phase boundary puts the refcount fields in the path of recycled writes — phase 2's first allocation overwrites them with arbitrary bytes. Test confirms strong_count reads 0x5555555555555555 after a 1 MB filler overwrites the slab. clone() increments to 0x...5556. Arc continues operating but refcount can never reach zero → silent memory leak. Worst cases: refcount underflow (premature drop), use-after-free, or — if ArcInner aligns differently — SIGSEGV on the AtomicUsize CAS. Same fundamental bug class as F16 (data retained across phase corrupted by reset) but applies to refcounted allocations specifically. Size-routing covers small Arcs (e.g., Arc<u64>) but not large payloads. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
HashMap (hashbrown) allocates one contiguous block for ctrl bytes + bucket array. With with_capacity(4096) HashMap<u64, u64>, the allocation is ~64 KB — above MIN_ARENA_BYTES so it lands in arena. Held across phase boundary, phase 2's first allocation overwrites the entire structure. Detection: HashMap.get on corrupted ctrl bytes (all 0x55 from filler) infinite-loops in linear probing — every slot looks "full" but no hash matches. Test uses a worker thread with mpsc::recv_timeout to detect hang as the corruption signal. Confirmed: 2s timeout fires deterministically. Same bug class as F22 (Arc) — any large allocation retained across phase is corrupted. Worse failure mode here: silent hang rather than memory leak. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
A trait object is (data_ptr, vtable_ptr). The vtable lives in .rodata so dispatch survives any heap corruption. The data lives in the Box's heap allocation; held across a phase boundary, the bytes are overwritten by phase 2's first allocation. Tests: - box_dyn_trait_data_corrupted_silent: confirms sentinel field reads 0x5555555555555555 (filler bytes) and payload_sum returns garbage. Vtable dispatch worked; data was wrong. - box_dyn_vtable_dispatch_survives_data_corruption: trait method only increments a global atomic; runs correctly across corrupted self. Failure mode is silent wrong values — no panic, no SIGSEGV, no hang. Worst form of bug class for correctness checks. If the trait impl had chased a pointer in self (e.g., self.inner_box.method()), it would SIGSEGV on dereferencing 0x5555_5555_5555_5555. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Add PhaseGuard struct with Drop-based end_phase, plus a phase(|| { ... })
scoped helper. Both compose with the existing free-function API; nothing
breaks. Use in place of paired begin_phase/end_phase calls when the phase
body can panic.
Test (test_phase_guard.rs):
- phase_guard_runs_end_phase_on_panic: catch_unwind a panicking phase()
body; verify post-unwind allocations land in System (not arena).
Mirrors the F17 reproducer with the fix applied — passes deterministically.
- phase_guard_runs_end_phase_on_normal_return: sanity check.
- nested_phase_guards_compose: nested phase() works.
Does not address F19 (concurrent begin/end) — same begin_phase atomics,
no cross-thread protection. That needs a separate scoped/exclusive API.
Cargo fmt also folded test_box_dyn_phase.rs.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Document that Vec/Arc/HashMap/Box<dyn Trait> >= MIN_ARENA_BYTES held across begin_phase() are silently overwritten. Lists each container's specific failure mode and points to clone()-via-system as the workaround. This is documentation only; behavior unchanged. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
These tests demonstrate that retaining arena allocations across phase boundaries silently corrupts user data — they're educational about the phase-scoping contract, not standard regression tests. They live on the exp5-soundness-tests branch alongside the soundness audit notes. Removed: test_realloc_phase, test_arc_phase, test_hashmap_phase, test_box_dyn_phase. Tests that exercise the allocator's correctness under valid usage (test_panic_phase, test_concurrent_phase, test_worker_deque_growth, test_crossbeam_epoch, test_threadpool_resize, test_phase_guard) stay. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
When an allocation's original ptr is not in the arena region (it came from System — either pre-phase, sub-threshold, or otherwise routed), realloc now goes through System.realloc instead of self.alloc + copy. This prevents the silent System→arena migration on growth: a Vec allocated outside a phase (System) that grows inside a phase would otherwise have its new buffer placed in the arena, where it becomes subject to phase recycling. System.realloc may also grow in place (mremap on Linux), avoiding the explicit copy that self.alloc + copy_nonoverlapping forces. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Document explicitly that ZkAllocator dispatches between arena and System, with size-routing and sticky-System realloc keeping the boundaries clean. Spell out the phase-scoping contract (don't retain across begin_phase), the available env vars (SLAB_GB / MIN_BYTES / POISON_RESET), and update the example to use phase() / PhaseGuard for panic safety. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Replace the single-loop example with one using phase() (panic-safe), then add three new sections: the two-allocator model (arena vs System and what goes where), the phase-scoping contract (don't retain across begin_phase, construct long-lived state pre-phase, prefer phase() RAII), and a table of the three env vars (SLAB_GB / MIN_BYTES / POISON_RESET). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
On macOS aarch64 the test's two allocations land at different addresses (T1's first alloc goes to System ~5GB while phase-2 arena allocations sit ~500GB up), so v and w never alias and the cross-thread invalidation isn't observable from this test. The bug is real on macOS too — the allocator's GENERATION atomic is global — but this specific reproducer needs aliasing to detect it. Track aliasing via a separate AtomicBool and only assert corruption when v and w aliased. Otherwise pass with a "test inconclusive on this platform" note. Linux behavior unchanged (deterministic corruption). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
macOS aarch64 has a smaller default thread stack than Linux, and debug builds inflate per-frame size. nested_join recursion at depths 1024 / 2048 / 512 overflowed the rayon worker stack on the macOS CI runner. Cap all DEPTH constants at 256. That still drives crossbeam-deque's Buffer to its 256-slot capacity (~4 KB header included), which crosses MIN_ARENA_BYTES=4096 — so the test still exercises the size-routing boundary the original depths targeted. Verified on linux in both release and debug builds. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The 3 tests touch process-global state — ARENA_ACTIVE, the bump pointer, and the panic hook — and Cargo runs tests in parallel by default. With nested_phase_guards_compose racing phase_guard_runs_end_phase_on_normal_return, the latter could observe ARENA_ACTIVE=true at the wrong moment and route its `after` allocation into the arena, where the next phase's filler recycled it. Add a file-local `static PHASE_LOCK: Mutex<()>` and grab it at the top of each test. No external deps; serializes only within this test binary. Other test binaries keep their global state isolated by living in separate processes (per Cargo's test-binary boundary). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
flush_rayonfix (leanEthereum/leanMultisig@f5e2299) which addressed one class of phase-crossing allocations but was proven insufficient for a broader second classZK_ALLOC_MIN_BYTES,ZK_ALLOC_SLAB_GB,ZK_ALLOC_POISON_RESETenv var controlsBug Class 1: Rayon Injector Blocks
Found by: Emile (leanEthereum/leanMultisig@f5e2299)
Fixed by:
flush_rayon— 256 no-oprayon::joincalls drain crossbeam-deque Injector slotsRayon's crossbeam-deque allocates Injector blocks (~520B, 64 JobRef slots) during parallel work. Under
#[global_allocator], these land in the arena during active phases. Whenbegin_phase()recycles the slab, rayon still holds a pointer to the old block — next job push writes 17 bytes over recycled memory. Silent corruption, not a crash.Characterization:
Bug Class 2: Library Allocation Pooling
Found by: This investigation
Fixed by: Size-routing — allocations < 4096B bypass arena, go to System
flush_rayondoes NOT fix Plonky3 — crashes persist with rayon-flush ON. A second class of phase-crossing allocations exists that flush_rayon cannot reach.Root cause: Libraries like
tracing-subscriberintentionally pool heap capacity across logical lifetimes.ExtensionsInner::clear()retains HashMap backing so future spans reuse it without reallocating. When the first span in a Registry slot is created during arena phase N, the HashMap backing lives in the arena.begin_phase(N+1)recycles it → use-after-free → panic/SIGSEGV.Same pattern exists in: crossbeam-epoch Bags, sharded-slab Pages, hashbrown HashMap rehash buffers. All explicitly documented as retaining capacity.
Key insight: Both bug classes share a property — problematic allocations are small (<4KB). The allocations zk-alloc needs to accelerate are large (polynomials, matrices, Merkle trees — all >>4KB). Size-routing exploits this gap. It also subsumes flush_rayon (Injector blocks are ~520B, well under threshold).
Performance
No measurable regression:
Validation
Limitations
ZK_ALLOC_MIN_BYTESfor heavier tracing workloads.#[global_allocator]arena patterns — any library that pools heap capacity across logical lifetimes is affected. Size-routing is the practical optimum; true robustness beyond the threshold requires scope-aware API changes.Test plan
cargo test --release— all tests passcargo clippy— clean