Skip to content

Devnet4#181

Open
TomWambsgans wants to merge 160 commits into
mainfrom
devnet4
Open

Devnet4#181
TomWambsgans wants to merge 160 commits into
mainfrom
devnet4

Conversation

@TomWambsgans

Copy link
Copy Markdown
Collaborator

No description provided.

TomWambsgans and others added 8 commits April 11, 2026 22:38
* poseidon avx2 avx512 (#192)

Co-authored-by: Tom Wambsgans <TomWambsgans@users.noreply.github.com>

* fix avx512 (panicked on small instances) (#193)

* fix avx512 (panicked on small instances)

* add `test_aggregation`

---------

Co-authored-by: Tom Wambsgans <TomWambsgans@users.noreply.github.com>

* fmt

---------

Co-authored-by: Tom Wambsgans <TomWambsgans@users.noreply.github.com>
@TomWambsgans TomWambsgans force-pushed the main branch 2 times, most recently from c6f9fd4 to c09c85a Compare April 26, 2026 21:43
TomWambsgans and others added 4 commits April 30, 2026 22:22
…ON dot product regression tests

Co-authored-by: mo-melvin77 <momelvinmome@gmail.com>
Co-authored-by: Thomas Coratger <thomas.coratger@gmail.com>
…vnet4` has been merged into `leanSig:main`)
#221)

The benchmark recursion already builds a NodeStats per node for the
live-tree display, but only the root's wall-clock is returned to
callers. Promote NodeStats / NodeReport / BenchmarkReport to pub and
add `run_aggregation_benchmark_report` returning the full per-node
breakdown (time, proof_kib, cycles, memory, poseidons, dots, n_xmss).

The existing `run_aggregation_benchmark` is preserved as a thin
wrapper returning the root node's `time_secs`, so the
`test_aggregation_throughput_per_num_xmss` test in the same file
continues to compile unchanged.

Matches the API already shipped on devnet5, letting downstream
consumers (e.g. lean-bench) collect identical per-node telemetry
across both branches.
@TomWambsgans TomWambsgans force-pushed the main branch 2 times, most recently from eacd019 to 9b2f632 Compare May 25, 2026 00:11
@TomWambsgans TomWambsgans force-pushed the main branch 2 times, most recently from c5a3050 to 9dc5d68 Compare May 28, 2026 12:02
TomWambsgans and others added 7 commits May 29, 2026 04:55
The merge of main into devnet4 (8eec56c) changed MleOwned to hold an
ArenaVec, but combine_statement still returned a heap Vec, which was
then bridged with ArenaVec::from_slice. At n_vars=24 that is a
single-threaded memcpy of ~256 MiB of extension elements per proof
while all worker threads sit idle, and the data crosses the memory
hierarchy twice.

Build the weights directly in an ArenaVec so it is moved, never
copied. All writers (compute_eval_eq_packed*, split_at_mut_many) take
&mut [T] and work unchanged via deref. The ArenaVec is created inside
the same proving phase where it was previously copied into one, so
arena-phase semantics are unchanged.

On a Zen5 box (Ryzen 9700X) this turns a -5.3% XMSS-aggregation
regression vs pre-merge into a +2.2% improvement (215.9 -> 220.7
XMSS/s); run_initial_sumcheck_rounds self time drops from 7.94% back
to 0.00%. Proof size is unchanged.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants