ci: land lib-cmp comparison-bench workflows on main (fidelity + perf)#20
Merged
Conversation
Land the two comparison-bench workflows on the default branch so their workflow_dispatch UI control appears and the pull_request->main hook fires (GitHub reads dispatch/PR workflows from the default branch). Both are advisory (never required status checks): - lib-cmp-precision.yml: the Fidelity / LSB-epsilon precision shootout. - lib-cmp-perf.yml: the peer-crate timing comparison (width x scale). Trigger: pull_request->main + workflow_dispatch. They run against the selected ref's tree (release/0.5.0 carries the benches today; they arrive on main with the 0.5.0 release), so on main they stay dormant-but-valid until then.
Contributor
Merging this PR will not alter performance
Warning Please fix the performance issues or acknowledge them on CodSpeed. Performance Changes
Tip Investigate this regression by commenting Comparing Footnotes
|
jackmoxley
added a commit
that referenced
this pull request
May 28, 2026
The previous `neg_twos_complement` did a two-pass shape:
1. NOT loop into out[N] (N writes).
2. `add_assign_fixed(out, [1, 0, …, 0])` (a full N-limb dependent
carry chain over a second stack array, even though limbs 1..N
add `0` after limb 0).
At wide N the dependent add chain across every limb dominates: each
overflowing_add reads the previous carry, blocking vectorisation, and
the second stack array is pure overhead.
Replace with a limb-0 split:
- `out[0] = !a[0] + 1`, capture the carry `c0`.
- If `c0 == false` (the overwhelmingly common path), limbs 1..N
reduce to plain independent `!a[i]` writes — no cross-limb
dependency chain, the compiler can keep them register-resident
and vectorise the NOT loop.
- If `c0 == true` (`a[0] == MAX`), fall back to a dependent
carry-prop chain through limbs 1..N (the correct, slow path).
Generic over `N`, single kernel — no per-tier copies, no LimbSize
axis, no Scratch-on-Int needed. Constitution rules 1-6 hold: one
generic algorithm, one named file, matcher unchanged, sizing local
to width.
A/B verdict (benches/micro/neg_kernel_ab.rs, 6 inputs covering
tiny / half_wide / mid / high / low / carry_chain):
D462 (N=24): fused_split ≈ two_pass (within ±10%, noisy)
D616 (N=32): fused_split beats two_pass by 1.25-1.83x
D924 (N=48): fused_split beats two_pass by 1.42-2.42x
D1232 (N=64): fused_split beats two_pass by 1.54-1.63x
Recovers ranks #23/#27/#28 (D616), #31 (D1232) of the bbc §8.4 wide-
neg cluster; D462 (#13/#17/#19/#20) is a wash at the kernel level
(any remaining gap lives in the call shape, not the kernel).
Bench seam: `__bench_internals::neg_fused_split` (routed kernel),
`neg_two_pass` (previous shape, reference baseline), `neg_fused_open`
(single-pass dependent-chain candidate). All bit-identical, asserted
before timing.
Validation: 6 kernel unit tests + 785 lib tests pass.
`cargo check` (default) + `cargo check --features
wide,x-wide,xx-wide,macros --all-targets` both clean.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Lands the two comparison-bench workflows on the default branch so they become UI-dispatchable (the
workflow_dispatch'Run workflow' control only appears for workflows on the default branch) and so thepull_request->main hook fires (GitHub reads dispatch/PR workflow definitions from the default branch).Both are advisory — never required status checks. Trigger:
pull_request-> main +workflow_dispatch. They run against the selected ref's tree (release/0.5.0 carries the lib_cmp benches today; they arrive on main with the 0.5.0 release), so on main they stay dormant-but-valid until then. No source changes — workflow files only.