Skip to content

ci(codeql): advanced SHA-pinned CodeQL workflow#13

Merged
jackmoxley merged 1 commit into
mainfrom
ci/codeql-advanced
May 23, 2026
Merged

ci(codeql): advanced SHA-pinned CodeQL workflow#13
jackmoxley merged 1 commit into
mainfrom
ci/codeql-advanced

Conversation

@jackmoxley
Copy link
Copy Markdown
Contributor

Adds the advanced CodeQL workflow to main so code scanning can be switched from the default setup to the committed, SHA-pinned advanced workflow.

  • languages: rust, build-mode: none (the Rust extractor works from source — no cargo build, no feature-flag juggling across the wide tiers).
  • Actions SHA-pinned (checkout v4, codeql-action v4).
  • Triggers: push + PR on main, plus a weekly Monday 07:00 UTC drift cron.

Once merged, flip Settings -> Code security -> CodeQL analysis -> Switch to advanced; the codeql.yml/badge.svg badge then becomes available for the README.

Replaces the repo's default code-scanning setup with a committed,
SHA-pinned advanced workflow (languages: rust, build-mode: none).
Lands on main so the 'Switch to advanced' option in Settings -> Code
security takes effect and the codeql.yml badge becomes available.
@codspeed-hq
Copy link
Copy Markdown
Contributor

codspeed-hq Bot commented May 23, 2026

Merging this PR will degrade performance by 11.23%

⚠️ Different runtime environments detected

Some benchmarks with significant performance changes were compared across different runtime environments,
which may affect the accuracy of the results.

Open the report in CodSpeed to investigate

⚡ 2 improved benchmarks
❌ 11 regressed benchmarks
✅ 197 untouched benchmarks
⏩ 9 skipped benchmarks1

Warning

Please fix the performance issues or acknowledge them on CodSpeed.

Performance Changes

Benchmark BASE HEAD Efficiency
floor 346.9 ns 288.6 ns +20.21%
from_f64 460.8 ns 548.3 ns -15.96%
pow_8 924.2 ns 836.7 ns +10.46%
B_handrolled[bound_1e18] 385.8 ns 473.3 ns -18.49%
B_handrolled[mid_1e15] 384.2 ns 471.7 ns -18.55%
B_handrolled[small_1e9] 384.2 ns 471.7 ns -18.55%
D_mg_magic[bound_1e18] 361.4 ns 419.7 ns -13.9%
D_mg_magic[mid_1e15] 361.4 ns 419.7 ns -13.9%
D_mg_magic[small_1e9] 361.4 ns 419.7 ns -13.9%
D_mg_magic[wide_1e22] 428.1 ns 486.4 ns -11.99%
E_production[bound_1e18] 521.1 ns 608.6 ns -14.38%
E_production[mid_1e15] 519.4 ns 606.9 ns -14.42%
E_production[small_1e9] 519.4 ns 606.9 ns -14.42%

Tip

Investigate this regression by commenting @codspeedbot fix this regression on this PR, or directly use the CodSpeed MCP with your agent.


Comparing ci/codeql-advanced (3fe409b) with main (0710fcf)

Open in CodSpeed

Footnotes

  1. 9 benchmarks were skipped, so the baseline results were used instead. If they were deleted from the codebase, click here and archive them to remove them from the performance reports.

@jackmoxley jackmoxley merged commit a9c0e9b into main May 23, 2026
7 checks passed
@jackmoxley jackmoxley deleted the ci/codeql-advanced branch May 23, 2026 23:21
jackmoxley added a commit that referenced this pull request May 28, 2026
The previous `neg_twos_complement` did a two-pass shape:
  1. NOT loop into out[N] (N writes).
  2. `add_assign_fixed(out, [1, 0, …, 0])` (a full N-limb dependent
     carry chain over a second stack array, even though limbs 1..N
     add `0` after limb 0).

At wide N the dependent add chain across every limb dominates: each
overflowing_add reads the previous carry, blocking vectorisation, and
the second stack array is pure overhead.

Replace with a limb-0 split:
  - `out[0] = !a[0] + 1`, capture the carry `c0`.
  - If `c0 == false` (the overwhelmingly common path), limbs 1..N
    reduce to plain independent `!a[i]` writes — no cross-limb
    dependency chain, the compiler can keep them register-resident
    and vectorise the NOT loop.
  - If `c0 == true` (`a[0] == MAX`), fall back to a dependent
    carry-prop chain through limbs 1..N (the correct, slow path).

Generic over `N`, single kernel — no per-tier copies, no LimbSize
axis, no Scratch-on-Int needed. Constitution rules 1-6 hold: one
generic algorithm, one named file, matcher unchanged, sizing local
to width.

A/B verdict (benches/micro/neg_kernel_ab.rs, 6 inputs covering
tiny / half_wide / mid / high / low / carry_chain):

  D462  (N=24): fused_split ≈ two_pass  (within ±10%, noisy)
  D616  (N=32): fused_split beats two_pass by 1.25-1.83x
  D924  (N=48): fused_split beats two_pass by 1.42-2.42x
  D1232 (N=64): fused_split beats two_pass by 1.54-1.63x

Recovers ranks #23/#27/#28 (D616), #31 (D1232) of the bbc §8.4 wide-
neg cluster; D462 (#13/#17/#19/#20) is a wash at the kernel level
(any remaining gap lives in the call shape, not the kernel).

Bench seam: `__bench_internals::neg_fused_split` (routed kernel),
`neg_two_pass` (previous shape, reference baseline), `neg_fused_open`
(single-pass dependent-chain candidate). All bit-identical, asserted
before timing.

Validation: 6 kernel unit tests + 785 lib tests pass.
`cargo check` (default) + `cargo check --features
wide,x-wide,xx-wide,macros --all-targets` both clean.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant