Skip to content

feat: split keccak chip into xorin + keccakf#2338

Closed
shuklaayush wants to merge 10 commits intodevelop-v1.6.0from
develop-new-keccak
Closed

feat: split keccak chip into xorin + keccakf#2338
shuklaayush wants to merge 10 commits intodevelop-v1.6.0from
develop-new-keccak

Conversation

@shuklaayush
Copy link
Copy Markdown
Collaborator

@shuklaayush shuklaayush commented Dec 30, 2025

Closes INT-5017

@github-actions

This comment has been minimized.

@github-actions

This comment has been minimized.

@github-actions

This comment has been minimized.

shuklaayush added a commit that referenced this pull request Jan 2, 2026
- limit threads launched by mulh cuda kernel
- use `inline constexpr` for compile time constants instead of `static
const` in cuda kernels

these are unrelated changes picked from #2338
@github-actions

This comment has been minimized.

@github-actions

This comment has been minimized.

@codspeed-hq
Copy link
Copy Markdown

codspeed-hq Bot commented Jan 5, 2026

CodSpeed Performance Report

Merging this PR will degrade performance by 83.29%

Comparing develop-new-keccak (6f0f247) with develop-v1.6.0 (6dc3800)1

⚠️ Unknown Walltime execution environment detected

Using the Walltime instrument on standard Hosted Runners will lead to inconsistent data.

For the most accurate results, we recommend using CodSpeed Macro Runners: bare-metal machines fine-tuned for performance measurement consistency.

Summary

❌ 9 regressed benchmarks
✅ 15 untouched benchmarks
⏩ 36 skipped benchmarks2

⚠️ Please fix the performance issues or acknowledge them on CodSpeed.

Performance Changes

Mode Benchmark BASE HEAD Efficiency
WallTime benchmark_execute_metered[quicksort] 9 ms 45.8 ms -80.27%
WallTime benchmark_execute_metered[sha256] 9.3 ms 45.2 ms -79.41%
WallTime benchmark_execute_metered[keccak256] 11.1 ms 49.9 ms -77.75%
WallTime benchmark_execute_metered[keccak256_iter] 71.7 ms 192.1 ms -62.7%
WallTime benchmark_execute_metered[fibonacci_recursive] 16.9 ms 53.7 ms -68.48%
WallTime benchmark_execute_metered[bubblesort] 11.3 ms 47.8 ms -76.45%
WallTime benchmark_execute_metered[fibonacci_iterative] 14.3 ms 50.1 ms -71.44%
WallTime benchmark_execute_metered[revm_transfer] 24.9 ms 61.3 ms -59.4%
WallTime benchmark_execute_metered[revm_snailtracer] 7.3 ms 43.9 ms -83.29%

Footnotes

  1. No successful run was found on develop-v1.6.0 (cb19091) during the generation of this report, so 95fdcd5 was used instead as the comparison base. There might be some changes unrelated to this pull request in this report.

  2. 36 benchmarks were skipped, so the baseline results were used instead. If they were deleted from the codebase, click here and archive them to remove them from the performance reports.

@github-actions

This comment has been minimized.

@shuklaayush shuklaayush changed the base branch from main to develop-v1.6.0 January 5, 2026 09:07
@github-actions

This comment has been minimized.

@github-actions

This comment has been minimized.

@jonathanpwang jonathanpwang force-pushed the develop-v1.6.0 branch 2 times, most recently from 95b9cdf to cb19091 Compare January 12, 2026 18:26
jonathanpwang and others added 7 commits January 13, 2026 22:30
New Keccak with Xorin and Keccakf chip and opcode

- [ ] I have performed a self-review of my own code
- [ ] Add negative tests for xorin chip
- [ ] Add negative tests for keccakf chip
- [x] Add unit test to CI
- [x] Add new guest code for E2E test to CI (the keccak example is
updated, but I am thinking of adding another one)
- [ ] Check with Ayush if I implemented the SizedRecord trait correctly
- [ ] Rebase to include Zach's new Plonky3 update and update the keccakf
trace gen to not have to transpose any more before giving it into the
input
- [ ] Maybe add comments to justify the correctness of the
constrain_input_read and constraint_output_write function

To reviewer: I will still have to complete the above checklist. But you
can start reviewing if you would like to.

Closes INT-5017, INT-5721, INT-5720, INT-5718, INT-5717, INT-5646,
INT-5018
Closes INT-5779

---------

Co-authored-by: Ayush Shukla <ayush@axiom.xyz>
Note that I removed the `cells_used` metric since with the introduction
of periphery chips (both here and the new SHA-2) it is very hard to make
it accurate. There are better ways to get that data, and we should add
tooling for it in a separate feature.
@github-actions

This comment has been minimized.

@github-actions

This comment has been minimized.

@jonathanpwang jonathanpwang marked this pull request as ready for review January 20, 2026 19:02
Copilot AI review requested due to automatic review settings January 20, 2026 19:02
Copy link
Copy Markdown
Contributor

@jonathanpwang jonathanpwang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM to merge into develop branch. We will fix lint and add more comprehensive test suite after merging.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR refactors the keccak256 circuit implementation by splitting it into two separate operations: xorin (for XOR-ing input into state) and keccakf (for the keccak-f permutation). This architectural change separates concerns and provides better modularity.

Changes:

  • Split monolithic keccak256 opcode into XORIN and KECCAKF opcodes with separate transpiler handling
  • Refactored guest library to implement keccak256 using the new primitives with optimized state management
  • Created separate AIRs and chips for xorin and keccakf operations, plus a periphery chip for keccakf permutation
  • Updated CUDA kernels to support the new architecture

Reviewed changes

Copilot reviewed 53 out of 56 changed files in this pull request and generated no comments.

Show a summary per file
File Description
extensions/keccak256/transpiler/src/lib.rs Split opcode enum into KeccakfOpcode and XorinOpcode with updated transpiler logic
extensions/keccak256/guest/src/lib.rs Implemented new xorin/keccakf primitives and refactored keccak256 to use them
extensions/keccak256/circuit/src/lib.rs Reorganized module structure for new chip architecture
extensions/keccak256/circuit/src/xorin/* New xorin chip implementation (air, columns, trace, tests)
extensions/keccak256/circuit/src/keccakf_op/* New keccakf operation chip
extensions/keccak256/circuit/src/keccakf_perm/* New keccakf permutation periphery chip
extensions/keccak256/circuit/cuda/* Updated CUDA kernels for new architecture
guest-libs/keccak256/tests/lib.rs Updated tests to use builder pattern with feature flags

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@github-actions
Copy link
Copy Markdown

group app.proof_time_ms app.cycles app.cells_used leaf.proof_time_ms leaf.cycles leaf.cells_used
verify_fibair 276 322,610 0 - - -
fibonacci 1,034 1,500,265 0 - - -
regex 2,342 4,137,877 0 - - -
ecrecover 762 124,173 0 - - -
pairing 1,475 1,745,757 0 - - -

Commit: 6f0f247

Benchmark Workflow

@jonathanpwang
Copy link
Copy Markdown
Contributor

Actually I'm going to close this PR because I will rebase develop-v1.6.0 onto develop-new-keccak for history purposes

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants