feat: split keccak chip into xorin + keccakf#2338
feat: split keccak chip into xorin + keccakf#2338shuklaayush wants to merge 10 commits intodevelop-v1.6.0from
Conversation
This comment has been minimized.
This comment has been minimized.
659cea1 to
01be16e
Compare
This comment has been minimized.
This comment has been minimized.
01be16e to
29ca86b
Compare
This comment has been minimized.
This comment has been minimized.
- limit threads launched by mulh cuda kernel - use `inline constexpr` for compile time constants instead of `static const` in cuda kernels these are unrelated changes picked from #2338
29ca86b to
be4d2d8
Compare
This comment has been minimized.
This comment has been minimized.
be4d2d8 to
4c32ab4
Compare
This comment has been minimized.
This comment has been minimized.
CodSpeed Performance ReportMerging this PR will degrade performance by 83.29%Comparing
|
| Mode | Benchmark | BASE |
HEAD |
Efficiency | |
|---|---|---|---|---|---|
| ❌ | WallTime | benchmark_execute_metered[quicksort] |
9 ms | 45.8 ms | -80.27% |
| ❌ | WallTime | benchmark_execute_metered[sha256] |
9.3 ms | 45.2 ms | -79.41% |
| ❌ | WallTime | benchmark_execute_metered[keccak256] |
11.1 ms | 49.9 ms | -77.75% |
| ❌ | WallTime | benchmark_execute_metered[keccak256_iter] |
71.7 ms | 192.1 ms | -62.7% |
| ❌ | WallTime | benchmark_execute_metered[fibonacci_recursive] |
16.9 ms | 53.7 ms | -68.48% |
| ❌ | WallTime | benchmark_execute_metered[bubblesort] |
11.3 ms | 47.8 ms | -76.45% |
| ❌ | WallTime | benchmark_execute_metered[fibonacci_iterative] |
14.3 ms | 50.1 ms | -71.44% |
| ❌ | WallTime | benchmark_execute_metered[revm_transfer] |
24.9 ms | 61.3 ms | -59.4% |
| ❌ | WallTime | benchmark_execute_metered[revm_snailtracer] |
7.3 ms | 43.9 ms | -83.29% |
Footnotes
-
No successful run was found on
develop-v1.6.0(cb19091) during the generation of this report, so 95fdcd5 was used instead as the comparison base. There might be some changes unrelated to this pull request in this report. ↩ -
36 benchmarks were skipped, so the baseline results were used instead. If they were deleted from the codebase, click here and archive them to remove them from the performance reports. ↩
4c32ab4 to
fa309ed
Compare
This comment has been minimized.
This comment has been minimized.
fa309ed to
0608cf3
Compare
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
95b9cdf to
cb19091
Compare
New Keccak with Xorin and Keccakf chip and opcode - [ ] I have performed a self-review of my own code - [ ] Add negative tests for xorin chip - [ ] Add negative tests for keccakf chip - [x] Add unit test to CI - [x] Add new guest code for E2E test to CI (the keccak example is updated, but I am thinking of adding another one) - [ ] Check with Ayush if I implemented the SizedRecord trait correctly - [ ] Rebase to include Zach's new Plonky3 update and update the keccakf trace gen to not have to transpose any more before giving it into the input - [ ] Maybe add comments to justify the correctness of the constrain_input_read and constraint_output_write function To reviewer: I will still have to complete the above checklist. But you can start reviewing if you would like to. Closes INT-5017, INT-5721, INT-5720, INT-5718, INT-5717, INT-5646, INT-5018
Resolves INT-5719
Resolves INT-5722
Closes INT-5779 --------- Co-authored-by: Ayush Shukla <ayush@axiom.xyz>
7571c7e to
75550a1
Compare
Note that I removed the `cells_used` metric since with the introduction of periphery chips (both here and the new SHA-2) it is very hard to make it accurate. There are better ways to get that data, and we should add tooling for it in a separate feature.
This comment has been minimized.
This comment has been minimized.
closes INT-5935
This comment has been minimized.
This comment has been minimized.
jonathanpwang
left a comment
There was a problem hiding this comment.
LGTM to merge into develop branch. We will fix lint and add more comprehensive test suite after merging.
There was a problem hiding this comment.
Pull request overview
This PR refactors the keccak256 circuit implementation by splitting it into two separate operations: xorin (for XOR-ing input into state) and keccakf (for the keccak-f permutation). This architectural change separates concerns and provides better modularity.
Changes:
- Split monolithic keccak256 opcode into
XORINandKECCAKFopcodes with separate transpiler handling - Refactored guest library to implement keccak256 using the new primitives with optimized state management
- Created separate AIRs and chips for xorin and keccakf operations, plus a periphery chip for keccakf permutation
- Updated CUDA kernels to support the new architecture
Reviewed changes
Copilot reviewed 53 out of 56 changed files in this pull request and generated no comments.
Show a summary per file
| File | Description |
|---|---|
| extensions/keccak256/transpiler/src/lib.rs | Split opcode enum into KeccakfOpcode and XorinOpcode with updated transpiler logic |
| extensions/keccak256/guest/src/lib.rs | Implemented new xorin/keccakf primitives and refactored keccak256 to use them |
| extensions/keccak256/circuit/src/lib.rs | Reorganized module structure for new chip architecture |
| extensions/keccak256/circuit/src/xorin/* | New xorin chip implementation (air, columns, trace, tests) |
| extensions/keccak256/circuit/src/keccakf_op/* | New keccakf operation chip |
| extensions/keccak256/circuit/src/keccakf_perm/* | New keccakf permutation periphery chip |
| extensions/keccak256/circuit/cuda/* | Updated CUDA kernels for new architecture |
| guest-libs/keccak256/tests/lib.rs | Updated tests to use builder pattern with feature flags |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
Actually I'm going to close this PR because I will rebase |
Closes INT-5017