Generate snarkjs-compatible Groth16 .zkey files from .ptau + .r1cs in seconds, not hours.
zk-setup is a Rust tool that replaces snarkjs groth16 setup for Circom circuits. It produces byte-identical output (sections 1-9) while running 20-60x faster on large circuits by leveraging multi-threaded curve arithmetic via Rayon.
Running snarkjs groth16 setup on large Circom circuits is painfully slow. A WebAuthn P-256 signature verification circuit with ~2 million constraints takes 60+ minutes in snarkjs (single-threaded JavaScript). During development, this makes iteration cycles unbearable.
zk-setup reimplements the trusted setup computation in Rust with:
- Multi-threaded constraint processing via Rayon (uses all CPU cores)
- Parallel point accumulation with per-thread buffers and merge
- Direct ptau binary parsing without intermediate representations
Result: the same 2M-constraint circuit completes in ~3 minutes on an 8-core machine.
# Clone and build
git clone https://github.com/example/zk-setup.git
cd zk-setup
cargo build --release
# Binary will be at target/release/zk-setup- Rust 1.70+ (uses edition 2021)
- A
.ptaufile from the Hermez trusted setup ceremony (or any snarkjs-compatible ptau) - A
.r1csfile compiled from a Circom circuit
# Basic usage
zk-setup --r1cs circuit.r1cs --ptau pot_final.ptau --out circuit_0000.zkey
# Limit thread count
zk-setup --r1cs circuit.r1cs --ptau pot_final.ptau --out circuit_0000.zkey -j 4The output _0000.zkey is a phase-1 zkey (before any phase-2 contributions). For production use, apply phase-2 contributions:
# Apply a phase 2 contribution (interactive)
snarkjs zkey contribute circuit_0000.zkey circuit_final.zkey --name="Contributor 1"
# Or apply a random beacon
snarkjs zkey beacon circuit_0000.zkey circuit_final.zkey 0102030405060708 10
# Export the verification key
snarkjs zkey export verificationkey circuit_final.zkey verification_key.jsonThe resulting circuit_final.zkey works with both snarkjs and rapidsnark for proof generation.
| Circuit | Constraints | snarkjs | zk-setup (8 cores) | Speedup |
|---|---|---|---|---|
| Small (multiplier) | ~1K | 2s | <1s | ~2x |
| Medium | ~100K | 5 min | 15s | ~20x |
| Large (WebAuthn P-256) | ~2M | 60+ min | ~3 min | ~20-60x |
Performance scales roughly linearly with the number of CPU cores.
.ptau file ──> Parse Lagrange-basis points ──┐
├──> Parallel MSM accumulation ──> Write .zkey
.r1cs file ──> Parse R1CS constraints ───────┘
- Parse R1CS -- Read the constraint system (A, B, C matrices in sparse form) using
ark-circom - Parse ptau -- Extract Lagrange-basis ceremony points from the
.ptaubinary file (sections 12-15) - Accumulate -- For each constraint, scatter-add weighted curve points into per-signal accumulators across parallel threads
- Merge -- Combine per-thread accumulators into final section points
- Write -- Serialize all 10 zkey sections in snarkjs-compatible binary format
Rather than computing one MSM per signal (which has poor cache locality on large circuits), zk-setup iterates over constraints and accumulates into per-signal buffers. This is the msm_fast path:
For each constraint c_i (parallelized across threads):
For each (signal, coef) in A[c_i]:
a_points[signal] += coef * tauG1_lagrange[i]
ic_or_c[signal] += coef * betaTauG1_lagrange[i]
For each (signal, coef) in B[c_i]:
b1_points[signal] += coef * tauG1_lagrange[i]
b2_points[signal] += coef * tauG2_lagrange[i]
ic_or_c[signal] += coef * alphaTauG1_lagrange[i]
For each (signal, coef) in C[c_i]:
ic_or_c[signal] += coef * tauG1_lagrange[i]
This processes all constraint non-zero entries in a single pass per thread, with O(total_nonzero_entries / num_threads) work per thread.
Many ptau files from ceremony repositories only contain monomial-basis points (sections 2-6) and tauG1 Lagrange points (section 12). The Lagrange points for tauG2, alphaTauG1, and betaTauG1 (sections 13-15) are often missing -- normally you'd need to run snarkjs powersoftau prepare phase2 first.
zk-setup handles this automatically: if sections 13-15 are missing, it computes Lagrange-basis points from monomial sections via a parallel radix-2 IFFT over curve points. No separate preparation step needed.
These are hard-won insights from making the output byte-identical to snarkjs. They're documented here because getting any of them wrong produces a zkey that silently generates invalid proofs.
The coefficient section (section 4) stores Fr field elements in R-squared Montgomery form: value * R^2 mod r, where R = 2^256 mod r.
This is not standard Montgomery form (value * R mod r), which is what most math libraries use internally. snarkjs explicitly multiplies by an extra factor of R before writing (see zkey_new.js line 330: nR2 = Fr.mul(n, R2r)).
In ark-ff, an Fr value has internal representation = standard_value * R. To get standard_value * R^2:
- Extract the internal bigint (
= standard_value * R) - Reinterpret it as a "standard" value and call
Fr::from_bigint, which multiplies by R - Write the resulting internal representation (
= standard_value * R^2)
Unlike the Fr coefficients in section 4, the Fq coordinates of G1 and G2 points use ordinary Montgomery form (value * R mod q). Both ptau files and zkey files store Fq this way. When reading, use Fq::new_unchecked to avoid double-Montgomery-encoding.
The proving key points (A, B1, B2, C/L, H queries) must be derived from the ptau ceremony's Lagrange-basis points -- they cannot be generated from random toxic waste. ark-groth16::generate_random_parameters generates its own random alpha, beta, delta, tau, which makes its output fundamentally incompatible with snarkjs/rapidsnark verification keys.
Despite both implementing Groth16, ark-groth16's generate_random_parameters cannot produce zkeys compatible with snarkjs or rapidsnark. The issue is structural: ark-groth16 generates its own ceremony parameters internally, while snarkjs expects proving keys derived from a specific ptau ceremony file. This tool bridges that gap.
The tool produces _0000.zkey files (zero phase-2 contributions). The gamma and delta parameters are set to the generator points (identity in the exponent). You must run snarkjs zkey contribute or snarkjs zkey beacon before using in production. For development and testing, the _0000.zkey works fine.
| Zkey Section | Content | Match with snarkjs |
|---|---|---|
| 1 | Header (prover type) | Byte-identical |
| 2 | Groth16 header (curve params + VK) | Byte-identical |
| 3 | IC / gamma_abc_g1 | Byte-identical |
| 4 | Coefficients (R^2 Montgomery) | Byte-identical |
| 5 | A query (G1) | Byte-identical |
| 6 | B1 query (G1) | Byte-identical |
| 7 | B2 query (G2) | Byte-identical |
| 8 | C/L query (G1) | Byte-identical |
| 9 | H query (G1) | Byte-identical |
| 10 | Contributions (csHash) | Differs* |
*Section 10 contains a circuit hash (csHash) that may differ because snarkjs computes the H-point hash from ptau section 2 (monomial basis), while zk-setup uses the computed H points. This does not affect proving or verification -- it only matters for ceremony chain verification.
src/
main.rs -- CLI entry point, orchestration, csHash computation
ptau_reader.rs -- Parse .ptau binary format, IFFT for unprepared files
zkey_writer.rs -- Serialize .zkey binary format, R^2 Montgomery encoding
msm_fast.rs -- Parallel constraint-driven point accumulation
| Module | Responsibility |
|---|---|
main.rs |
Argument parsing, R1CS loading, Lagrange point deserialization, H query extraction, csHash computation, orchestration |
ptau_reader.rs |
Read ptau binary sections, extract ceremony points, compute missing Lagrange sections via parallel IFFT over G1/G2 curve points |
zkey_writer.rs |
Write all 10 zkey sections in snarkjs order (1,2,4,3,9,8,5,6,7,10), handle R^2 Montgomery coefficient encoding |
msm_fast.rs |
Split constraints across Rayon threads, accumulate weighted curve points per signal, merge thread-local buffers, parallel affine serialization |
=== snarkjs groth16 setup ===
[INFO] snarkJS: Reading r1cs
[INFO] snarkJS: Reading tauG1
[INFO] snarkJS: Reading tauG2
[INFO] snarkJS: Reading alphatauG1
[INFO] snarkJS: Reading betatauG1
Time: 4.952s (1.22s user)
=== zk-setup (Rust) ===
[zk-setup] Using 16 threads
[zk-setup] Loaded in 0.0s -- 1 constraints, 4 vars, 2 public, domain=4
[zk-setup] Loaded ptau (power=10) in 0.0s
[zk-setup] Sections computed in 0.0s
[zk-setup] Total: 0.0s
Time: 0.170s
=== Byte-level section comparison ===
Section 1: MATCH (4 bytes)
Section 2: MATCH (660 bytes)
Section 3: MATCH (192 bytes)
Section 4: MATCH (224 bytes)
Section 5: MATCH (256 bytes)
Section 6: MATCH (256 bytes)
Section 7: MATCH (512 bytes)
Section 8: MATCH (64 bytes)
Section 9: MATCH (256 bytes)
Section 10: DIFF (68 bytes) -- csHash only, does not affect proving
=== snarkjs proof verification ===
Proof valid: true
Public signals: ['33', '3'] (c=33, a=3)
Tested on Apple M3 Max (16 cores, 48 GB RAM):
============================================
zk-setup Benchmark: UnifiedWebAuthn P-256
============================================
System: arm64 Apple M3 Max
Cores: 16
RAM: 48 GB
Circuit: UnifiedWebAuthnProof
- P-256 ECDSA signature verification
- EdDSA Baby Jubjub (KYC + Accreditation)
- Sparse Merkle Tree (sanctions check)
- 3 Poseidon nullifiers
R1CS: 464 MB
PTAU: 2.3 GB (power 21, Hermez ceremony)
=== zk-setup output ===
[zk-setup] Using 16 threads
[zk-setup] Loaded in 0.5s -- 1,989,865 constraints, 1,975,377 vars, 9 public, domain=2,097,152
[zk-setup] Loaded ptau (power=21) in 0.3s
[zk-setup] Constraints processed in 0.8s
[zk-setup] Parallel accumulation: 16 threads, 124,367 constraints/chunk
[zk-setup] Sections computed in 171.3s
[zk-setup] H query computed in 0.1s
[zk-setup] Written 1,261.1 MB in 0.6s
[zk-setup] Total: 174.8s (~2 min 55s)
Wall clock: 2:56 | CPU time: 1312s | CPU utilization: 751%
Output: 1.2 GB zkey
=== snarkjs equivalent ===
Killed after 14 minutes (still reading the ptau file)
Estimated: 60-90+ minutes for the full setup
=== VK export ===
Protocol: groth16
Curve: bn128
nPublic: 9
IC points: 10
=== On-chain verification ===
Passkey purchase tx: SUCCESS
ComplianceVerified event emitted
10 property tokens minted
MetaTxExecuted(success: true)
The zkey generated by zk-setup was used in a real passkey purchase transaction on an Anvil testnet. Here is the full proof trail from zkey generation to on-chain verification:
1. zk-setup generated UnifiedWebAuthnProof_final.zkey (1.2 GB, 174.8s)
2. Solidity verifier exported from the zkey via snarkjs
3. Contracts deployed to Anvil (UnifiedWebAuthnVerifier + ZKComplianceRegistry + RealEstateTokenV2)
4. User created WebAuthn passkey (P-256 key pair on device)
5. User initiated purchase → server generated unified ZK proof via rapidsnark (~108s)
6. Proof submitted on-chain via PasskeyRelayForwarder meta-transaction
7. On-chain Groth16 verification passed (ecmul + ecadd + ecpairing precompiles)
Transaction: 0xf0c3905c15e907a73a1e94caf5b52b919cb29cd9554f17f915debe371fc7209d
Status: 1 (success)
Gas used: 648,920
Block: 202
Events emitted (9 total):
NullifierUsed × 3 (KYC, Accreditation, Sanctions nullifiers consumed)
ComplianceVerified × 2 (ZKComplianceRegistry + RealEstateToken)
Transfer × 2 (10,000 TRY stablecoin payment + 10 KAPT token mint)
TokensPurchased × 1 (investor, amount=10, totalPaidTRY=10,000)
MetaTxExecuted × 1 (from=passkey_address, to=token_contract, success=true)
The proof verified the following claims in a single Groth16 proof (~527K constraints unified from 3 separate circuits):
- P-256 ECDSA signature is valid (WebAuthn passkey authentication)
- KYC credential is valid and not expired (EdDSA Baby Jubjub signature)
- Accreditation credential meets threshold (EdDSA signature)
- Address is not on sanctions list (Sparse Merkle Tree non-membership)
- All 3 nullifiers are fresh (prevents replay across compliance periods)
VK verification:
Protocol: groth16
Curve: bn128
nPublic: 9
IC points: 10
Alpha G1: on BN254 curve ✓
You can verify that the output matches snarkjs:
# Generate with snarkjs
snarkjs groth16 setup circuit.r1cs pot_final.ptau circuit_snarkjs_0000.zkey
# Generate with zk-setup
zk-setup --r1cs circuit.r1cs --ptau pot_final.ptau --out circuit_zksetup_0000.zkey
# Compare section-by-section (sections 1-9 should be byte-identical)
python3 -c "
import struct
def read_sections(path):
sections = {}
with open(path, 'rb') as f:
f.read(12)
while True:
try:
sid = struct.unpack('<I', f.read(4))[0]
size = struct.unpack('<Q', f.read(8))[0]
if size > 1e12: break
sections[sid] = f.read(size)
except: break
return sections
ref_s = read_sections('circuit_snarkjs_0000.zkey')
rust_s = read_sections('circuit_zksetup_0000.zkey')
for sid in sorted(set(list(ref_s.keys()) + list(rust_s.keys()))):
r, m = ref_s.get(sid, b''), rust_s.get(sid, b'')
print(f'Section {sid:2d}: {\"MATCH\" if r == m else \"DIFF\"} ({len(r)} bytes)')
"- snarkjs by iden3 -- the reference implementation whose binary format we match byte-for-byte
- arkworks ecosystem -- the Rust elliptic curve and FFT libraries powering the multi-threaded computation
- ark-circom -- R1CS parsing and the Montgomery encoding insights
- Hermez -- for hosting the Powers of Tau ceremony files
- Claude (Anthropic) and Opus 4 -- AI pair programming that helped discover the critical R-squared Montgomery encoding, diagnose the ark-groth16/snarkjs incompatibility, and iterate through multiple architectural approaches to arrive at a correct, byte-identical implementation
MIT