Benchmarking framework for measuring ZK proving performance of Cartesi machine state transitions. Supports multiple ZK implementations (RISC-0, Zisk) to enable cross-prover comparison.
ZK provers like RISC-0 have two distinct phases:
- Execution - Running the program to collect metrics
- Proving - Generating cryptographic proofs (99% of total time)
Full proving is extremely slow (impossible on Mac, slow even on GPU). However, provers offer a dev mode that executes without generating real proofs. This is useful because the key metrics we need are available from execution alone:
| Metric | Description |
|---|---|
| Cycles | RISC-V cycles spent on execution |
| Segments | Proof chunks (RISC-0 splits execution into parallelizable segments) |
| Page Count | Memory pages touched (correlates strongly with cycle count) |
With cycles measured, we can estimate real-world proving time using hardware throughput data:
Proving Time = Total Cycles / Hardware Throughput (cycles/second)
For example, RISC-0 publishes throughput benchmarks:
- NVIDIA RTX 4090: ~85 kHz (85,000 cycles/second)
- NVIDIA RTX 3090 Ti: ~60 kHz
- CPU (x86): ~10 kHz
This approach lets us benchmark quickly in dev mode while providing actionable time estimates for real hardware.
Cartesi machine execution consists of:
- Boot phase - Linux kernel boot (always similar)
- Payload phase - The actual benchmark workload
We never prove the entire execution end-to-end. Instead, we prove chunks (windows) of execution:
- Divergence happens at specific cycles - we only need to prove around that point
- Different chunks have different characteristics (memory usage, page count)
- We need to know the worst-case chunk to set reasonable limits
This is why benchmarks test various chunk sizes and use Monte Carlo sampling to understand the distribution.
RISC-0 segments are independently provable. With N GPUs, proving time divides by N. The segment size is configurable - smaller segments mean more parallelization but higher overhead.
Tests proving across a range of chunk sizes:
mode: sweep
step_sizes:
min: 50000 # Minimum chunk size (cycles)
max: 500000 # Maximum chunk size
increment: 20000 # Step between measurementsProduces graphs showing how cycles, segments, and estimated time scale with chunk size.
Samples random windows across the entire program execution:
mode: monte-carlo
monte_carlo:
num_samples: 100 # Number of random samples
window_size: 100000 # Fixed window size to sampleProduces histograms showing the distribution of metrics across different parts of execution. Useful for understanding variance and worst-case scenarios.
- Python 3.10+
- Rust toolchain (for building provers)
- Machine emulator dependencies (see
deps/machine/README.md)
# Clone with submodules
git clone --recursive https://github.com/cartesi/zk-benchmarks.git
cd zk-benchmarks
# Install Python dependencies
pip install -r requirements.txt
# Initialize submodules
git submodule update --init --recursive
# Build the machine emulator and prover
python benchmark.py --build-onlyRISC-0 (built automatically):
# Dev mode is enabled by default (no GPU required)
# Set in config.yaml: RISC0_DEV_MODE: "1"Zisk (requires pre-built ELF):
# Install Zisk toolchain
curl -L https://raw.githubusercontent.com/0xPolygonHermez/zisk/main/ziskup/install.sh | bash
~/.zisk/bin/ziskup
# Build Zisk ELF in machine-emulator (requires LLVM 20)
cd deps/machine/zisk
make LLVM20_DIR=/path/to/llvm@20 allpython benchmark.py --list# Run with default settings
python benchmark.py stress-int64
# Override step sizes
python benchmark.py stress-int64 --min-step 10000 --max-step 100000 --increment 5000
# Run Monte Carlo mode
python benchmark.py mc-heapsort --monte-carlo --num-samples 50 --window-size 50000# Run only RISC-0
python benchmark.py stress-int64 --prover risc0
# Run only Zisk
python benchmark.py stress-int64 --prover ziskpython benchmark.py# Kernel and rootfs images
linux_image_url: https://github.com/cartesi/machine-linux-image/releases/...
rootfs_image_url: https://github.com/cartesi/machine-rootfs-image/releases/...
# Hardware profiles for time estimation
hardware_profiles:
risc0_rtx_4090:
name: "RISC0 NVIDIA RTX 4090"
prover: risc0
throughput_khz: 85
risc0_cpu:
name: "RISC0 CPU (x86)"
prover: risc0
throughput_khz: 10
# Prover-specific settings
provers:
risc0:
env:
RISC0_DEV_MODE: "1"
zisk:
zisk_home: null # Uses ~/.zisk by defaultname: stress-int64
description: "stress-ng int64 benchmark"
command: "stress-ng --cpu 1 --cpu-method int64 --cpu-ops 400 --metrics"
mode: sweep
# Which provers to run
provers:
- risc0
- zisk
# Hardware profiles for time estimation (per prover)
hardware_profiles:
risc0:
- risc0_rtx_4090
- risc0_cpu
# Sweep mode settings
step_sizes:
min: 50000
max: 500000
increment: 20000Results are saved to results/<benchmark>_<prover>_<timestamp>/:
results/stress-int64_risc0_20260112_174343/
├── log.bin # Step log from Cartesi machine
├── receipt.bin # Proof receipt (dev mode)
├── results.json # Raw metrics for each step size
└── plots.png # Visualization graphs
results.json contains per-step metrics:
[
{
"step": 50000,
"total_cycles": 192937984,
"number_of_segments": 184,
"user_cycles": 174285398,
"page_count": 122,
"execution_time": "4.66s"
}
]results/mc-heapsort_monte_carlo_20260112_162122/
├── log.bin
├── receipt.bin
├── mc-heapsort_100000.jsonl # One JSON object per sample
└── mc-heapsort_100000_histograms.png
Four graphs are generated:
- Execution Time - Time to run in dev mode (not meaningful for production estimates)
- Number of Segments - Segments increase with chunk size
- Total Cycles - RISC-V cycles required to prove
- Page Count - Memory pages touched (correlates with cycles)
Additionally, estimated proving time graphs show real-world projections for each hardware profile.
Histograms show the distribution of metrics across random samples:
- Page Count Distribution - How memory usage varies across execution
- Total Cycles Distribution - Cycle count variance
- Estimated Time Distribution - Real-world time estimates
Statistics include: min, max, mean, median, P90, P95, P99.
Benchmarks use stress-ng to simulate different workloads:
| Benchmark | Workload Type | Description |
|---|---|---|
stress-int64 |
CPU-bound | 64-bit integer operations |
stress-fp |
CPU-bound | Floating point operations |
stress-loop |
CPU-bound | Tight loop iterations |
mc-heapsort |
Memory-bound | Heap sort with allocations |
mc-loop |
CPU-bound | Loop with Monte Carlo sampling |
mc-tree |
Memory-bound | Tree operations |
zk-benchmarks/
├── benchmark.py # Main benchmark runner
├── config.yaml # Global configuration
├── requirements.txt # Python dependencies
├── provers/ # Prover adapter layer
│ ├── base.py # Abstract ProverAdapter class
│ ├── risc0.py # RISC-0 adapter
│ └── zisk.py # Zisk adapter
├── benchmarks/ # Benchmark definitions
│ ├── stress-int64.yaml
│ ├── mc-heapsort.yaml
│ └── ...
├── deps/
│ └── machine/ # Machine emulator submodule
│ ├── risc0/ # RISC-0 prover integration
│ └── zisk/ # Zisk prover integration
├── images/ # Downloaded kernel/rootfs
└── results/ # Benchmark output
- Create a YAML file in
benchmarks/:
name: my-benchmark
description: "Description of what this tests"
command: "command-to-run-inside-cartesi-machine"
mode: sweep # or monte-carlo
provers:
- risc0
- zisk
hardware_profiles:
risc0:
- risc0_rtx_4090
- risc0_cpu
step_sizes: # for sweep mode
min: 50000
max: 500000
increment: 20000- Run:
python benchmark.py my-benchmark
- Create adapter in
provers/:
from .base import ProverAdapter, ProverResult
class NewProverAdapter(ProverAdapter):
name = "new-prover"
def is_built(self) -> bool:
# Check if prover binaries exist
pass
def build(self) -> bool:
# Build the prover
pass
def prove(self, start_hash, end_hash, log_path, step_size, output_dir) -> ProverResult:
# Run the prover and return metrics
pass- Register in
provers/__init__.py - Add configuration to
config.yaml - Add hardware profiles if available