ZK Benchmarks for Cartesi Machine

Benchmarking framework for measuring ZK proving performance of Cartesi machine state transitions. Supports multiple ZK implementations (RISC-0, Zisk) to enable cross-prover comparison.

Overview

What We Measure

ZK provers like RISC-0 have two distinct phases:

Execution - Running the program to collect metrics
Proving - Generating cryptographic proofs (99% of total time)

Full proving is extremely slow (impossible on Mac, slow even on GPU). However, provers offer a dev mode that executes without generating real proofs. This is useful because the key metrics we need are available from execution alone:

Metric	Description
Cycles	RISC-V cycles spent on execution
Segments	Proof chunks (RISC-0 splits execution into parallelizable segments)
Page Count	Memory pages touched (correlates strongly with cycle count)

Time Estimation

With cycles measured, we can estimate real-world proving time using hardware throughput data:

Proving Time = Total Cycles / Hardware Throughput (cycles/second)

For example, RISC-0 publishes throughput benchmarks:

NVIDIA RTX 4090: ~85 kHz (85,000 cycles/second)
NVIDIA RTX 3090 Ti: ~60 kHz
CPU (x86): ~10 kHz

This approach lets us benchmark quickly in dev mode while providing actionable time estimates for real hardware.

Why Chunks Matter

Cartesi machine execution consists of:

Boot phase - Linux kernel boot (always similar)
Payload phase - The actual benchmark workload

We never prove the entire execution end-to-end. Instead, we prove chunks (windows) of execution:

Divergence happens at specific cycles - we only need to prove around that point
Different chunks have different characteristics (memory usage, page count)
We need to know the worst-case chunk to set reasonable limits

This is why benchmarks test various chunk sizes and use Monte Carlo sampling to understand the distribution.

Parallelization

RISC-0 segments are independently provable. With N GPUs, proving time divides by N. The segment size is configurable - smaller segments mean more parallelization but higher overhead.

Benchmark Modes

Sweep Mode

Tests proving across a range of chunk sizes:

mode: sweep
step_sizes:
  min: 50000      # Minimum chunk size (cycles)
  max: 500000     # Maximum chunk size
  increment: 20000 # Step between measurements

Produces graphs showing how cycles, segments, and estimated time scale with chunk size.

Monte Carlo Mode

Samples random windows across the entire program execution:

mode: monte-carlo
monte_carlo:
  num_samples: 100    # Number of random samples
  window_size: 100000 # Fixed window size to sample

Produces histograms showing the distribution of metrics across different parts of execution. Useful for understanding variance and worst-case scenarios.

Installation

Prerequisites

Python 3.10+
Rust toolchain (for building provers)
Machine emulator dependencies (see deps/machine/README.md)

Setup

# Clone with submodules
git clone --recursive https://github.com/cartesi/zk-benchmarks.git
cd zk-benchmarks

# Install Python dependencies
pip install -r requirements.txt

# Initialize submodules
git submodule update --init --recursive

# Build the machine emulator and prover
python benchmark.py --build-only

Prover-Specific Setup

RISC-0 (built automatically):

# Dev mode is enabled by default (no GPU required)
# Set in config.yaml: RISC0_DEV_MODE: "1"

Zisk (requires pre-built ELF):

# Install Zisk toolchain
curl -L https://raw.githubusercontent.com/0xPolygonHermez/zisk/main/ziskup/install.sh | bash
~/.zisk/bin/ziskup

# Build Zisk ELF in machine-emulator (requires LLVM 20)
cd deps/machine/zisk
make LLVM20_DIR=/path/to/llvm@20 all

Usage

List Available Benchmarks

python benchmark.py --list

Run a Single Benchmark

# Run with default settings
python benchmark.py stress-int64

# Override step sizes
python benchmark.py stress-int64 --min-step 10000 --max-step 100000 --increment 5000

# Run Monte Carlo mode
python benchmark.py mc-heapsort --monte-carlo --num-samples 50 --window-size 50000

Run with Specific Prover

# Run only RISC-0
python benchmark.py stress-int64 --prover risc0

# Run only Zisk
python benchmark.py stress-int64 --prover zisk

Run All Benchmarks

python benchmark.py

Configuration

Global Config (`config.yaml`)

# Kernel and rootfs images
linux_image_url: https://github.com/cartesi/machine-linux-image/releases/...
rootfs_image_url: https://github.com/cartesi/machine-rootfs-image/releases/...

# Hardware profiles for time estimation
hardware_profiles:
  risc0_rtx_4090:
    name: "RISC0 NVIDIA RTX 4090"
    prover: risc0
    throughput_khz: 85
  risc0_cpu:
    name: "RISC0 CPU (x86)"
    prover: risc0
    throughput_khz: 10

# Prover-specific settings
provers:
  risc0:
    env:
      RISC0_DEV_MODE: "1"
  zisk:
    zisk_home: null  # Uses ~/.zisk by default

Benchmark Config (`benchmarks/*.yaml`)

name: stress-int64
description: "stress-ng int64 benchmark"
command: "stress-ng --cpu 1 --cpu-method int64 --cpu-ops 400 --metrics"
mode: sweep

# Which provers to run
provers:
  - risc0
  - zisk

# Hardware profiles for time estimation (per prover)
hardware_profiles:
  risc0:
    - risc0_rtx_4090
    - risc0_cpu

# Sweep mode settings
step_sizes:
  min: 50000
  max: 500000
  increment: 20000

Output

Results are saved to results/<benchmark>_<prover>_<timestamp>/:

Sweep Mode Output

results/stress-int64_risc0_20260112_174343/
├── log.bin          # Step log from Cartesi machine
├── receipt.bin      # Proof receipt (dev mode)
├── results.json     # Raw metrics for each step size
└── plots.png        # Visualization graphs

results.json contains per-step metrics:

[
  {
    "step": 50000,
    "total_cycles": 192937984,
    "number_of_segments": 184,
    "user_cycles": 174285398,
    "page_count": 122,
    "execution_time": "4.66s"
  }
]

Monte Carlo Output

results/mc-heapsort_monte_carlo_20260112_162122/
├── log.bin
├── receipt.bin
├── mc-heapsort_100000.jsonl    # One JSON object per sample
└── mc-heapsort_100000_histograms.png

Understanding the Graphs

Sweep Mode Plots

Four graphs are generated:

Execution Time - Time to run in dev mode (not meaningful for production estimates)
Number of Segments - Segments increase with chunk size
Total Cycles - RISC-V cycles required to prove
Page Count - Memory pages touched (correlates with cycles)

Additionally, estimated proving time graphs show real-world projections for each hardware profile.

Monte Carlo Histograms

Histograms show the distribution of metrics across random samples:

Page Count Distribution - How memory usage varies across execution
Total Cycles Distribution - Cycle count variance
Estimated Time Distribution - Real-world time estimates

Statistics include: min, max, mean, median, P90, P95, P99.

Workload Types

Benchmarks use stress-ng to simulate different workloads:

Benchmark	Workload Type	Description
`stress-int64`	CPU-bound	64-bit integer operations
`stress-fp`	CPU-bound	Floating point operations
`stress-loop`	CPU-bound	Tight loop iterations
`mc-heapsort`	Memory-bound	Heap sort with allocations
`mc-loop`	CPU-bound	Loop with Monte Carlo sampling
`mc-tree`	Memory-bound	Tree operations

Project Structure

zk-benchmarks/
├── benchmark.py          # Main benchmark runner
├── config.yaml           # Global configuration
├── requirements.txt      # Python dependencies
├── provers/              # Prover adapter layer
│   ├── base.py           # Abstract ProverAdapter class
│   ├── risc0.py          # RISC-0 adapter
│   └── zisk.py           # Zisk adapter
├── benchmarks/           # Benchmark definitions
│   ├── stress-int64.yaml
│   ├── mc-heapsort.yaml
│   └── ...
├── deps/
│   └── machine/          # Machine emulator submodule
│       ├── risc0/        # RISC-0 prover integration
│       └── zisk/         # Zisk prover integration
├── images/               # Downloaded kernel/rootfs
└── results/              # Benchmark output

Adding New Benchmarks

Create a YAML file in benchmarks/:

name: my-benchmark
description: "Description of what this tests"
command: "command-to-run-inside-cartesi-machine"
mode: sweep  # or monte-carlo

provers:
  - risc0
  - zisk

hardware_profiles:
  risc0:
    - risc0_rtx_4090
    - risc0_cpu

step_sizes:  # for sweep mode
  min: 50000
  max: 500000
  increment: 20000

Run: python benchmark.py my-benchmark

Adding New Provers

Create adapter in provers/:

from .base import ProverAdapter, ProverResult

class NewProverAdapter(ProverAdapter):
    name = "new-prover"

    def is_built(self) -> bool:
        # Check if prover binaries exist
        pass

    def build(self) -> bool:
        # Build the prover
        pass

    def prove(self, start_hash, end_hash, log_path, step_size, output_dir) -> ProverResult:
        # Run the prover and return metrics
        pass

Register in provers/__init__.py
Add configuration to config.yaml
Add hardware profiles if available

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
benchmarks		benchmarks
deps		deps
provers		provers
results		results
.env		.env
.gitignore		.gitignore
.gitmodules		.gitmodules
README.md		README.md
benchmark.py		benchmark.py
config.yaml		config.yaml
requirements.txt		requirements.txt

cartesi/zk-benchmarks

Folders and files

Latest commit

History

Repository files navigation