Skip to content

cartesi/zk-benchmarks

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

ZK Benchmarks for Cartesi Machine

Benchmarking framework for measuring ZK proving performance of Cartesi machine state transitions. Supports multiple ZK implementations (RISC-0, Zisk) to enable cross-prover comparison.

Overview

What We Measure

ZK provers like RISC-0 have two distinct phases:

  1. Execution - Running the program to collect metrics
  2. Proving - Generating cryptographic proofs (99% of total time)

Full proving is extremely slow (impossible on Mac, slow even on GPU). However, provers offer a dev mode that executes without generating real proofs. This is useful because the key metrics we need are available from execution alone:

Metric Description
Cycles RISC-V cycles spent on execution
Segments Proof chunks (RISC-0 splits execution into parallelizable segments)
Page Count Memory pages touched (correlates strongly with cycle count)

Time Estimation

With cycles measured, we can estimate real-world proving time using hardware throughput data:

Proving Time = Total Cycles / Hardware Throughput (cycles/second)

For example, RISC-0 publishes throughput benchmarks:

  • NVIDIA RTX 4090: ~85 kHz (85,000 cycles/second)
  • NVIDIA RTX 3090 Ti: ~60 kHz
  • CPU (x86): ~10 kHz

This approach lets us benchmark quickly in dev mode while providing actionable time estimates for real hardware.

Why Chunks Matter

Cartesi machine execution consists of:

  1. Boot phase - Linux kernel boot (always similar)
  2. Payload phase - The actual benchmark workload

We never prove the entire execution end-to-end. Instead, we prove chunks (windows) of execution:

  • Divergence happens at specific cycles - we only need to prove around that point
  • Different chunks have different characteristics (memory usage, page count)
  • We need to know the worst-case chunk to set reasonable limits

This is why benchmarks test various chunk sizes and use Monte Carlo sampling to understand the distribution.

Parallelization

RISC-0 segments are independently provable. With N GPUs, proving time divides by N. The segment size is configurable - smaller segments mean more parallelization but higher overhead.

Benchmark Modes

Sweep Mode

Tests proving across a range of chunk sizes:

mode: sweep
step_sizes:
  min: 50000      # Minimum chunk size (cycles)
  max: 500000     # Maximum chunk size
  increment: 20000 # Step between measurements

Produces graphs showing how cycles, segments, and estimated time scale with chunk size.

Monte Carlo Mode

Samples random windows across the entire program execution:

mode: monte-carlo
monte_carlo:
  num_samples: 100    # Number of random samples
  window_size: 100000 # Fixed window size to sample

Produces histograms showing the distribution of metrics across different parts of execution. Useful for understanding variance and worst-case scenarios.

Installation

Prerequisites

  • Python 3.10+
  • Rust toolchain (for building provers)
  • Machine emulator dependencies (see deps/machine/README.md)

Setup

# Clone with submodules
git clone --recursive https://github.com/cartesi/zk-benchmarks.git
cd zk-benchmarks

# Install Python dependencies
pip install -r requirements.txt

# Initialize submodules
git submodule update --init --recursive

# Build the machine emulator and prover
python benchmark.py --build-only

Prover-Specific Setup

RISC-0 (built automatically):

# Dev mode is enabled by default (no GPU required)
# Set in config.yaml: RISC0_DEV_MODE: "1"

Zisk (requires pre-built ELF):

# Install Zisk toolchain
curl -L https://raw.githubusercontent.com/0xPolygonHermez/zisk/main/ziskup/install.sh | bash
~/.zisk/bin/ziskup

# Build Zisk ELF in machine-emulator (requires LLVM 20)
cd deps/machine/zisk
make LLVM20_DIR=/path/to/llvm@20 all

Usage

List Available Benchmarks

python benchmark.py --list

Run a Single Benchmark

# Run with default settings
python benchmark.py stress-int64

# Override step sizes
python benchmark.py stress-int64 --min-step 10000 --max-step 100000 --increment 5000

# Run Monte Carlo mode
python benchmark.py mc-heapsort --monte-carlo --num-samples 50 --window-size 50000

Run with Specific Prover

# Run only RISC-0
python benchmark.py stress-int64 --prover risc0

# Run only Zisk
python benchmark.py stress-int64 --prover zisk

Run All Benchmarks

python benchmark.py

Configuration

Global Config (config.yaml)

# Kernel and rootfs images
linux_image_url: https://github.com/cartesi/machine-linux-image/releases/...
rootfs_image_url: https://github.com/cartesi/machine-rootfs-image/releases/...

# Hardware profiles for time estimation
hardware_profiles:
  risc0_rtx_4090:
    name: "RISC0 NVIDIA RTX 4090"
    prover: risc0
    throughput_khz: 85
  risc0_cpu:
    name: "RISC0 CPU (x86)"
    prover: risc0
    throughput_khz: 10

# Prover-specific settings
provers:
  risc0:
    env:
      RISC0_DEV_MODE: "1"
  zisk:
    zisk_home: null  # Uses ~/.zisk by default

Benchmark Config (benchmarks/*.yaml)

name: stress-int64
description: "stress-ng int64 benchmark"
command: "stress-ng --cpu 1 --cpu-method int64 --cpu-ops 400 --metrics"
mode: sweep

# Which provers to run
provers:
  - risc0
  - zisk

# Hardware profiles for time estimation (per prover)
hardware_profiles:
  risc0:
    - risc0_rtx_4090
    - risc0_cpu

# Sweep mode settings
step_sizes:
  min: 50000
  max: 500000
  increment: 20000

Output

Results are saved to results/<benchmark>_<prover>_<timestamp>/:

Sweep Mode Output

results/stress-int64_risc0_20260112_174343/
├── log.bin          # Step log from Cartesi machine
├── receipt.bin      # Proof receipt (dev mode)
├── results.json     # Raw metrics for each step size
└── plots.png        # Visualization graphs

results.json contains per-step metrics:

[
  {
    "step": 50000,
    "total_cycles": 192937984,
    "number_of_segments": 184,
    "user_cycles": 174285398,
    "page_count": 122,
    "execution_time": "4.66s"
  }
]

Monte Carlo Output

results/mc-heapsort_monte_carlo_20260112_162122/
├── log.bin
├── receipt.bin
├── mc-heapsort_100000.jsonl    # One JSON object per sample
└── mc-heapsort_100000_histograms.png

Understanding the Graphs

Sweep Mode Plots

Four graphs are generated:

  1. Execution Time - Time to run in dev mode (not meaningful for production estimates)
  2. Number of Segments - Segments increase with chunk size
  3. Total Cycles - RISC-V cycles required to prove
  4. Page Count - Memory pages touched (correlates with cycles)

Additionally, estimated proving time graphs show real-world projections for each hardware profile.

Monte Carlo Histograms

Histograms show the distribution of metrics across random samples:

  • Page Count Distribution - How memory usage varies across execution
  • Total Cycles Distribution - Cycle count variance
  • Estimated Time Distribution - Real-world time estimates

Statistics include: min, max, mean, median, P90, P95, P99.

Workload Types

Benchmarks use stress-ng to simulate different workloads:

Benchmark Workload Type Description
stress-int64 CPU-bound 64-bit integer operations
stress-fp CPU-bound Floating point operations
stress-loop CPU-bound Tight loop iterations
mc-heapsort Memory-bound Heap sort with allocations
mc-loop CPU-bound Loop with Monte Carlo sampling
mc-tree Memory-bound Tree operations

Project Structure

zk-benchmarks/
├── benchmark.py          # Main benchmark runner
├── config.yaml           # Global configuration
├── requirements.txt      # Python dependencies
├── provers/              # Prover adapter layer
│   ├── base.py           # Abstract ProverAdapter class
│   ├── risc0.py          # RISC-0 adapter
│   └── zisk.py           # Zisk adapter
├── benchmarks/           # Benchmark definitions
│   ├── stress-int64.yaml
│   ├── mc-heapsort.yaml
│   └── ...
├── deps/
│   └── machine/          # Machine emulator submodule
│       ├── risc0/        # RISC-0 prover integration
│       └── zisk/         # Zisk prover integration
├── images/               # Downloaded kernel/rootfs
└── results/              # Benchmark output

Adding New Benchmarks

  1. Create a YAML file in benchmarks/:
name: my-benchmark
description: "Description of what this tests"
command: "command-to-run-inside-cartesi-machine"
mode: sweep  # or monte-carlo

provers:
  - risc0
  - zisk

hardware_profiles:
  risc0:
    - risc0_rtx_4090
    - risc0_cpu

step_sizes:  # for sweep mode
  min: 50000
  max: 500000
  increment: 20000
  1. Run: python benchmark.py my-benchmark

Adding New Provers

  1. Create adapter in provers/:
from .base import ProverAdapter, ProverResult

class NewProverAdapter(ProverAdapter):
    name = "new-prover"

    def is_built(self) -> bool:
        # Check if prover binaries exist
        pass

    def build(self) -> bool:
        # Build the prover
        pass

    def prove(self, start_hash, end_hash, log_path, step_size, output_dir) -> ProverResult:
        # Run the prover and return metrics
        pass
  1. Register in provers/__init__.py
  2. Add configuration to config.yaml
  3. Add hardware profiles if available

References

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages