Skip to content

Matheus-Sanchez/quantum-bench

Repository files navigation

Quantum Bench

Python Platform GPU Status License

Portable benchmark harness for quantum circuit simulation across CPU and NVIDIA GPU environments, designed for reproducible local validation on Windows/WSL2 and later migration to stronger Ubuntu workstations.

Short description

Quantum Bench standardizes how quantum circuit simulators are executed, measured, and compared across development and target machines, with emphasis on Qiskit Aer, Qulacs, PennyLane Lightning, and the practical behavior of NVIDIA-backed simulation paths.

The project is organized around two ideas:

  • keep the benchmark pipeline portable across Windows, WSL2, and Ubuntu;
  • separate quick validation runs from heavier hardware-focused campaigns.
  • for frontier-style campaigns on WSL2, prefer a Windows-hosted orchestrator that launches one case at a time inside WSL so a single hard crash does not destroy the whole run.

What it does

  • Runs benchmark profiles from JSON config files
  • Captures host, package, CUDA, and GPU metadata
  • Measures wall time, CPU time, peak RSS, and GPU memory
  • Compares results against a small exact reference simulator for correctness checks
  • Writes CSV, JSON, and Markdown artifacts for later analysis
  • Generates plots from result directories

Current Benchmark Report

The repository now includes a GitHub-safe benchmark report for the current machine, based on the validated WSL2 + Qiskit Aer GPU path and the clean-start frontier methodology.

Headline frontier from the current machine:

Slice Stable max measured Highest tested Reading
CPU / double / ghz 28 29 29q became unstable
GPU / double / ghz 28 29 dedicated 29q probe failed
GPU / single / ghz 28 29 29q became unstable
CPU / double / random 20 29 stability dropped sharply after 20q
GPU / single / trotter 20 29 stability dropped sharply after 20q

Working interpretation:

  • WSL2 is the canonical NVIDIA path for this machine.
  • The clean-start per-slice runs are the canonical frontier measurements.
  • The sustained batch campaign is still useful, but it acts more like an endurance test than a clean ceiling measurement.

Benchmark Figure Guide

These figures summarize the benchmark campaign that was run on this machine. This is not neural-network training; in this repository, "training run" refers to the long benchmark execution campaign that repeatedly simulated quantum circuits until stable limits, timing behavior, memory use, and failure boundaries were observed.

Unless a caption says otherwise, time values are measured in seconds. Memory plots use either MiB/MB or GiB as stated in the axis label. Qubit counts are logical simulated qubits, not physical hardware qubits.

Machine Frontier Overview

Current machine frontier overview

This chart shows the stable simulation frontier by benchmark slice. The X axis lists each slice as device / precision / circuit family, such as CPU/double ghz or GPU/single trotter. The Y axis is stable max qubits, meaning the largest qubit count that completed reliably for that slice. Higher bars mean the local machine could simulate larger statevectors for that combination.

Current machine hardware overview

This figure combines the public-safe machine specification with resource envelopes. The memory panel uses GiB on the X axis and compares system RAM, safe RAM budget, GPU VRAM, and safe VRAM budget. The capability panel uses benchmark target type on the X axis and recommended maximum qubits on the Y axis. It explains why the later benchmark frontiers cluster around the high-20-qubit range on this workstation.

cuQuantum Campaign Overview

cuQuantum rows by profile

This stacked bar chart shows how many non-warmup benchmark rows each cuQuantum campaign produced. The X axis is the campaign profile, such as speed-sweep, exact-frontier, observable-frontier, ideal-depth, and noisy-depth. The Y axis is row count. Green segments are successful rows; red segments are failed, pruned, or hardware-boundary rows. In frontier-style tests, the red portion is not simply noise: it marks where the machine or WSL/GPU stack stopped completing the requested workload.

cuQuantum success rate by campaign

This chart converts the same campaigns into non-warmup success percentage. The X axis is the campaign profile. The Y axis is success rate in percent, from 0 to 100. The speed-sweep campaign reached 100%, while frontier and depth campaigns intentionally pushed into failure zones, so low percentages there mean the test successfully mapped boundaries rather than only collecting easy successes.

cuQuantum speed sweep 28q simulation time

This chart compares the median pure simulation time at 28q. The X axis is the circuit family: ansatz, ghz, random, and trotter. The Y axis is median simulate_s, measured in seconds. Each grouped bar is a simulator variant: CPU statevector, GPU thrust, or GPU cuStateVec. This isolates simulator execution time from surrounding runner overhead and shows that GPU acceleration is workload-dependent rather than automatically faster for every circuit family.

cuQuantum noisy stable frontier

This chart shows the stable frontier for noisy sampled simulations. The X axis combines simulator variant and circuit family, such as cpu_statevector / ghz or gpu_thrust / trotter. The Y axis is stable max qubits. The main reading is that noisy sampled circuits are harder than ideal statevector speed sweeps: GHZ remained stable through 20q, while ansatz and trotter stabilized lower, around 16q, for the CPU and GPU thrust paths.

cuQuantum noisy TVD by depth

This line chart tracks noise-induced distribution drift for 12q noisy runs. The X axis is circuit depth, meaning the number of repeated circuit layers configured by the profile. The Y axis is median TVD, or total variation distance, which is unitless. 0 means the noisy sampled distribution matched the ideal/reference distribution; larger values mean the measured probability distribution moved farther away. Each line is a simulator variant plus circuit family pair.

Clean GHZ Frontier Scaling

CPU double GHZ time vs qubits

This plot shows CPU double-precision GHZ timing as qubit count increases. The X axis is qubits. The Y axis is wall-clock time in seconds. Because statevector simulation scales exponentially with qubits, the curve is expected to rise sharply near the practical frontier.

GPU double GHZ time vs qubits

This plot shows GPU double-precision GHZ timing. The X axis is qubits and the Y axis is wall-clock time in seconds. It should be read together with the memory plots below: a point can be fast enough at one qubit count but become unstable once memory pressure reaches the WSL/GPU envelope.

GPU double GHZ RAM vs qubits

This chart tracks host-side RAM pressure for the GPU double GHZ run. The X axis is qubits. The Y axis is peak RSS in MB, meaning the maximum resident memory used by the process on the host side. This is not VRAM; it is system memory used by Python, Qiskit Aer, orchestration, and supporting allocations.

GPU double GHZ VRAM vs qubits

This chart tracks GPU memory pressure for the same GPU double GHZ slice. The X axis is qubits. The Y axis is peak GPU memory in MB. This is the most direct view of how much VRAM was consumed while simulating larger statevectors on the RTX A2000 12GB.

GPU single GHZ time vs qubits

This plot shows GPU single-precision GHZ timing. The X axis is qubits and the Y axis is wall-clock time in seconds. Single precision uses less memory per amplitude than double precision, so it can sometimes improve the frontier, but the final stable limit still depends on backend behavior, WSL stability, and circuit structure.

Development And WSL Comparison Snapshots

Windows dev time vs qubits

This development snapshot shows wall-clock time scaling on the native Windows development path. The X axis is qubits and the Y axis is median wall time in seconds. It is useful as a smoke-test view of CPU-side execution and harness overhead, but it is not the canonical NVIDIA GPU path for this machine.

WSL Qiskit time vs qubits

This companion plot shows wall-clock time scaling on the WSL2 Qiskit path. The X axis is qubits and the Y axis is median wall time in seconds. It helps compare the Linux/WSL execution route against the native Windows development route.

WSL Qiskit GPU CPU speedup

This chart shows GPU/CPU speedup using median wall time. The X axis is qubits. The Y axis is a unitless speedup ratio computed as CPU median wall_s / GPU median wall_s. A value above 1.0 means the GPU path was faster; a value below 1.0 means the CPU path was faster for that matched circuit point. The horizontal 1.0 line is the break-even point.

Detailed hardware specification

Component Current machine
Host OS Windows 11 Pro 10.0.26200
Guest OS WSL2 Ubuntu 24.04 on Linux 6.6.87.2-microsoft-standard-WSL2
CPU Intel Core i7-14700K
Physical cores / logical threads 20 / 28
System RAM 27.4 GiB
Safe RAM budget used by the probe 20.0 GiB
GPU NVIDIA RTX A2000 12GB
VRAM 12.0 GiB
Safe VRAM budget used by the probe 9.1 GiB
Driver / CUDA 581.42 / 13.0
Python / Qiskit / Aer 3.12.3 / 1.4.5 / 0.15.1

For the full write-up, methodology notes, and the structured public-safe summaries, see:

Why WSL2 Instead Of Native Windows GPU

For the current machine and tested stack, WSL2 was the only path that produced a real NVIDIA-backed run with Qiskit Aer GPU.

In the measured setup:

  • native Windows CPU runs worked well for Qiskit Aer, Qulacs, and PennyLane;
  • native Windows GPU runs failed across the tested backends;
  • WSL2 with Ubuntu and qiskit==1.4.5 plus qiskit-aer-gpu successfully executed GPU simulations on the RTX A2000 12GB.

Practical conclusion:

  • use Windows native for fast development and CPU-side validation;
  • use WSL2 or Ubuntu native when the actual goal is evaluating NVIDIA GPU behavior.

How To Reproduce The Current Results

Windows dev run

py -3.13 -m venv .venv
.venv\Scripts\Activate.ps1
python -m pip install -U pip setuptools wheel
python -m pip install psutil matplotlib pynvml qiskit qiskit-aer qulacs pennylane pennylane-lightning
python -m quantum_bench env-report --output artifacts/env-report-win-venv.json
python -m quantum_bench capability-probe --output artifacts/capabilities-win-venv.json
python -m quantum_bench run --profile profiles/dev.json --capabilities artifacts/capabilities-win-venv.json
python -m quantum_bench plot --input-dir results/dev/<run-dir> --output-dir plots/dev-<run-dir>

WSL2 GPU run

python3 -m venv .venv-wsl
source .venv-wsl/bin/activate
python -m pip install -U pip setuptools wheel
python -m pip install psutil matplotlib pynvml
python -m pip install qiskit==1.4.5 qiskit-aer-gpu
python -m quantum_bench env-report --output artifacts/env-report-wsl-gpu.json
python -m quantum_bench capability-probe --output artifacts/capabilities-wsl-gpu.json
python -m quantum_bench run --profile profiles/dev-wsl-gpu.json --capabilities artifacts/capabilities-wsl-gpu.json
python -m quantum_bench report --input-dir results/dev-wsl-gpu/<run-dir>
python -m quantum_bench plot --input-dir results/dev-wsl-gpu/<run-dir> --output-dir plots/dev-wsl-gpu-<run-dir>

WSL2 frontier run

Use this when the goal is to push the current machine close to its practical statevector frontier on the validated NVIDIA path.

python3 -m venv .venv-wsl
source .venv-wsl/bin/activate
python -m pip install -U pip setuptools wheel
python -m pip install psutil matplotlib pynvml qiskit-aer-gpu
python -m quantum_bench env-report --output artifacts/env-report-wsl-frontier.json
python -m quantum_bench capability-probe --output artifacts/capabilities-wsl-frontier.json
python -m quantum_bench run --profile profiles/frontier-wsl-gpu.json --capabilities artifacts/capabilities-wsl-frontier.json
python -m quantum_bench report --input-dir results/frontier-wsl-gpu/<run-dir>
python -m quantum_bench plot --input-dir results/frontier-wsl-gpu/<run-dir> --output-dir plots/frontier-wsl-gpu-<run-dir>

WSL2 clean-slice frontier run

Use this when you want the most reliable frontier data on the current machine. It runs each (device, precision, family) slice independently, reuses a cached WSL environment report, and writes a campaign-level summary in addition to the per-run artifacts.

python -m quantum_bench env-report --output artifacts/env-report-wsl-frontier.json
python -m quantum_bench capability-probe --output artifacts/capabilities-wsl-frontier.json
.\scripts\run_wsl_frontier_precision.ps1 -ResultsRoot results/frontier-wsl-precision-campaign -PlotsRoot plots/frontier-wsl-precision-campaign -Resume

For a single clean confirmation slice, use -SelectedSlice, for example:

.\scripts\run_wsl_frontier_precision.ps1 -SelectedSlice gpu-single-ghz -ResultsRoot results/frontier-wsl-precision-gpu-single-ghz-clean -PlotsRoot plots/frontier-wsl-precision-gpu-single-ghz-clean -Resume

If the first qiskit-aer-gpu install resolves to an incompatible qiskit combination in your WSL environment, fall back to:

python -m pip install qiskit==1.4.5 qiskit-aer-gpu

Optional secondary extension for Qulacs after the main Qiskit Aer frontier run:

python -m pip install qulacs-gpu
python -m quantum_bench run --profile profiles/frontier-wsl-qulacs.json --capabilities artifacts/capabilities-wsl-frontier.json
python -m quantum_bench report --input-dir results/frontier-wsl-qulacs/<run-dir>
python -m quantum_bench plot --input-dir results/frontier-wsl-qulacs/<run-dir> --output-dir plots/frontier-wsl-qulacs-<run-dir>

Committed benchmark artifacts

What should stay public vs local

Safe to commit:

  • source code, profiles, scripts, and curated docs under docs/
  • public-safe JSON summaries under docs/data/
  • curated plots under docs/assets/

Keep local only:

  • raw timestamped runs under results/
  • raw generated plots under plots/
  • local machine snapshots under artifacts/
  • local environments and temporary directories such as .venv/, .venv-wsl/, and .campaign-temp/
  • any file that exposes usernames, full local paths, temporary directories, hostnames, or raw environment-variable dumps

The raw timestamped machine outputs that generated those summaries remain local under results/, plots/, and artifacts/. They are intentionally kept out of Git so the repository stays readable and commit-safe.

Current MVP scope

  • Libraries:
    • Qiskit Aer
    • Qulacs
    • PennyLane Lightning
  • Circuit families:
    • ghz
    • qft
    • random
    • ansatz
    • trotter
  • Commands:
    • run
    • run-wsl
    • plot
    • report
    • compare
    • env-report
    • capability-probe

Out of scope for this version:

  • qsim/Cirq
  • ProjectQ
  • real-provider calibrated noise campaigns
  • W, HHL, and SupermarQ

Project layout

quantum_bench/
  adapters/          backend-specific execution
  capability.py      RAM/VRAM estimation
  cli.py             command entrypoints
  config.py          profile expansion
  env_report.py      machine metadata
  plotting.py        plot generation
  probability.py     probability normalization and divergence metrics
  artifacts.py       probability sidecars, accumulation summaries, run plots
  reporting.py       JSON/Markdown analysis summaries
  recipes.py         canonical circuit recipes
  reference.py       small exact simulator for correctness checks
  runner.py          isolated case execution and CSV/JSON writing

docs/
  assets/            committed plots used in the public report
  data/              committed machine-summary JSON
  reports/           GitHub-safe benchmark reports

profiles/
  dev.json
  full.json
  dev-wsl-gpu.json
  frontier-wsl-gpu.json
  frontier-wsl-qulacs.json
  cuquantum-speed-sweep-wsl.json
  cuquantum-exact-frontier-wsl.json
  cuquantum-observable-frontier-wsl.json
  cuquantum-ideal-depth-sweep-wsl.json
  cuquantum-noisy-depth-sweep-wsl.json
  cuquantum-exact-frontier-appliance.json
  cuquantum-observable-frontier-appliance.json

scripts/
  run_wsl_frontier_precision.ps1

Profiles

The repository ships with the original development/frontier profiles plus canonical cuQuantum campaigns:

  • profiles/dev.json Short development profile for Windows CPU and partial GPU-path validation.
  • profiles/full.json Larger campaign driven by capability-probe, intended for the stronger target workstation.
  • profiles/dev-wsl-gpu.json WSL2/Linux-oriented development profile focused on real NVIDIA GPU runs with Qiskit Aer.
  • profiles/frontier-wsl-gpu.json WSL2/Linux frontier profile focused on the heaviest validated Qiskit Aer CPU/GPU statevector cases on the current class of machine.
  • profiles/frontier-wsl-qulacs.json Shorter WSL2/Linux extension profile for Qulacs CPU/GPU after the main frontier run.
  • profiles/cuquantum-speed-sweep-wsl.json WSL2/Linux comparison profile for cpu_statevector, gpu_thrust, and gpu_custatevec with real persistent_group execution.
  • profiles/cuquantum-exact-frontier-wsl.json WSL2/Linux exact-frontier profile for cpu_statevector, gpu_thrust, and gpu_custatevec in double and single.
  • profiles/cuquantum-observable-frontier-wsl.json WSL2/Linux observable-frontier profile comparing full statevector baselines against gpu_tensornetwork for marginal probabilities.
  • profiles/cuquantum-ideal-depth-sweep-wsl.json Ideal numeric depth sweep for ansatz, random, and trotter, with exact references up to 12 qubits and marginal outputs above that.
  • profiles/cuquantum-noisy-depth-sweep-wsl.json Synthetic probabilistic noise sweep using synthetic_canonical_v1, counts, and 4096 shots by default.
  • profiles/cuquantum-exact-frontier-appliance.json Docker/appliance exact-frontier profile for appliance_cusvaer; missing Docker or NVIDIA Container Toolkit produces a clear diagnostic row.
  • profiles/cuquantum-observable-frontier-appliance.json Docker/appliance observable-frontier profile for appliance_tensornetwork with the same diagnostic policy.

The frontier profiles also enable frontier_stop_on_failure, so once a qubit level fails for a given (library, backend, device, precision, family) slice, the remaining equal-or-higher cases in that slice are pruned instead of repeatedly hammering the same unstable boundary.

Installation

Windows development environment

Use this for CPU runs and basic pipeline validation.

py -3.13 -m venv .venv
.venv\Scripts\Activate.ps1
python -m pip install -U pip setuptools wheel
python -m pip install psutil matplotlib pynvml qiskit qiskit-aer qulacs pennylane pennylane-lightning

Run directly from the repository root:

.venv\Scripts\python.exe -m quantum_bench --help

WSL2 / Ubuntu GPU environment

Use this for the NVIDIA path. On the current machine, this is the only route that successfully executed real GPU simulations.

python3 -m venv .venv-wsl
source .venv-wsl/bin/activate
python -m pip install -U pip setuptools wheel
python -m pip install psutil matplotlib pynvml
python -m pip install qiskit==1.4.5 qiskit-aer-gpu

Notes:

  • Qiskit Aer GPU worked in WSL2 after pinning qiskit==1.4.5.
  • Native Windows GPU execution did not work for the tested setup.
  • qulacs-gpu was not installed in WSL2 during the current runs because it requires a local build toolchain.
  • For the frontier workflow, try pip install qiskit-aer-gpu first and keep the pinned qiskit==1.4.5 fallback if the resolved combination does not execute GPU cases correctly.

Commands

1. Environment report

python -m quantum_bench env-report --output artifacts/env-report.json

Generates hardware, OS, Python, package, and CUDA metadata.

2. Capability probe

python -m quantum_bench capability-probe --output artifacts/capabilities.json

Estimates safe RAM and VRAM envelopes for statevector experiments.

3. Run a profile

Windows development run:

.venv\Scripts\python.exe -m quantum_bench run --profile profiles/dev.json --capabilities artifacts/capabilities-win-venv.json

WSL2 GPU run:

.venv-wsl/bin/python -m quantum_bench run --profile profiles/dev-wsl-gpu.json --capabilities artifacts/capabilities-wsl-gpu.json

Larger workstation-oriented run:

python -m quantum_bench run --profile profiles/full.json --capabilities artifacts/capabilities.json

Frontier WSL2 GPU run:

python -m quantum_bench run --profile profiles/frontier-wsl-gpu.json --capabilities artifacts/capabilities-wsl-frontier.json

Recommended frontier run from Windows host with WSL execution isolation:

python -m quantum_bench run-wsl --profile profiles/frontier-wsl-gpu.json --capabilities artifacts/capabilities-wsl-frontier.json --repo-root . --wsl-python .venv-wsl/bin/python

This mode launches one benchmark case per wsl invocation, which is much more resilient near the hardware limit because a single WSL-side crash is recorded as a failed case instead of taking down the entire campaign.

To regenerate the committed public plots from the curated JSON files:

python scripts/generate_public_report_assets.py

Phase 1 cuQuantum speed sweep:

python -m quantum_bench run-wsl --profile profiles/cuquantum-speed-sweep-wsl.json --capabilities artifacts/capabilities-wsl-frontier.json --repo-root . --wsl-python .venv-wsl/bin/python

Phase 1 cuQuantum exact frontier:

python -m quantum_bench run-wsl --profile profiles/cuquantum-exact-frontier-wsl.json --capabilities artifacts/capabilities-wsl-frontier.json --repo-root . --wsl-python .venv-wsl/bin/python

Full cuQuantum metric campaigns:

python -m quantum_bench run-wsl --profile profiles/cuquantum-observable-frontier-wsl.json --capabilities artifacts/capabilities-wsl-frontier.json --repo-root . --wsl-python .venv-wsl/bin/python
python -m quantum_bench run-wsl --profile profiles/cuquantum-ideal-depth-sweep-wsl.json --capabilities artifacts/capabilities-wsl-frontier.json --repo-root . --wsl-python .venv-wsl/bin/python
python -m quantum_bench run-wsl --profile profiles/cuquantum-noisy-depth-sweep-wsl.json --capabilities artifacts/capabilities-wsl-frontier.json --repo-root . --wsl-python .venv-wsl/bin/python

Appliance profiles use executor=docker_wsl:

python -m quantum_bench run-wsl --profile profiles/cuquantum-exact-frontier-appliance.json --repo-root . --wsl-python .venv-wsl/bin/python
python -m quantum_bench run-wsl --profile profiles/cuquantum-observable-frontier-appliance.json --repo-root . --wsl-python .venv-wsl/bin/python

4. Generate analysis report

python -m quantum_bench report --input-dir results/frontier-wsl-gpu/...

Typical outputs:

  • analysis-summary.json
  • analysis-report.md

5. Compare multiple result directories

python -m quantum_bench compare --input-dir results/cuquantum-speed-sweep-wsl/<run-a> --input-dir results/cuquantum-speed-sweep-wsl/<run-b> --output-dir results/cuquantum-compare

Typical outputs:

  • comparison-summary.json
  • comparison-report.md

6. Generate plots

python -m quantum_bench plot --input-dir results/dev/... --output-dir plots/dev

Typical outputs:

  • time_vs_qubits.png
  • ram_vs_qubits.png
  • vram_vs_qubits.png
  • gpu_cpu_speedup.png
  • fidelity_vs_qubits.png
  • summary.json

Result files

Each run creates a timestamped directory containing:

  • env-report.json
  • capability-report.json
  • manifest.json
  • results.csv
  • results.json
  • distribution-snapshots.jsonl
  • probability-matrix.json
  • error-accumulation-summary.json
  • artifact-status.json
  • run-level plots such as tvd_vs_depth.png, jsd_vs_depth.png, time_breakdown_stacked.png, memory_vs_qubits.png, exact_frontier_by_variant.png, and observable_frontier_by_variant.png
  • analysis-summary.json after running report
  • analysis-report.md after running report
  • comparison-summary.json after running compare
  • comparison-report.md after running compare

Important CSV columns:

  • wall_s Total wall-clock time in seconds for the case.
  • cpu_s CPU time consumed by the process.
  • peak_rss_mb Peak resident memory in MB.
  • gpu_peak_mem_mb Peak GPU memory observed in MB.
  • state_fidelity_ref Fidelity against the small exact reference simulator when the qubit count is within the reference budget.
  • prob_l1_ref, tvd_ref, hellinger_ref, jsd_ref Probability-distribution divergence against the ideal reference when available.
  • tvd_noisy_vs_ideal, hellinger_noisy_vs_ideal, jsd_noisy_vs_ideal Synthetic-noise degradation metrics for probabilistic runs.
  • speedup_vs_cpu, speedup_vs_gpu_thrust Pointwise speedups matched by family, qubits, depth, precision, noise profile, and output mode.
  • backend_init_s, transpile_s, simulate_s, extract_s Timing breakdown used to identify the dominant bottleneck.

Current results on this machine

The canonical committed summary is docs/reports/current-machine-frontier.md. It replaces the earlier ad hoc README snapshots with a clean-start frontier report, a structured JSON summary, and curated plots that are safe to keep in Git.

Local raw artifacts still exist for the full investigation, but they remain intentionally gitignored because they are timestamped, bulky, and specific to this workstation.

Known caveats

  • The GPU study is not complete yet. It currently has strong evidence for Qiskit Aer GPU in WSL2, but not yet for Qulacs GPU or PennyLane lightning.gpu.
  • Some fidelity values are suspicious, especially around QFT. That likely indicates a harness-side issue such as qubit ordering or backend mapping, not necessarily a simulator failure.
  • The older development profiles are intentionally small and still useful for smoke testing, but the canonical frontier characterization now comes from the clean-start slice campaign documented under docs/.

Limitations

  • The current benchmark harness still has noticeable fixed overhead from process startup and framework initialization, especially in GPU-oriented runs.
  • The present GPU evidence is strongest for Qiskit Aer in WSL2; it is not yet a broad statement about every simulator.
  • qulacs-gpu is not part of the measured Linux GPU results yet, because the current environment did not include the required build toolchain.
  • PennyLane lightning.gpu is not part of the measured success path yet on this machine.
  • Some correctness metrics, especially around QFT, likely still contain backend-ordering or reference-comparison issues.
  • The repository now contains a much stronger machine-specific frontier characterization, but it is still a local benchmark campaign rather than a cross-lab publication package.

Recommended next steps

  • Fix the fidelity/reference mismatch for QFT and any other backend-ordering issues
  • Reduce fixed runner overhead for GPU-focused campaigns
  • Expand the WSL2 GPU profile to larger qubit counts
  • Add a working qulacs-gpu toolchain in WSL2
  • Re-run the full campaign on the stronger target workstation

Roadmap

  • v0.2 Fix correctness issues in the reference comparison layer, especially around QFT and possible qubit-ordering mismatches.
  • v0.3 Improve GPU methodology by reducing per-case startup overhead and expanding WSL2 GPU campaigns to larger qubit counts.
  • v0.4 Bring up qulacs-gpu and, if possible, PennyLane lightning.gpu on the Linux path.
  • v0.5 Run the full profile on the stronger target workstation and publish a cleaner benchmark report.
  • v1.0 Add a stable comparison set across CPU, GPU, correctness, and resource usage with reproducible published artifacts.

Notes

  • The runner imports heavy quantum frameworks lazily, per case.
  • Failures are recorded as result rows instead of aborting the whole campaign.
  • Raw generated directories such as results/, plots/, and artifacts/ are intentionally ignored by git. The committed benchmark record now lives under docs/.

About

CPU/GPU quantum circuit benchmark harness for Qiskit Aer, cuQuantum, Qulacs and PennyLane on Windows, WSL2 and Ubuntu.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors