Portable benchmark harness for quantum circuit simulation across CPU and NVIDIA GPU environments, designed for reproducible local validation on Windows/WSL2 and later migration to stronger Ubuntu workstations.
Short description
Quantum Bench standardizes how quantum circuit simulators are executed, measured, and compared across development and target machines, with emphasis on Qiskit Aer, Qulacs, PennyLane Lightning, and the practical behavior of NVIDIA-backed simulation paths.
The project is organized around two ideas:
- keep the benchmark pipeline portable across Windows, WSL2, and Ubuntu;
- separate quick validation runs from heavier hardware-focused campaigns.
- for frontier-style campaigns on WSL2, prefer a Windows-hosted orchestrator that launches one case at a time inside WSL so a single hard crash does not destroy the whole run.
- Runs benchmark profiles from JSON config files
- Captures host, package, CUDA, and GPU metadata
- Measures wall time, CPU time, peak RSS, and GPU memory
- Compares results against a small exact reference simulator for correctness checks
- Writes CSV, JSON, and Markdown artifacts for later analysis
- Generates plots from result directories
The repository now includes a GitHub-safe benchmark report for the current machine, based on the validated WSL2 + Qiskit Aer GPU path and the clean-start frontier methodology.
- Extended cuQuantum campaign: docs/reports/cuquantum-benchmark-campaign.md
- cuQuantum profile spreadsheet: docs/data/cuquantum-profile-summary.csv
- cuQuantum frontier spreadsheet: docs/data/cuquantum-frontier-summary.csv
- cuQuantum error spreadsheet: docs/data/cuquantum-error-summary.csv
- Canonical report: docs/reports/current-machine-frontier.md
- Structured summary: docs/data/current-machine-frontier.json
- Hardware specification: docs/data/current-machine-hardware.json
- Public repo policy: docs/reports/public-repo-guidelines.md
Headline frontier from the current machine:
| Slice | Stable max measured | Highest tested | Reading |
|---|---|---|---|
CPU / double / ghz |
28 | 29 | 29q became unstable |
GPU / double / ghz |
28 | 29 | dedicated 29q probe failed |
GPU / single / ghz |
28 | 29 | 29q became unstable |
CPU / double / random |
20 | 29 | stability dropped sharply after 20q |
GPU / single / trotter |
20 | 29 | stability dropped sharply after 20q |
Working interpretation:
WSL2is the canonical NVIDIA path for this machine.- The clean-start per-slice runs are the canonical frontier measurements.
- The sustained batch campaign is still useful, but it acts more like an endurance test than a clean ceiling measurement.
These figures summarize the benchmark campaign that was run on this machine. This is not neural-network training; in this repository, "training run" refers to the long benchmark execution campaign that repeatedly simulated quantum circuits until stable limits, timing behavior, memory use, and failure boundaries were observed.
Unless a caption says otherwise, time values are measured in seconds. Memory plots use either MiB/MB or GiB as stated in the axis label. Qubit counts are logical simulated qubits, not physical hardware qubits.
This chart shows the stable simulation frontier by benchmark slice. The X axis lists each slice as device / precision / circuit family, such as CPU/double ghz or GPU/single trotter. The Y axis is stable max qubits, meaning the largest qubit count that completed reliably for that slice. Higher bars mean the local machine could simulate larger statevectors for that combination.
This figure combines the public-safe machine specification with resource envelopes. The memory panel uses GiB on the X axis and compares system RAM, safe RAM budget, GPU VRAM, and safe VRAM budget. The capability panel uses benchmark target type on the X axis and recommended maximum qubits on the Y axis. It explains why the later benchmark frontiers cluster around the high-20-qubit range on this workstation.
This stacked bar chart shows how many non-warmup benchmark rows each cuQuantum campaign produced. The X axis is the campaign profile, such as speed-sweep, exact-frontier, observable-frontier, ideal-depth, and noisy-depth. The Y axis is row count. Green segments are successful rows; red segments are failed, pruned, or hardware-boundary rows. In frontier-style tests, the red portion is not simply noise: it marks where the machine or WSL/GPU stack stopped completing the requested workload.
This chart converts the same campaigns into non-warmup success percentage. The X axis is the campaign profile. The Y axis is success rate in percent, from 0 to 100. The speed-sweep campaign reached 100%, while frontier and depth campaigns intentionally pushed into failure zones, so low percentages there mean the test successfully mapped boundaries rather than only collecting easy successes.
This chart compares the median pure simulation time at 28q. The X axis is the circuit family: ansatz, ghz, random, and trotter. The Y axis is median simulate_s, measured in seconds. Each grouped bar is a simulator variant: CPU statevector, GPU thrust, or GPU cuStateVec. This isolates simulator execution time from surrounding runner overhead and shows that GPU acceleration is workload-dependent rather than automatically faster for every circuit family.
This chart shows the stable frontier for noisy sampled simulations. The X axis combines simulator variant and circuit family, such as cpu_statevector / ghz or gpu_thrust / trotter. The Y axis is stable max qubits. The main reading is that noisy sampled circuits are harder than ideal statevector speed sweeps: GHZ remained stable through 20q, while ansatz and trotter stabilized lower, around 16q, for the CPU and GPU thrust paths.
This line chart tracks noise-induced distribution drift for 12q noisy runs. The X axis is circuit depth, meaning the number of repeated circuit layers configured by the profile. The Y axis is median TVD, or total variation distance, which is unitless. 0 means the noisy sampled distribution matched the ideal/reference distribution; larger values mean the measured probability distribution moved farther away. Each line is a simulator variant plus circuit family pair.
This plot shows CPU double-precision GHZ timing as qubit count increases. The X axis is qubits. The Y axis is wall-clock time in seconds. Because statevector simulation scales exponentially with qubits, the curve is expected to rise sharply near the practical frontier.
This plot shows GPU double-precision GHZ timing. The X axis is qubits and the Y axis is wall-clock time in seconds. It should be read together with the memory plots below: a point can be fast enough at one qubit count but become unstable once memory pressure reaches the WSL/GPU envelope.
This chart tracks host-side RAM pressure for the GPU double GHZ run. The X axis is qubits. The Y axis is peak RSS in MB, meaning the maximum resident memory used by the process on the host side. This is not VRAM; it is system memory used by Python, Qiskit Aer, orchestration, and supporting allocations.
This chart tracks GPU memory pressure for the same GPU double GHZ slice. The X axis is qubits. The Y axis is peak GPU memory in MB. This is the most direct view of how much VRAM was consumed while simulating larger statevectors on the RTX A2000 12GB.
This plot shows GPU single-precision GHZ timing. The X axis is qubits and the Y axis is wall-clock time in seconds. Single precision uses less memory per amplitude than double precision, so it can sometimes improve the frontier, but the final stable limit still depends on backend behavior, WSL stability, and circuit structure.
This development snapshot shows wall-clock time scaling on the native Windows development path. The X axis is qubits and the Y axis is median wall time in seconds. It is useful as a smoke-test view of CPU-side execution and harness overhead, but it is not the canonical NVIDIA GPU path for this machine.
This companion plot shows wall-clock time scaling on the WSL2 Qiskit path. The X axis is qubits and the Y axis is median wall time in seconds. It helps compare the Linux/WSL execution route against the native Windows development route.
This chart shows GPU/CPU speedup using median wall time. The X axis is qubits. The Y axis is a unitless speedup ratio computed as CPU median wall_s / GPU median wall_s. A value above 1.0 means the GPU path was faster; a value below 1.0 means the CPU path was faster for that matched circuit point. The horizontal 1.0 line is the break-even point.
| Component | Current machine |
|---|---|
| Host OS | Windows 11 Pro 10.0.26200 |
| Guest OS | WSL2 Ubuntu 24.04 on Linux 6.6.87.2-microsoft-standard-WSL2 |
| CPU | Intel Core i7-14700K |
| Physical cores / logical threads | 20 / 28 |
| System RAM | 27.4 GiB |
| Safe RAM budget used by the probe | 20.0 GiB |
| GPU | NVIDIA RTX A2000 12GB |
| VRAM | 12.0 GiB |
| Safe VRAM budget used by the probe | 9.1 GiB |
| Driver / CUDA | 581.42 / 13.0 |
| Python / Qiskit / Aer | 3.12.3 / 1.4.5 / 0.15.1 |
For the full write-up, methodology notes, and the structured public-safe summaries, see:
- docs/reports/current-machine-frontier.md
- docs/data/current-machine-frontier.json
- docs/data/current-machine-hardware.json
For the current machine and tested stack, WSL2 was the only path that produced a real NVIDIA-backed run with Qiskit Aer GPU.
In the measured setup:
- native Windows CPU runs worked well for
Qiskit Aer,Qulacs, andPennyLane; - native Windows GPU runs failed across the tested backends;
WSL2with Ubuntu andqiskit==1.4.5plusqiskit-aer-gpusuccessfully executed GPU simulations on theRTX A2000 12GB.
Practical conclusion:
- use Windows native for fast development and CPU-side validation;
- use
WSL2or Ubuntu native when the actual goal is evaluating NVIDIA GPU behavior.
py -3.13 -m venv .venv
.venv\Scripts\Activate.ps1
python -m pip install -U pip setuptools wheel
python -m pip install psutil matplotlib pynvml qiskit qiskit-aer qulacs pennylane pennylane-lightning
python -m quantum_bench env-report --output artifacts/env-report-win-venv.json
python -m quantum_bench capability-probe --output artifacts/capabilities-win-venv.json
python -m quantum_bench run --profile profiles/dev.json --capabilities artifacts/capabilities-win-venv.json
python -m quantum_bench plot --input-dir results/dev/<run-dir> --output-dir plots/dev-<run-dir>python3 -m venv .venv-wsl
source .venv-wsl/bin/activate
python -m pip install -U pip setuptools wheel
python -m pip install psutil matplotlib pynvml
python -m pip install qiskit==1.4.5 qiskit-aer-gpu
python -m quantum_bench env-report --output artifacts/env-report-wsl-gpu.json
python -m quantum_bench capability-probe --output artifacts/capabilities-wsl-gpu.json
python -m quantum_bench run --profile profiles/dev-wsl-gpu.json --capabilities artifacts/capabilities-wsl-gpu.json
python -m quantum_bench report --input-dir results/dev-wsl-gpu/<run-dir>
python -m quantum_bench plot --input-dir results/dev-wsl-gpu/<run-dir> --output-dir plots/dev-wsl-gpu-<run-dir>Use this when the goal is to push the current machine close to its practical statevector frontier on the validated NVIDIA path.
python3 -m venv .venv-wsl
source .venv-wsl/bin/activate
python -m pip install -U pip setuptools wheel
python -m pip install psutil matplotlib pynvml qiskit-aer-gpu
python -m quantum_bench env-report --output artifacts/env-report-wsl-frontier.json
python -m quantum_bench capability-probe --output artifacts/capabilities-wsl-frontier.json
python -m quantum_bench run --profile profiles/frontier-wsl-gpu.json --capabilities artifacts/capabilities-wsl-frontier.json
python -m quantum_bench report --input-dir results/frontier-wsl-gpu/<run-dir>
python -m quantum_bench plot --input-dir results/frontier-wsl-gpu/<run-dir> --output-dir plots/frontier-wsl-gpu-<run-dir>Use this when you want the most reliable frontier data on the current machine. It runs each (device, precision, family) slice independently, reuses a cached WSL environment report, and writes a campaign-level summary in addition to the per-run artifacts.
python -m quantum_bench env-report --output artifacts/env-report-wsl-frontier.json
python -m quantum_bench capability-probe --output artifacts/capabilities-wsl-frontier.json
.\scripts\run_wsl_frontier_precision.ps1 -ResultsRoot results/frontier-wsl-precision-campaign -PlotsRoot plots/frontier-wsl-precision-campaign -ResumeFor a single clean confirmation slice, use -SelectedSlice, for example:
.\scripts\run_wsl_frontier_precision.ps1 -SelectedSlice gpu-single-ghz -ResultsRoot results/frontier-wsl-precision-gpu-single-ghz-clean -PlotsRoot plots/frontier-wsl-precision-gpu-single-ghz-clean -ResumeIf the first qiskit-aer-gpu install resolves to an incompatible qiskit combination in your WSL environment, fall back to:
python -m pip install qiskit==1.4.5 qiskit-aer-gpuOptional secondary extension for Qulacs after the main Qiskit Aer frontier run:
python -m pip install qulacs-gpu
python -m quantum_bench run --profile profiles/frontier-wsl-qulacs.json --capabilities artifacts/capabilities-wsl-frontier.json
python -m quantum_bench report --input-dir results/frontier-wsl-qulacs/<run-dir>
python -m quantum_bench plot --input-dir results/frontier-wsl-qulacs/<run-dir> --output-dir plots/frontier-wsl-qulacs-<run-dir>- Current machine frontier report
- Current machine frontier JSON
- Current machine hardware JSON
- Public repository guidelines
- Curated benchmark figures under
docs/assets/
Safe to commit:
- source code, profiles, scripts, and curated docs under
docs/ - public-safe JSON summaries under
docs/data/ - curated plots under
docs/assets/
Keep local only:
- raw timestamped runs under
results/ - raw generated plots under
plots/ - local machine snapshots under
artifacts/ - local environments and temporary directories such as
.venv/,.venv-wsl/, and.campaign-temp/ - any file that exposes usernames, full local paths, temporary directories, hostnames, or raw environment-variable dumps
The raw timestamped machine outputs that generated those summaries remain local under results/, plots/, and artifacts/. They are intentionally kept out of Git so the repository stays readable and commit-safe.
- Libraries:
Qiskit AerQulacsPennyLane Lightning
- Circuit families:
ghzqftrandomansatztrotter
- Commands:
runrun-wslplotreportcompareenv-reportcapability-probe
Out of scope for this version:
qsim/CirqProjectQ- real-provider calibrated noise campaigns
W,HHL, andSupermarQ
quantum_bench/
adapters/ backend-specific execution
capability.py RAM/VRAM estimation
cli.py command entrypoints
config.py profile expansion
env_report.py machine metadata
plotting.py plot generation
probability.py probability normalization and divergence metrics
artifacts.py probability sidecars, accumulation summaries, run plots
reporting.py JSON/Markdown analysis summaries
recipes.py canonical circuit recipes
reference.py small exact simulator for correctness checks
runner.py isolated case execution and CSV/JSON writing
docs/
assets/ committed plots used in the public report
data/ committed machine-summary JSON
reports/ GitHub-safe benchmark reports
profiles/
dev.json
full.json
dev-wsl-gpu.json
frontier-wsl-gpu.json
frontier-wsl-qulacs.json
cuquantum-speed-sweep-wsl.json
cuquantum-exact-frontier-wsl.json
cuquantum-observable-frontier-wsl.json
cuquantum-ideal-depth-sweep-wsl.json
cuquantum-noisy-depth-sweep-wsl.json
cuquantum-exact-frontier-appliance.json
cuquantum-observable-frontier-appliance.json
scripts/
run_wsl_frontier_precision.ps1
The repository ships with the original development/frontier profiles plus canonical cuQuantum campaigns:
profiles/dev.jsonShort development profile for Windows CPU and partial GPU-path validation.profiles/full.jsonLarger campaign driven bycapability-probe, intended for the stronger target workstation.profiles/dev-wsl-gpu.jsonWSL2/Linux-oriented development profile focused on real NVIDIA GPU runs withQiskit Aer.profiles/frontier-wsl-gpu.jsonWSL2/Linux frontier profile focused on the heaviest validatedQiskit AerCPU/GPU statevector cases on the current class of machine.profiles/frontier-wsl-qulacs.jsonShorter WSL2/Linux extension profile forQulacsCPU/GPU after the main frontier run.profiles/cuquantum-speed-sweep-wsl.jsonWSL2/Linux comparison profile forcpu_statevector,gpu_thrust, andgpu_custatevecwith realpersistent_groupexecution.profiles/cuquantum-exact-frontier-wsl.jsonWSL2/Linux exact-frontier profile forcpu_statevector,gpu_thrust, andgpu_custatevecindoubleandsingle.profiles/cuquantum-observable-frontier-wsl.jsonWSL2/Linux observable-frontier profile comparing full statevector baselines againstgpu_tensornetworkfor marginal probabilities.profiles/cuquantum-ideal-depth-sweep-wsl.jsonIdeal numeric depth sweep foransatz,random, andtrotter, with exact references up to 12 qubits and marginal outputs above that.profiles/cuquantum-noisy-depth-sweep-wsl.jsonSynthetic probabilistic noise sweep usingsynthetic_canonical_v1,counts, and 4096 shots by default.profiles/cuquantum-exact-frontier-appliance.jsonDocker/appliance exact-frontier profile forappliance_cusvaer; missing Docker or NVIDIA Container Toolkit produces a clear diagnostic row.profiles/cuquantum-observable-frontier-appliance.jsonDocker/appliance observable-frontier profile forappliance_tensornetworkwith the same diagnostic policy.
The frontier profiles also enable frontier_stop_on_failure, so once a qubit level fails for a given (library, backend, device, precision, family) slice, the remaining equal-or-higher cases in that slice are pruned instead of repeatedly hammering the same unstable boundary.
Use this for CPU runs and basic pipeline validation.
py -3.13 -m venv .venv
.venv\Scripts\Activate.ps1
python -m pip install -U pip setuptools wheel
python -m pip install psutil matplotlib pynvml qiskit qiskit-aer qulacs pennylane pennylane-lightningRun directly from the repository root:
.venv\Scripts\python.exe -m quantum_bench --helpUse this for the NVIDIA path. On the current machine, this is the only route that successfully executed real GPU simulations.
python3 -m venv .venv-wsl
source .venv-wsl/bin/activate
python -m pip install -U pip setuptools wheel
python -m pip install psutil matplotlib pynvml
python -m pip install qiskit==1.4.5 qiskit-aer-gpuNotes:
Qiskit Aer GPUworked in WSL2 after pinningqiskit==1.4.5.- Native Windows GPU execution did not work for the tested setup.
qulacs-gpuwas not installed in WSL2 during the current runs because it requires a local build toolchain.- For the frontier workflow, try
pip install qiskit-aer-gpufirst and keep the pinnedqiskit==1.4.5fallback if the resolved combination does not execute GPU cases correctly.
python -m quantum_bench env-report --output artifacts/env-report.jsonGenerates hardware, OS, Python, package, and CUDA metadata.
python -m quantum_bench capability-probe --output artifacts/capabilities.jsonEstimates safe RAM and VRAM envelopes for statevector experiments.
Windows development run:
.venv\Scripts\python.exe -m quantum_bench run --profile profiles/dev.json --capabilities artifacts/capabilities-win-venv.jsonWSL2 GPU run:
.venv-wsl/bin/python -m quantum_bench run --profile profiles/dev-wsl-gpu.json --capabilities artifacts/capabilities-wsl-gpu.jsonLarger workstation-oriented run:
python -m quantum_bench run --profile profiles/full.json --capabilities artifacts/capabilities.jsonFrontier WSL2 GPU run:
python -m quantum_bench run --profile profiles/frontier-wsl-gpu.json --capabilities artifacts/capabilities-wsl-frontier.jsonRecommended frontier run from Windows host with WSL execution isolation:
python -m quantum_bench run-wsl --profile profiles/frontier-wsl-gpu.json --capabilities artifacts/capabilities-wsl-frontier.json --repo-root . --wsl-python .venv-wsl/bin/pythonThis mode launches one benchmark case per wsl invocation, which is much more resilient near the hardware limit because a single WSL-side crash is recorded as a failed case instead of taking down the entire campaign.
To regenerate the committed public plots from the curated JSON files:
python scripts/generate_public_report_assets.pyPhase 1 cuQuantum speed sweep:
python -m quantum_bench run-wsl --profile profiles/cuquantum-speed-sweep-wsl.json --capabilities artifacts/capabilities-wsl-frontier.json --repo-root . --wsl-python .venv-wsl/bin/pythonPhase 1 cuQuantum exact frontier:
python -m quantum_bench run-wsl --profile profiles/cuquantum-exact-frontier-wsl.json --capabilities artifacts/capabilities-wsl-frontier.json --repo-root . --wsl-python .venv-wsl/bin/pythonFull cuQuantum metric campaigns:
python -m quantum_bench run-wsl --profile profiles/cuquantum-observable-frontier-wsl.json --capabilities artifacts/capabilities-wsl-frontier.json --repo-root . --wsl-python .venv-wsl/bin/python
python -m quantum_bench run-wsl --profile profiles/cuquantum-ideal-depth-sweep-wsl.json --capabilities artifacts/capabilities-wsl-frontier.json --repo-root . --wsl-python .venv-wsl/bin/python
python -m quantum_bench run-wsl --profile profiles/cuquantum-noisy-depth-sweep-wsl.json --capabilities artifacts/capabilities-wsl-frontier.json --repo-root . --wsl-python .venv-wsl/bin/pythonAppliance profiles use executor=docker_wsl:
python -m quantum_bench run-wsl --profile profiles/cuquantum-exact-frontier-appliance.json --repo-root . --wsl-python .venv-wsl/bin/python
python -m quantum_bench run-wsl --profile profiles/cuquantum-observable-frontier-appliance.json --repo-root . --wsl-python .venv-wsl/bin/pythonpython -m quantum_bench report --input-dir results/frontier-wsl-gpu/...Typical outputs:
analysis-summary.jsonanalysis-report.md
python -m quantum_bench compare --input-dir results/cuquantum-speed-sweep-wsl/<run-a> --input-dir results/cuquantum-speed-sweep-wsl/<run-b> --output-dir results/cuquantum-compareTypical outputs:
comparison-summary.jsoncomparison-report.md
python -m quantum_bench plot --input-dir results/dev/... --output-dir plots/devTypical outputs:
time_vs_qubits.pngram_vs_qubits.pngvram_vs_qubits.pnggpu_cpu_speedup.pngfidelity_vs_qubits.pngsummary.json
Each run creates a timestamped directory containing:
env-report.jsoncapability-report.jsonmanifest.jsonresults.csvresults.jsondistribution-snapshots.jsonlprobability-matrix.jsonerror-accumulation-summary.jsonartifact-status.json- run-level plots such as
tvd_vs_depth.png,jsd_vs_depth.png,time_breakdown_stacked.png,memory_vs_qubits.png,exact_frontier_by_variant.png, andobservable_frontier_by_variant.png analysis-summary.jsonafter runningreportanalysis-report.mdafter runningreportcomparison-summary.jsonafter runningcomparecomparison-report.mdafter runningcompare
Important CSV columns:
wall_sTotal wall-clock time in seconds for the case.cpu_sCPU time consumed by the process.peak_rss_mbPeak resident memory in MB.gpu_peak_mem_mbPeak GPU memory observed in MB.state_fidelity_refFidelity against the small exact reference simulator when the qubit count is within the reference budget.prob_l1_ref,tvd_ref,hellinger_ref,jsd_refProbability-distribution divergence against the ideal reference when available.tvd_noisy_vs_ideal,hellinger_noisy_vs_ideal,jsd_noisy_vs_idealSynthetic-noise degradation metrics for probabilistic runs.speedup_vs_cpu,speedup_vs_gpu_thrustPointwise speedups matched by family, qubits, depth, precision, noise profile, and output mode.backend_init_s,transpile_s,simulate_s,extract_sTiming breakdown used to identify the dominant bottleneck.
The canonical committed summary is docs/reports/current-machine-frontier.md. It replaces the earlier ad hoc README snapshots with a clean-start frontier report, a structured JSON summary, and curated plots that are safe to keep in Git.
Local raw artifacts still exist for the full investigation, but they remain intentionally gitignored because they are timestamped, bulky, and specific to this workstation.
- The GPU study is not complete yet. It currently has strong evidence for
Qiskit Aer GPUin WSL2, but not yet forQulacs GPUorPennyLane lightning.gpu. - Some fidelity values are suspicious, especially around
QFT. That likely indicates a harness-side issue such as qubit ordering or backend mapping, not necessarily a simulator failure. - The older development profiles are intentionally small and still useful for smoke testing, but the canonical frontier characterization now comes from the clean-start slice campaign documented under
docs/.
- The current benchmark harness still has noticeable fixed overhead from process startup and framework initialization, especially in GPU-oriented runs.
- The present GPU evidence is strongest for
Qiskit AerinWSL2; it is not yet a broad statement about every simulator. qulacs-gpuis not part of the measured Linux GPU results yet, because the current environment did not include the required build toolchain.PennyLane lightning.gpuis not part of the measured success path yet on this machine.- Some correctness metrics, especially around
QFT, likely still contain backend-ordering or reference-comparison issues. - The repository now contains a much stronger machine-specific frontier characterization, but it is still a local benchmark campaign rather than a cross-lab publication package.
- Fix the fidelity/reference mismatch for
QFTand any other backend-ordering issues - Reduce fixed runner overhead for GPU-focused campaigns
- Expand the WSL2 GPU profile to larger qubit counts
- Add a working
qulacs-gputoolchain in WSL2 - Re-run the full campaign on the stronger target workstation
v0.2Fix correctness issues in the reference comparison layer, especially aroundQFTand possible qubit-ordering mismatches.v0.3Improve GPU methodology by reducing per-case startup overhead and expanding WSL2 GPU campaigns to larger qubit counts.v0.4Bring upqulacs-gpuand, if possible,PennyLane lightning.gpuon the Linux path.v0.5Run thefullprofile on the stronger target workstation and publish a cleaner benchmark report.v1.0Add a stable comparison set across CPU, GPU, correctness, and resource usage with reproducible published artifacts.
- The runner imports heavy quantum frameworks lazily, per case.
- Failures are recorded as result rows instead of aborting the whole campaign.
- Raw generated directories such as
results/,plots/, andartifacts/are intentionally ignored by git. The committed benchmark record now lives underdocs/.














