This document explains the local and CI benchmark flow used in GraphCompose.
The short version is:
scripts/run-benchmarks.ps1is the normal local entry pointCurrentSpeedBenchmarkhas two profiles:smokeandfull- current-speed diffs are only valid between reports from the same profile
- repeated local runs should be compared via median aggregation, not by eyeballing one lucky run
If you are changing layout, pagination, render ordering, PDF session lifetime, or benchmark tooling, read this file together with README.md, architecture.md, and CONTRIBUTING.md.
suite: one benchmark family such ascurrent-speedorcomparativeprofile: a current-speed mode. Today that meanssmokeorfullrun: one timestamped JSON/CSV result written asrun-<timestamp>.jsonaggregate: a median report built from several repeated local runscompatible pair: two reports that can be diffed safely. Forcurrent-speed, compatibility means the same profile
The default local workflow is:
powershell -ExecutionPolicy Bypass -File .\scripts\run-benchmarks.ps1That wrapper is intentionally opinionated. It does more than just invoke one Java main class.
The script prints numbered sections so you can map console output to the pipeline:
01-build-classpathBuilds the test classpath once and writestarget/benchmark.classpath.02-current-speedRunsCurrentSpeedBenchmarkin the selected profile.03-comparativeRuns the GraphCompose canonical vs iText 5 vs JasperReports comparison.04-core-engineRunsGraphComposeBenchmark.05-full-cvRunsFullCvBenchmark.06-scalabilityRuns the thread-scaling throughput benchmark.07-stressRuns the concurrent stability stress test.08-enduranceOptional. Runs only when-IncludeEnduranceis provided.09-diff-current-speedDiffs the newest compatible current-speed reports.10-diff-comparativeDiffs the two newest comparative reports.
Each step writes a dedicated log file under target/benchmark-runs/<timestamp>/logs/, and the wrapper mirrors that log back to the console after the step finishes.
CurrentSpeedBenchmark supports two intended usage modes:
smokeBounded latency-oriented checks for pull requests and quick local spot checks. Defaults: 30 warmup + 100 measurement iterations per scenario, no throughput pass. Smoke is now sized so the JIT reaches a steady C1/C2 state and the p95 calculation has enough samples to interpolate between order statistics rather than collapsing to the maximum observed sample.fullWider warmup and measurement windows (12 warmup + 40 measurement) plus throughput coverage for local investigation and scheduled runs.
Use the same profile when comparing results. A smoke report and a full report are different experiments, not two samples of the same one.
- Every scenario triggers
System.gc()and a 50 ms sleep between warmup and measurement so the first measured iteration does not pay for warmup-era garbage. Variance dropped from 10–25 % to 2–5 % between runs on a developer laptop. - Percentiles use linear interpolation between order statistics
(
rank = (n-1) * p). Earlier versions returnedsorted[floor], which made p95 == max for small sample counts. - A "stage breakdown" table prints alongside the latency table for every
template scenario (
compose / layout / render / totalmedian ms). Use it when attributing regressions to engine layout vs PDFBox serialization — PDFBox typically takes 35–68 % of the end-to-end timing on these scenarios. - The performance gate (
-Dgraphcompose.benchmark.enforceGate=true) now uses thresholds calibrated at ~3× the observed avg, leaving room for CI machine variance while still catching ≥50 % regressions. peakHeapMbreports the heap delta over the post-warmup baseline rather than absolute used heap. The metric is closer to per-iteration allocation pressure than to total live data.
Examples:
powershell -ExecutionPolicy Bypass -File .\scripts\run-benchmarks.ps1 -CurrentSpeedProfile smoke
powershell -ExecutionPolicy Bypass -File .\scripts\run-benchmarks.ps1 -CurrentSpeedProfile fullFor current-speed reports, the wrapper now selects the newest pair that matches the profile of the latest run.
That means:
- if the newest run is
full, the script looks for the newest previousfullrun - if the newest run is
smoke, the script looks for the newest previoussmokerun - if there is no second run with that profile yet, the diff step is skipped instead of failing the whole benchmark run
This mirrors the rule enforced by BenchmarkDiffTool: current-speed reports with different profiles must not be diffed.
Comparative reports do not have the same profile split, so the wrapper simply diffs the two newest comparative runs.
When you pass -Repeat N, the wrapper reruns:
current-speedcomparative
After that, it writes median aggregate reports and diffs median-vs-median on later runs. This is the preferred mode for local decision-making because it reduces noise from GC, background processes, JIT warmup differences, and filesystem activity.
Example:
powershell -ExecutionPolicy Bypass -File .\scripts\run-benchmarks.ps1 -CurrentSpeedProfile full -Repeat 5powershell -ExecutionPolicy Bypass -File .\scripts\run-benchmarks.ps1 -CurrentSpeedProfile smokepowershell -ExecutionPolicy Bypass -File .\scripts\run-benchmarks.ps1 -CurrentSpeedProfile fullpowershell -ExecutionPolicy Bypass -File .\scripts\run-benchmarks.ps1 -CurrentSpeedProfile full -Repeat 5When comparing two branches, run a clean compile on both worktrees before the
benchmark wrapper. This prevents stale target/classes from making one branch
look faster or slower than the code that is actually checked out.
.\mvnw.cmd -B -ntp clean test-compilepowershell -ExecutionPolicy Bypass -File .\scripts\run-benchmarks.ps1 -SkipDiffpowershell -ExecutionPolicy Bypass -File .\scripts\run-benchmarks.ps1 -OpenResultsThe wrapper writes two groups of artifacts.
target/benchmark-runs/<timestamp>/SUMMARY.mdtarget/benchmark-runs/<timestamp>/logs/*.log
These are the best place to look when one numbered step fails.
target/benchmarks/current-speed/target/benchmarks/comparative/target/benchmarks/diffs/target/benchmarks/aggregates/
Typical contents:
run-<timestamp>.json- suite-specific CSV exports
latest.jsonconvenience copies- median aggregate reports under
aggregates/...
The PowerShell wrapper is preferred, but direct runs are still useful when debugging one suite in isolation.
Build the classpath first:
mvn --% -B -ntp -DskipTests test-compile dependency:build-classpath -DincludeScope=test -Dmdep.outputFile=target/benchmark.classpath
$cp = (Get-Content 'target/benchmark.classpath' -Raw).Trim()Then run the suite you care about:
java -cp "target\test-classes;target\classes;$cp" com.demcha.compose.CurrentSpeedBenchmark
java -Dgraphcompose.benchmark.profile=smoke -cp "target\test-classes;target\classes;$cp" com.demcha.compose.CurrentSpeedBenchmark
java -cp "target\test-classes;target\classes;$cp" com.demcha.compose.ComparativeBenchmark
java -cp "target\test-classes;target\classes;$cp" com.demcha.compose.BenchmarkDiffTool current-speed
java -cp "target\test-classes;target\classes;$cp" com.demcha.compose.BenchmarkDiffTool comparativeUse the suite shortcut when possible. BenchmarkDiffTool current-speed already knows how to select the newest compatible pair for the current-speed suite.
Because there were not yet two current-speed reports with the same profile as the latest run.
Example:
- latest run is
full - historical reports contain only one
fullrun and severalsmokeruns - result: the diff is skipped because there is no compatible pair yet
Today that warning comes from Lombok on newer JDKs. If the Maven section still ends with BUILD SUCCESS, treat it as noisy stderr, not as a benchmark failure.
Local benchmark numbers are sensitive to machine conditions:
- background CPU load
- OneDrive or antivirus activity
- thermal throttling
- JVM warmup differences
- GC timing
Do not call a one-off slowdown a code regression until repeated runs show the same direction.
Prefer rerunning the relevant suite on the current checkout. For local claims, median-based repeated runs are safer than one-off results.
When changing the benchmark pipeline:
- keep
README.mdaligned with the supported command line - update this file when the wrapper flow, artifact layout, or diff rules change
- keep current-speed profile semantics explicit in user-facing docs
- preserve the rule that incompatible current-speed profiles must never be diffed