From 5637d46fac856921ff50a628c5b5097afb4af68c Mon Sep 17 00:00:00 2001 From: Peter Lawrey Date: Sat, 25 Oct 2025 08:11:38 +0100 Subject: [PATCH 01/12] Add coverage tests and align documentation --- README.adoc | 10 +- pom.xml | 21 ++ src/main/adoc/architecture.adoc | 5 +- src/main/adoc/benchmark-lifecycle.adoc | 2 +- src/main/adoc/decision-log.adoc | 2 +- src/main/adoc/jlbh-cookbook.adoc | 7 +- src/main/adoc/project-requirements.adoc | 68 +++--- .../adoc/results-interpretation-guide.adoc | 2 +- .../jlbh/JLBHAdditionalCoverageTest.java | 185 ++++++++++++++++ .../chronicle/jlbh/JLBHOptionsTest.java | 83 ++++++++ .../jlbh/util/JLBHResultSerializerTest.java | 199 ++++++++++++++++++ 11 files changed, 537 insertions(+), 47 deletions(-) create mode 100644 src/test/java/net/openhft/chronicle/jlbh/JLBHAdditionalCoverageTest.java create mode 100644 src/test/java/net/openhft/chronicle/jlbh/JLBHOptionsTest.java create mode 100644 src/test/java/net/openhft/chronicle/jlbh/util/JLBHResultSerializerTest.java diff --git a/README.adoc b/README.adoc index f1300a09..cce3eb07 100644 --- a/README.adoc +++ b/README.adoc @@ -77,12 +77,12 @@ Commonly used option methods include: For a full list of configuration parameters see `src/main/java/net/openhft/chronicle/jlbh/JLBHOptions.java`. -For a visual overview of how a benchmark progresses, see the link:docs/benchmark-lifecycle.adoc[benchmark lifecycle diagram]. +For a visual overview of how a benchmark progresses, see the link:src/main/adoc/benchmark-lifecycle.adoc[benchmark lifecycle diagram]. == Additional Features === OS Jitter Measurement -JLBH can optionally track operating-system jitter by running a background thread that records scheduler delays as a separate probe. This is useful when investigating latency spikes caused by kernel activity, but it incurs some overhead so it is disabled by default. You can enable it via `recordOSJitter()` when you need to quantify such noise. The feature is listed in the requirements specification around lines 52 and 56 of `project-requirements.adoc`. +JLBH can optionally track operating-system jitter by running a background thread that records scheduler delays as a separate probe. This is useful when investigating latency spikes caused by kernel activity, but it incurs some overhead so it is enabled by default. Disable it via `recordOSJitter(false)` when you want to avoid the extra monitoring. The feature is listed in the requirements specification around lines 52 and 56 of `project-requirements.adoc`. === Adding custom probes JLBH lets you time additional stages of your benchmark using `addProbe(String)`. The returned `NanoSampler` records its own histogram. @@ -147,11 +147,11 @@ The following table summarises the main non-functional quality attributes derive |=== | Area | Requirement -| Performance | Overhead per sample must remain below 100 ns when no additional probes are active. Histogram generation must support ≥200 M iterations without heap pressure. +| Performance | Overhead per sample must remain below 100 ns when no additional probes are active. Histogram generation must support >=200 M iterations without heap pressure. | Reliability | Harness must abort gracefully on interruptions or sample time-outs, and immutable result objects ensure thread-safe publication. | Usability | Fluent builder API with sensible defaults yields a runnable benchmark in ≤10 LOC. ASCII table outputs are human-readable and CI-friendly. | Portability | Pure-Java codebase with runtime-detected JDK optimisations; no native compilation. -| Maintainability | ≥80 % unit-test coverage and adherence to Chronicle coding standards validated by SonarCloud. +| Maintainability | >=80 % unit-test coverage and adherence to Chronicle coding standards validated by SonarCloud. | Security | No executable deserialisation; harness operates in-process. Users remain responsible for securing benchmarked code. |=== @@ -229,4 +229,4 @@ Configure the target rate using `throughput(int, TimeUnit)` and optionally a `La == Licensing and Release Information -The code is licensed under the terms described in the link:LICENSE[Apache 2.0 License]. Releases are available on link:https://github.com/OpenHFT/JLBH/releases[GitHub]. +The code is licensed under the terms described in the link:LICENSE.adoc[Apache 2.0 License]. Releases are available on link:https://github.com/OpenHFT/JLBH/releases[GitHub]. diff --git a/pom.xml b/pom.xml index 5b7e9c66..35ade1c4 100644 --- a/pom.xml +++ b/pom.xml @@ -140,6 +140,27 @@ + + + org.jacoco + jacoco-maven-plugin + 0.8.14 + + + prepare-agent + + prepare-agent + + + + report + verify + + report + + + + diff --git a/src/main/adoc/architecture.adoc b/src/main/adoc/architecture.adoc index fc8eda61..46445443 100644 --- a/src/main/adoc/architecture.adoc +++ b/src/main/adoc/architecture.adoc @@ -77,7 +77,7 @@ The `startTimeNs` is the calculated ideal start time for the iteration. * `OSJitterMonitor` ** An internal background thread responsible for measuring operating system scheduling jitter. -** Activated if `JLBHOptions.recordOSJitter` is true. +** Monitoring is enabled by default (`JLBHOptions.recordOSJitter` defaults to true) and can be disabled via `recordOSJitter(false)`. ** Repeatedly calls `System.nanoTime()` to detect delays greater than `recordJitterGreaterThanNs` and records these into a dedicated `Histogram`. == 3. Execution Flow @@ -140,6 +140,7 @@ JLBH has a specific threading model that users and contributors should understan * **Event Loop Integration**: ** As an alternative to `JLBH.start()`, the benchmark can be driven by an external `EventLoop` (e.g., from Chronicle Services) by calling `JLBH.eventLoopHandler(EventLoop)`. +** Coordinated omission accounting must remain enabled; calling `eventLoopHandler` with `accountForCoordinatedOmission(false)` throws an `UnsupportedOperationException`. ** In this mode, the warm-up and measurement iterations are scheduled and executed on the provided event loop's thread, using internal `EventHandler` implementations (`WarmupHandler`, `JLBHEventHandler`). * **User-Spawned Threads within `JLBHTask`**: @@ -194,4 +195,4 @@ Once a benchmark or a run is complete, the generated data is fixed. This includes percentile tables for each probe and overall summaries. ** Programmatic access to the structured, immutable results is provided via the `JLBHResultConsumer` and the `JLBHResult` interface. ** Results can be serialized to CSV format using `net.openhft.chronicle.jlbh.util.JLBHResultSerializer`, facilitating offline analysis or import into spreadsheets and other tools. -** Integration with CI systems like TeamCity is supported via `net.openhft.chronicle.jlbh.TeamCityHelper`, which can emit build statistics. \ No newline at end of file +** Integration with CI systems like TeamCity is supported via `net.openhft.chronicle.jlbh.TeamCityHelper`, which can emit build statistics. diff --git a/src/main/adoc/benchmark-lifecycle.adoc b/src/main/adoc/benchmark-lifecycle.adoc index 5cfc71e7..322d3bf4 100644 --- a/src/main/adoc/benchmark-lifecycle.adoc +++ b/src/main/adoc/benchmark-lifecycle.adoc @@ -22,7 +22,7 @@ This initial phase prepares the benchmark environment: * **Task Initialization**: The `JLBHTask.init(JLBH jlbh)` method of the user-provided task is invoked. ** This allows the benchmark task to perform its own one-time setup, such as initializing resources or fixtures. ** Crucially, this is where additional measurement probes can be registered by calling `jlbh.addProbe("probeName")` to get `NanoSampler` instances for specific sub-sections of the task. -* **OS Jitter Monitoring**: If configured in `JLBHOptions` (`recordOSJitter = true`), the `OSJitterMonitor` background thread is started to begin measuring operating system scheduling jitter independently. +* **OS Jitter Monitoring**: Jitter tracking is enabled by default (`recordOSJitter` defaults to true); unless disabled via `recordOSJitter(false)`, the `OSJitterMonitor` background thread starts to measure operating system scheduling jitter independently. * **Affinity**: If an `acquireLock` supplier is configured in `JLBHOptions`, an attempt to acquire CPU affinity for the main benchmark thread might occur. == 2. Warmup diff --git a/src/main/adoc/decision-log.adoc b/src/main/adoc/decision-log.adoc index f8b87b83..1113684c 100644 --- a/src/main/adoc/decision-log.adoc +++ b/src/main/adoc/decision-log.adoc @@ -32,6 +32,6 @@ Rationale for Decision:: Impact & Consequences:: * Contributors must learn basic AsciiDoc syntax. * Build pipeline processes `.adoc` files to publish HTML. -- Some external tools may expect Markdown, requiring conversion. +* Some external tools may expect Markdown, requiring conversion. Notes/Links:: ** https://asciidoctor.org[AsciiDoc project page] diff --git a/src/main/adoc/jlbh-cookbook.adoc b/src/main/adoc/jlbh-cookbook.adoc index e2ca8047..fa91cc6d 100644 --- a/src/main/adoc/jlbh-cookbook.adoc +++ b/src/main/adoc/jlbh-cookbook.adoc @@ -520,8 +520,8 @@ public class MainApp { .iterations(10_000) // Shorter for quick demo .throughput(50_000) .jlbhTask(new CallBenchmark()) - // Enable OS Jitter recording - .recordOSJitter(true) + // OS jitter recording is enabled by default; use recordOSJitter(false) + // if you want to disable the background monitor. // Optional: Only record jitter events greater than a specific threshold (default 1_000 ns) // .recordJitterGreaterThanNs(5_000) // Optional: Pin the OS Jitter monitoring thread to a specific CPU core (if supported) @@ -535,7 +535,8 @@ public class MainApp { *Interpreting OS Jitter Results*: -* When `recordOSJitter(true)` is set, JLBH starts a background thread that continuously measures the time elapsed between calls to `System.nanoTime()`. +* JLBH records OS jitter by default and starts a background thread that continuously measures the time elapsed between calls to `System.nanoTime()`. +Use `recordOSJitter(false)` only when you want to suppress this monitoring. * If this measured gap exceeds `recordJitterGreaterThanNs`, it's considered an OS-induced pause/jitter event and is recorded in a special "OS Jitter" probe. * The JLBH output (both console and `JLBHResult`) will include a section for the "OS Jitter" probe, showing its own percentile distribution. * These jitter values represent pauses that are *not* caused by your application code but by the operating system or other processes interfering with the JLBH process's execution. diff --git a/src/main/adoc/project-requirements.adoc b/src/main/adoc/project-requirements.adoc index ba6ab2e6..2600c7ab 100644 --- a/src/main/adoc/project-requirements.adoc +++ b/src/main/adoc/project-requirements.adoc @@ -1,4 +1,4 @@ -= Chronicle JLBH – Software Requirements Specification += Chronicle JLBH - Software Requirements Specification Chronicle Software :revnumber: 1.0 :revdate: 23 May 2025 @@ -7,13 +7,13 @@ Chronicle Software *_Abstract_*:: Chronicle JLBH is an open-source Java library, released under the Apache Licence 2.0, that provides a high-resolution latency benchmark harness. -It allows developers to _measure_, _analyse_ and _regress-test_ latency behaviour of critical code paths “in context” rather than via isolated micro-benchmarks. +It allows developers to _measure_, _analyse_ and _regress-test_ latency behaviour of critical code paths "in context" rather than via isolated micro-benchmarks. Key features include compensation for *co-ordinated omission*, configurable throughput modelling, additional probes, optional operating-system jitter tracking, and serialisation of results for CI dashboards. == 1. Introduction === 1.1 Purpose -This document specifies functional and non-functional requirements for Chronicle JLBH, ensuring that contributors and integrators share a common understanding of the product’s goals and constraints. +This document specifies functional and non-functional requirements for Chronicle JLBH, ensuring that contributors and integrators share a common understanding of the product's goals and constraints. === 1.2 Scope The harness targets _low-latency_ Java applications (trading systems, micro-services, real-time analytics) that must quantify latencies at extreme percentiles under realistic load patterns. @@ -23,7 +23,7 @@ The harness targets _low-latency_ Java applications (trading systems, micro-serv |=== | *Term* | *Definition* -| JLBH | Java Latency Benchmark Harness – the library under specification. +| JLBH | Java Latency Benchmark Harness - the library under specification. | Co-ordinated Omission | Measurement error whereby pauses/back-pressure hide worst-case latencies. | Probe | A named histogram that records an additional timing within the benchmark workflow. | OS Jitter | Kernel or scheduler-induced timing variance measured by a background thread. @@ -32,14 +32,14 @@ The harness targets _low-latency_ Java applications (trading systems, micro-serv === 1.4 References -* Upstream repository: https://github.com/OpenHFT/JLBH[GitHub – OpenHFT/JLBH] +* Upstream repository: https://github.com/OpenHFT/JLBH[GitHub - OpenHFT/JLBH] * API reference: https://javadoc.io/doc/net.openhft/chronicle-core/latest/net/openhft/chronicle/core/jlbh/JLBH.html[Javadoc] * Tutorial series: http://www.rationaljava.com/2016/04/a-series-of-posts-on-jlbh-java-latency.html[RationalJava blog series] * Discussion of _co-ordinated omission_: https://groups.google.com/g/mechanical-sympathy/c/icNZJejUHfE[m.s. thread] * Chronicle blog post on JLBH in event loops: https://vanilla-java.github.io/2016/04/02/Microservices-in-the-Chronicle-World-Part-5.html[Vanilla-Java article] === 1.5 Overview -Sections 2–5 describe the product context, interfaces, detailed system features, and quality attributes. +Sections 2-5 describe the product context, interfaces, detailed system features, and quality attributes. Section 6 captures licensing, whilst Section 7 presents the glossary. == 2. Overall Description @@ -49,7 +49,7 @@ JLBH is delivered as a *Maven* artefact (`net.openhft:jlbh`) and depends on Chro === 2.2 Product Functions -* Generate high-resolution histograms (≥35 bits) of end-to-end sample latencies. +* Generate high-resolution histograms (>=35 bits) of end-to-end sample latencies. * Optionally compensate for *co-ordinated omission* by adjusting the synthetic start time. * Model load via configurable `throughput(int, TimeUnit)` and *LatencyDistributor*s. * Provide additional named probes via `addProbe(String)`. @@ -92,11 +92,11 @@ JLBH is delivered as a *Maven* artefact (`net.openhft:jlbh`) and depends on Chro === 3.1 Public API -* `JLBHOptions` – builder for benchmark parameters (throughput, warm-up, runs, etc.). -* `JLBHTask` – user-defined callback with lifecycle methods `init`, `run`, `warmedUp`, `runComplete`, `complete`. -* `JLBH` – orchestrator class with `start()` and `eventLoopHandler(EventLoop)` entry points. -* `JLBHResult` – immutable projection of measured data. -* `JLBHResultSerializer` – CSV writer for analytics pipelines. +* `JLBHOptions` - builder for benchmark parameters (throughput, warm-up, runs, etc.). +* `JLBHTask` - user-defined callback with lifecycle methods `init`, `run`, `warmedUp`, `runComplete`, `complete`. +* `JLBH` - orchestrator class with `start()` and `eventLoopHandler(EventLoop)` entry points. +* `JLBHResult` - immutable projection of measured data. +* `JLBHResultSerializer` - CSV writer for analytics pipelines. === 3.2 User Interface CLI usage is demonstrated in _ExampleJLBHMain_ and test fixtures; no dedicated GUI is provided. @@ -113,54 +113,54 @@ None. The harness interacts with OS timers and CPU affinity via JNI where availa == 4. System Features -=== 4.1 Latency Sampling +=== 4.1 FN-001 Latency Sampling *Description*: Collect nanosecond deltas from user code via `sample(long)` or `sampleNanos(long)`. *Stimulus/Response*: Upon each iteration, `JLBHTask.run` calls `jlbh.sample(...)`; histogram updates occur; run summary printed at completion. *Priority*: *Essential*. -=== 4.2 Coordinated-Omission Compensation +=== 4.2 FN-002 Coordinated-Omission Compensation Ensures that queue back-log does not under-represent tail latency. Enabled by default; opt-out via `accountForCoordinatedOmission(false)`. -=== 4.3 Additional Probes +=== 4.3 FN-003 Additional Probes Developers may time sub-stages (e.g., serialisation, network round-trip) through `addProbe(String)` and record them independently. -=== 4.4 OS Jitter Tracking -A background thread measures scheduling gaps exceeding a configured threshold (`recordJitterGreaterThanNs`). Results are summarised alongside core latencies. +=== 4.4 FN-004 OS Jitter Tracking +A background thread measures scheduling gaps exceeding a configured threshold (`recordJitterGreaterThanNs`). Monitoring is enabled by default and can be disabled via `recordOSJitter(false)`. Results are summarised alongside core latencies. -=== 4.5 CSV Serialisation +=== 4.5 FN-005 CSV Serialisation Post-run, results can be persisted for offline analysis (`result.csv` by default). -=== 4.6 Event Loop Integration +=== 4.6 FN-006 Event Loop Integration `eventLoopHandler(EventLoop)` allows benchmarks to operate inside Chronicle threading framework, avoiding extra threads. == 5. Non-Functional Requirements -=== 5.1 Performance +=== 5.1 NF-P-001 Performance * Overhead per sample must remain below 100 ns when no additional probes are active. -* Histogram generation must support ≥200 M iterations without heap pressure. +* Histogram generation must support >=200 M iterations without heap pressure. -=== 5.2 Reliability +=== 5.2 NF-R-001 Reliability * Harness must abort gracefully on thread interruptions or sample time-outs (`timeout(long)`). * Immutable result objects ensure thread-safe publication to external consumers. -=== 5.3 Usability +=== 5.3 NF-UX-001 Usability * Fluent builder API; sensible defaults provide a runnable benchmark in ≤10 LOC. * ASCII table outputs are human-readable and CI-friendly. -=== 5.4 Portability +=== 5.4 NF-O-001 Portability * Pure-Java codebase; no native compilation steps. * JDK-specific optimisations (e.g., *Zing* support) are runtime-detected. -=== 5.5 Maintainability +=== 5.5 NF-O-002 Maintainability * 80 %+ unit-test line coverage with deterministic fixtures. * Code adheres to Chronicle parent POM style and SonarCloud quality gates. -=== 5.6 Security +=== 5.6 NF-S-001 Security No executable deserialisation; harness operates in-process. Users remain responsible for securing benchmarked code. == 6. Licensing -The project is released under the *Apache Licence 2.0* (_see_ `LICENSE`). +The project is released under the *Apache Licence 2.0* (_see_ `LICENSE.adoc`). Downstream consumers must preserve licence notices and may include JLBH in commercial or OSS products, subject to the terms therein. == 7. Glossary @@ -169,7 +169,7 @@ Co-ordinated Omission:: Statistical artefact causing under-reporting of worst-ca Histogram:: Data structure that records frequency counts in logarithmic buckets, enabling percentile extraction. Percentile:: Value below which a given percentage of observations fall (e.g., 99th percentile). -== 8. Appendix A – Example Minimal Benchmark +== 8. Appendix A - Example Minimal Benchmark [source,java] ---- @@ -189,10 +189,10 @@ public class NothingBenchmark implements JLBHTask { } ---- -== 9. Appendix B – Footnotes +== 9. Appendix B - Footnotes -* JLBH originated within the Chronicle Software open-source stack and is actively maintained.footnote:[Project home — GitHub repository:contentReference[oaicite:1]] -* The API reference highlights the focus on _co-ordinated omission_ and event-loop support.footnote:[Javadoc for class `JLBH`:contentReference[oaicite:2]] -* A multi-part blog series gives practical guidance on designing realistic latency tests.footnote:[RationalJava series:contentReference[oaicite:3]] -* Vanilla-Java’s micro-services article demonstrates embedding JLBH in an event-driven architecture.footnote:[Vanilla-Java blog:contentReference[oaicite:4]] -* Original discussion of co-ordinated omission by Gil Tene motivates JLBH’s default settings.footnote:[Mechanical Sympathy thread:contentReference[oaicite:5]] +* JLBH originated within the Chronicle Software open-source stack and is actively maintained. See https://github.com/OpenHFT/JLBH. +* The API reference highlights the focus on _co-ordinated omission_ and event-loop support. See https://www.javadoc.io/doc/net.openhft/chronicle-core/latest/net/openhft/chronicle/jlbh/JLBH.html. +* A multi-part blog series gives practical guidance on designing realistic latency tests. See http://www.rationaljava.com/2016/04/a-series-of-posts-on-jlbh-java-latency.html. +* Vanilla-Java's micro-services article demonstrates embedding JLBH in an event-driven architecture. See https://vanilla-java.github.io/2016/04/02/Microservices-in-the-Chronicle-World-Part-5.html. +* Original discussion of co-ordinated omission by Gil Tene motivates JLBH's default settings. See https://groups.google.com/g/mechanical-sympathy/c/icNZJejUHfE. diff --git a/src/main/adoc/results-interpretation-guide.adoc b/src/main/adoc/results-interpretation-guide.adoc index 92c2e1e1..e2886473 100644 --- a/src/main/adoc/results-interpretation-guide.adoc +++ b/src/main/adoc/results-interpretation-guide.adoc @@ -66,7 +66,7 @@ Co-ordinated Omission is a critical concept in latency measurement, especially f JLBH can optionally measure Operating System (OS) jitter, which refers to delays caused by the OS scheduler or other system activities interrupting the benchmark process. * **Mechanism**: -** When `recordOSJitter = true` in `JLBHOptions`, a dedicated background thread (`OSJitterMonitor`) is started. +** Jitter monitoring is enabled by default (`recordOSJitter` defaults to true); a dedicated background thread (`OSJitterMonitor`) starts unless you explicitly disable it with `recordOSJitter(false)`. ** This thread repeatedly calls `System.nanoTime()` in a tight loop and measures the time difference between consecutive calls. ** If this difference (a "gap") exceeds a configurable threshold (`recordJitterGreaterThanNs`, default 1 microsecond), it's recorded as an OS jitter event in a separate "OS Jitter" histogram. * **Interpreting Jitter Output**: diff --git a/src/test/java/net/openhft/chronicle/jlbh/JLBHAdditionalCoverageTest.java b/src/test/java/net/openhft/chronicle/jlbh/JLBHAdditionalCoverageTest.java new file mode 100644 index 00000000..1b5bae2c --- /dev/null +++ b/src/test/java/net/openhft/chronicle/jlbh/JLBHAdditionalCoverageTest.java @@ -0,0 +1,185 @@ +package net.openhft.chronicle.jlbh; + +import net.openhft.chronicle.core.util.NanoSampler; +import org.jetbrains.annotations.NotNull; +import org.junit.Test; + +import java.io.ByteArrayOutputStream; +import java.io.PrintStream; +import java.lang.reflect.Method; +import java.util.concurrent.TimeUnit; +import java.util.concurrent.atomic.AtomicInteger; + +import static org.junit.Assert.*; + +public class JLBHAdditionalCoverageTest { + /** + * Exercises {@link JLBH#abort()} to ensure the running harness can be stopped safely. + */ + @Test + public void shouldAbortWhenRequested() { + AbortOnRunTask task = new AbortOnRunTask(); + JLBHOptions options = newHarness(task) + .warmUpIterations(2) + .iterations(20) + .runs(5); + JLBH jlbh = new JLBH(options, silentPrintStream(), JLBHResultConsumer.newThreadSafeInstance()); + + jlbh.start(); + + assertTrue("abort was not triggered", task.abortCount() > 0); + assertTrue("task should finish quickly after abort", task.totalRuns() < 5 * 20); + } + + /** + * Verifies that configuring a timeout starts the watchdog thread without error. + */ + @Test + public void shouldStartTimeoutCheckerWhenTimeoutConfigured() { + CountingTask task = new CountingTask(); + JLBHOptions options = newHarness(task) + .warmUpIterations(1) + .iterations(5) + .runs(1) + .timeout(1) + .recordOSJitter(false); + JLBH jlbh = new JLBH(options, silentPrintStream(), JLBHResultConsumer.newThreadSafeInstance()); + + jlbh.start(); + + assertTrue("samples should have been recorded", task.totalRuns() >= 1); + } + + /** + * Ensures {@link JLBH#eventLoopHandler(net.openhft.chronicle.core.threads.EventLoop)} rejects use when + * coordinated omission compensation is disabled. + */ + @Test(expected = UnsupportedOperationException.class) + public void shouldRejectEventLoopWhenCoordinatedOmissionDisabled() { + JLBHOptions options = newHarness(new NoOpTask()) + .accountForCoordinatedOmission(false) + .recordOSJitter(false); + JLBH jlbh = new JLBH(options, silentPrintStream(), null); + jlbh.eventLoopHandler(null); + } + + /** + * Covers helper methods that format percentile output. + */ + @Test + public void shouldFormatRunSummaries() throws Exception { + JLBHOptions options = newHarness(new NoOpTask()); + JLBH jlbh = new JLBH(options, silentPrintStream(), null); + Method addPr = JLBH.class.getDeclaredMethod("addPrToPrint", StringBuilder.class, String.class, int.class); + addPr.setAccessible(true); + StringBuilder sb = new StringBuilder(); + addPr.invoke(jlbh, sb, "99.9: ", 2); + assertEquals("99.9: %12.2f %12.2f %12.2f%n", sb.toString()); + + Method header = JLBH.class.getDeclaredMethod("generateRunSummaryHeader", int.class); + header.setAccessible(true); + assertEquals("Percentile run1 run2 % Variation", header.invoke(jlbh, 2)); + + Method unit = JLBH.class.getDeclaredMethod("timeUnitToString", TimeUnit.class); + unit.setAccessible(true); + assertEquals("ns", unit.invoke(jlbh, TimeUnit.NANOSECONDS)); + assertEquals("us", unit.invoke(jlbh, TimeUnit.MICROSECONDS)); + assertEquals("ms", unit.invoke(jlbh, TimeUnit.MILLISECONDS)); + assertEquals("s", unit.invoke(jlbh, TimeUnit.SECONDS)); + assertEquals("min", unit.invoke(jlbh, TimeUnit.MINUTES)); + assertEquals("h", unit.invoke(jlbh, TimeUnit.HOURS)); + assertEquals("day", unit.invoke(jlbh, TimeUnit.DAYS)); + } + + private static JLBHOptions newHarness(JLBHTask task) { + return new JLBHOptions() + .warmUpIterations(1) + .iterations(1) + .throughput(10_000) + .runs(1) + .recordOSJitter(false) + .jlbhTask(task); + } + + private static PrintStream silentPrintStream() { + return new PrintStream(new ByteArrayOutputStream()); + } + + private static final class AbortOnRunTask implements JLBHTask { + private JLBH jlbh; + private final AtomicInteger count = new AtomicInteger(); + private final AtomicInteger abortCount = new AtomicInteger(); + private final AtomicInteger totalRuns = new AtomicInteger(); + + @Override + public void init(JLBH jlbh) { + this.jlbh = jlbh; + } + + @Override + public void run(long startTimeNs) { + int current = count.incrementAndGet(); + long duration = Math.max(0, System.nanoTime() - startTimeNs); + jlbh.sample(duration); + if (current == 10) { + abortCount.incrementAndGet(); + jlbh.abort(); + } + totalRuns.incrementAndGet(); + } + + @Override + public void complete() { + // no-op + } + + int abortCount() { + return abortCount.get(); + } + + int totalRuns() { + return totalRuns.get(); + } + } + + private static final class CountingTask implements JLBHTask { + private final AtomicInteger totalRuns = new AtomicInteger(); + private JLBH harness; + private NanoSampler sampler; + + @Override + public void init(JLBH jlbh) { + this.harness = jlbh; + this.sampler = jlbh.addProbe("count"); + } + + @Override + public void run(long startTimeNs) { + totalRuns.incrementAndGet(); + long duration = System.nanoTime() - startTimeNs; + harness.sample(duration); + sampler.sampleNanos(duration); + } + + @Override + public void complete() { + // no-op + } + + int totalRuns() { + return totalRuns.get(); + } + } + + private static final class NoOpTask implements JLBHTask { + @Override + public void init(JLBH jlbh) { + // no-op + } + + @Override + public void run(long startTimeNs) { + // no-op + } + } +} diff --git a/src/test/java/net/openhft/chronicle/jlbh/JLBHOptionsTest.java b/src/test/java/net/openhft/chronicle/jlbh/JLBHOptionsTest.java new file mode 100644 index 00000000..5d1c666d --- /dev/null +++ b/src/test/java/net/openhft/chronicle/jlbh/JLBHOptionsTest.java @@ -0,0 +1,83 @@ +package net.openhft.chronicle.jlbh; + +import net.openhft.affinity.AffinityLock; +import org.junit.Test; + +import java.lang.reflect.Field; +import java.util.concurrent.TimeUnit; +import java.util.function.Supplier; + +import static org.junit.Assert.*; + +public class JLBHOptionsTest { + + @Test + public void shouldApplyAllConfigurationOptions() throws Exception { + JLBHOptions options = new JLBHOptions(); + LatencyDistributor distributor = averageLatencyNS -> averageLatencyNS * 2; + Supplier customSupplier = () -> null; + JLBHTask task = new NoOpTask(); + + options.throughput(42) + .throughput(84, TimeUnit.MILLISECONDS) + .latencyDistributor(distributor) + .accountForCoordinatedOmission(false) + .recordJitterGreaterThanNs(12) + .recordOSJitter(false) + .warmUpIterations(123) + .runs(5) + .iterations(200) + .iterations(1_000_000L) + .jlbhTask(task) + .pauseAfterWarmupMS(77) + .skipFirstRun(true) + .jitterAffinity(true) + .acquireLock(customSupplier) + .timeout(9876L); + + assertEquals(84, getField(options, "throughput", Integer.class).intValue()); + assertEquals(TimeUnit.MILLISECONDS, getField(options, "throughputTimeUnit", TimeUnit.class)); + assertSame(distributor, getField(options, "latencyDistributor", LatencyDistributor.class)); + assertFalse(getField(options, "accountForCoordinatedOmission", Boolean.class)); + assertEquals(12, getField(options, "recordJitterGreaterThanNs", Integer.class).intValue()); + assertFalse(getField(options, "recordOSJitter", Boolean.class)); + assertEquals(123, getField(options, "warmUpIterations", Integer.class).intValue()); + assertEquals(5, getField(options, "runs", Integer.class).intValue()); + assertEquals(1_000_000L, getField(options, "iterations", Long.class).longValue()); + assertSame(task, getField(options, "jlbhTask", JLBHTask.class)); + assertEquals(77, getField(options, "pauseAfterWarmupMS", Integer.class).intValue()); + assertEquals(JLBHOptions.SKIP_FIRST_RUN.SKIP, getField(options, "skipFirstRun", JLBHOptions.SKIP_FIRST_RUN.class)); + assertTrue(getField(options, "jitterAffinity", Boolean.class)); + assertSame(customSupplier, getField(options, "acquireLock", Supplier.class)); + assertEquals(9876L, getField(options, "timeout", Long.class).longValue()); + + String printable = options.toString(); + assertTrue(printable.contains("runs=5")); + assertTrue(printable.contains("iterations=1000000")); + assertTrue(printable.contains("latencyDistributor=")); + } + + @Test + public void shouldRespectSkipFirstRunFalse() throws Exception { + JLBHOptions options = new JLBHOptions().skipFirstRun(false); + assertEquals(JLBHOptions.SKIP_FIRST_RUN.NO_SKIP, getField(options, "skipFirstRun", JLBHOptions.SKIP_FIRST_RUN.class)); + } + + private static T getField(JLBHOptions options, String name, Class type) throws Exception { + Field field = JLBHOptions.class.getDeclaredField(name); + field.setAccessible(true); + return type.cast(field.get(options)); + } + + private static final class NoOpTask implements JLBHTask { + @Override + public void init(JLBH jlbh) { + // no-op + } + + @Override + public void run(long startTimeNs) { + // no-op + } + } +} diff --git a/src/test/java/net/openhft/chronicle/jlbh/util/JLBHResultSerializerTest.java b/src/test/java/net/openhft/chronicle/jlbh/util/JLBHResultSerializerTest.java new file mode 100644 index 00000000..4db78357 --- /dev/null +++ b/src/test/java/net/openhft/chronicle/jlbh/util/JLBHResultSerializerTest.java @@ -0,0 +1,199 @@ +package net.openhft.chronicle.jlbh.util; + +import net.openhft.chronicle.jlbh.JLBHResult; +import org.junit.Rule; +import org.junit.Test; +import org.junit.rules.TemporaryFolder; + +import java.io.IOException; +import java.nio.file.Files; +import java.nio.file.Path; +import java.nio.file.Paths; +import java.time.Duration; +import java.util.*; + +import static org.junit.Assert.*; + +public class JLBHResultSerializerTest { + + @Rule + public final TemporaryFolder tmp = new TemporaryFolder(); + + private static final Duration P50 = Duration.ofNanos(100); + private static final Duration P90 = Duration.ofNanos(200); + private static final Duration P99 = Duration.ofNanos(300); + private static final Duration P999 = Duration.ofNanos(400); + private static final Duration WORST = Duration.ofNanos(600); + + @Test + public void shouldWriteSelectedProbesWithoutOsJitter() throws IOException { + FakeRunResult endToEnd = new FakeRunResult(P50, P90, P99, P999, null, WORST); + FakeRunResult probe = new FakeRunResult(P50.multipliedBy(2), P90, P99, P999, null, WORST); + FakeResult result = new FakeResult(endToEnd, Map.of("TheProbe", probe), Optional.empty()); + + Path out = tmp.newFile("subset.csv").toPath(); + JLBHResultSerializer.runResultToCSV(result, out.toString(), Collections.singletonList("TheProbe"), false); + + List lines = Files.readAllLines(out); + assertEquals(3, lines.size()); + assertEquals(",50th p-le,90th p-le,99th p-le,999th p-le,9999th p-le,Worst,", lines.get(0)); + assertEquals("endToEnd,100,200,300,400,,600,", lines.get(1)); + assertEquals("TheProbe,200,200,300,400,,600,", lines.get(2)); + } + + @Test + public void shouldIncludeOsJitterAndAdditionalProbes() throws IOException { + FakeRunResult endToEnd = new FakeRunResult(P50, P90, P99, P999, Duration.ofNanos(500), WORST); + FakeRunResult probe = new FakeRunResult(P50, P90.multipliedBy(2), P99, P999, Duration.ofNanos(700), WORST); + FakeRunResult osJitter = new FakeRunResult(Duration.ofNanos(10), Duration.ofNanos(20), Duration.ofNanos(30), + Duration.ofNanos(40), Duration.ofNanos(50), Duration.ofNanos(60)); + + FakeResult result = new FakeResult(endToEnd, Map.of("Custom", probe), Optional.of(osJitter)); + + Path out = tmp.newFile("full.csv").toPath(); + JLBHResultSerializer.runResultToCSV(result, out.toString(), Arrays.asList("Custom", "Missing"), true); + + List lines = Files.readAllLines(out); + assertEquals(4, lines.size()); + assertEquals("endToEnd,100,200,300,400,500,600,", lines.get(1)); + assertEquals("Custom,100,400,300,400,700,600,", lines.get(2)); + assertEquals("OSJitter,10,20,30,40,50,60,", lines.get(3)); + } + + @Test + public void shouldDefaultToResultCsvInWorkingDirectory() throws IOException { + FakeRunResult runResult = new FakeRunResult(P50, P90, P99, P999, Duration.ofNanos(500), WORST); + FakeResult result = new FakeResult(runResult, Map.of("Probe", runResult), Optional.of(runResult)); + + Path output = Paths.get(JLBHResultSerializer.RESULT_CSV); + Files.deleteIfExists(output); + JLBHResultSerializer.runResultToCSV(result); + assertTrue(Files.exists(output)); + List lines = Files.readAllLines(output); + assertEquals(4, lines.size()); + Files.deleteIfExists(output); + } + + private static final class FakeResult implements JLBHResult { + private final ProbeResult endToEnd; + private final Map additionalProbes; + private final Optional osJitter; + + private FakeResult(JLBHResult.RunResult endToEnd, + Map probes, + Optional osJitter) { + this.endToEnd = new FakeProbeResult(endToEnd); + Map probeResults = new HashMap<>(); + for (Map.Entry entry : probes.entrySet()) { + probeResults.put(entry.getKey(), new FakeProbeResult(entry.getValue())); + } + this.additionalProbes = probeResults; + this.osJitter = osJitter.map(FakeProbeResult::new); + } + + @Override + public ProbeResult endToEnd() { + return endToEnd; + } + + @Override + public Optional probe(String probeName) { + return Optional.ofNullable(additionalProbes.get(probeName)); + } + + @Override + public Set probeNames() { + return additionalProbes.keySet(); + } + + @Override + public Optional osJitter() { + return osJitter; + } + } + + private static final class FakeProbeResult implements JLBHResult.ProbeResult { + private final JLBHResult.RunResult runResult; + + private FakeProbeResult(JLBHResult.RunResult runResult) { + this.runResult = runResult; + } + + @Override + public JLBHResult.RunResult summaryOfLastRun() { + return runResult; + } + + @Override + public List eachRunSummary() { + return Collections.singletonList(runResult); + } + } + + private static final class FakeRunResult implements JLBHResult.RunResult { + private final Duration p50; + private final Duration p90; + private final Duration p99; + private final Duration p999; + private final Duration p9999; + private final Duration worst; + + private FakeRunResult(Duration p50, + Duration p90, + Duration p99, + Duration p999, + Duration p9999, + Duration worst) { + this.p50 = p50; + this.p90 = p90; + this.p99 = p99; + this.p999 = p999; + this.p9999 = p9999; + this.worst = worst; + } + + @Override + public Map percentiles() { + EnumMap map = new EnumMap<>(Percentile.class); + map.put(Percentile.PERCENTILE_50TH, p50); + map.put(Percentile.PERCENTILE_90TH, p90); + map.put(Percentile.PERCENTILE_99TH, p99); + map.put(Percentile.PERCENTILE_99_9TH, p999); + map.put(Percentile.WORST, worst); + if (p9999 != null) { + map.put(Percentile.PERCENTILE_99_99TH, p9999); + } + return map; + } + + @Override + public Duration get50thPercentile() { + return p50; + } + + @Override + public Duration get90thPercentile() { + return p90; + } + + @Override + public Duration get99thPercentile() { + return p99; + } + + @Override + public Duration get999thPercentile() { + return p999; + } + + @Override + public Duration get9999thPercentile() { + return p9999; + } + + @Override + public Duration getWorst() { + return worst; + } + } +} From b7c7439da553d67e3cd881a4cb42f49b85a546c3 Mon Sep 17 00:00:00 2001 From: Peter Lawrey Date: Sat, 25 Oct 2025 08:21:16 +0100 Subject: [PATCH 02/12] Replace Map.of in serializer test --- .../chronicle/jlbh/util/JLBHResultSerializerTest.java | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/src/test/java/net/openhft/chronicle/jlbh/util/JLBHResultSerializerTest.java b/src/test/java/net/openhft/chronicle/jlbh/util/JLBHResultSerializerTest.java index 4db78357..18acfe37 100644 --- a/src/test/java/net/openhft/chronicle/jlbh/util/JLBHResultSerializerTest.java +++ b/src/test/java/net/openhft/chronicle/jlbh/util/JLBHResultSerializerTest.java @@ -29,7 +29,7 @@ public class JLBHResultSerializerTest { public void shouldWriteSelectedProbesWithoutOsJitter() throws IOException { FakeRunResult endToEnd = new FakeRunResult(P50, P90, P99, P999, null, WORST); FakeRunResult probe = new FakeRunResult(P50.multipliedBy(2), P90, P99, P999, null, WORST); - FakeResult result = new FakeResult(endToEnd, Map.of("TheProbe", probe), Optional.empty()); + FakeResult result = new FakeResult(endToEnd, Collections.singletonMap("TheProbe", probe), Optional.empty()); Path out = tmp.newFile("subset.csv").toPath(); JLBHResultSerializer.runResultToCSV(result, out.toString(), Collections.singletonList("TheProbe"), false); @@ -48,7 +48,7 @@ public void shouldIncludeOsJitterAndAdditionalProbes() throws IOException { FakeRunResult osJitter = new FakeRunResult(Duration.ofNanos(10), Duration.ofNanos(20), Duration.ofNanos(30), Duration.ofNanos(40), Duration.ofNanos(50), Duration.ofNanos(60)); - FakeResult result = new FakeResult(endToEnd, Map.of("Custom", probe), Optional.of(osJitter)); + FakeResult result = new FakeResult(endToEnd, Collections.singletonMap("Custom", probe), Optional.of(osJitter)); Path out = tmp.newFile("full.csv").toPath(); JLBHResultSerializer.runResultToCSV(result, out.toString(), Arrays.asList("Custom", "Missing"), true); @@ -63,7 +63,7 @@ public void shouldIncludeOsJitterAndAdditionalProbes() throws IOException { @Test public void shouldDefaultToResultCsvInWorkingDirectory() throws IOException { FakeRunResult runResult = new FakeRunResult(P50, P90, P99, P999, Duration.ofNanos(500), WORST); - FakeResult result = new FakeResult(runResult, Map.of("Probe", runResult), Optional.of(runResult)); + FakeResult result = new FakeResult(runResult, Collections.singletonMap("Probe", runResult), Optional.of(runResult)); Path output = Paths.get(JLBHResultSerializer.RESULT_CSV); Files.deleteIfExists(output); From 141599076cbe9feac2de9916e302fe5c9e39d7a1 Mon Sep 17 00:00:00 2001 From: Peter Lawrey Date: Sat, 25 Oct 2025 08:31:35 +0100 Subject: [PATCH 03/12] Enable compiler lint warnings --- pom.xml | 25 +++++++++++++++++++ .../jlbh/JLBHAdditionalCoverageTest.java | 1 - 2 files changed, 25 insertions(+), 1 deletion(-) diff --git a/pom.xml b/pom.xml index 35ade1c4..56dc8cb1 100644 --- a/pom.xml +++ b/pom.xml @@ -120,6 +120,19 @@ + + org.apache.maven.plugins + maven-compiler-plugin + + true + false + + -Xlint:all + -parameters + + + + + + + + + + JLBH-OPS-201: CLI mode preserves historical behaviour (stack traces, exit-on-error, exposing percentile arrays). Refactor to soft-fail tracked under JLBH-OPS-260. + + + + + + + JLBH-OPS-202: Pseudo-random generator only seeds synthetic load; cryptographic strength unnecessary. + + + + + + + JLBH-OPS-203: Deterministic jitter keeps benchmarks reproducible; no attacker input. + + + + + + + JLBH-OPS-204: Caller is the benchmark harness itself; returning backing arrays avoids millions of allocations during tight loops. + + + + + + + JLBH-OPS-205: Paths come from operator scripts; RuntimeException propagates IO failures without changing the established API. + + + + + + + + + diff --git a/src/main/java/net/openhft/chronicle/jlbh/JLBH.java b/src/main/java/net/openhft/chronicle/jlbh/JLBH.java index 9170eb44..03a95cc4 100644 --- a/src/main/java/net/openhft/chronicle/jlbh/JLBH.java +++ b/src/main/java/net/openhft/chronicle/jlbh/JLBH.java @@ -35,6 +35,7 @@ import java.util.concurrent.ConcurrentSkipListMap; import java.util.concurrent.TimeUnit; import java.util.concurrent.atomic.AtomicBoolean; +import java.util.concurrent.atomic.AtomicLong; import java.util.function.Consumer; import java.util.stream.Collectors; @@ -93,8 +94,7 @@ public class JLBH implements NanoSampler { private final AtomicBoolean abortTestRun = new AtomicBoolean(); private final long mod; private final long length; - // Todo: Remove all concurrent constructs such as volatile and AtomicBoolean - private volatile long noResultsReturned; + private final AtomicLong sampleCount = new AtomicLong(); //Use non-atomic when so thread synchronisation is necessary private boolean warmedUp; private volatile Thread testThread; @@ -146,8 +146,9 @@ public JLBH(@NotNull JLBHOptions jlbhOptions, @NotNull PrintStream printStream, : jlbhOptions.iterations > 50_000_000 ? 20_000_000_000L : jlbhOptions.iterations > 10_000_000 ? 10_000_000_000L : 5_000_000_000L; - long mod2; - for (mod2 = 1000; mod2 <= jlbhOptions.iterations / 200; mod2 *= 10) { + long mod2 = 1000; + while (mod2 <= jlbhOptions.iterations / 200) { + mod2 *= 10; } this.mod = mod2; } @@ -339,7 +340,7 @@ private void startTimeoutCheckerIfRequired() { private void waitForWarmupToComplete(long warmupStart) { while (!warmUpComplete.get()) { Jvm.pause(500); - printStream.println("Complete: " + noResultsReturned); + printStream.println("Complete: " + sampleCount.get()); if (testThread.isInterrupted()) { return; } @@ -445,7 +446,7 @@ private void endOfRun(int run, long runStart) { jlbhOptions.jlbhTask.runComplete(); - noResultsReturned = 0; + sampleCount.set(0); additionHistograms.values().forEach(Histogram::reset); endToEndHistogram.reset(); osJitterMonitor.reset(); @@ -458,8 +459,9 @@ private void checkSampleTimeout() { while (true) { Jvm.pause(TimeUnit.SECONDS.toMillis(10)); - if (previousSampleCount < noResultsReturned) { - previousSampleCount = noResultsReturned; + long current = sampleCount.get(); + if (previousSampleCount < current) { + previousSampleCount = current; previousSampleTime = System.currentTimeMillis(); } else { if (previousSampleTime < (System.currentTimeMillis() - jlbhOptions.timeout)) { @@ -659,12 +661,12 @@ public void sampleNanos(long durationNs) { * @param durationNs latency in nanoseconds */ public void sample(long durationNs) { - noResultsReturned++; - if (noResultsReturned < jlbhOptions.warmUpIterations && !warmedUp) { + long current = sampleCount.incrementAndGet(); + if (current < jlbhOptions.warmUpIterations && !warmedUp) { endToEndHistogram.sample(durationNs); return; } - if (noResultsReturned == jlbhOptions.warmUpIterations && !warmedUp) { + if (current == jlbhOptions.warmUpIterations && !warmedUp) { warmedUp = true; endToEndHistogram.reset(); if (!additionHistograms.isEmpty()) { diff --git a/src/main/java/net/openhft/chronicle/jlbh/TeamCityHelper.java b/src/main/java/net/openhft/chronicle/jlbh/TeamCityHelper.java index 990bf1eb..6431f734 100644 --- a/src/main/java/net/openhft/chronicle/jlbh/TeamCityHelper.java +++ b/src/main/java/net/openhft/chronicle/jlbh/TeamCityHelper.java @@ -110,7 +110,7 @@ private static void printPercentiles(@NotNull String s, private static void printPercentiles(@NotNull String s, @NotNull PrintStream printStream, double[] percentages, double[] values) { PercentileSummary summary = new PercentileSummary(false, Collections.singletonList(values), percentages); String extra = Jvm.isAzulZing() ? ".zing" : Jvm.isJava15Plus() ? ".java17" : ""; - summary.forEachRow(((percentile, rowValues, variance) -> - printStream.println("##teamcity[buildStatisticValue key='" + s + "." + percentile + extra + "' value='" + rowValues[0] + "']"))); + summary.forEachRow((percentile, rowValues, variance) -> + printStream.println("##teamcity[buildStatisticValue key='" + s + "." + percentile + extra + "' value='" + rowValues[0] + "']")); } } diff --git a/src/main/java/net/openhft/chronicle/jlbh/util/JLBHResultSerializer.java b/src/main/java/net/openhft/chronicle/jlbh/util/JLBHResultSerializer.java index 453b9519..6f030fd1 100644 --- a/src/main/java/net/openhft/chronicle/jlbh/util/JLBHResultSerializer.java +++ b/src/main/java/net/openhft/chronicle/jlbh/util/JLBHResultSerializer.java @@ -5,9 +5,10 @@ import java.io.BufferedWriter; import java.io.IOException; -import java.io.PrintWriter; import java.io.Writer; +import java.nio.charset.StandardCharsets; import java.nio.file.Files; +import java.nio.file.Path; import java.nio.file.Paths; import java.time.Duration; import java.util.Collections; @@ -95,7 +96,8 @@ public static void runResultToCSV(JLBHResult jlbhResult, String fileName, String * @throws IOException if the file cannot be written */ public static void runResultToCSV(JLBHResult jlbhResult, String fileName, Iterable namesOfProbes, boolean includeOSJitter) throws IOException { - try (Writer pw = new BufferedWriter(new PrintWriter(Files.newOutputStream(Paths.get(fileName))))) { + Path output = Paths.get(fileName); + try (BufferedWriter pw = Files.newBufferedWriter(output, StandardCharsets.UTF_8)) { writeHeader(pw); JLBHResult.ProbeResult probeResult = jlbhResult.endToEnd(); From bfc4c58f92e6c54cd7bcf44e56b02749fd4f2a2b Mon Sep 17 00:00:00 2001 From: Peter Lawrey Date: Mon, 27 Oct 2025 14:12:17 +0000 Subject: [PATCH 06/12] Fix formatting issues in JLBH.java and remove unused maven-compiler-plugin configuration from pom.xml --- pom.xml | 13 ------------- src/main/java/net/openhft/chronicle/jlbh/JLBH.java | 6 +++--- 2 files changed, 3 insertions(+), 16 deletions(-) diff --git a/pom.xml b/pom.xml index 7501444f..830b3352 100644 --- a/pom.xml +++ b/pom.xml @@ -129,19 +129,6 @@ - - org.apache.maven.plugins - maven-compiler-plugin - - true - false - - -Xlint:all - -parameters - - - - - + 4.0.0 @@ -38,7 +39,7 @@ 0.70 1.23ea6 - + jlbh 1.27ea2-SNAPSHOT OpenHFT/JLBH @@ -129,19 +130,6 @@ - - org.apache.maven.plugins - maven-compiler-plugin - - true - false - - -Xlint:all - -parameters - - - - E[Cleanup]; ---- -A typical JLBH (Java Latency Benchmark Harness) execution follows a distinct lifecycle designed to ensure accurate and repeatable latency measurements. The key phases are: +A typical JLBH (Java Latency Benchmark Harness) execution follows a distinct lifecycle designed to ensure accurate and repeatable latency measurements. +The key phases are: == Setup This initial phase prepares the benchmark environment: -* **JLBH Configuration**: The user instantiates `JLBHOptions` to define all benchmark parameters (e.g., iterations, throughput, runs, coordinated omission settings). An instance of `JLBH` is then created using these options. +* **JLBH Configuration**: The user instantiates `JLBHOptions` to define all benchmark parameters (e.g., iterations, throughput, runs, coordinated omission settings). +An instance of `JLBH` is then created using these options. * **Task Initialization**: The `JLBHTask.init(JLBH jlbh)` method of the user-provided task is invoked. ** This allows the benchmark task to perform its own one-time setup, such as initializing resources or fixtures. ** Crucially, this is where additional measurement probes can be registered by calling `jlbh.addProbe("probeName")` to get `NanoSampler` instances for specific sub-sections of the task. @@ -30,24 +32,29 @@ This initial phase prepares the benchmark environment: Before any measurements are formally recorded, the harness executes a warm-up phase: -* **Warmup Iterations**: The `JLBHTask.run(long startTimeNs)` method is called `warmUpIterations` times (as specified in `JLBHOptions`). The `startTimeNs` passed is typically just `System.nanoTime()` for this phase. +* **Warmup Iterations**: The `JLBHTask.run(long startTimeNs)` method is called `warmUpIterations` times (as specified in `JLBHOptions`). +The `startTimeNs` passed is typically just `System.nanoTime()` for this phase. ** The primary goal is to allow the Java Virtual Machine (JVM) to perform Just-In-Time (JIT) compilation and optimize the benchmarked code paths. ** It also helps in bringing caches to a "warm" state and letting the system reach a more stable performance profile. -* **Sample Discarding**: Latency samples generated during the warmup phase are usually recorded into the histograms but are *not* part of the final reported results. The histograms are reset after this phase. +* **Sample Discarding**: Latency samples generated during the warmup phase are usually recorded into the histograms but are *not* part of the final reported results. +The histograms are reset after this phase. * **Task Notification**: After all warmup iterations are complete, the `JLBHTask.warmedUp()` method is called. * **Optional Pause**: If `JLBHOptions.pauseAfterWarmupMS` is greater than zero, JLBH will pause for the specified duration before proceeding to the execution phase. == Execution (Measurement Runs) -This is the core phase where timed iterations are performed and latency data is collected. This phase consists of one or more "runs" (as configured by `JLBHOptions.runs`): +This is the core phase where timed iterations are performed and latency data is collected. +This phase consists of one or more "runs" (as configured by `JLBHOptions.runs`): * **Per Run**: ** The following steps are repeated for each configured run. ** **Iteration Loop**: The `JLBH` orchestrator executes a loop for the number of `iterations` specified in `JLBHOptions`. -*** **Start Time Calculation**: For each iteration, JLBH calculates an ideal `startTimeNs`. This calculation considers: +*** **Start Time Calculation**: For each iteration, JLBH calculates an ideal `startTimeNs`. +This calculation considers: **** The configured `throughput` (e.g., messages per second). **** The `LatencyDistributor` strategy, if one is set, to potentially vary the inter-iteration delay. -**** If `accountForCoordinatedOmission` is true (the default), `startTimeNs` represents the ideal scheduled time for the operation. JLBH will then busy-wait (spin) until `System.nanoTime()` reaches this `startTimeNs`, ensuring the task is dispatched as close as possible to its intended schedule, thus accounting for delays that might otherwise be missed. +**** If `accountForCoordinatedOmission` is true (the default), `startTimeNs` represents the ideal scheduled time for the operation. +JLBH will then busy-wait (spin) until `System.nanoTime()` reaches this `startTimeNs`, ensuring the task is dispatched as close as possible to its intended schedule, thus accounting for delays that might otherwise be missed. *** **Task Invocation**: The `JLBHTask.run(startTimeNs)` method is invoked with the calculated (and potentially waited-for) `startTimeNs`. *** **Sample Recording**: **** Within the `JLBHTask.run()` method, the user's code is executed. @@ -65,8 +72,10 @@ This is the core phase where timed iterations are performed and latency data is After all measurement runs are completed, the aggregated results are finalized and made available: * **Final Summary**: A comprehensive summary, often comparing results across all runs and highlighting percentile variations, is printed to the `PrintStream`. -* **Result Object Creation**: The final, immutable `JLBHResult` object is constructed. This object encapsulates all recorded data for all probes across all runs. -* **Result Consumption**: If a `JLBHResultConsumer` was provided during JLBH setup, its `accept(JLBHResult)` method is called, passing the `JLBHResult` object. This allows for programmatic access to the detailed results, for example, for assertion in automated tests or for custom serialization. +* **Result Object Creation**: The final, immutable `JLBHResult` object is constructed. +This object encapsulates all recorded data for all probes across all runs. +* **Result Consumption**: If a `JLBHResultConsumer` was provided during JLBH setup, its `accept(JLBHResult)` method is called, passing the `JLBHResult` object. +This allows for programmatic access to the detailed results, for example, for assertion in automated tests or for custom serialization. * **External Tools**: Users can further process the `JLBHResult` using utilities like: ** `JLBHResultSerializer` to write results to a CSV file for analysis in spreadsheets or other tools. ** `TeamCityHelper` to output statistics in a format suitable for TeamCity CI server integration. diff --git a/src/main/adoc/decision-log.adoc b/src/main/adoc/decision-log.adoc index 1113684c..9e14c9b0 100644 --- a/src/main/adoc/decision-log.adoc +++ b/src/main/adoc/decision-log.adoc @@ -5,7 +5,8 @@ Chronicle Software :toc: :lang: en-GB -This document records key architectural and project choices made during the development of Chronicle JLBH. Entries follow the Decision Record Template so that contributors can trace why a change occurred. +This document records key architectural and project choices made during the development of Chronicle JLBH. +Entries follow the Decision Record Template so that contributors can trace why a change occurred. == Decisions diff --git a/src/main/adoc/project-requirements.adoc b/src/main/adoc/project-requirements.adoc index 887b1336..5050f001 100644 --- a/src/main/adoc/project-requirements.adoc +++ b/src/main/adoc/project-requirements.adoc @@ -14,9 +14,11 @@ Key features include compensation for *co-ordinated omission*, configurable thro == Introduction === Purpose + This document specifies functional and non-functional requirements for Chronicle JLBH, ensuring that contributors and integrators share a common understanding of the product's goals and constraints. === Scope + The harness targets _low-latency_ Java applications (trading systems, micro-services, real-time analytics) that must quantify latencies at extreme percentiles under realistic load patterns. === Definitions, Acronyms and Abbreviations @@ -40,13 +42,16 @@ The harness targets _low-latency_ Java applications (trading systems, micro-serv * Chronicle blog post on JLBH in event loops: https://vanilla-java.github.io/2016/04/02/Microservices-in-the-Chronicle-World-Part-5.html[Vanilla-Java article] === Overview + Sections 2-5 describe the product context, interfaces, detailed system features, and quality attributes. Section 6 captures licensing, whilst Section 7 presents the glossary. == Overall Description === Product Perspective -JLBH is delivered as a *Maven* artefact (`net.openhft:jlbh`) and depends on Chronicle Core for low-level utilities. It may be embedded in unit tests, standalone mains, or invoked as an _EventLoop_ handler. + +JLBH is delivered as a *Maven* artefact (`net.openhft:jlbh`) and depends on Chronicle Core for low-level utilities. +It may be embedded in unit tests, standalone mains, or invoked as an _EventLoop_ handler. === Product Functions @@ -100,10 +105,13 @@ JLBH is delivered as a *Maven* artefact (`net.openhft:jlbh`) and depends on Chro * `JLBHResultSerializer` - CSV writer for analytics pipelines. === User Interface + CLI usage is demonstrated in _ExampleJLBHMain_ and test fixtures; no dedicated GUI is provided. === Hardware Interfaces -None. The harness interacts with OS timers and CPU affinity via JNI where available. + +None. +The harness interacts with OS timers and CPU affinity via JNI where available. === Software Interfaces @@ -115,49 +123,65 @@ None. The harness interacts with OS timers and CPU affinity via JNI where availa == System Features === FN-001 Latency Sampling + *Description*: Collect nanosecond deltas from user code via `sample(long)` or `sampleNanos(long)`. *Stimulus/Response*: Upon each iteration, `JLBHTask.run` calls `jlbh.sample(...)`; histogram updates occur; run summary printed at completion. *Priority*: *Essential*. === FN-002 Coordinated-Omission Compensation -Ensures that queue back-log does not under-represent tail latency. Enabled by default; opt-out via `accountForCoordinatedOmission(false)`. + +Ensures that queue back-log does not under-represent tail latency. +Enabled by default; opt-out via `accountForCoordinatedOmission(false)`. === FN-003 Additional Probes + Developers may time sub-stages (e.g., serialisation, network round-trip) through `addProbe(String)` and record them independently. === FN-004 OS Jitter Tracking -A background thread measures scheduling gaps exceeding a configured threshold (`recordJitterGreaterThanNs`). Monitoring is enabled by default and can be disabled via `recordOSJitter(false)`. Results are summarised alongside core latencies. + +A background thread measures scheduling gaps exceeding a configured threshold (`recordJitterGreaterThanNs`). +Monitoring is enabled by default and can be disabled via `recordOSJitter(false)`. +Results are summarised alongside core latencies. === FN-005 CSV Serialisation + Post-run, results can be persisted for offline analysis (`result.csv` by default). === FN-006 Event Loop Integration + `eventLoopHandler(EventLoop)` allows benchmarks to operate inside Chronicle threading framework, avoiding extra threads. == Non-Functional Requirements === NF-P-001 Performance + * Overhead per sample must remain below 100 ns when no additional probes are active. * Histogram generation must support >=200 M iterations without heap pressure. === NF-R-001 Reliability + * Harness must abort gracefully on thread interruptions or sample time-outs (`timeout(long)`). * Immutable result objects ensure thread-safe publication to external consumers. === NF-UX-001 Usability + * Fluent builder API; sensible defaults provide a runnable benchmark in ≤10 LOC. * ASCII table outputs are human-readable and CI-friendly. === NF-O-001 Portability + * Pure-Java codebase; no native compilation steps. * JDK-specific optimisations (e.g., *Zing* support) are runtime-detected. === NF-O-002 Maintainability + * 80 %+ unit-test line coverage with deterministic fixtures. * Code adheres to Chronicle parent POM style and SonarCloud quality gates. === NF-S-001 Security -No executable deserialisation; harness operates in-process. Users remain responsible for securing benchmarked code. + +No executable deserialisation; harness operates in-process. +Users remain responsible for securing benchmarked code. == Licensing @@ -192,8 +216,13 @@ public class NothingBenchmark implements JLBHTask { == Appendix B - Footnotes -* JLBH originated within the Chronicle Software open-source stack and is actively maintained. See https://github.com/OpenHFT/JLBH. -* The API reference highlights the focus on _co-ordinated omission_ and event-loop support. See https://www.javadoc.io/doc/net.openhft/chronicle-core/latest/net/openhft/chronicle/jlbh/JLBH.html. -* A multi-part blog series gives practical guidance on designing realistic latency tests. See http://www.rationaljava.com/2016/04/a-series-of-posts-on-jlbh-java-latency.html. -* Vanilla-Java's micro-services article demonstrates embedding JLBH in an event-driven architecture. See https://vanilla-java.github.io/2016/04/02/Microservices-in-the-Chronicle-World-Part-5.html. -* Original discussion of co-ordinated omission by Gil Tene motivates JLBH's default settings. See https://groups.google.com/g/mechanical-sympathy/c/icNZJejUHfE. +* JLBH originated within the Chronicle Software open-source stack and is actively maintained. +See https://github.com/OpenHFT/JLBH. +* The API reference highlights the focus on _co-ordinated omission_ and event-loop support. +See https://www.javadoc.io/doc/net.openhft/chronicle-core/latest/net/openhft/chronicle/jlbh/JLBH.html. +* A multi-part blog series gives practical guidance on designing realistic latency tests. +See http://www.rationaljava.com/2016/04/a-series-of-posts-on-jlbh-java-latency.html. +* Vanilla-Java's micro-services article demonstrates embedding JLBH in an event-driven architecture. +See https://vanilla-java.github.io/2016/04/02/Microservices-in-the-Chronicle-World-Part-5.html. +* Original discussion of co-ordinated omission by Gil Tene motivates JLBH's default settings. +See https://groups.google.com/g/mechanical-sympathy/c/icNZJejUHfE. diff --git a/src/main/adoc/results-interpretation-guide.adoc b/src/main/adoc/results-interpretation-guide.adoc index f71e9eb6..89452e3f 100644 --- a/src/main/adoc/results-interpretation-guide.adoc +++ b/src/main/adoc/results-interpretation-guide.adoc @@ -8,11 +8,13 @@ Chronicle Software :icons: font *_Abstract_*:: -This guide explains how to read and understand the various outputs and summaries generated by Chronicle JLBH. Effective interpretation is key to deriving meaningful insights from your latency benchmarks. +This guide explains how to read and understand the various outputs and summaries generated by Chronicle JLBH. +Effective interpretation is key to deriving meaningful insights from your latency benchmarks. == Understanding JLBH Console Output Structure -When JLBH runs, it typically prints information to the console (or a configured `PrintStream`). Understanding this output is the first step: +When JLBH runs, it typically prints information to the console (or a configured `PrintStream`). +Understanding this output is the first step: * **Warm-up Messages**: Indicates the progress and completion of the warm-up phase (e.g., "Warm up complete (X iterations took Y s)"). * **Per-Run Detailed Output**: For each measurement run, JLBH prints a section: @@ -20,7 +22,8 @@ When JLBH runs, it typically prints information to the console (or a configured ** Run metadata: Total run time, latency distribution strategy, coordinated omission status, target throughput. ** **Probe Statistics**: For each active probe (the default "End to End" probe, any custom probes added via `jlbh.addProbe()`, and the "OS Jitter" probe if enabled), a line showing: *** Probe name and total samples recorded for that probe in the run. -*** A compact representation of key percentiles (e.g., "50/90 99/99.9 99.99 - worst was X / Y Z / A B - C"). These values are usually in microseconds. +*** A compact representation of key percentiles (e.g., "50/90 99/99.9 99.99 - worst was X / Y Z / A B - C"). +These values are usually in microseconds. * **Final Summary Tables**: After all runs are complete, JLBH prints summary tables for each probe: ** Header: "SUMMARY (probe_name) us". ** Rows for key percentiles (50.0, 90.0, 99.0, 99.9, etc., up to 'worst'). @@ -29,28 +32,38 @@ When JLBH runs, it typically prints information to the console (or a configured == Interpreting Latency Histograms and Percentiles -JLBH uses histograms to capture the full distribution of measured latencies, rather than just averages which can hide important details. These histograms are then used to derive percentile values. +JLBH uses histograms to capture the full distribution of measured latencies, rather than just averages which can hide important details. +These histograms are then used to derive percentile values. * **What is a Percentile?** -** A percentile indicates the value below which a given percentage of observations fall. For example, the 99th percentile latency is the value X where 99% of all measured latencies were less than or equal to X, and 1% were greater. +** A percentile indicates the value below which a given percentage of observations fall. +For example, the 99th percentile latency is the value X where 99% of all measured latencies were less than or equal to X, and 1% were greater. * **Key Percentiles in JLBH Output**: -** `50.0` (Median): The middle value; 50% of latencies are lower, 50% are higher. It gives a sense of the "typical" latency but can be misleading on its own if the distribution is skewed. +** `50.0` (Median): The middle value; 50% of latencies are lower, 50% are higher. +It gives a sense of the "typical" latency but can be misleading on its own if the distribution is skewed. ** `90.0`: 90% of requests were faster than this value. -** `99.0` (99th or "p99"): A common metric for Service Level Objectives (SLOs). Indicates that 1 out of 100 requests experienced this latency or worse. +** `99.0` (99th or "p99"): A common metric for Service Level Objectives (SLOs). +Indicates that 1 out of 100 requests experienced this latency or worse. ** `99.9` (99.9th or "p999", "three nines"): Important for high-performance systems; 1 out of 1000 requests were this slow or slower. ** `99.99` (99.99th or "p9999", "four nines"): Critical for systems demanding very high consistency in performance. -** `worst`: The single highest latency recorded during the run. This can be affected by outliers but is important to note, especially if it's dramatically different from high percentiles. +** `worst`: The single highest latency recorded during the run. +This can be affected by outliers but is important to note, especially if it's dramatically different from high percentiles. * **Why Percentiles Matter**: ** Averages can be heavily skewed by a small number of very low or very high latencies, masking the true experience of most requests. -** Tail latencies (e.g., p99 and above) often drive user-perceived performance and system stability. A system with a good average latency but terrible tail latency can still feel unresponsive or unreliable. -* **Units**: Latency values in JLBH output are typically shown in microseconds (`us`) or sometimes nanoseconds (`ns`) if the values are very small. Pay attention to the units indicated in the table headers. -* **Custom Probes**: If you've added custom probes (e.g., `jlbh.addProbe("MyStage")`), each will have its own percentile table in the output. Analyzing these helps pinpoint which stage of an operation contributes most to the overall end-to-end latency. +** Tail latencies (e.g., p99 and above) often drive user-perceived performance and system stability. +A system with a good average latency but terrible tail latency can still feel unresponsive or unreliable. +* **Units**: Latency values in JLBH output are typically shown in microseconds (`us`) or sometimes nanoseconds (`ns`) if the values are very small. +Pay attention to the units indicated in the table headers. +* **Custom Probes**: If you've added custom probes (e.g., `jlbh.addProbe("MyStage")`), each will have its own percentile table in the output. +Analyzing these helps pinpoint which stage of an operation contributes most to the overall end-to-end latency. == Co-ordinated Omission (CO) Co-ordinated Omission is a critical concept in latency measurement, especially for systems that process requests at a certain rate. -* **The Problem**: If a system (or the benchmark harness itself) pauses or slows down, it might temporarily stop measuring. When it resumes, it might only measure the (potentially fast) service times of requests that are processed immediately, "omitting" the time that other requests would have spent waiting *during* the pause. This leads to unrealistically optimistic latency figures, particularly at the tail. +* **The Problem**: If a system (or the benchmark harness itself) pauses or slows down, it might temporarily stop measuring. +When it resumes, it might only measure the (potentially fast) service times of requests that are processed immediately, "omitting" the time that other requests would have spent waiting *during* the pause. +This leads to unrealistically optimistic latency figures, particularly at the tail. * **How JLBH Compensates**: ** JLBH addresses this by default (`accountForCoordinatedOmission = true` in `JLBHOptions`). ** For each iteration, JLBH calculates an *ideal scheduled start time* based on the target throughput. @@ -58,9 +71,13 @@ Co-ordinated Omission is a critical concept in latency measurement, especially f ** The latency is then calculated as `System.nanoTime() - idealStartTimeNs`. ** This means if the system or harness was delayed, the measured latency for that sample will correctly include that delay, reflecting the experience a request would have had if it arrived at its scheduled time. * **Impact on Results**: -** *With CO compensation (default)*: Histograms reflect a more accurate picture of the latency a user or client would experience, including any queuing or delays caused by the system struggling to keep up with the offered load. Tails are usually higher and more realistic. -** *Without CO compensation*: Histograms show only the raw processing time for the operations that were actually executed, potentially missing significant backpressure effects. This can be useful for understanding raw compute time but not overall system responsiveness under load. -* **Comparison**: Always state whether CO compensation was enabled when comparing benchmark results. Comparing a CO-compensated run with a non-compensated run is misleading. JLBH prints "Correcting for co-ordinated:true/false" in each run's output. +** *With CO compensation (default)*: Histograms reflect a more accurate picture of the latency a user or client would experience, including any queuing or delays caused by the system struggling to keep up with the offered load. +Tails are usually higher and more realistic. +** *Without CO compensation*: Histograms show only the raw processing time for the operations that were actually executed, potentially missing significant backpressure effects. +This can be useful for understanding raw compute time but not overall system responsiveness under load. +* **Comparison**: Always state whether CO compensation was enabled when comparing benchmark results. +Comparing a CO-compensated run with a non-compensated run is misleading. +JLBH prints "Correcting for co-ordinated:true/false" in each run's output. == OS Jitter Probe @@ -72,7 +89,8 @@ JLBH can optionally measure Operating System (OS) jitter, which refers to delays ** If this difference (a "gap") exceeds a configurable threshold (`recordJitterGreaterThanNs`, default 1 microsecond), it's recorded as an OS jitter event in a separate "OS Jitter" histogram. * **Interpreting Jitter Output**: ** The "OS Jitter" probe will have its own percentile table in the JLBH output. -** *High jitter values* (e.g., multiple milliseconds at p99 or worst) suggest that the OS, other processes, or hardware interrupts are frequently pausing the benchmark threads for significant periods. This is "machine noise." +** *High jitter values* (e.g., multiple milliseconds at p99 or worst) suggest that the OS, other processes, or hardware interrupts are frequently pausing the benchmark threads for significant periods. +This is "machine noise." ** *Low jitter values* (e.g., a few microseconds) indicate that the benchmark process was generally ableto run without significant external interference from the OS. * **Why it's Useful**: ** Helps distinguish between latency caused by your application code versus latency caused by the environment. @@ -84,29 +102,39 @@ JLBH can optionally measure Operating System (OS) jitter, which refers to delays == Analysing Run-to-Run Variation -Latency measurements can vary between successive runs of the same benchmark on the same machine due to various factors. JLBH's summary output includes a "% Variation" column for each percentile. +Latency measurements can vary between successive runs of the same benchmark on the same machine due to various factors. +JLBH's summary output includes a "% Variation" column for each percentile. * **The Variation Column**: This shows the percentage difference between the maximum and minimum values recorded for a given percentile across all included runs. -** By default (`JLBHOptions.skipFirstRun = NOT_SET`), if there are more than 3 runs, the first run is often excluded from this variation calculation as it might still be affected by late-stage JIT or other one-off initializations. You can control this explicitly with `skipFirstRun(boolean)`. +** By default (`JLBHOptions.skipFirstRun = NOT_SET`), if there are more than 3 runs, the first run is often excluded from this variation calculation as it might still be affected by late-stage JIT or other one-off initializations. +You can control this explicitly with `skipFirstRun(boolean)`. * **Interpreting Variation**: -** *Low Variation (e.g., < 5-10%)*: Suggests a stable benchmarking environment and that your application's performance is repeatable under the test conditions. This increases confidence in the results. -** *High Variation*: Indicates instability. Possible causes include: -*** **Insufficient Warm-up**: The JVM might still be optimizing code differently across runs. Try increasing `warmUpIterations`. -*** **Garbage Collection (GC) Pauses**: Infrequent but long GC pauses can hit some runs but not others, drastically affecting tail percentiles. Monitor GC activity. +** *Low Variation (e.g., < 5-10%)*: Suggests a stable benchmarking environment and that your application's performance is repeatable under the test conditions. +This increases confidence in the results. +** *High Variation*: Indicates instability. +Possible causes include: +*** **Insufficient Warm-up**: The JVM might still be optimizing code differently across runs. +Try increasing `warmUpIterations`. +*** **Garbage Collection (GC) Pauses**: Infrequent but long GC pauses can hit some runs but not others, drastically affecting tail percentiles. +Monitor GC activity. *** **External System Noise**: Other processes on the machine, network fluctuations (if applicable), or even hardware power management. *** **Adaptive JVM Behavior**: Some JVM optimizations are adaptive and can change behavior between runs if conditions shift slightly. *** **Benchmarked Code Instability**: The code itself might have inherent variability or be interacting with unstable external resources. -* **Goal**: Aim for reasonably tight spreads to ensure your conclusions are based on consistent behavior. Focus on understanding systematic performance rather than chasing the single "best" score from one run. Longer runs (`iterations`) can also help smooth out some variations and capture rarer events more consistently. +* **Goal**: Aim for reasonably tight spreads to ensure your conclusions are based on consistent behavior. +Focus on understanding systematic performance rather than chasing the single "best" score from one run. +Longer runs (`iterations`) can also help smooth out some variations and capture rarer events more consistently. == The Throughput vs. Latency Relationship A fundamental aspect of performance is the trade-off between throughput (how much work is done per unit of time) and latency (how long each piece of work takes). -* **JLBH's Role**: JLBH allows you to set a target `throughput` via `JLBHOptions`. This controls the rate at which `JLBHTask.run()` is invoked. +* **JLBH's Role**: JLBH allows you to set a target `throughput` via `JLBHOptions`. +This controls the rate at which `JLBHTask.run()` is invoked. * **Typical Behavior**: ** At low throughputs, a system can often maintain low latency. ** As throughput increases, contention for resources (CPU, memory, network, locks) typically rises. -** Beyond a certain point, latency will start to increase, often non-linearly. Queues may build up, and the system struggles to keep pace. +** Beyond a certain point, latency will start to increase, often non-linearly. +Queues may build up, and the system struggles to keep pace. * **Using JLBH to Explore**: ** Run your benchmark with different `throughput` settings to understand how your application's latency responds to varying load. ** This helps identify the "sweet spot" for your system or pinpoint throughput levels where performance starts to degrade significantly. @@ -117,40 +145,56 @@ A fundamental aspect of performance is the trade-off between throughput (how muc Avoiding these common mistakes can help ensure your benchmark results are meaningful and your conclusions are sound: * **Insufficient Warm-up**: -** *Issue*: The JVM performs JIT compilation, class loading, and cache warming during initial execution. Short or inadequate warm-up can mean you're measuring pre-optimized code or including one-off startup costs. -** *Solution*: Ensure `warmUpIterations` is sufficient. A common rule of thumb is at least the JVM's compilation threshold (e.g., `Jvm.compileThreshold()` in Chronicle Core, typically 10,000-15,000 iterations) plus some buffer for application-specific warm-up. Monitor if results stabilize after a certain number of runs. +** *Issue*: The JVM performs JIT compilation, class loading, and cache warming during initial execution. +Short or inadequate warm-up can mean you're measuring pre-optimized code or including one-off startup costs. +** *Solution*: Ensure `warmUpIterations` is sufficient. +A common rule of thumb is at least the JVM's compilation threshold (e.g., `Jvm.compileThreshold()` in Chronicle Core, typically 10,000-15,000 iterations) plus some buffer for application-specific warm-up. +Monitor if results stabilize after a certain number of runs. * **Ignoring Co-ordinated Omission Setting**: ** *Issue*: Comparing results where one run had CO compensation enabled and another didn't will lead to incorrect conclusions about performance changes. ** *Solution*: Always be aware of the `accountForCoordinatedOmission` setting (default is `true`) and report it alongside your results. * **Benchmark Overhead**: -** *Issue*: The act of measuring itself (calling `System.nanoTime()`, `jlbh.sample()`) has a small overhead. For extremely fast operations (deep sub-microsecond), this overhead can become a significant percentage of the measured time. -** *Solution*: JLBH is designed for low overhead (aiming for <100ns). Be mindful of this when benchmarking trivial operations. Consider batching very small operations if appropriate, or if the raw speed of an isolated, tiny operation is critical, a more specialized microbenchmarking tool like JMH might be used for that specific part, with JLBH testing the larger context. +** *Issue*: The act of measuring itself (calling `System.nanoTime()`, `jlbh.sample()`) has a small overhead. +For extremely fast operations (deep sub-microsecond), this overhead can become a significant percentage of the measured time. +** *Solution*: JLBH is designed for low overhead (aiming for <100ns). +Be mindful of this when benchmarking trivial operations. +Consider batching very small operations if appropriate, or if the raw speed of an isolated, tiny operation is critical, a more specialized microbenchmarking tool like JMH might be used for that specific part, with JLBH testing the larger context. * **Unrealistic Throughput Targets**: -** *Issue*: Setting `throughput` far beyond what the system can handle will just measure a system under constant overload, with latencies dominated by queue times. Setting it too low might not expose bottlenecks that appear under moderate load. +** *Issue*: Setting `throughput` far beyond what the system can handle will just measure a system under constant overload, with latencies dominated by queue times. +Setting it too low might not expose bottlenecks that appear under moderate load. ** *Solution*: Test a range of throughputs relevant to your expected operational load and capacity limits. * **Focusing Only on Averages**: ** *Issue*: Averages hide outliers and don't reflect the experience of users hitting tail latencies. ** *Solution*: Prioritize percentile analysis (p50, p90, p99, p99.9, worst) as JLBH encourages. * **Ignoring Custom Probe Data**: ** *Issue*: If you've set up custom probes for different stages of an operation but only look at the end-to-end numbers, you miss valuable insights into internal bottlenecks. -** *Solution*: Analyze the percentile data for each custom probe. Compare their contributions to the overall latency. +** *Solution*: Analyze the percentile data for each custom probe. +Compare their contributions to the overall latency. * **Environmental Inconsistency**: ** *Issue*: Running benchmarks on different hardware, with varying background OS/application load, different JVM versions, or different system configurations will yield different results. -** *Solution*: For comparative analysis, always use a consistent, controlled, and quiet environment. Document the environment meticulously. +** *Solution*: For comparative analysis, always use a consistent, controlled, and quiet environment. +Document the environment meticulously. * **Thread Pinning (Affinity) Issues**: -** *Issue*: On multi-core systems, OS thread scheduling can move your benchmark thread between cores, causing cache misses and jitter. Forgetting to pin, or pinning to a busy/shared core, can skew results. -** *Solution*: Use `JLBHOptions.acquireLock(Affinity::acquireLock)` (from OpenHFT Affinity library) to attempt to pin the main benchmark thread to an isolated core. Similarly for `jitterAffinity` if using the OS Jitter probe. Ensure these cores are genuinely isolated. +** *Issue*: On multi-core systems, OS thread scheduling can move your benchmark thread between cores, causing cache misses and jitter. +Forgetting to pin, or pinning to a busy/shared core, can skew results. +** *Solution*: Use `JLBHOptions.acquireLock(Affinity::acquireLock)` (from OpenHFT Affinity library) to attempt to pin the main benchmark thread to an isolated core. +Similarly for `jitterAffinity` if using the OS Jitter probe. +Ensure these cores are genuinely isolated. * **Not Enough Iterations/Run Duration**: ** *Issue*: Very short runs (low `iterations`) may not be statistically significant or may fail to capture infrequent, high-latency events like GC pauses or network timeouts. -** *Solution*: Run enough iterations to achieve statistical stability and to give rare events a chance to occur if they are part of the system's behavior under load. This often means runs lasting at least several seconds, or even minutes for deep stability tests. +** *Solution*: Run enough iterations to achieve statistical stability and to give rare events a chance to occur if they are part of the system's behavior under load. +This often means runs lasting at least several seconds, or even minutes for deep stability tests. == Deriving Actionable Insights The ultimate goal of benchmarking is to gain insights that can lead to improvements. * **Identify Bottlenecks**: -** High end-to-end latency? Use custom probes to break down the operation and see which stage is the culprit. -** High p99 or p99.9 values? This often points to issues like GC, network spikes, lock contention, or OS jitter. Correlate with OS Jitter probe and GC logs. +** High end-to-end latency? +Use custom probes to break down the operation and see which stage is the culprit. +** High p99 or p99.9 values? +This often points to issues like GC, network spikes, lock contention, or OS jitter. +Correlate with OS Jitter probe and GC logs. * **Validate Optimizations**: Run benchmarks before and after a code change to quantify its impact on latency across different percentiles and throughputs. * **Capacity Planning**: Determine the throughput your system can handle while meeting latency SLOs. * **Regression Testing**: Integrate JLBH benchmarks into your CI/CD pipeline to catch performance regressions automatically. diff --git a/src/main/config/spotbugs-exclude.xml b/src/main/config/spotbugs-exclude.xml index 573b9bc5..2811fc6b 100644 --- a/src/main/config/spotbugs-exclude.xml +++ b/src/main/config/spotbugs-exclude.xml @@ -1,8 +1,8 @@ + xmlns="https://github.com/spotbugs/filter/3.0.0" + xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" + xsi:schemaLocation="https://github.com/spotbugs/filter/3.0.0 https://raw.githubusercontent.com/spotbugs/spotbugs/4.8.0/spotbugs/etc/findbugsfilter.xsd"> @@ -10,14 +10,18 @@ - JLBH-OPS-201: CLI mode preserves historical behaviour (stack traces, exit-on-error, exposing percentile arrays). Refactor to soft-fail tracked under JLBH-OPS-260. + JLBH-OPS-201: CLI mode preserves historical behaviour (stack traces, exit-on-error, exposing + percentile arrays). Refactor to soft-fail tracked under JLBH-OPS-260. + - JLBH-OPS-202: Pseudo-random generator only seeds synthetic load; cryptographic strength unnecessary. + JLBH-OPS-202: Pseudo-random generator only seeds synthetic load; cryptographic strength + unnecessary. + @@ -31,14 +35,18 @@ - JLBH-OPS-204: Caller is the benchmark harness itself; returning backing arrays avoids millions of allocations during tight loops. + JLBH-OPS-204: Caller is the benchmark harness itself; returning backing arrays avoids millions of + allocations during tight loops. + - JLBH-OPS-205: Paths come from operator scripts; RuntimeException propagates IO failures without changing the established API. + JLBH-OPS-205: Paths come from operator scripts; RuntimeException propagates IO failures without + changing the established API. + diff --git a/src/main/java/net/openhft/chronicle/jlbh/JLBHOptions.java b/src/main/java/net/openhft/chronicle/jlbh/JLBHOptions.java index 471dbcba..9aef92ba 100644 --- a/src/main/java/net/openhft/chronicle/jlbh/JLBHOptions.java +++ b/src/main/java/net/openhft/chronicle/jlbh/JLBHOptions.java @@ -49,6 +49,7 @@ public class JLBHOptions { boolean jitterAffinity; Supplier acquireLock = Affinity::acquireLock; long timeout; + /** * Number of iterations per second to be pushed through the benchmark * diff --git a/src/main/java/net/openhft/chronicle/jlbh/JLBHResult.java b/src/main/java/net/openhft/chronicle/jlbh/JLBHResult.java index 083e2a9e..39852141 100644 --- a/src/main/java/net/openhft/chronicle/jlbh/JLBHResult.java +++ b/src/main/java/net/openhft/chronicle/jlbh/JLBHResult.java @@ -56,7 +56,7 @@ public interface JLBHResult { * * @param probeName name of the probe for which results are requested * @return an {@code Optional} containing the probe results or - * {@link Optional#empty()} if the probe is not present + * {@link Optional#empty()} if the probe is not present */ @NotNull Optional probe(String probeName); @@ -185,7 +185,7 @@ interface RunResult { * {@code null} value will be returned.

* * @return duration of the 99.99th percentile or {@code null} when - * not available + * not available */ @Nullable Duration get9999thPercentile(); @@ -199,23 +199,41 @@ interface RunResult { Duration getWorst(); enum Percentile { - /** 50th percentile (median). */ + /** + * 50th percentile (median). + */ PERCENTILE_50TH, - /** 90th percentile. */ + /** + * 90th percentile. + */ PERCENTILE_90TH, - /** 99th percentile. */ + /** + * 99th percentile. + */ PERCENTILE_99TH, - /** 99.7th percentile. */ + /** + * 99.7th percentile. + */ PERCENTILE_99_7TH, - /** 99.9th percentile. */ + /** + * 99.9th percentile. + */ PERCENTILE_99_9TH, - /** 99.97th percentile. */ + /** + * 99.97th percentile. + */ PERCENTILE_99_97TH, - /** 99.99th percentile. */ + /** + * 99.99th percentile. + */ PERCENTILE_99_99TH, - /** 99.999th percentile. */ + /** + * 99.999th percentile. + */ PERCENTILE_99_999TH, - /** Highest (worst) observed latency. */ + /** + * Highest (worst) observed latency. + */ WORST } } diff --git a/src/main/java/net/openhft/chronicle/jlbh/LatencyDistributor.java b/src/main/java/net/openhft/chronicle/jlbh/LatencyDistributor.java index 047f9cc8..056d0b47 100644 --- a/src/main/java/net/openhft/chronicle/jlbh/LatencyDistributor.java +++ b/src/main/java/net/openhft/chronicle/jlbh/LatencyDistributor.java @@ -29,9 +29,9 @@ public interface LatencyDistributor { * Adjust the delay between benchmark iterations. * * @param averageLatencyNS the nominal delay in nanoseconds derived from the - * configured throughput + * configured throughput * @return the actual number of nanoseconds to wait before the next - * iteration is executed + * iteration is executed */ long apply(long averageLatencyNS); } diff --git a/src/main/java/net/openhft/chronicle/jlbh/TeamCityHelper.java b/src/main/java/net/openhft/chronicle/jlbh/TeamCityHelper.java index 6431f734..842d2f8e 100644 --- a/src/main/java/net/openhft/chronicle/jlbh/TeamCityHelper.java +++ b/src/main/java/net/openhft/chronicle/jlbh/TeamCityHelper.java @@ -73,13 +73,13 @@ public static void histo(@NotNull String name, @NotNull Histogram histo, @NotNul *

For each percentile a {@code buildStatisticValue} service message is * emitted so that TeamCity can record the values as build statistics.

* - * @param prefix prefix used to construct the TeamCity statistic key. The - * generated key has the form - * {@code prefix.<probe>.<percentile>} - * @param jlbh benchmark instance providing access to the percentile - * data for the last run - * @param iterations total number of iterations executed in the run; used to - * derive the set of percentiles that will be output + * @param prefix prefix used to construct the TeamCity statistic key. The + * generated key has the form + * {@code prefix.<probe>.<percentile>} + * @param jlbh benchmark instance providing access to the percentile + * data for the last run + * @param iterations total number of iterations executed in the run; used to + * derive the set of percentiles that will be output * @param printStream destination to which the TeamCity service messages are * written */ diff --git a/src/main/java/net/openhft/chronicle/jlbh/util/JLBHResultSerializer.java b/src/main/java/net/openhft/chronicle/jlbh/util/JLBHResultSerializer.java index 6f030fd1..8e49f030 100644 --- a/src/main/java/net/openhft/chronicle/jlbh/util/JLBHResultSerializer.java +++ b/src/main/java/net/openhft/chronicle/jlbh/util/JLBHResultSerializer.java @@ -31,7 +31,7 @@ * JLBHResult result = consumer.get(); * JLBHResultSerializer.runResultToCSV(result); * } - * + *

* which will create a {@code result.csv} file in the working directory. Other * overloads allow specifying the file name, the set of probes to export and * whether the OS jitter probe should be included. @@ -89,10 +89,10 @@ public static void runResultToCSV(JLBHResult jlbhResult, String fileName, String * {@code namesOfProbes} a row is written if that probe exists. When * {@code includeOSJitter} is {@code true} the OS jitter probe is appended. * - * @param jlbhResult the benchmark result to serialise - * @param fileName path of the CSV file to create - * @param namesOfProbes additional probes to export - * @param includeOSJitter whether to include OS jitter metrics + * @param jlbhResult the benchmark result to serialise + * @param fileName path of the CSV file to create + * @param namesOfProbes additional probes to export + * @param includeOSJitter whether to include OS jitter metrics * @throws IOException if the file cannot be written */ public static void runResultToCSV(JLBHResult jlbhResult, String fileName, Iterable namesOfProbes, boolean includeOSJitter) throws IOException { diff --git a/src/test/java/net/openhft/chronicle/jlbh/JLBHAdditionalCoverageTest.java b/src/test/java/net/openhft/chronicle/jlbh/JLBHAdditionalCoverageTest.java index 6623825d..63c70a49 100644 --- a/src/test/java/net/openhft/chronicle/jlbh/JLBHAdditionalCoverageTest.java +++ b/src/test/java/net/openhft/chronicle/jlbh/JLBHAdditionalCoverageTest.java @@ -9,7 +9,8 @@ import java.util.concurrent.TimeUnit; import java.util.concurrent.atomic.AtomicInteger; -import static org.junit.Assert.*; +import static org.junit.Assert.assertEquals; +import static org.junit.Assert.assertTrue; public class JLBHAdditionalCoverageTest { /** diff --git a/src/test/java/net/openhft/chronicle/jlbh/JLBHTest.java b/src/test/java/net/openhft/chronicle/jlbh/JLBHTest.java index 09da656e..e814d5f6 100644 --- a/src/test/java/net/openhft/chronicle/jlbh/JLBHTest.java +++ b/src/test/java/net/openhft/chronicle/jlbh/JLBHTest.java @@ -234,11 +234,11 @@ public void histogramSummariesAreCorrect() { System.out.println(baos); assertTrue(baos.toString().replace("\r", "").contains( "-------------------------------- SUMMARY (B) us ----------------------------------------------------\n" + - "Percentile run1 run2 run3 % Variation\n" + - "50.0: 0.10 0.10 0.10 0.00\n" + - "90.0: 0.10 0.10 0.10 0.00\n" + - "99.0: 0.10 0.10 0.10 0.00\n" + - "worst: 0.10 0.10 0.10 0.00")); + "Percentile run1 run2 run3 % Variation\n" + + "50.0: 0.10 0.10 0.10 0.00\n" + + "90.0: 0.10 0.10 0.10 0.00\n" + + "99.0: 0.10 0.10 0.10 0.00\n" + + "worst: 0.10 0.10 0.10 0.00")); } @Test diff --git a/src/test/java/net/openhft/chronicle/jlbh/PercentileSummaryTest.java b/src/test/java/net/openhft/chronicle/jlbh/PercentileSummaryTest.java index 7c6e3720..c7ca5b0a 100644 --- a/src/test/java/net/openhft/chronicle/jlbh/PercentileSummaryTest.java +++ b/src/test/java/net/openhft/chronicle/jlbh/PercentileSummaryTest.java @@ -96,10 +96,10 @@ public void testThatVarianceIsCalculatedCorrectly() { final PercentileSummary percentileSummary = new PercentileSummary(false, percentileSummaries, percentiles); percentileSummary.printSummary(); - assertEquals((0.009 - 0.002) / (0.009 + 0.002 /2) * 100, percentileSummary.calculateVariance(0), DELTA); - assertEquals((0.009 - 0.003) / (0.009 + 0.003 /2) * 100, percentileSummary.calculateVariance(1), DELTA); - assertEquals((0.009 - 0.004) / (0.009 + 0.004 /2) * 100, percentileSummary.calculateVariance(2), DELTA); - assertEquals((0.009 - 0.005) / (0.009 + 0.005 /2) * 100, percentileSummary.calculateVariance(3), DELTA); + assertEquals((0.009 - 0.002) / (0.009 + 0.002 / 2) * 100, percentileSummary.calculateVariance(0), DELTA); + assertEquals((0.009 - 0.003) / (0.009 + 0.003 / 2) * 100, percentileSummary.calculateVariance(1), DELTA); + assertEquals((0.009 - 0.004) / (0.009 + 0.004 / 2) * 100, percentileSummary.calculateVariance(2), DELTA); + assertEquals((0.009 - 0.005) / (0.009 + 0.005 / 2) * 100, percentileSummary.calculateVariance(3), DELTA); assertEquals(0, percentileSummary.calculateVariance(percentiles.length - 2), DELTA); } @@ -119,9 +119,9 @@ public void testVarianceSkipFirst() { percentileSummary.printSummary(); // 50th percentile - assertEquals((0.009 - 0.003) / (0.009 + 0.003 /2) * 100, percentileSummary.calculateVariance(0), DELTA); + assertEquals((0.009 - 0.003) / (0.009 + 0.003 / 2) * 100, percentileSummary.calculateVariance(0), DELTA); // 90th has no value in the first run, so nothing to skip? - assertEquals((0.009 - 0.003) / (0.009 + 0.003 /2) * 100, percentileSummary.calculateVariance(1), DELTA); + assertEquals((0.009 - 0.003) / (0.009 + 0.003 / 2) * 100, percentileSummary.calculateVariance(1), DELTA); } @Test @@ -144,10 +144,10 @@ public void testForEachRow() { receivedValues.add(values); receivedVariances.add(variance); }); - assertArrayEquals(new Double[] {0.5, 0.9, 1.0}, receivedPercentiles.toArray(new Double[] {})); - assertArrayEquals(new double[] {0.002, 0.003}, receivedValues.get(0), DELTA); - assertArrayEquals(new double[] {POSITIVE_INFINITY, 0.003}, receivedValues.get(1), DELTA); - assertArrayEquals(new double[] {0.002, 0.003}, receivedValues.get(2), DELTA); - assertArrayEquals(new Double[] {25.0, 0.0, 25.0}, receivedVariances.toArray(new Double[] {})); + assertArrayEquals(new Double[]{0.5, 0.9, 1.0}, receivedPercentiles.toArray(new Double[]{})); + assertArrayEquals(new double[]{0.002, 0.003}, receivedValues.get(0), DELTA); + assertArrayEquals(new double[]{POSITIVE_INFINITY, 0.003}, receivedValues.get(1), DELTA); + assertArrayEquals(new double[]{0.002, 0.003}, receivedValues.get(2), DELTA); + assertArrayEquals(new Double[]{25.0, 0.0, 25.0}, receivedVariances.toArray(new Double[]{})); } } diff --git a/src/test/java/net/openhft/chronicle/jlbh/SimpleOSJitterBenchmark.java b/src/test/java/net/openhft/chronicle/jlbh/SimpleOSJitterBenchmark.java index 95e2a2b4..31807efd 100644 --- a/src/test/java/net/openhft/chronicle/jlbh/SimpleOSJitterBenchmark.java +++ b/src/test/java/net/openhft/chronicle/jlbh/SimpleOSJitterBenchmark.java @@ -35,7 +35,7 @@ public static void main(String[] args) { .recordOSJitter(true) .runs(4) .jlbhTask(new SimpleOSJitterBenchmark()); - new JLBH(lth,System.out, jlbhResult -> { + new JLBH(lth, System.out, jlbhResult -> { jlbhResult.osJitter().ifPresent(probeResult -> { JLBHResult.RunResult runResult = probeResult.summaryOfLastRun(); System.out.println("runResult = " + runResult); diff --git a/src/test/java/net/openhft/chronicle/jlbh/SimpleResultSerializerBenchmark.java b/src/test/java/net/openhft/chronicle/jlbh/SimpleResultSerializerBenchmark.java index 84695357..b074da07 100644 --- a/src/test/java/net/openhft/chronicle/jlbh/SimpleResultSerializerBenchmark.java +++ b/src/test/java/net/openhft/chronicle/jlbh/SimpleResultSerializerBenchmark.java @@ -33,8 +33,8 @@ public static void main(String[] args) { .throughput(100_000) // .accountForCoordinatedOmission(true) .runs(2) - .jlbhTask(new SimpleResultSerializerBenchmark() ); - new JLBH(lth, System.out,jlbhResult -> { + .jlbhTask(new SimpleResultSerializerBenchmark()); + new JLBH(lth, System.out, jlbhResult -> { try { System.out.println("Serializing result..."); JLBHResultSerializer.runResultToCSV(jlbhResult); diff --git a/src/test/java/net/openhft/chronicle/jlbh/util/JLBHResultSerializerTest.java b/src/test/java/net/openhft/chronicle/jlbh/util/JLBHResultSerializerTest.java index 18acfe37..f62673ca 100644 --- a/src/test/java/net/openhft/chronicle/jlbh/util/JLBHResultSerializerTest.java +++ b/src/test/java/net/openhft/chronicle/jlbh/util/JLBHResultSerializerTest.java @@ -12,7 +12,8 @@ import java.time.Duration; import java.util.*; -import static org.junit.Assert.*; +import static org.junit.Assert.assertEquals; +import static org.junit.Assert.assertTrue; public class JLBHResultSerializerTest { From 0e6775d476f421d5ad14d7b24d2d077f849de35b Mon Sep 17 00:00:00 2001 From: Peter Lawrey Date: Tue, 28 Oct 2025 10:59:48 +0000 Subject: [PATCH 08/12] Refactor documentation for benchmark lifecycle and latency recording --- LICENSE.adoc | 2 +- README.adoc | 50 ++--- src/main/adoc/architecture.adoc | 184 +++++++-------- src/main/adoc/benchmark-lifecycle.adoc | 93 ++++---- src/main/adoc/jlbh-cookbook.adoc | 4 +- .../adoc/results-interpretation-guide.adoc | 212 ++++++++++-------- 6 files changed, 287 insertions(+), 258 deletions(-) diff --git a/LICENSE.adoc b/LICENSE.adoc index 5c5c264f..f4505664 100644 --- a/LICENSE.adoc +++ b/LICENSE.adoc @@ -6,4 +6,4 @@ You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -See the License for the specific language governing permissions and limitations under the License. \ No newline at end of file +See the License for the specific language governing permissions and limitations under the License. diff --git a/README.adoc b/README.adoc index 1d39ab03..a8699078 100644 --- a/README.adoc +++ b/README.adoc @@ -7,7 +7,7 @@ Chronicle Software image:https://maven-badges.herokuapp.com/maven-central/net.openhft/jlbh/badge.svg[caption="",link=https://maven-badges.herokuapp.com/maven-central/net.openhft/jlbh] image:https://javadoc.io/badge2/net.openhft/JLBH/javadoc.svg[link="https://www.javadoc.io/doc/net.openhft/chronicle-wire/latest/index.html"] -//image:https://javadoc-badge.appspot.com/net.openhft/jlbh.svg?label=javadoc[JavaDoc, link=https://www.javadoc.io/doc/net.openhft/jlbh] +// image:https://javadoc-badge.appspot.com/net.openhft/jlbh.svg?label=javadoc[JavaDoc, link=https://www.javadoc.io/doc/net.openhft/jlbh] image:https://img.shields.io/github/license/OpenHFT/JLBH[GitHub] image:https://img.shields.io/badge/release%20notes-subscribe-brightgreen[link="https://chronicle.software/release-notes/"] image:https://sonarcloud.io/api/project_badges/measure?project=OpenHFT_JLBH&metric=alert_status[link="https://sonarcloud.io/dashboard?id=OpenHFT_JLBH"] @@ -173,10 +173,10 @@ The following table summarises the main non-functional quality attributes derive http://www.rationaljava.com/2016/04/jlbh-introducing-java-latency.html[Introducing JLBH] -- What is JLBH -- What was the motivation for JLBH -- Differences between JMH and JLBH -- Quick start guide +* What is JLBH +* What was the motivation for JLBH +* Differences between JMH and JLBH +* Quick start guide === Sample percentile output @@ -194,43 +194,43 @@ worst: 12.56 12.56 10.61 10.93 http://www.rationaljava.com/2016/04/jlbh-examples-1-why-code-should-be.html[Why Code Should be Benchmarked in Context] -- A side by side example using JMH and JLBH for Date serialisation -- Measuring Date serialisation in a microbenchmark -- Measuring Date serialisation as part of a proper application -- How to add a probe to your JLBH benchmark -- Understanding the importance of measuring code in context +* A side by side example using JMH and JLBH for Date serialisation +* Measuring Date serialisation in a microbenchmark +* Measuring Date serialisation as part of a proper application +* How to add a probe to your JLBH benchmark +* Understanding the importance of measuring code in context Co-ordinated omission occurs when latency measurements ignore the time requests wait to be serviced, leading to unrealistically low results. http://www.rationaljava.com/2016/04/jlbh-examples-2-accounting-for.html[Accounting for Coordinated Omission] -- Running JLBH with and without accounting for coordinated omission -- An example illustrating the numerical impact of coordinated omission -- A discussion about flow control +* Running JLBH with and without accounting for coordinated omission +* An example illustrating the numerical impact of coordinated omission +* A discussion about flow control http://www.rationaljava.com/2016/04/jlbh-examples-3-affects-of-throughput.html[The Affects of Throughput on Latency] Note: the blog deliberately uses "Affects" in the title while describing the effects of throughput on latency. -- A discussion about the effects of throughput on latency -- How to use JLBH to measure TCP loopback -- Adding probes to test both halves of the TCP round trip -- Watching the effect of increasing throughput on latency -- Understanding that you have to drop throughput to achieve good latencies at high percentiles. +* A discussion about the effects of throughput on latency +* How to use JLBH to measure TCP loopback +* Adding probes to test both halves of the TCP round trip +* Watching the effect of increasing throughput on latency +* Understanding that you have to drop throughput to achieve good latencies at high percentiles. http://www.rationaljava.com/2016/04/jlbh-examples-4-benchmarking-quickfix.html[Benchmarking QuickFix vs ChronicleFix] -- Using JLBH to benchmark QuickFIX -- Observing how QuickFix latencies degrade through the percentiles -- Comparing QuickFIX with Chronicle FIX +* Using JLBH to benchmark QuickFIX +* Observing how QuickFix latencies degrade through the percentiles +* Comparing QuickFIX with Chronicle FIX == Common Questions === Using JLBH as part of automated performance/performance regression testing -- Extract latency percentiles from `net.openhft.chronicle.jlbh.JLBHTest::shouldProvideResultData` for xUnit tests. -- Run them on a production-like CI server. -- When local hardware is adequate, execute them along with other tests. -- Integrate these benchmarks into the TDD cycle to expose latency-related design issues early. +* Extract latency percentiles from `net.openhft.chronicle.jlbh.JLBHTest::shouldProvideResultData` for xUnit tests. +* Run them on a production-like CI server. +* When local hardware is adequate, execute them along with other tests. +* Integrate these benchmarks into the TDD cycle to expose latency-related design issues early. === Warm-up iterations diff --git a/src/main/adoc/architecture.adoc b/src/main/adoc/architecture.adoc index cd223d26..df620736 100644 --- a/src/main/adoc/architecture.adoc +++ b/src/main/adoc/architecture.adoc @@ -86,114 +86,114 @@ The `startTimeNs` is the calculated ideal start time for the iteration. The harness follows a well-defined lifecycle, detailed visually in link:benchmark-lifecycle.adoc[benchmark lifecycle diagram]. A typical execution sequence is as follows: -* **Initialization**: -** The user configures `JLBHOptions` defining all benchmark parameters. -** A `JLBH` instance is created with these options, an optional `PrintStream` for output, and an optional `JLBHResultConsumer` for programmatic result access. -** The `JLBHTask.init(JLBH)` method is called, allowing the user's task to perform setup. -*** During `init`, custom `NanoSampler` probes can be registered via `jlbh.addProbe(String)`. +Initialization :: +* The user configures `JLBHOptions` defining all benchmark parameters. +* A `JLBH` instance is created with these options, an optional `PrintStream` for output, and an optional `JLBHResultConsumer` for programmatic result access. +* The `JLBHTask.init(JLBH)` method is called, allowing the user's task to perform setup. +** During `init`, custom `NanoSampler` probes can be registered via `jlbh.addProbe(String)`. Each probe is backed by its own `Histogram`. -** If `recordOSJitter` is enabled in `JLBHOptions`, the `OSJitterMonitor` background thread is started. - -* **Warm-up Phase**: -** The `JLBHTask.run(System.nanoTime())` method is invoked `warmUpIterations` times. -*** The primary purpose is to allow the JVM's JIT compiler to optimize the code and for the system to reach a steady operational state. -*** Latency samples collected during this phase are typically added to histograms but are reset and discarded before measurement runs begin. -** After warm-up iterations, `JLBHTask.warmedUp()` is called. -** All probe histograms (end-to-end, custom, OS jitter) are reset. - -* **Measurement Runs**: -** This phase is repeated for the number of `runs` specified in `JLBHOptions`. -** For each run: -*** A loop executes for the configured number of `iterations`. -*** In each iteration: -**** The `JLBH` orchestrator calculates the `startTimeNs` for the task invocation. -***** This calculation is influenced by the configured `throughput` and the `LatencyDistributor` strategy. -***** If `accountForCoordinatedOmission` is true, `startTimeNs` represents the ideal scheduled time. +* If `recordOSJitter` is enabled in `JLBHOptions`, the `OSJitterMonitor` background thread is started. + +Warm-up Phase :: +* The `JLBHTask.run(System.nanoTime())` method is invoked `warmUpIterations` times. +** The primary purpose is to allow the JVM's JIT compiler to optimize the code and for the system to reach a steady operational state. +** Latency samples collected during this phase are typically added to histograms but are reset and discarded before measurement runs begin. +* After warm-up iterations, `JLBHTask.warmedUp()` is called. +* All probe histograms (end-to-end, custom, OS jitter) are reset. + +Measurement Runs :: +* This phase is repeated for the number of `runs` specified in `JLBHOptions`. +* For each run: +** A loop executes for the configured number of `iterations`. +** In each iteration: +*** The `JLBH` orchestrator calculates the `startTimeNs` for the task invocation. +**** This calculation is influenced by the configured `throughput` and the `LatencyDistributor` strategy. +**** If `accountForCoordinatedOmission` is true, `startTimeNs` represents the ideal scheduled time. The harness may busy-wait (spin using `System.nanoTime()`) if the current time is before this ideal `startTimeNs` to ensure the task starts as close to the scheduled time as possible. -**** `JLBHTask.run(startTimeNs)` is invoked with this calculated start time. -***** The user's benchmark logic executes. -***** To record the end-to-end latency, the task calls `jlbh.sample(System.nanoTime() - startTimeNs)`. -***** To record latency for sub-stages, the task calls `sampleNanos()` on any custom `NanoSampler` probes obtained during `init`. -**** Each call to `sample()` or `sampleNanos()` on a sampler results in the duration being recorded in its corresponding `Histogram`. -*** After all iterations for the current run are complete: -**** The `JLBHTask.runComplete()` method is invoked. -**** Statistics for the completed run (percentile values from each histogram) are typically printed to the configured `PrintStream`. -**** The percentile data for this run is stored as part of the accumulating results. -**** All probe histograms are reset to prepare for the next measurement run (if any). - -* **Completion**: -** After all measurement runs have finished: -*** A final summary report, often comparing results across runs and showing variation, is printed to the `PrintStream`. -*** The complete, immutable `JLBHResult` object, containing all data for all probes across all runs, is constructed. -*** If a `JLBHResultConsumer` was provided, its `accept(JLBHResult)` method is called with the final result. -** The `JLBHTask.complete()` method is invoked for any final cleanup by the user's task. -** The `OSJitterMonitor` thread is terminated if it was running. +*** `JLBHTask.run(startTimeNs)` is invoked with this calculated start time. +**** The user's benchmark logic executes. +**** To record the end-to-end latency, the task calls `jlbh.sample(System.nanoTime() - startTimeNs)`. +**** To record latency for sub-stages, the task calls `sampleNanos()` on any custom `NanoSampler` probes obtained during `init`. +*** Each call to `sample()` or `sampleNanos()` on a sampler results in the duration being recorded in its corresponding `Histogram`. +** After all iterations for the current run are complete: +*** The `JLBHTask.runComplete()` method is invoked. +*** Statistics for the completed run (percentile values from each histogram) are typically printed to the configured `PrintStream`. +*** The percentile data for this run is stored as part of the accumulating results. +*** All probe histograms are reset to prepare for the next measurement run (if any). + +Completion :: +* After all measurement runs have finished: +** A final summary report, often comparing results across runs and showing variation, is printed to the `PrintStream`. +** The complete, immutable `JLBHResult` object, containing all data for all probes across all runs, is constructed. +** If a `JLBHResultConsumer` was provided, its `accept(JLBHResult)` method is called with the final result. +* The `JLBHTask.complete()` method is invoked for any final cleanup by the user's task. +* The `OSJitterMonitor` thread is terminated if it was running. == Threading Model JLBH has a specific threading model that users and contributors should understand: -* **Primary Harness Thread**: -** When `JLBH.start()` is called, it orchestrates the entire benchmark lifecycle (initialization, warm-up, measurement runs, completion) on the *calling thread*. -** All `JLBHTask` lifecycle methods (`init`, `run`, `warmedUp`, `runComplete`, `complete`) are invoked sequentially on this single harness thread. -*** This implies that `JLBHTask` implementations generally do not need to be concerned about thread-safety for their internal state when accessed *only* by these JLBH-invoked lifecycle methods. -** CPU affinity for this thread can be suggested via `JLBHOptions.acquireLock`. - -* **Event Loop Integration**: -** As an alternative to `JLBH.start()`, the benchmark can be driven by an external `EventLoop` (e.g., from Chronicle Services) by calling `JLBH.eventLoopHandler(EventLoop)`. -** Coordinated omission accounting must remain enabled; calling `eventLoopHandler` with `accountForCoordinatedOmission(false)` throws an `UnsupportedOperationException`. -** In this mode, the warm-up and measurement iterations are scheduled and executed on the provided event loop's thread, using internal `EventHandler` implementations (`WarmupHandler`, `JLBHEventHandler`). - -* **User-Spawned Threads within `JLBHTask`**: -** The code being benchmarked *within* `JLBHTask.run()` (i.e., the "business logic") *can* be multi-threaded if the application itself is multi-threaded. -** However, critical interactions with the `JLBH` instance *must* be managed carefully. -*** Calling `jlbh.sample()` or methods on `NanoSampler` probes (like `probe.sampleNanos()`) should ideally be done only by the thread that `JLBH` used to call `JLBHTask.run()`. -*** The `JLBH` class is annotated `@SingleThreaded`, and its internal state (including the histograms) is generally not designed for concurrent modification from multiple user threads without external synchronization. +Primary Harness Thread :: +* When `JLBH.start()` is called, it orchestrates the entire benchmark lifecycle (initialization, warm-up, measurement runs, completion) on the _calling thread_. +* All `JLBHTask` lifecycle methods (`init`, `run`, `warmedUp`, `runComplete`, `complete`) are invoked sequentially on this single harness thread. +** This implies that `JLBHTask` implementations generally do not need to be concerned about thread-safety for their internal state when accessed _only_ by these JLBH-invoked lifecycle methods. +* CPU affinity for this thread can be suggested via `JLBHOptions.acquireLock`. + +Event Loop Integration :: +* As an alternative to `JLBH.start()`, the benchmark can be driven by an external `EventLoop` (e.g., from Chronicle Services) by calling `JLBH.eventLoopHandler(EventLoop)`. +* Coordinated omission accounting must remain enabled; calling `eventLoopHandler` with `accountForCoordinatedOmission(false)` throws an `UnsupportedOperationException`. +* In this mode, the warm-up and measurement iterations are scheduled and executed on the provided event loop's thread, using internal `EventHandler` implementations (`WarmupHandler`, `JLBHEventHandler`). + +User-Spawned Threads within `JLBHTask` :: +* The code being benchmarked _within_ `JLBHTask.run()` (i.e., the "business logic") _can_ be multi-threaded if the application itself is multi-threaded. +* However, critical interactions with the `JLBH` instance _must_ be managed carefully. +** Calling `jlbh.sample()` or methods on `NanoSampler` probes (like `probe.sampleNanos()`) should ideally be done only by the thread that `JLBH` used to call `JLBHTask.run()`. +** The `JLBH` class is annotated `@SingleThreaded`, and its internal state (including the histograms) is generally not designed for concurrent modification from multiple user threads without external synchronization. Chronicle Core's `Histogram` has specific single-writer assumptions unless explicitly used with synchronization. -* **`OSJitterMonitor` Thread**: -** If OS jitter recording is enabled (`JLBHOptions.recordOSJitter == true`), a dedicated background thread (`OSJitterMonitor`) is started. -** This thread runs independently of the main harness thread. -** Its sole purpose is to repeatedly sample `System.nanoTime()` to detect scheduling delays or pauses exceeding `JLBHOptions.recordJitterGreaterThanNs` and record these durations into a dedicated `Histogram`. -** Its CPU affinity can be optionally controlled via `JLBHOptions.jitterAffinity`. +`OSJitterMonitor` Thread :: +* If OS jitter recording is enabled (`JLBHOptions.recordOSJitter == true`), a dedicated background thread (`OSJitterMonitor`) is started. +* This thread runs independently of the main harness thread. +* Its sole purpose is to repeatedly sample `System.nanoTime()` to detect scheduling delays or pauses exceeding `JLBHOptions.recordJitterGreaterThanNs` and record these durations into a dedicated `Histogram`. +* Its CPU affinity can be optionally controlled via `JLBHOptions.jitterAffinity`. -* **Result Consumption**: -** The `JLBHResult` object containing all benchmark data is immutable by design. -** To safely access this result from a thread other than the main harness thread (e.g., an application thread wanting to query results after a test run), `JLBHResultConsumer.newThreadSafeInstance()` should be used. +Result Consumption :: +* The `JLBHResult` object containing all benchmark data is immutable by design. +* To safely access this result from a thread other than the main harness thread (e.g., an application thread wanting to query results after a test run), `JLBHResultConsumer.newThreadSafeInstance()` should be used. This creates a `ThreadSafeJLBHResultConsumer` which uses a `volatile` field to ensure proper publication of the immutable result. == Data Management and Results JLBH pays careful attention to how latency data is recorded, structured, and reported: -* **Latency Recording**: -** All latency durations are measured and recorded in nanoseconds. -** `System.nanoTime()` is the fundamental source for time measurements, chosen for its high resolution and (on most systems) monotonic behavior. -** For each active probe (end-to-end, user-defined additional probes, OS jitter), a `net.openhft.chronicle.core.util.Histogram` instance is maintained. -*** These histograms are configured by `JLBH.createHistogram()` for high precision (typically 8 significant figures of value) and a wide dynamic range (typically 35 bits, covering from nanoseconds to many seconds). +Latency Recording :: +* All latency durations are measured and recorded in nanoseconds. +* `System.nanoTime()` is the fundamental source for time measurements, chosen for its high resolution and (on most systems) monotonic behavior. +* For each active probe (end-to-end, user-defined additional probes, OS jitter), a `net.openhft.chronicle.core.util.Histogram` instance is maintained. +** These histograms are configured by `JLBH.createHistogram()` for high precision (typically 8 significant figures of value) and a wide dynamic range (typically 35 bits, covering from nanoseconds to many seconds). -* **Immutable Results**: -** A core design principle is the immutability of benchmark results. +Immutable Results :: +* A core design principle is the immutability of benchmark results. Once a benchmark or a run is complete, the generated data is fixed. -** The `JLBHResult` interface and its constituent parts (`ProbeResult`, `RunResult`) are designed to be immutable. -** This is enforced by internal implementations like `ImmutableJLBHResult`, `ImmutableProbeResult`, and `ImmutableRunResult`. -** Immutability simplifies reasoning about results, ensures thread-safe access for consumers, and allows for reliable and repeatable reporting. - -* **Structure of Results (`JLBHResult`)**: -** The top-level `JLBHResult` object provides access to: -*** `endToEnd()`: Returns a `ProbeResult` for the overall end-to-end latency measurements. -*** `probe(String probeName)`: Returns an `Optional` for any user-defined probe, identified by its name. -*** `osJitter()`: Returns an `Optional` for the OS jitter measurements, if recorded. -** Each `JLBHResult.ProbeResult` object contains: -*** `summaryOfLastRun()`: A `RunResult` detailing the statistics of the final measurement run for that probe. -*** `eachRunSummary()`: A `List`, providing the statistics for every measurement run of that probe, in execution order. -** Each `JLBHResult.RunResult` object contains: -*** `percentiles()`: A `Map`, mapping enum keys (e.g., `PERCENTILE_50TH`, `PERCENTILE_99TH`, `WORST`) to their corresponding latency `Duration` values. -*** Convenience getter methods like `get50thPercentile()`, `get99thPercentile()`, `getWorst()`. - -* **Output and Serialization**: -** Human-readable summaries are printed to the configured `PrintStream` (default `System.out`) during and after benchmark runs. +* The `JLBHResult` interface and its constituent parts (`ProbeResult`, `RunResult`) are designed to be immutable. +* This is enforced by internal implementations like `ImmutableJLBHResult`, `ImmutableProbeResult`, and `ImmutableRunResult`. +* Immutability simplifies reasoning about results, ensures thread-safe access for consumers, and allows for reliable and repeatable reporting. + +Structure of Results (`JLBHResult`) :: +* The top-level `JLBHResult` object provides access to: +** `endToEnd()`: Returns a `ProbeResult` for the overall end-to-end latency measurements. +** `probe(String probeName)`: Returns an `Optional` for any user-defined probe, identified by its name. +** `osJitter()`: Returns an `Optional` for the OS jitter measurements, if recorded. +* Each `JLBHResult.ProbeResult` object contains: +** `summaryOfLastRun()`: A `RunResult` detailing the statistics of the final measurement run for that probe. +** `eachRunSummary()`: A `List`, providing the statistics for every measurement run of that probe, in execution order. +* Each `JLBHResult.RunResult` object contains: +** `percentiles()`: A `Map`, mapping enum keys (e.g., `PERCENTILE_50TH`, `PERCENTILE_99TH`, `WORST`) to their corresponding latency `Duration` values. +** Convenience getter methods like `get50thPercentile()`, `get99thPercentile()`, `getWorst()`. + +Output and Serialization :: +* Human-readable summaries are printed to the configured `PrintStream` (default `System.out`) during and after benchmark runs. This includes percentile tables for each probe and overall summaries. -** Programmatic access to the structured, immutable results is provided via the `JLBHResultConsumer` and the `JLBHResult` interface. -** Results can be serialized to CSV format using `net.openhft.chronicle.jlbh.util.JLBHResultSerializer`, facilitating offline analysis or import into spreadsheets and other tools. -** Integration with CI systems like TeamCity is supported via `net.openhft.chronicle.jlbh.TeamCityHelper`, which can emit build statistics. +* Programmatic access to the structured, immutable results is provided via the `JLBHResultConsumer` and the `JLBHResult` interface. +* Results can be serialized to CSV format using `net.openhft.chronicle.jlbh.util.JLBHResultSerializer`, facilitating offline analysis or import into spreadsheets and other tools. +* Integration with CI systems like TeamCity is supported via `net.openhft.chronicle.jlbh.TeamCityHelper`, which can emit build statistics. diff --git a/src/main/adoc/benchmark-lifecycle.adoc b/src/main/adoc/benchmark-lifecycle.adoc index 98d8b710..cdf780e6 100644 --- a/src/main/adoc/benchmark-lifecycle.adoc +++ b/src/main/adoc/benchmark-lifecycle.adoc @@ -20,70 +20,85 @@ The key phases are: This initial phase prepares the benchmark environment: -* **JLBH Configuration**: The user instantiates `JLBHOptions` to define all benchmark parameters (e.g., iterations, throughput, runs, coordinated omission settings). +JLBH Configuration :: +The user instantiates `JLBHOptions` to define all benchmark parameters (e.g., iterations, throughput, runs, coordinated omission settings). An instance of `JLBH` is then created using these options. -* **Task Initialization**: The `JLBHTask.init(JLBH jlbh)` method of the user-provided task is invoked. -** This allows the benchmark task to perform its own one-time setup, such as initializing resources or fixtures. -** Crucially, this is where additional measurement probes can be registered by calling `jlbh.addProbe("probeName")` to get `NanoSampler` instances for specific sub-sections of the task. -* **OS Jitter Monitoring**: Jitter tracking is enabled by default (`recordOSJitter` defaults to true); unless disabled via `recordOSJitter(false)`, the `OSJitterMonitor` background thread starts to measure operating system scheduling jitter independently. -* **Affinity**: If an `acquireLock` supplier is configured in `JLBHOptions`, an attempt to acquire CPU affinity for the main benchmark thread might occur. +Task Initialization :: +The `JLBHTask.init(JLBH jlbh)` method of the user-provided task is invoked. +* This allows the benchmark task to perform its own one-time setup, such as initializing resources or fixtures. +* Crucially, this is where additional measurement probes can be registered by calling `jlbh.addProbe("probeName")` to get `NanoSampler` instances for specific sub-sections of the task. +OS Jitter Monitoring :: +Jitter tracking is enabled by default (`recordOSJitter` defaults to true); unless disabled via `recordOSJitter(false)`, the `OSJitterMonitor` background thread starts to measure operating system scheduling jitter independently. +Affinity :: +If an `acquireLock` supplier is configured in `JLBHOptions`, an attempt to acquire CPU affinity for the main benchmark thread might occur. == Warmup Before any measurements are formally recorded, the harness executes a warm-up phase: -* **Warmup Iterations**: The `JLBHTask.run(long startTimeNs)` method is called `warmUpIterations` times (as specified in `JLBHOptions`). +Warmup Iterations :: +The `JLBHTask.run(long startTimeNs)` method is called `warmUpIterations` times (as specified in `JLBHOptions`). The `startTimeNs` passed is typically just `System.nanoTime()` for this phase. -** The primary goal is to allow the Java Virtual Machine (JVM) to perform Just-In-Time (JIT) compilation and optimize the benchmarked code paths. -** It also helps in bringing caches to a "warm" state and letting the system reach a more stable performance profile. -* **Sample Discarding**: Latency samples generated during the warmup phase are usually recorded into the histograms but are *not* part of the final reported results. +* The primary goal is to allow the Java Virtual Machine (JVM) to perform Just-In-Time (JIT) compilation and optimize the benchmarked code paths. +* It also helps in bringing caches to a "warm" state and letting the system reach a more stable performance profile. +Sample Discarding :: +Latency samples generated during the warmup phase are usually recorded into the histograms but are _not_ part of the final reported results. The histograms are reset after this phase. -* **Task Notification**: After all warmup iterations are complete, the `JLBHTask.warmedUp()` method is called. -* **Optional Pause**: If `JLBHOptions.pauseAfterWarmupMS` is greater than zero, JLBH will pause for the specified duration before proceeding to the execution phase. +Task Notification :: +After all warmup iterations are complete, the `JLBHTask.warmedUp()` method is called. +Optional Pause :: +If `JLBHOptions.pauseAfterWarmupMS` is greater than zero, JLBH will pause for the specified duration before proceeding to the execution phase. == Execution (Measurement Runs) This is the core phase where timed iterations are performed and latency data is collected. This phase consists of one or more "runs" (as configured by `JLBHOptions.runs`): -* **Per Run**: -** The following steps are repeated for each configured run. -** **Iteration Loop**: The `JLBH` orchestrator executes a loop for the number of `iterations` specified in `JLBHOptions`. -*** **Start Time Calculation**: For each iteration, JLBH calculates an ideal `startTimeNs`. +Per Run :: +* The following steps are repeated for each configured run. +* *Iteration Loop*: The `JLBH` orchestrator executes a loop for the number of `iterations` specified in `JLBHOptions`. +** *Start Time Calculation*: For each iteration, JLBH calculates an ideal `startTimeNs`. This calculation considers: -**** The configured `throughput` (e.g., messages per second). -**** The `LatencyDistributor` strategy, if one is set, to potentially vary the inter-iteration delay. -**** If `accountForCoordinatedOmission` is true (the default), `startTimeNs` represents the ideal scheduled time for the operation. +*** The configured `throughput` (e.g., messages per second). +*** The `LatencyDistributor` strategy, if one is set, to potentially vary the inter-iteration delay. +*** If `accountForCoordinatedOmission` is true (the default), `startTimeNs` represents the ideal scheduled time for the operation. JLBH will then busy-wait (spin) until `System.nanoTime()` reaches this `startTimeNs`, ensuring the task is dispatched as close as possible to its intended schedule, thus accounting for delays that might otherwise be missed. -*** **Task Invocation**: The `JLBHTask.run(startTimeNs)` method is invoked with the calculated (and potentially waited-for) `startTimeNs`. -*** **Sample Recording**: -**** Within the `JLBHTask.run()` method, the user's code is executed. -**** To record the primary end-to-end latency for the iteration, the task must call `jlbh.sample(System.nanoTime() - startTimeNs)`. -**** For any custom probes registered during setup, the task can call `myProbe.sampleNanos(duration)` to record latencies for specific sub-stages. -**** Each recorded sample is added to the respective `Histogram` associated with the sampler (either the main JLBH sampler or a custom probe's sampler). -** **Run Completion**: After all iterations for the current run are finished: -*** The `JLBHTask.runComplete()` method is invoked. -*** Statistics for this specific run (e.g., percentiles derived from the histograms) are typically printed to the configured `PrintStream`. -*** The collected histogram data for this run is processed and stored internally for the final `JLBHResult`. -*** All probe histograms are then reset to ensure that measurements for the next run (if any) are independent. +** *Task Invocation*: The `JLBHTask.run(startTimeNs)` method is invoked with the calculated (and potentially waited-for) `startTimeNs`. +** *Sample Recording*: +*** Within the `JLBHTask.run()` method, the user's code is executed. +*** To record the primary end-to-end latency for the iteration, the task must call `jlbh.sample(System.nanoTime() - startTimeNs)`. +*** For any custom probes registered during setup, the task can call `myProbe.sampleNanos(duration)` to record latencies for specific sub-stages. +*** Each recorded sample is added to the respective `Histogram` associated with the sampler (either the main JLBH sampler or a custom probe's sampler). +* *Run Completion*: After all iterations for the current run are finished: +** The `JLBHTask.runComplete()` method is invoked. +** Statistics for this specific run (e.g., percentiles derived from the histograms) are typically printed to the configured `PrintStream`. +** The collected histogram data for this run is processed and stored internally for the final `JLBHResult`. +** All probe histograms are then reset to ensure that measurements for the next run (if any) are independent. == Reporting After all measurement runs are completed, the aggregated results are finalized and made available: -* **Final Summary**: A comprehensive summary, often comparing results across all runs and highlighting percentile variations, is printed to the `PrintStream`. -* **Result Object Creation**: The final, immutable `JLBHResult` object is constructed. +Final Summary :: +A comprehensive summary, often comparing results across all runs and highlighting percentile variations, is printed to the `PrintStream`. +Result Object Creation :: +The final, immutable `JLBHResult` object is constructed. This object encapsulates all recorded data for all probes across all runs. -* **Result Consumption**: If a `JLBHResultConsumer` was provided during JLBH setup, its `accept(JLBHResult)` method is called, passing the `JLBHResult` object. +Result Consumption :: +If a `JLBHResultConsumer` was provided during JLBH setup, its `accept(JLBHResult)` method is called, passing the `JLBHResult` object. This allows for programmatic access to the detailed results, for example, for assertion in automated tests or for custom serialization. -* **External Tools**: Users can further process the `JLBHResult` using utilities like: -** `JLBHResultSerializer` to write results to a CSV file for analysis in spreadsheets or other tools. -** `TeamCityHelper` to output statistics in a format suitable for TeamCity CI server integration. +External Tools :: +Users can further process the `JLBHResult` using utilities like: +* `JLBHResultSerializer` to write results to a CSV file for analysis in spreadsheets or other tools. +* `TeamCityHelper` to output statistics in a format suitable for TeamCity CI server integration. == Cleanup The final phase ensures that any resources are properly released: -* **Task Cleanup**: The `JLBHTask.complete()` method is called, allowing the user's benchmark task to perform any necessary cleanup (e.g., closing files, releasing network connections). -* **OS Jitter Monitor**: If the `OSJitterMonitor` thread was started, it is signaled to terminate. -* **Affinity Release**: If a CPU affinity lock was acquired by JLBH at the start, it is released. +Task Cleanup :: +The `JLBHTask.complete()` method is called, allowing the user's benchmark task to perform any necessary cleanup (e.g., closing files, releasing network connections). +OS Jitter Monitor :: +If the `OSJitterMonitor` thread was started, it is signaled to terminate. +Affinity Release :: +If a CPU affinity lock was acquired by JLBH at the start, it is released. diff --git a/src/main/adoc/jlbh-cookbook.adoc b/src/main/adoc/jlbh-cookbook.adoc index 605398a6..0587d334 100644 --- a/src/main/adoc/jlbh-cookbook.adoc +++ b/src/main/adoc/jlbh-cookbook.adoc @@ -310,7 +310,7 @@ public class MainApp { * The `eventLoopHandler` method installs handlers that drive the benchmark lifecycle on the `EventLoop`'s thread. * This is useful when the code being benchmarked is designed to run on a specific event loop. * `JLBHOptions.accountForCoordinatedOmission(true)` (the default) is essential for `eventLoopHandler` to function correctly. ---- +''' == Using Multiple Probes for Staged Operations @@ -406,7 +406,7 @@ public class MainApp { * This approach is excellent for identifying bottlenecks within a larger operation. * The JLBH output will include separate percentile summaries for "Stage1_Processing", "Stage2_Persistence", and the default "end to end" probe. ---- +''' == Recording and Exporting Results diff --git a/src/main/adoc/results-interpretation-guide.adoc b/src/main/adoc/results-interpretation-guide.adoc index 89452e3f..830d00de 100644 --- a/src/main/adoc/results-interpretation-guide.adoc +++ b/src/main/adoc/results-interpretation-guide.adoc @@ -7,7 +7,7 @@ Chronicle Software :lang: en-GB :icons: font -*_Abstract_*:: +_Abstract_:: This guide explains how to read and understand the various outputs and summaries generated by Chronicle JLBH. Effective interpretation is key to deriving meaningful insights from your latency benchmarks. @@ -16,29 +16,32 @@ Effective interpretation is key to deriving meaningful insights from your latenc When JLBH runs, it typically prints information to the console (or a configured `PrintStream`). Understanding this output is the first step: -* **Warm-up Messages**: Indicates the progress and completion of the warm-up phase (e.g., "Warm up complete (X iterations took Y s)"). -* **Per-Run Detailed Output**: For each measurement run, JLBH prints a section: -** Header indicating the run number (e.g., "BENCHMARK RESULTS (RUN 1) us"). -** Run metadata: Total run time, latency distribution strategy, coordinated omission status, target throughput. -** **Probe Statistics**: For each active probe (the default "End to End" probe, any custom probes added via `jlbh.addProbe()`, and the "OS Jitter" probe if enabled), a line showing: -*** Probe name and total samples recorded for that probe in the run. -*** A compact representation of key percentiles (e.g., "50/90 99/99.9 99.99 - worst was X / Y Z / A B - C"). +Warm-up Messages :: +Indicates the progress and completion of the warm-up phase (e.g., "Warm up complete (X iterations took Y s)"). +Per-Run Detailed Output :: +For each measurement run, JLBH prints a section: +* Header indicating the run number (e.g., "BENCHMARK RESULTS (RUN 1) us"). +* Run metadata: Total run time, latency distribution strategy, coordinated omission status, target throughput. +* *Probe Statistics*: For each active probe (the default "End to End" probe, any custom probes added via `jlbh.addProbe()`, and the "OS Jitter" probe if enabled), a line showing: +** Probe name and total samples recorded for that probe in the run. +** A compact representation of key percentiles (e.g., "50/90 99/99.9 99.99 - worst was X / Y Z / A B - C"). These values are usually in microseconds. -* **Final Summary Tables**: After all runs are complete, JLBH prints summary tables for each probe: -** Header: "SUMMARY (probe_name) us". -** Rows for key percentiles (50.0, 90.0, 99.0, 99.9, etc., up to 'worst'). -** Columns for each run, showing the latency value for that percentile in that run. -** A final "% Variation" column, indicating the percentage difference between the highest and lowest values for that percentile across the runs (excluding the first run by default if `runs > 3` or if `skipFirstRun` is explicitly true). +Final Summary Tables :: +After all runs are complete, JLBH prints summary tables for each probe: +* Header: "SUMMARY (probe_name) us". +* Rows for key percentiles (50.0, 90.0, 99.0, 99.9, etc., up to 'worst'). +* Columns for each run, showing the latency value for that percentile in that run. +* A final "% Variation" column, indicating the percentage difference between the highest and lowest values for that percentile across the runs (excluding the first run by default if `runs > 3` or if `skipFirstRun` is explicitly true). == Interpreting Latency Histograms and Percentiles JLBH uses histograms to capture the full distribution of measured latencies, rather than just averages which can hide important details. These histograms are then used to derive percentile values. -* **What is a Percentile?** +* *What is a Percentile?* ** A percentile indicates the value below which a given percentage of observations fall. For example, the 99th percentile latency is the value X where 99% of all measured latencies were less than or equal to X, and 1% were greater. -* **Key Percentiles in JLBH Output**: +Key Percentiles in JLBH Output :: ** `50.0` (Median): The middle value; 50% of latencies are lower, 50% are higher. It gives a sense of the "typical" latency but can be misleading on its own if the distribution is skewed. ** `90.0`: 90% of requests were faster than this value. @@ -48,34 +51,38 @@ Indicates that 1 out of 100 requests experienced this latency or worse. ** `99.99` (99.99th or "p9999", "four nines"): Critical for systems demanding very high consistency in performance. ** `worst`: The single highest latency recorded during the run. This can be affected by outliers but is important to note, especially if it's dramatically different from high percentiles. -* **Why Percentiles Matter**: +Why Percentiles Matter :: ** Averages can be heavily skewed by a small number of very low or very high latencies, masking the true experience of most requests. ** Tail latencies (e.g., p99 and above) often drive user-perceived performance and system stability. A system with a good average latency but terrible tail latency can still feel unresponsive or unreliable. -* **Units**: Latency values in JLBH output are typically shown in microseconds (`us`) or sometimes nanoseconds (`ns`) if the values are very small. +Units :: +Latency values in JLBH output are typically shown in microseconds (`us`) or sometimes nanoseconds (`ns`) if the values are very small. Pay attention to the units indicated in the table headers. -* **Custom Probes**: If you've added custom probes (e.g., `jlbh.addProbe("MyStage")`), each will have its own percentile table in the output. +Custom Probes :: +If you've added custom probes (e.g., `jlbh.addProbe("MyStage")`), each will have its own percentile table in the output. Analyzing these helps pinpoint which stage of an operation contributes most to the overall end-to-end latency. == Co-ordinated Omission (CO) Co-ordinated Omission is a critical concept in latency measurement, especially for systems that process requests at a certain rate. -* **The Problem**: If a system (or the benchmark harness itself) pauses or slows down, it might temporarily stop measuring. -When it resumes, it might only measure the (potentially fast) service times of requests that are processed immediately, "omitting" the time that other requests would have spent waiting *during* the pause. +The Problem :: +If a system (or the benchmark harness itself) pauses or slows down, it might temporarily stop measuring. +When it resumes, it might only measure the (potentially fast) service times of requests that are processed immediately, "omitting" the time that other requests would have spent waiting _during_ the pause. This leads to unrealistically optimistic latency figures, particularly at the tail. -* **How JLBH Compensates**: -** JLBH addresses this by default (`accountForCoordinatedOmission = true` in `JLBHOptions`). -** For each iteration, JLBH calculates an *ideal scheduled start time* based on the target throughput. -** It then actively waits (busy-spins using `System.nanoTime()`) until this ideal start time is reached before invoking `JLBHTask.run(startTimeNs)` with that ideal time. -** The latency is then calculated as `System.nanoTime() - idealStartTimeNs`. -** This means if the system or harness was delayed, the measured latency for that sample will correctly include that delay, reflecting the experience a request would have had if it arrived at its scheduled time. -* **Impact on Results**: -** *With CO compensation (default)*: Histograms reflect a more accurate picture of the latency a user or client would experience, including any queuing or delays caused by the system struggling to keep up with the offered load. +How JLBH Compensates :: +* JLBH addresses this by default (`accountForCoordinatedOmission = true` in `JLBHOptions`). +* For each iteration, JLBH calculates an _ideal scheduled start time_ based on the target throughput. +* It then actively waits (busy-spins using `System.nanoTime()`) until this ideal start time is reached before invoking `JLBHTask.run(startTimeNs)` with that ideal time. +* The latency is then calculated as `System.nanoTime() - idealStartTimeNs`. +* This means if the system or harness was delayed, the measured latency for that sample will correctly include that delay, reflecting the experience a request would have had if it arrived at its scheduled time. +Impact on Results :: +* _With CO compensation (default)_: Histograms reflect a more accurate picture of the latency a user or client would experience, including any queuing or delays caused by the system struggling to keep up with the offered load. Tails are usually higher and more realistic. -** *Without CO compensation*: Histograms show only the raw processing time for the operations that were actually executed, potentially missing significant backpressure effects. +* _Without CO compensation_: Histograms show only the raw processing time for the operations that were actually executed, potentially missing significant backpressure effects. This can be useful for understanding raw compute time but not overall system responsiveness under load. -* **Comparison**: Always state whether CO compensation was enabled when comparing benchmark results. +Comparison :: +Always state whether CO compensation was enabled when comparing benchmark results. Comparing a CO-compensated run with a non-compensated run is misleading. JLBH prints "Correcting for co-ordinated:true/false" in each run's output. @@ -83,44 +90,46 @@ JLBH prints "Correcting for co-ordinated:true/false" in each run's output. JLBH can optionally measure Operating System (OS) jitter, which refers to delays caused by the OS scheduler or other system activities interrupting the benchmark process. -* **Mechanism**: -** Jitter monitoring is enabled by default (`recordOSJitter` defaults to true); a dedicated background thread (`OSJitterMonitor`) starts unless you explicitly disable it with `recordOSJitter(false)`. -** This thread repeatedly calls `System.nanoTime()` in a tight loop and measures the time difference between consecutive calls. -** If this difference (a "gap") exceeds a configurable threshold (`recordJitterGreaterThanNs`, default 1 microsecond), it's recorded as an OS jitter event in a separate "OS Jitter" histogram. -* **Interpreting Jitter Output**: -** The "OS Jitter" probe will have its own percentile table in the JLBH output. -** *High jitter values* (e.g., multiple milliseconds at p99 or worst) suggest that the OS, other processes, or hardware interrupts are frequently pausing the benchmark threads for significant periods. +Mechanism :: +* Jitter monitoring is enabled by default (`recordOSJitter` defaults to true); a dedicated background thread (`OSJitterMonitor`) starts unless you explicitly disable it with `recordOSJitter(false)`. +* This thread repeatedly calls `System.nanoTime()` in a tight loop and measures the time difference between consecutive calls. +* If this difference (a "gap") exceeds a configurable threshold (`recordJitterGreaterThanNs`, default 1 microsecond), it's recorded as an OS jitter event in a separate "OS Jitter" histogram. +Interpreting Jitter Output :: +* The "OS Jitter" probe will have its own percentile table in the JLBH output. +* _High jitter values_ (e.g., multiple milliseconds at p99 or worst) suggest that the OS, other processes, or hardware interrupts are frequently pausing the benchmark threads for significant periods. This is "machine noise." -** *Low jitter values* (e.g., a few microseconds) indicate that the benchmark process was generally ableto run without significant external interference from the OS. -* **Why it's Useful**: -** Helps distinguish between latency caused by your application code versus latency caused by the environment. -** If your application shows high tail latencies, and the OS Jitter probe also shows high values around the same magnitude, it's a strong indicator that external factors are at play. -* **Actions for High Jitter**: -** Ensure the benchmark is running on a quiet machine with minimal other load. -** Utilize CPU isolation / shielding for benchmark threads and the OS jitter thread (see `JLBHOptions.acquireLock` and `jitterAffinity`). -** Investigate system settings (e.g., power saving modes, transparent huge pages, kernel scheduler options, interrupt coalescing). +* _Low jitter values_ (e.g., a few microseconds) indicate that the benchmark process was generally ableto run without significant external interference from the OS. +Why it's Useful :: +* Helps distinguish between latency caused by your application code versus latency caused by the environment. +* If your application shows high tail latencies, and the OS Jitter probe also shows high values around the same magnitude, it's a strong indicator that external factors are at play. +Actions for High Jitter :: +* Ensure the benchmark is running on a quiet machine with minimal other load. +* Utilize CPU isolation / shielding for benchmark threads and the OS jitter thread (see `JLBHOptions.acquireLock` and `jitterAffinity`). +* Investigate system settings (e.g., power saving modes, transparent huge pages, kernel scheduler options, interrupt coalescing). == Analysing Run-to-Run Variation Latency measurements can vary between successive runs of the same benchmark on the same machine due to various factors. JLBH's summary output includes a "% Variation" column for each percentile. -* **The Variation Column**: This shows the percentage difference between the maximum and minimum values recorded for a given percentile across all included runs. -** By default (`JLBHOptions.skipFirstRun = NOT_SET`), if there are more than 3 runs, the first run is often excluded from this variation calculation as it might still be affected by late-stage JIT or other one-off initializations. +The Variation Column :: +This shows the percentage difference between the maximum and minimum values recorded for a given percentile across all included runs. +* By default (`JLBHOptions.skipFirstRun = NOT_SET`), if there are more than 3 runs, the first run is often excluded from this variation calculation as it might still be affected by late-stage JIT or other one-off initializations. You can control this explicitly with `skipFirstRun(boolean)`. -* **Interpreting Variation**: -** *Low Variation (e.g., < 5-10%)*: Suggests a stable benchmarking environment and that your application's performance is repeatable under the test conditions. +Interpreting Variation :: +* _Low Variation (e.g., < 5-10%)_: Suggests a stable benchmarking environment and that your application's performance is repeatable under the test conditions. This increases confidence in the results. -** *High Variation*: Indicates instability. +* _High Variation_: Indicates instability. Possible causes include: -*** **Insufficient Warm-up**: The JVM might still be optimizing code differently across runs. +** *Insufficient Warm-up*: The JVM might still be optimizing code differently across runs. Try increasing `warmUpIterations`. -*** **Garbage Collection (GC) Pauses**: Infrequent but long GC pauses can hit some runs but not others, drastically affecting tail percentiles. +** *Garbage Collection (GC) Pauses*: Infrequent but long GC pauses can hit some runs but not others, drastically affecting tail percentiles. Monitor GC activity. -*** **External System Noise**: Other processes on the machine, network fluctuations (if applicable), or even hardware power management. -*** **Adaptive JVM Behavior**: Some JVM optimizations are adaptive and can change behavior between runs if conditions shift slightly. -*** **Benchmarked Code Instability**: The code itself might have inherent variability or be interacting with unstable external resources. -* **Goal**: Aim for reasonably tight spreads to ensure your conclusions are based on consistent behavior. +** *External System Noise*: Other processes on the machine, network fluctuations (if applicable), or even hardware power management. +** *Adaptive JVM Behavior*: Some JVM optimizations are adaptive and can change behavior between runs if conditions shift slightly. +** *Benchmarked Code Instability*: The code itself might have inherent variability or be interacting with unstable external resources. +Goal :: +Aim for reasonably tight spreads to ensure your conclusions are based on consistent behavior. Focus on understanding systematic performance rather than chasing the single "best" score from one run. Longer runs (`iterations`) can also help smooth out some variations and capture rarer events more consistently. @@ -128,74 +137,79 @@ Longer runs (`iterations`) can also help smooth out some variations and capture A fundamental aspect of performance is the trade-off between throughput (how much work is done per unit of time) and latency (how long each piece of work takes). -* **JLBH's Role**: JLBH allows you to set a target `throughput` via `JLBHOptions`. +JLBH's Role :: +JLBH allows you to set a target `throughput` via `JLBHOptions`. This controls the rate at which `JLBHTask.run()` is invoked. -* **Typical Behavior**: -** At low throughputs, a system can often maintain low latency. -** As throughput increases, contention for resources (CPU, memory, network, locks) typically rises. -** Beyond a certain point, latency will start to increase, often non-linearly. +Typical Behavior :: +* At low throughputs, a system can often maintain low latency. +* As throughput increases, contention for resources (CPU, memory, network, locks) typically rises. +* Beyond a certain point, latency will start to increase, often non-linearly. Queues may build up, and the system struggles to keep pace. -* **Using JLBH to Explore**: -** Run your benchmark with different `throughput` settings to understand how your application's latency responds to varying load. -** This helps identify the "sweet spot" for your system or pinpoint throughput levels where performance starts to degrade significantly. -** The results can inform capacity planning and help set realistic performance expectations. +Using JLBH to Explore :: +* Run your benchmark with different `throughput` settings to understand how your application's latency responds to varying load. +* This helps identify the "sweet spot" for your system or pinpoint throughput levels where performance starts to degrade significantly. +* The results can inform capacity planning and help set realistic performance expectations. == Common Pitfalls in Interpretation Avoiding these common mistakes can help ensure your benchmark results are meaningful and your conclusions are sound: -* **Insufficient Warm-up**: -** *Issue*: The JVM performs JIT compilation, class loading, and cache warming during initial execution. +Insufficient Warm-up :: +* _Issue_: The JVM performs JIT compilation, class loading, and cache warming during initial execution. Short or inadequate warm-up can mean you're measuring pre-optimized code or including one-off startup costs. -** *Solution*: Ensure `warmUpIterations` is sufficient. +* _Solution_: Ensure `warmUpIterations` is sufficient. A common rule of thumb is at least the JVM's compilation threshold (e.g., `Jvm.compileThreshold()` in Chronicle Core, typically 10,000-15,000 iterations) plus some buffer for application-specific warm-up. Monitor if results stabilize after a certain number of runs. -* **Ignoring Co-ordinated Omission Setting**: -** *Issue*: Comparing results where one run had CO compensation enabled and another didn't will lead to incorrect conclusions about performance changes. -** *Solution*: Always be aware of the `accountForCoordinatedOmission` setting (default is `true`) and report it alongside your results. -* **Benchmark Overhead**: -** *Issue*: The act of measuring itself (calling `System.nanoTime()`, `jlbh.sample()`) has a small overhead. +Ignoring Co-ordinated Omission Setting :: +* _Issue_: Comparing results where one run had CO compensation enabled and another didn't will lead to incorrect conclusions about performance changes. +* _Solution_: Always be aware of the `accountForCoordinatedOmission` setting (default is `true`) and report it alongside your results. +Benchmark Overhead :: +* _Issue_: The act of measuring itself (calling `System.nanoTime()`, `jlbh.sample()`) has a small overhead. For extremely fast operations (deep sub-microsecond), this overhead can become a significant percentage of the measured time. -** *Solution*: JLBH is designed for low overhead (aiming for <100ns). +* _Solution_: JLBH is designed for low overhead (aiming for <100ns). Be mindful of this when benchmarking trivial operations. Consider batching very small operations if appropriate, or if the raw speed of an isolated, tiny operation is critical, a more specialized microbenchmarking tool like JMH might be used for that specific part, with JLBH testing the larger context. -* **Unrealistic Throughput Targets**: -** *Issue*: Setting `throughput` far beyond what the system can handle will just measure a system under constant overload, with latencies dominated by queue times. +Unrealistic Throughput Targets :: +* _Issue_: Setting `throughput` far beyond what the system can handle will just measure a system under constant overload, with latencies dominated by queue times. Setting it too low might not expose bottlenecks that appear under moderate load. -** *Solution*: Test a range of throughputs relevant to your expected operational load and capacity limits. -* **Focusing Only on Averages**: -** *Issue*: Averages hide outliers and don't reflect the experience of users hitting tail latencies. -** *Solution*: Prioritize percentile analysis (p50, p90, p99, p99.9, worst) as JLBH encourages. -* **Ignoring Custom Probe Data**: -** *Issue*: If you've set up custom probes for different stages of an operation but only look at the end-to-end numbers, you miss valuable insights into internal bottlenecks. -** *Solution*: Analyze the percentile data for each custom probe. +* _Solution_: Test a range of throughputs relevant to your expected operational load and capacity limits. +Focusing Only on Averages :: +* _Issue_: Averages hide outliers and don't reflect the experience of users hitting tail latencies. +* _Solution_: Prioritize percentile analysis (p50, p90, p99, p99.9, worst) as JLBH encourages. +Ignoring Custom Probe Data :: +* _Issue_: If you've set up custom probes for different stages of an operation but only look at the end-to-end numbers, you miss valuable insights into internal bottlenecks. +* _Solution_: Analyze the percentile data for each custom probe. Compare their contributions to the overall latency. -* **Environmental Inconsistency**: -** *Issue*: Running benchmarks on different hardware, with varying background OS/application load, different JVM versions, or different system configurations will yield different results. -** *Solution*: For comparative analysis, always use a consistent, controlled, and quiet environment. +Environmental Inconsistency :: +* _Issue_: Running benchmarks on different hardware, with varying background OS/application load, different JVM versions, or different system configurations will yield different results. +* _Solution_: For comparative analysis, always use a consistent, controlled, and quiet environment. Document the environment meticulously. -* **Thread Pinning (Affinity) Issues**: -** *Issue*: On multi-core systems, OS thread scheduling can move your benchmark thread between cores, causing cache misses and jitter. +Thread Pinning (Affinity) Issues :: +* _Issue_: On multi-core systems, OS thread scheduling can move your benchmark thread between cores, causing cache misses and jitter. Forgetting to pin, or pinning to a busy/shared core, can skew results. -** *Solution*: Use `JLBHOptions.acquireLock(Affinity::acquireLock)` (from OpenHFT Affinity library) to attempt to pin the main benchmark thread to an isolated core. +* _Solution_: Use `JLBHOptions.acquireLock(Affinity::acquireLock)` (from OpenHFT Affinity library) to attempt to pin the main benchmark thread to an isolated core. Similarly for `jitterAffinity` if using the OS Jitter probe. Ensure these cores are genuinely isolated. -* **Not Enough Iterations/Run Duration**: -** *Issue*: Very short runs (low `iterations`) may not be statistically significant or may fail to capture infrequent, high-latency events like GC pauses or network timeouts. -** *Solution*: Run enough iterations to achieve statistical stability and to give rare events a chance to occur if they are part of the system's behavior under load. +Not Enough Iterations/Run Duration :: +* _Issue_: Very short runs (low `iterations`) may not be statistically significant or may fail to capture infrequent, high-latency events like GC pauses or network timeouts. +* _Solution_: Run enough iterations to achieve statistical stability and to give rare events a chance to occur if they are part of the system's behavior under load. This often means runs lasting at least several seconds, or even minutes for deep stability tests. == Deriving Actionable Insights The ultimate goal of benchmarking is to gain insights that can lead to improvements. -* **Identify Bottlenecks**: -** High end-to-end latency? +Identify Bottlenecks :: +* High end-to-end latency? Use custom probes to break down the operation and see which stage is the culprit. -** High p99 or p99.9 values? +* High p99 or p99.9 values? This often points to issues like GC, network spikes, lock contention, or OS jitter. Correlate with OS Jitter probe and GC logs. -* **Validate Optimizations**: Run benchmarks before and after a code change to quantify its impact on latency across different percentiles and throughputs. -* **Capacity Planning**: Determine the throughput your system can handle while meeting latency SLOs. -* **Regression Testing**: Integrate JLBH benchmarks into your CI/CD pipeline to catch performance regressions automatically. -* **Understand Trade-offs**: Use JLBH to explore the impact of different configurations, algorithms, or architectural choices on the latency profile. +Validate Optimizations :: +Run benchmarks before and after a code change to quantify its impact on latency across different percentiles and throughputs. +Capacity Planning :: +Determine the throughput your system can handle while meeting latency SLOs. +Regression Testing :: +Integrate JLBH benchmarks into your CI/CD pipeline to catch performance regressions automatically. +Understand Trade-offs :: +Use JLBH to explore the impact of different configurations, algorithms, or architectural choices on the latency profile. From 979350c847e3721b8debcf746c2f2cbc5e7630e5 Mon Sep 17 00:00:00 2001 From: Peter Lawrey Date: Tue, 28 Oct 2025 12:32:54 +0000 Subject: [PATCH 09/12] Uses Java 8 compatible version --- pom.xml | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/pom.xml b/pom.xml index cc18a575..30e2cd83 100644 --- a/pom.xml +++ b/pom.xml @@ -30,8 +30,8 @@ openhft https://sonarcloud.io 3.6.0 - 10.26.1 - 4.9.8.1 + 8.45.1 + 4.8.6.6 1.14.0 3.28.0 0.8.14 From 036f4b5b0b53559600ac5ff342f0e7dc88492444 Mon Sep 17 00:00:00 2001 From: Peter Lawrey Date: Tue, 28 Oct 2025 12:44:20 +0000 Subject: [PATCH 10/12] Adjust JaCoCo coverage gates --- pom.xml | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/pom.xml b/pom.xml index 30e2cd83..a33396e9 100644 --- a/pom.xml +++ b/pom.xml @@ -35,8 +35,8 @@ 1.14.0 3.28.0 0.8.14 - 0.80 - 0.70 + 0.883 + 0.786 1.23ea6 From f43779e74cdfa1dd339a33ecb3731643964e4d4d Mon Sep 17 00:00:00 2001 From: Peter Lawrey Date: Tue, 28 Oct 2025 14:12:55 +0000 Subject: [PATCH 11/12] Update character set policy from ASCII-7 to ISO-8859-1 in documentation --- AGENTS.md | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/AGENTS.md b/AGENTS.md index 46e0b3cf..2e0e049c 100644 --- a/AGENTS.md +++ b/AGENTS.md @@ -11,8 +11,8 @@ LLM-based agents can accelerate development only if they respect our house rules | Requirement | Rationale | |--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|--------------------------------------------------------------------------------------------------------------------------------| | **British English** spelling (`organisation`, `licence`, *not* `organization`, `license`) except technical US spellings like `synchronized` | Keeps wording consistent with Chronicle's London HQ and existing docs. See the University of Oxford style guide for reference. | -| **ASCII-7 only** (code-points 0-127). Avoid smart quotes, non-breaking spaces and accented characters. | ASCII-7 survives every toolchain Chronicle uses, incl. low-latency binary wire formats that expect the 8th bit to be 0. | -| If a symbol is not available in ASCII-7, use a textual form such as `micro-second`, `>=`, `:alpha:`, `:yes:`. This is the preferred approach and Unicode must not be inserted. | Extended or '8-bit ASCII' variants are *not* portable and are therefore disallowed. | +| **ISO-8859-1** (code-points 0-255). Avoid smart quotes, non-breaking spaces and accented characters. | ISO-8859-1 survives every toolchain Chronicle uses, incl. low-latency binary wire formats that expect the 8th bit to be 0. | +| If a symbol is not available in ISO-8859-1, use a textual form such as `micro-second`, `>=`, `:alpha:`, `:yes:`. This is the preferred approach and Unicode must not be inserted. | Extended or '8-bit ASCII' variants are *not* portable and are therefore disallowed. | ## Javadoc guidelines @@ -92,7 +92,7 @@ This tight loop informs the AI accurately and creates immediate clarity for all When using AI agents to assist with development, please adhere to the following guidelines: * **Respect the Language & Character-set Policy**: Ensure all AI-generated content follows the British English and - ASCII-7 guidelines outlined above. + ISO-8859-1 guidelines outlined above. Focus on Clarity: AI-generated documentation should be clear and concise and add value beyond what is already present in the code or existing documentation. * **Avoid Redundancy**: Do not generate content that duplicates existing documentation or code comments unless it From 31d7452d034e2299e9a138ba1719b51e9beee63a Mon Sep 17 00:00:00 2001 From: Peter Lawrey Date: Thu, 30 Oct 2025 11:00:09 +0000 Subject: [PATCH 12/12] Move Checkstyle config under src/main/config --- pom.xml | 2 +- src/main/config/checkstyle.xml | 210 +++++++++++++++++++++++++++++++++ 2 files changed, 211 insertions(+), 1 deletion(-) create mode 100644 src/main/config/checkstyle.xml diff --git a/pom.xml b/pom.xml index a33396e9..7e32bcb3 100644 --- a/pom.xml +++ b/pom.xml @@ -208,7 +208,7 @@ - net/openhft/quality/checkstyle/checkstyle.xml + src/main/config/checkstyle.xml true true warning diff --git a/src/main/config/checkstyle.xml b/src/main/config/checkstyle.xml new file mode 100644 index 00000000..844dd904 --- /dev/null +++ b/src/main/config/checkstyle.xml @@ -0,0 +1,210 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +