Skip to content

Cat 7.b: query latency distribution (YCSB percentiles)#33

Open
jphein wants to merge 2 commits into
M0nkeyFl0wer:mainfrom
techempower-org:feat/cat7b-latency-distribution
Open

Cat 7.b: query latency distribution (YCSB percentiles)#33
jphein wants to merge 2 commits into
M0nkeyFl0wer:mainfrom
techempower-org:feat/cat7b-latency-distribution

Conversation

@jphein
Copy link
Copy Markdown
Contributor

@jphein jphein commented May 25, 2026

Summary

Closes #25.

Adds Cat 7.b — query latency distribution using YCSB-standard percentile reporting (p50/p95/p99/p99.9/max).

  • New sme/categories/latency.py with LatencyCollector and LatencyReport dataclasses
  • LatencyCollector.timed_call(fn, *args) wraps any callable with wall-clock measurement
  • format_latency_report() renders human-readable output
  • Uses numpy percentiles (already a core dependency) — no new dependencies required

Design decisions:

  • numpy percentiles instead of HdrHistogram for the initial implementation — HdrHistogram is more appropriate for sub-ms benchmarks with coordinated omission concerns, but LLM-query latencies (hundreds of ms each) don't need that precision floor. Can upgrade later if warranted.
  • Raw latency array included in report for downstream reaggregation
  • [latency] optional extra added to pyproject.toml for future HdrHistogram upgrade path

Test plan

  • test_latency.py — collector recording, percentile computation, timed_call wrapper, empty report edge case, format output
  • Existing tests pass unchanged

🫏 Generated with Claude Code

Adds sme/categories/latency.py with LatencyCollector that wraps
adapter calls and captures wall-clock timing, computing YCSB-standard
percentiles (p50, p95, p99, p99.9, max) via numpy.

The [latency] optional extra reserves an HdrHistogram upgrade path
when higher-fidelity tail measurement is needed.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings May 25, 2026 01:01
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Note

Copilot was unable to run its full agentic suite in this review.

Adds a new Category 7.b latency distribution utility and accompanying tests, plus an optional dependency stanza for a latency-related extra.

Changes:

  • Introduces LatencyCollector/LatencyReport for recording and reporting query latency percentiles.
  • Adds pytest coverage for collector behavior, dict conversion, and report formatting.
  • Adds a latency optional dependency extra in pyproject.toml.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 5 comments.

File Description
tests/test_latency.py Adds unit tests for latency collection, percentile reporting, dict serialization, and formatting output.
sme/categories/latency.py Implements latency collection/reporting and a human-readable summary formatter.
pyproject.toml Adds a latency optional-deps extra and includes it in the all extra.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread sme/categories/latency.py Outdated
Comment on lines +5 to +6
Uses numpy for percentile calculations; optionally upgrades to
HdrHistogram when the [latency] extra is installed.
Comment thread sme/categories/latency.py Outdated
Comment on lines +11 to +18
import logging
from dataclasses import dataclass, field

import numpy as np

log = logging.getLogger(__name__)


Comment thread sme/categories/latency.py Outdated
Comment on lines +11 to +18
import logging
from dataclasses import dataclass, field

import numpy as np

log = logging.getLogger(__name__)


Comment thread tests/test_latency.py Outdated
Comment on lines +34 to +41
def test_timed_call():
c = LatencyCollector()
def slow_fn(x):
time.sleep(0.01)
return x * 2
result, latency = c.timed_call(slow_fn, 5)
assert result == 10
assert latency >= 10.0
Comment thread pyproject.toml
Comment on lines +50 to +52
latency = [
"hdrhistogram>=0.10",
]
@M0nkeyFl0wer
Copy link
Copy Markdown
Owner

YCSB percentiles (p50/p95/p99/p99.9) are exactly the right shape for query-latency — aggregate means hide the tail, and the tail is where memory systems differentiate. The LatencyCollector API with keep_raw for post-hoc analysis is clean.

Three small things before merge:

  1. HdrHistogram fallback. The docstring promises "optionally upgrade to HdrHistogram when the extra is installed" but the code never imports or uses it, and the optional hdrhistogram dep in pyproject.toml is unreferenced. Either wire it in (the value over np.percentile is preservation across merged runs and lower memory at large N) or drop the doc claim. Probably worth wiring — HdrHistogram's whole reason to exist is that np.percentile on a raw-keep array doesn't scale past ~1M samples.
  2. Dead logging import — drop it.
  3. time.sleep(0.01) test is timing-flaky — replace with a deterministic mock-clock or fixed timestamps.

One framing question for the spec angle: Cat 7.b (latency) and Cat 7 (cost-per-correct, #34) are both landing in the same category. Is the intended Cat 7 readout a single combined section or two side-by-side? The answer changes what shape the headline metric becomes — "cost-per-correct at p95 latency" vs. two independent columns.

…ic latency test

Back LatencyCollector with HdrHistogram when the [latency] extra is
installed (guarded optional import); fall back to np.percentile on the
base install. HdrHistogram gives fixed per-sample memory at large N and
lossless cross-run merging, fulfilling the module docstring's prior
unmet promise. Record microseconds for sub-ms precision.

Remove the unused logging import. Replace the flaky time.sleep(0.01)
timed_call test with a monkeypatched perf_counter clock. Percentile
assertions now tolerate HdrHistogram quantization on the HDR path while
staying exact on the numpy fallback; add HDR-only merge/backend tests.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@jphein
Copy link
Copy Markdown
Contributor Author

jphein commented May 27, 2026

Pushed eb79499:

  1. HdrHistogram wired. Added a latency extra (hdrhistogram>=0.10); the collector uses an HdrHistogram backend when the extra is installed and falls back to numpy percentiles otherwise (guarded import). That delivers the lossless-merge across runs / bounded memory at large N you flagged, while the base install stays dependency-light.
  2. Removed the dead logging import.
  3. Replaced the flaky time.sleep(0.01) test with deterministic fixed timestamps.

CI green on 3.10/3.11/3.12 (the numpy fallback path — the extra isn't installed in CI).

On the Cat 7 framing question (also on #34): going with two side-by-side readouts — latency (this PR) and cost (#34) stay independent so each lands on its own.

🫏

@M0nkeyFl0wer
Copy link
Copy Markdown
Owner

Verified — HdrHistogram backend behind the [latency] extra with numpy fallback is the right shape: base install stays dependency-light, large-N runs get lossless merging and bounded memory. The deterministic monkeypatch(perf_counter) test is much nicer than the sleep version it replaced. Ship.

One downstream methodology question, not a blocker: the HDR path and the numpy path return percentile values that differ by HDR quantization bounds. For published cross-system readings, is one canonical, or does the spec require [latency] installed when reporting numbers? Worth a one-line note in sme_spec_v8.md so two researchers reporting "p99 = 480ms" can't be talking about subtly different things, but doesn't need to gate this merge.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Cat 7.b — latency distribution (YCSB methodology + HdrHistogram)

3 participants