rohitg00 · rohitg00 · May 13, 2026 · May 13, 2026
diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -6,6 +6,14 @@ The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.1.0/),
 
 ## [Unreleased]
 
+### Added
+
+- **`benchmark/load-100k.ts` load harness** ([#346](https://github.com/rohitg00/agentmemory/issues/346)). Hand-rolled, dependency-free harness that seeds N synthetic memories against a local daemon at `http://localhost:3111` and records p50 / p90 / p99 latency + throughput for `POST /agentmemory/remember`, `POST /agentmemory/smart-search`, and `GET /agentmemory/memories?latest=true` across the matrix N ∈ {1k, 10k, 100k} × concurrency C ∈ {1, 10, 100}. Content drawn from a seedable `mulberry32` PRNG so re-running against the same build produces the same seed corpus. Results land in `benchmark/results/load-100k-<short-git-sha>.json` (schema-versioned). Wired as `npm run bench:load`. See `benchmark/README.md` for the matrix and env knobs.
+
+### Performance
+
+- This is the placeholder for per-release p50 / p90 / p99 numbers from `benchmark/load-100k.ts`. Each release should land a `benchmark/results/load-100k-<sha>.json` and reference the headline p99 here. Format suggestion: one bullet per (N, C) cell that materially regressed or improved versus the previous release. p99 is the capacity-planning number; p50 + throughput are context. See [`benchmark/README.md`](benchmark/README.md) for how to reproduce.
+
 ## [0.9.12] — 2026-05-13
 
 Four landed PRs since v0.9.11 — one type-correctness fix, one search-quality fix (BM25 unicode + vector-index live-write), one viewer hardening (CSP-clean fonts + load-error surface), and one integrations security hardening (bearer token over plaintext HTTP).

diff --git a/benchmark/README.md b/benchmark/README.md
@@ -0,0 +1,100 @@
+# benchmark/
+
+Two kinds of numbers live in this directory:
+
+1. **Quality / retrieval** — `longmemeval-bench.ts`, `quality-eval.ts`,
+   `real-embeddings-eval.ts`, `scale-eval.ts`. Recall, precision, token
+   savings. Documented in `LONGMEMEVAL.md`, `QUALITY.md`,
+   `REAL-EMBEDDINGS.md`, `SCALE.md`.
+
+2. **Load shape** — `load-100k.ts`. p50 / p90 / p99 latency and
+   throughput against a running daemon. This is the file you want when
+   somebody asks "what's p99 at 100k memories under concurrency 100?".
+
+## load-100k.ts
+
+Hand-rolled, dependency-free load harness. Issues real HTTP against a
+local agentmemory daemon at `http://localhost:3111`, records per-request
+latency with `performance.now()`, and writes a JSON report per run.
+
+### What it measures
+
+For each cell in the matrix `(N, concurrency, endpoint)` it records:
+
+- `p50_ms`, `p90_ms`, `p99_ms` — nearest-rank percentiles.
+- `min_ms`, `max_ms`, `ops`, `errors`.
+- `throughput_per_sec` — wall-clock ops / sec for that cell.
+
+Default matrix:
+
+- `N` ∈ {1000, 10000, 100000} — number of memories seeded before the
+  cell runs.
+- `C` ∈ {1, 10, 100} — concurrent in-flight requests during the cell.
+- Endpoints under test:
+  - `POST /agentmemory/remember`
+  - `POST /agentmemory/smart-search`
+  - `GET  /agentmemory/memories?latest=true`
+
+Each cell issues `BENCH_OPS=200` requests by default — enough samples
+for stable p99 without dragging a 100k-seed run past tens of minutes.
+
+### Why p99 is the number that matters
+
+p50 tells you the median request feels fast. p90 tells you the bulk of
+requests feel fast. **p99 tells you the request your tail user hits when
+they really need it feels fast.** Capacity planning lives here — if you
+want to size a fleet, scale your daemon, or set an SLO, p99 is the
+number to plan against. p50 will lie to you.
+
+### Running it
+
+```bash
+# 1. Start the daemon however you normally do (npx, Docker, etc.)
+npx @agentmemory/agentmemory
+
+# 2. From the repo root, in another shell:
+npm run bench:load
+```
+
+To override the matrix:
+
+```bash
+BENCH_N=1000 BENCH_C=1,10 BENCH_OPS=100 npm run bench:load
+```
+
+To have the harness spawn a daemon for the run (after `npm run build`):
+
+```bash
+AGENTMEMORY_BENCH_AUTOSTART=1 npm run bench:load
+```
+
+Other env knobs (see the file header for the canonical list):
+
+- `AGENTMEMORY_URL` — base URL of the daemon (default
+  `http://localhost:3111`).
+- `BENCH_SEED` — seed for the `mulberry32` content RNG. Same seed +
+  same daemon build = byte-identical seed corpus.
+- `BENCH_OUT_DIR` — where the JSON report lands (default
+  `benchmark/results/`).
+
+### Where results land
+
+`benchmark/results/load-100k-<short-git-sha>.json`. The harness
+`mkdir -p`s the directory. The file has a `schema_version: 1` field so
+future format changes don't silently break consumers.
+
+### Content generation is seedable
+
+Synthetic memory content is built from a small noun / verb / concept
+vocabulary fed by a `mulberry32(BENCH_SEED)` PRNG. Same seed + same
+build = same corpus. The point isn't "realistic" content (there isn't
+one realistic content); the point is **reproducibility** — re-running
+the harness against the same git sha should give the same content
+mixture going in, so latency variance comes from the daemon and not
+from JSON payload jitter.
+
+### Publishing numbers per release
+
+The release flow appends a `## Performance` section to `CHANGELOG.md`
+referencing the JSON in `benchmark/results/` for that release's git
+sha. p99 is the headline number; the JSON is the receipt.
diff --git a/benchmark/lib/percentiles.ts b/benchmark/lib/percentiles.ts
@@ -0,0 +1,22 @@
+/**
+ * Nearest-rank percentile over a pre-sorted ascending array of numbers.
+ *
+ * No dependencies, no allocation. The caller is responsible for sorting
+ * the input ascending (`arr.sort((a, b) => a - b)`) — sorting in here
+ * would hide an O(n log n) cost in what looks like a cheap lookup.
+ *
+ * @param sorted Ascending-sorted samples. Empty array returns `NaN`.
+ * @param p Percentile in [0, 100]. Values outside the range are clamped.
+ * @returns The sample at the nearest rank, or `NaN` for empty input.
+ */
+export function pXX(sorted: number[], p: number): number {
+  const n = sorted.length;
+  if (n === 0) return NaN;
+  const clamped = Math.max(0, Math.min(100, p));
+  if (clamped === 0) return sorted[0]!;
+  if (clamped === 100) return sorted[n - 1]!;
+  // Nearest-rank: rank = ceil(p/100 * n), index = rank - 1.
+  const rank = Math.ceil((clamped / 100) * n);
+  const idx = Math.min(n - 1, Math.max(0, rank - 1));
+  return sorted[idx]!;
+}