diff --git a/CHANGELOG.md b/CHANGELOG.md
index 8266e2ad..ec089052 100644
--- a/CHANGELOG.md
+++ b/CHANGELOG.md
@@ -6,6 +6,14 @@ The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.1.0/),
 
 ## [Unreleased]
 
+### Added
+
+- **`benchmark/load-100k.ts` load harness** ([#346](https://github.com/rohitg00/agentmemory/issues/346)). Hand-rolled, dependency-free harness that seeds N synthetic memories against a local daemon at `http://localhost:3111` and records p50 / p90 / p99 latency + throughput for `POST /agentmemory/remember`, `POST /agentmemory/smart-search`, and `GET /agentmemory/memories?latest=true` across the matrix N ∈ {1k, 10k, 100k} × concurrency C ∈ {1, 10, 100}. Content drawn from a seedable `mulberry32` PRNG so re-running against the same build produces the same seed corpus. Results land in `benchmark/results/load-100k-<short-git-sha>.json` (schema-versioned). Wired as `npm run bench:load`. See `benchmark/README.md` for the matrix and env knobs.
+
+### Performance
+
+- This is the placeholder for per-release p50 / p90 / p99 numbers from `benchmark/load-100k.ts`. Each release should land a `benchmark/results/load-100k-<sha>.json` and reference the headline p99 here. Format suggestion: one bullet per (N, C) cell that materially regressed or improved versus the previous release. p99 is the capacity-planning number; p50 + throughput are context. See [`benchmark/README.md`](benchmark/README.md) for how to reproduce.
+
 ## [0.9.12] — 2026-05-13
 
 Four landed PRs since v0.9.11 — one type-correctness fix, one search-quality fix (BM25 unicode + vector-index live-write), one viewer hardening (CSP-clean fonts + load-error surface), and one integrations security hardening (bearer token over plaintext HTTP).
diff --git a/benchmark/README.md b/benchmark/README.md
new file mode 100644
index 00000000..23d22852
--- /dev/null
+++ b/benchmark/README.md
@@ -0,0 +1,100 @@
+# benchmark/
+
+Two kinds of numbers live in this directory:
+
+1. **Quality / retrieval** — `longmemeval-bench.ts`, `quality-eval.ts`,
+   `real-embeddings-eval.ts`, `scale-eval.ts`. Recall, precision, token
+   savings. Documented in `LONGMEMEVAL.md`, `QUALITY.md`,
+   `REAL-EMBEDDINGS.md`, `SCALE.md`.
+
+2. **Load shape** — `load-100k.ts`. p50 / p90 / p99 latency and
+   throughput against a running daemon. This is the file you want when
+   somebody asks "what's p99 at 100k memories under concurrency 100?".
+
+## load-100k.ts
+
+Hand-rolled, dependency-free load harness. Issues real HTTP against a
+local agentmemory daemon at `http://localhost:3111`, records per-request
+latency with `performance.now()`, and writes a JSON report per run.
+
+### What it measures
+
+For each cell in the matrix `(N, concurrency, endpoint)` it records:
+
+- `p50_ms`, `p90_ms`, `p99_ms` — nearest-rank percentiles.
+- `min_ms`, `max_ms`, `ops`, `errors`.
+- `throughput_per_sec` — wall-clock ops / sec for that cell.
+
+Default matrix:
+
+- `N` ∈ {1000, 10000, 100000} — number of memories seeded before the
+  cell runs.
+- `C` ∈ {1, 10, 100} — concurrent in-flight requests during the cell.
+- Endpoints under test:
+  - `POST /agentmemory/remember`
+  - `POST /agentmemory/smart-search`
+  - `GET  /agentmemory/memories?latest=true`
+
+Each cell issues `BENCH_OPS=200` requests by default — enough samples
+for stable p99 without dragging a 100k-seed run past tens of minutes.
+
+### Why p99 is the number that matters
+
+p50 tells you the median request feels fast. p90 tells you the bulk of
+requests feel fast. **p99 tells you the request your tail user hits when
+they really need it feels fast.** Capacity planning lives here — if you
+want to size a fleet, scale your daemon, or set an SLO, p99 is the
+number to plan against. p50 will lie to you.
+
+### Running it
+
+```bash
+# 1. Start the daemon however you normally do (npx, Docker, etc.)
+npx @agentmemory/agentmemory
+
+# 2. From the repo root, in another shell:
+npm run bench:load
+```
+
+To override the matrix:
+
+```bash
+BENCH_N=1000 BENCH_C=1,10 BENCH_OPS=100 npm run bench:load
+```
+
+To have the harness spawn a daemon for the run (after `npm run build`):
+
+```bash
+AGENTMEMORY_BENCH_AUTOSTART=1 npm run bench:load
+```
+
+Other env knobs (see the file header for the canonical list):
+
+- `AGENTMEMORY_URL` — base URL of the daemon (default
+  `http://localhost:3111`).
+- `BENCH_SEED` — seed for the `mulberry32` content RNG. Same seed +
+  same daemon build = byte-identical seed corpus.
+- `BENCH_OUT_DIR` — where the JSON report lands (default
+  `benchmark/results/`).
+
+### Where results land
+
+`benchmark/results/load-100k-<short-git-sha>.json`. The harness
+`mkdir -p`s the directory. The file has a `schema_version: 1` field so
+future format changes don't silently break consumers.
+
+### Content generation is seedable
+
+Synthetic memory content is built from a small noun / verb / concept
+vocabulary fed by a `mulberry32(BENCH_SEED)` PRNG. Same seed + same
+build = same corpus. The point isn't "realistic" content (there isn't
+one realistic content); the point is **reproducibility** — re-running
+the harness against the same git sha should give the same content
+mixture going in, so latency variance comes from the daemon and not
+from JSON payload jitter.
+
+### Publishing numbers per release
+
+The release flow appends a `## Performance` section to `CHANGELOG.md`
+referencing the JSON in `benchmark/results/` for that release's git
+sha. p99 is the headline number; the JSON is the receipt.
diff --git a/benchmark/lib/percentiles.ts b/benchmark/lib/percentiles.ts
new file mode 100644
index 00000000..204feeab
--- /dev/null
+++ b/benchmark/lib/percentiles.ts
@@ -0,0 +1,22 @@
+/**
+ * Nearest-rank percentile over a pre-sorted ascending array of numbers.
+ *
+ * No dependencies, no allocation. The caller is responsible for sorting
+ * the input ascending (`arr.sort((a, b) => a - b)`) — sorting in here
+ * would hide an O(n log n) cost in what looks like a cheap lookup.
+ *
+ * @param sorted Ascending-sorted samples. Empty array returns `NaN`.
+ * @param p Percentile in [0, 100]. Values outside the range are clamped.
+ * @returns The sample at the nearest rank, or `NaN` for empty input.
+ */
+export function pXX(sorted: number[], p: number): number {
+  const n = sorted.length;
+  if (n === 0) return NaN;
+  const clamped = Math.max(0, Math.min(100, p));
+  if (clamped === 0) return sorted[0]!;
+  if (clamped === 100) return sorted[n - 1]!;
+  // Nearest-rank: rank = ceil(p/100 * n), index = rank - 1.
+  const rank = Math.ceil((clamped / 100) * n);
+  const idx = Math.min(n - 1, Math.max(0, rank - 1));
+  return sorted[idx]!;
+}
diff --git a/benchmark/load-100k.ts b/benchmark/load-100k.ts
new file mode 100644
index 00000000..676aa647
--- /dev/null
+++ b/benchmark/load-100k.ts
@@ -0,0 +1,528 @@
+/**
+ * Load harness — seeds N synthetic memories against a local agentmemory
+ * daemon, then drives a matrix of (N, concurrency, endpoint) cells and
+ * records p50 / p90 / p99 latency + throughput per cell.
+ *
+ * Spec: GitHub issue #346.
+ *
+ * Runs against an already-running daemon at `http://localhost:3111` by
+ * default. Set `AGENTMEMORY_BENCH_AUTOSTART=1` to spawn one via
+ * `node dist/cli.js start` for the duration of the run.
+ *
+ * Env knobs:
+ *   AGENTMEMORY_BENCH_AUTOSTART   "1" to spawn the daemon (default: assume up)
+ *   AGENTMEMORY_URL               base URL of the daemon (default: http://localhost:3111)
+ *   BENCH_N                       comma-separated N sizes (default: 1000,10000,100000)
+ *   BENCH_C                       comma-separated concurrency levels (default: 1,10,100)
+ *   BENCH_OPS                     ops per cell during measurement (default: 200)
+ *   BENCH_SEED                    seed for the mulberry32 RNG (default: 0xC0FFEE)
+ *   BENCH_OUT_DIR                 results dir (default: benchmark/results)
+ *
+ * The harness writes one JSON file per run named
+ * `load-100k-<short-git-sha>.json`. The git sha is best-effort — falls
+ * back to a timestamp when run outside a checkout.
+ */
+
+import { spawn, type ChildProcess } from "node:child_process";
+import { execFileSync } from "node:child_process";
+import { mkdirSync, writeFileSync, existsSync } from "node:fs";
+import { join, resolve } from "node:path";
+import { performance } from "node:perf_hooks";
+
+import { pXX } from "./lib/percentiles.js";
+
+/** Seedable PRNG. Mulberry32 — 32-bit state, uniform output in [0, 1). */
+function mulberry32(seed: number): () => number {
+  let s = seed >>> 0;
+  return () => {
+    s = (s + 0x6d2b79f5) >>> 0;
+    let t = s;
+    t = Math.imul(t ^ (t >>> 15), t | 1);
+    t ^= t + Math.imul(t ^ (t >>> 7), t | 61);
+    return ((t ^ (t >>> 14)) >>> 0) / 4294967296;
+  };
+}
+
+const NOUNS = [
+  "cache", "queue", "router", "stream", "shard", "lock", "buffer", "worker",
+  "engine", "trigger", "function", "memory", "index", "graph", "vector",
+  "session", "observation", "summary", "embedding", "tokenizer", "scheduler",
+  "consumer", "producer", "channel", "actor", "pipeline", "watcher", "pool",
+];
+const VERBS = [
+  "flushes", "rotates", "compacts", "rebalances", "drains", "warms",
+  "expires", "deduplicates", "snapshots", "replays", "promotes", "demotes",
+  "merges", "splits", "indexes", "scans", "compresses", "uploads",
+];
+const CONCEPTS = [
+  "throughput", "latency", "backpressure", "consistency", "isolation",
+  "durability", "idempotency", "fan-out", "cardinality", "skew",
+  "hot-path", "cold-start", "tail-latency", "saturation", "quiescence",
+];
+
+function buildContent(rng: () => number, i: number): string {
+  const n = NOUNS[Math.floor(rng() * NOUNS.length)]!;
+  const v = VERBS[Math.floor(rng() * VERBS.length)]!;
+  const c1 = CONCEPTS[Math.floor(rng() * CONCEPTS.length)]!;
+  const c2 = CONCEPTS[Math.floor(rng() * CONCEPTS.length)]!;
+  const k = Math.floor(rng() * 9999);
+  return `seed-${i} the ${n} ${v} ${c1} under ${c2} pressure (k=${k})`;
+}
+
+interface RunConfig {
+  baseUrl: string;
+  Ns: number[];
+  Cs: number[];
+  opsPerCell: number;
+  seed: number;
+  outDir: string;
+  autoStart: boolean;
+}
+
+function parseIntList(raw: string | undefined, fallback: number[]): number[] {
+  if (!raw) return fallback;
+  const out = raw
+    .split(",")
+    .map((s) => parseInt(s.trim(), 10))
+    .filter((n) => Number.isFinite(n) && n > 0);
+  return out.length > 0 ? out : fallback;
+}
+
+function loadConfig(): RunConfig {
+  return {
+    baseUrl: (process.env["AGENTMEMORY_URL"] || "http://localhost:3111").replace(
+      /\/+$/,
+      "",
+    ),
+    Ns: parseIntList(process.env["BENCH_N"], [1000, 10000, 100000]),
+    Cs: parseIntList(process.env["BENCH_C"], [1, 10, 100]),
+    opsPerCell: parseInt(process.env["BENCH_OPS"] || "200", 10) || 200,
+    seed: parseInt(process.env["BENCH_SEED"] || "12648430", 10) || 12648430,
+    outDir:
+      process.env["BENCH_OUT_DIR"] ||
+      resolve(process.cwd(), "benchmark", "results"),
+    autoStart: process.env["AGENTMEMORY_BENCH_AUTOSTART"] === "1",
+  };
+}
+
+async function waitForLivez(baseUrl: string, timeoutMs: number): Promise<void> {
+  const start = Date.now();
+  let lastErr: unknown = null;
+  while (Date.now() - start < timeoutMs) {
+    try {
+      const res = await fetch(`${baseUrl}/agentmemory/livez`, {
+        signal: AbortSignal.timeout(2000),
+      });
+      if (res.ok) return;
+      lastErr = new Error(`livez HTTP ${res.status}`);
+    } catch (err) {
+      lastErr = err;
+    }
+    await new Promise((r) => setTimeout(r, 500));
+  }
+  const reason =
+    lastErr instanceof Error ? lastErr.message : String(lastErr ?? "unknown");
+  throw new Error(
+    `daemon at ${baseUrl} did not become ready within ${timeoutMs} ms: ${reason}`,
+  );
+}
+
+function maybeStartDaemon(): ChildProcess | null {
+  const candidates = ["dist/cli.mjs", "dist/cli.js"].map((p) =>
+    resolve(process.cwd(), p),
+  );
+  const cliPath = candidates.find((p) => existsSync(p));
+  if (!cliPath) {
+    throw new Error(
+      `AGENTMEMORY_BENCH_AUTOSTART=1 but neither ${candidates.join(" nor ")} exists — run \`npm run build\` first`,
+    );
+  }
+  const child = spawn(process.execPath, [cliPath, "start"], {
+    stdio: ["ignore", "pipe", "pipe"],
+    detached: false,
+  });
+  child.stdout?.on("data", () => {
+    /* drain */
+  });
+  child.stderr?.on("data", () => {
+    /* drain */
+  });
+  return child;
+}
+
+function shortGitSha(): string {
+  try {
+    const sha = execFileSync("git", ["rev-parse", "--short", "HEAD"], {
+      encoding: "utf8",
+      stdio: ["ignore", "pipe", "ignore"],
+    }).trim();
+    if (sha) return sha;
+  } catch {
+    /* no git */
+  }
+  return `nogit-${Date.now().toString(36)}`;
+}
+
+interface CellResult {
+  endpoint: string;
+  N: number;
+  C: number;
+  ops: number;
+  errors: number;
+  wall_ms: number;
+  throughput_per_sec: number;
+  p50_ms: number;
+  p90_ms: number;
+  p99_ms: number;
+  min_ms: number;
+  max_ms: number;
+}
+
+/**
+ * Issue `total` requests against `fetcher`, capped at `concurrency`
+ * in-flight at any moment. Records per-request latency in ms.
+ */
+async function driveLoad(
+  concurrency: number,
+  total: number,
+  fetcher: (i: number) => Promise<void>,
+): Promise<{ latencies: number[]; errors: number; wallMs: number }> {
+  const latencies: number[] = [];
+  let errors = 0;
+  let issued = 0;
+  const wallStart = performance.now();
+
+  async function worker(): Promise<void> {
+    while (true) {
+      const i = issued++;
+      if (i >= total) return;
+      const t0 = performance.now();
+      try {
+        await fetcher(i);
+        latencies.push(performance.now() - t0);
+      } catch {
+        errors++;
+      }
+    }
+  }
+
+  const workers = Array.from({ length: Math.max(1, concurrency) }, () =>
+    worker(),
+  );
+  await Promise.allSettled(workers);
+  const wallMs = performance.now() - wallStart;
+  return { latencies, errors, wallMs };
+}
+
+function summarize(
+  endpoint: string,
+  N: number,
+  C: number,
+  latencies: number[],
+  errors: number,
+  wallMs: number,
+): CellResult {
+  const sorted = latencies.slice().sort((a, b) => a - b);
+  const ops = sorted.length;
+  return {
+    endpoint,
+    N,
+    C,
+    ops,
+    errors,
+    wall_ms: Math.round(wallMs * 1000) / 1000,
+    throughput_per_sec:
+      wallMs > 0 ? Math.round((ops / (wallMs / 1000)) * 100) / 100 : 0,
+    p50_ms: Math.round(pXX(sorted, 50) * 1000) / 1000,
+    p90_ms: Math.round(pXX(sorted, 90) * 1000) / 1000,
+    p99_ms: Math.round(pXX(sorted, 99) * 1000) / 1000,
+    min_ms: ops > 0 ? Math.round(sorted[0]! * 1000) / 1000 : NaN,
+    max_ms: ops > 0 ? Math.round(sorted[ops - 1]! * 1000) / 1000 : NaN,
+  };
+}
+
+async function seedMemories(
+  baseUrl: string,
+  count: number,
+  rng: () => number,
+  seedConcurrency = 32,
+): Promise<{ seeded: number; errors: number; wallMs: number }> {
+  let issued = 0;
+  let seeded = 0;
+  let errors = 0;
+  const t0 = performance.now();
+  async function worker(): Promise<void> {
+    while (true) {
+      const i = issued++;
+      if (i >= count) return;
+      const body = JSON.stringify({
+        content: buildContent(rng, i),
+        type: "observation",
+      });
+      try {
+        const res = await fetch(`${baseUrl}/agentmemory/remember`, {
+          method: "POST",
+          headers: { "content-type": "application/json" },
+          body,
+          signal: AbortSignal.timeout(30_000),
+        });
+        if (res.ok) {
+          seeded++;
+        } else {
+          errors++;
+        }
+        // drain body to free the socket
+        await res.text().catch(() => "");
+      } catch {
+        errors++;
+      }
+    }
+  }
+  await Promise.allSettled(
+    Array.from({ length: seedConcurrency }, () => worker()),
+  );
+  return { seeded, errors, wallMs: performance.now() - t0 };
+}
+
+async function measureRemember(
+  baseUrl: string,
+  rng: () => number,
+  N: number,
+  C: number,
+  ops: number,
+): Promise<CellResult> {
+  const { latencies, errors, wallMs } = await driveLoad(C, ops, async (i) => {
+    const body = JSON.stringify({
+      content: buildContent(rng, N + i),
+      type: "observation",
+    });
+    const res = await fetch(`${baseUrl}/agentmemory/remember`, {
+      method: "POST",
+      headers: { "content-type": "application/json" },
+      body,
+      signal: AbortSignal.timeout(30_000),
+    });
+    await res.text().catch(() => "");
+    if (!res.ok) throw new Error(`HTTP ${res.status}`);
+  });
+  return summarize("POST /agentmemory/remember", N, C, latencies, errors, wallMs);
+}
+
+async function measureSmartSearch(
+  baseUrl: string,
+  rng: () => number,
+  N: number,
+  C: number,
+  ops: number,
+): Promise<CellResult> {
+  const queries = Array.from({ length: 32 }, (_, i) => buildContent(rng, i));
+  const { latencies, errors, wallMs } = await driveLoad(C, ops, async (i) => {
+    const body = JSON.stringify({
+      query: queries[i % queries.length],
+      limit: 5,
+    });
+    const res = await fetch(`${baseUrl}/agentmemory/smart-search`, {
+      method: "POST",
+      headers: { "content-type": "application/json" },
+      body,
+      signal: AbortSignal.timeout(30_000),
+    });
+    await res.text().catch(() => "");
+    if (!res.ok) throw new Error(`HTTP ${res.status}`);
+  });
+  return summarize(
+    "POST /agentmemory/smart-search",
+    N,
+    C,
+    latencies,
+    errors,
+    wallMs,
+  );
+}
+
+async function measureMemoriesLatest(
+  baseUrl: string,
+  N: number,
+  C: number,
+  ops: number,
+): Promise<CellResult> {
+  const { latencies, errors, wallMs } = await driveLoad(C, ops, async () => {
+    const res = await fetch(`${baseUrl}/agentmemory/memories?latest=true`, {
+      method: "GET",
+      signal: AbortSignal.timeout(30_000),
+    });
+    await res.text().catch(() => "");
+    if (!res.ok) throw new Error(`HTTP ${res.status}`);
+  });
+  return summarize(
+    "GET /agentmemory/memories?latest=true",
+    N,
+    C,
+    latencies,
+    errors,
+    wallMs,
+  );
+}
+
+interface RunReport {
+  schema_version: 1;
+  generated_at: string;
+  git_sha: string;
+  base_url: string;
+  seed: number;
+  matrix: { N: number[]; C: number[] };
+  ops_per_cell: number;
+  cells: CellResult[];
+  notes: string;
+}
+
+async function main(): Promise<void> {
+  const cfg = loadConfig();
+  console.log(
+    `[load-100k] base=${cfg.baseUrl} N=${cfg.Ns.join(",")} C=${cfg.Cs.join(",")} ops/cell=${cfg.opsPerCell} seed=${cfg.seed}`,
+  );
+
+  let daemon: ChildProcess | null = null;
+  if (cfg.autoStart) {
+    console.log("[load-100k] AGENTMEMORY_BENCH_AUTOSTART=1 — spawning daemon");
+    daemon = maybeStartDaemon();
+  }
+
+  try {
+    console.log("[load-100k] waiting for /agentmemory/livez (timeout 30s)");
+    await waitForLivez(cfg.baseUrl, 30_000);
+
+    const cells: CellResult[] = [];
+    // N sorted ascending so each cell builds on the previous seed work.
+    const sortedNs = cfg.Ns.slice().sort((a, b) => a - b);
+    let seededSoFar = 0;
+    for (const N of sortedNs) {
+      const delta = N - seededSoFar;
+      if (delta > 0) {
+        console.log(
+          `[load-100k] seeding ${delta} memories (target N=${N})`,
+        );
+        const rng = mulberry32(cfg.seed + seededSoFar);
+        const seedRes = await seedMemories(cfg.baseUrl, delta, rng);
+        seededSoFar += seedRes.seeded;
+        console.log(
+          `[load-100k]   seeded=${seedRes.seeded} errors=${seedRes.errors} wall=${(
+            seedRes.wallMs / 1000
+          ).toFixed(2)}s`,
+        );
+        if (seedRes.errors > 0 && seedRes.seeded === 0) {
+          throw new Error(
+            `seeding produced 0 successes and ${seedRes.errors} errors — daemon misconfigured`,
+          );
+        }
+      }
+
+      for (const C of cfg.Cs) {
+        const probeRng = mulberry32(cfg.seed ^ (N * 0x9e3779b1) ^ C);
+
+        console.log(`[load-100k] cell N=${N} C=${C} remember`);
+        const remember = await measureRemember(
+          cfg.baseUrl,
+          probeRng,
+          N,
+          C,
+          cfg.opsPerCell,
+        );
+        cells.push(remember);
+
+        console.log(`[load-100k] cell N=${N} C=${C} smart-search`);
+        const search = await measureSmartSearch(
+          cfg.baseUrl,
+          mulberry32(cfg.seed ^ (N * 0x85ebca77) ^ C),
+          N,
+          C,
+          cfg.opsPerCell,
+        );
+        cells.push(search);
+
+        console.log(`[load-100k] cell N=${N} C=${C} memories?latest=true`);
+        const memories = await measureMemoriesLatest(
+          cfg.baseUrl,
+          N,
+          C,
+          cfg.opsPerCell,
+        );
+        cells.push(memories);
+      }
+    }
+
+    const report: RunReport = {
+      schema_version: 1,
+      generated_at: new Date().toISOString(),
+      git_sha: shortGitSha(),
+      base_url: cfg.baseUrl,
+      seed: cfg.seed,
+      matrix: { N: sortedNs, C: cfg.Cs.slice() },
+      ops_per_cell: cfg.opsPerCell,
+      cells,
+      notes:
+        "Single-process load harness. Latency in milliseconds. " +
+        "Throughput is wall-clock ops/sec for the cell (concurrent in-flight = C).",
+    };
+
+    mkdirSync(cfg.outDir, { recursive: true });
+    const outPath = join(cfg.outDir, `load-100k-${report.git_sha}.json`);
+    writeFileSync(outPath, JSON.stringify(report, null, 2) + "\n", "utf8");
+    console.log(`[load-100k] wrote ${outPath} (${cells.length} cells)`);
+
+    // Compact table to stdout for the verification commit.
+    console.log("");
+    console.log(
+      [
+        "endpoint".padEnd(40),
+        "N".padStart(8),
+        "C".padStart(4),
+        "ops".padStart(6),
+        "err".padStart(4),
+        "p50_ms".padStart(8),
+        "p90_ms".padStart(8),
+        "p99_ms".padStart(8),
+        "tp/s".padStart(9),
+      ].join(" "),
+    );
+    for (const c of cells) {
+      console.log(
+        [
+          c.endpoint.padEnd(40),
+          String(c.N).padStart(8),
+          String(c.C).padStart(4),
+          String(c.ops).padStart(6),
+          String(c.errors).padStart(4),
+          c.p50_ms.toFixed(2).padStart(8),
+          c.p90_ms.toFixed(2).padStart(8),
+          c.p99_ms.toFixed(2).padStart(8),
+          c.throughput_per_sec.toFixed(2).padStart(9),
+        ].join(" "),
+      );
+    }
+  } finally {
+    if (daemon) {
+      daemon.kill("SIGTERM");
+      // give it 2s to exit cleanly before SIGKILL
+      await new Promise<void>((resolveFn) => {
+        const t = setTimeout(() => {
+          try {
+            daemon!.kill("SIGKILL");
+          } catch {
+            /* already dead */
+          }
+          resolveFn();
+        }, 2000);
+        daemon!.once("exit", () => {
+          clearTimeout(t);
+          resolveFn();
+        });
+      });
+    }
+  }
+}
+
+main().catch((err) => {
+  console.error("[load-100k] failed:", err instanceof Error ? err.stack : err);
+  process.exit(1);
+});
diff --git a/benchmark/results/load-100k-96c0ed0.json b/benchmark/results/load-100k-96c0ed0.json
new file mode 100644
index 00000000..726a0023
--- /dev/null
+++ b/benchmark/results/load-100k-96c0ed0.json
@@ -0,0 +1,61 @@
+{
+  "schema_version": 1,
+  "generated_at": "2026-05-13T19:49:26.116Z",
+  "git_sha": "96c0ed0",
+  "base_url": "http://localhost:3111",
+  "seed": 12648430,
+  "matrix": {
+    "N": [
+      1000
+    ],
+    "C": [
+      10
+    ]
+  },
+  "ops_per_cell": 200,
+  "cells": [
+    {
+      "endpoint": "POST /agentmemory/remember",
+      "N": 1000,
+      "C": 10,
+      "ops": 200,
+      "errors": 0,
+      "wall_ms": 11504.64,
+      "throughput_per_sec": 17.38,
+      "p50_ms": 577.435,
+      "p90_ms": 607.335,
+      "p99_ms": 675.269,
+      "min_ms": 64.46,
+      "max_ms": 683.164
+    },
+    {
+      "endpoint": "POST /agentmemory/smart-search",
+      "N": 1000,
+      "C": 10,
+      "ops": 200,
+      "errors": 0,
+      "wall_ms": 3264.572,
+      "throughput_per_sec": 61.26,
+      "p50_ms": 160.064,
+      "p90_ms": 185.608,
+      "p99_ms": 224.354,
+      "min_ms": 98.498,
+      "max_ms": 251.317
+    },
+    {
+      "endpoint": "GET /agentmemory/memories?latest=true",
+      "N": 1000,
+      "C": 10,
+      "ops": 200,
+      "errors": 0,
+      "wall_ms": 8051.764,
+      "throughput_per_sec": 24.84,
+      "p50_ms": 395.462,
+      "p90_ms": 475.714,
+      "p99_ms": 542.648,
+      "min_ms": 158.79,
+      "max_ms": 635.331
+    }
+  ],
+  "notes": "Single-process load harness. Latency in milliseconds. Throughput is wall-clock ops/sec for the cell (concurrent in-flight = C)."
+}
diff --git a/package.json b/package.json
index 5526fbea..b7b3a50a 100644
--- a/package.json
+++ b/package.json
@@ -24,7 +24,8 @@
     "test": "vitest run --exclude test/integration.test.ts",
     "test:watch": "vitest --exclude test/integration.test.ts",
     "test:integration": "vitest run test/integration.test.ts",
-    "test:all": "vitest run"
+    "test:all": "vitest run",
+    "bench:load": "node --import tsx benchmark/load-100k.ts"
   },
   "keywords": [
     "ai",