diff --git a/CHANGELOG.md b/CHANGELOG.md index 8266e2ad..ec089052 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -6,6 +6,14 @@ The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.1.0/), ## [Unreleased] +### Added + +- **`benchmark/load-100k.ts` load harness** ([#346](https://github.com/rohitg00/agentmemory/issues/346)). Hand-rolled, dependency-free harness that seeds N synthetic memories against a local daemon at `http://localhost:3111` and records p50 / p90 / p99 latency + throughput for `POST /agentmemory/remember`, `POST /agentmemory/smart-search`, and `GET /agentmemory/memories?latest=true` across the matrix N ∈ {1k, 10k, 100k} × concurrency C ∈ {1, 10, 100}. Content drawn from a seedable `mulberry32` PRNG so re-running against the same build produces the same seed corpus. Results land in `benchmark/results/load-100k-.json` (schema-versioned). Wired as `npm run bench:load`. See `benchmark/README.md` for the matrix and env knobs. + +### Performance + +- This is the placeholder for per-release p50 / p90 / p99 numbers from `benchmark/load-100k.ts`. Each release should land a `benchmark/results/load-100k-.json` and reference the headline p99 here. Format suggestion: one bullet per (N, C) cell that materially regressed or improved versus the previous release. p99 is the capacity-planning number; p50 + throughput are context. See [`benchmark/README.md`](benchmark/README.md) for how to reproduce. + ## [0.9.12] — 2026-05-13 Four landed PRs since v0.9.11 — one type-correctness fix, one search-quality fix (BM25 unicode + vector-index live-write), one viewer hardening (CSP-clean fonts + load-error surface), and one integrations security hardening (bearer token over plaintext HTTP). diff --git a/benchmark/README.md b/benchmark/README.md new file mode 100644 index 00000000..23d22852 --- /dev/null +++ b/benchmark/README.md @@ -0,0 +1,100 @@ +# benchmark/ + +Two kinds of numbers live in this directory: + +1. **Quality / retrieval** — `longmemeval-bench.ts`, `quality-eval.ts`, + `real-embeddings-eval.ts`, `scale-eval.ts`. Recall, precision, token + savings. Documented in `LONGMEMEVAL.md`, `QUALITY.md`, + `REAL-EMBEDDINGS.md`, `SCALE.md`. + +2. **Load shape** — `load-100k.ts`. p50 / p90 / p99 latency and + throughput against a running daemon. This is the file you want when + somebody asks "what's p99 at 100k memories under concurrency 100?". + +## load-100k.ts + +Hand-rolled, dependency-free load harness. Issues real HTTP against a +local agentmemory daemon at `http://localhost:3111`, records per-request +latency with `performance.now()`, and writes a JSON report per run. + +### What it measures + +For each cell in the matrix `(N, concurrency, endpoint)` it records: + +- `p50_ms`, `p90_ms`, `p99_ms` — nearest-rank percentiles. +- `min_ms`, `max_ms`, `ops`, `errors`. +- `throughput_per_sec` — wall-clock ops / sec for that cell. + +Default matrix: + +- `N` ∈ {1000, 10000, 100000} — number of memories seeded before the + cell runs. +- `C` ∈ {1, 10, 100} — concurrent in-flight requests during the cell. +- Endpoints under test: + - `POST /agentmemory/remember` + - `POST /agentmemory/smart-search` + - `GET /agentmemory/memories?latest=true` + +Each cell issues `BENCH_OPS=200` requests by default — enough samples +for stable p99 without dragging a 100k-seed run past tens of minutes. + +### Why p99 is the number that matters + +p50 tells you the median request feels fast. p90 tells you the bulk of +requests feel fast. **p99 tells you the request your tail user hits when +they really need it feels fast.** Capacity planning lives here — if you +want to size a fleet, scale your daemon, or set an SLO, p99 is the +number to plan against. p50 will lie to you. + +### Running it + +```bash +# 1. Start the daemon however you normally do (npx, Docker, etc.) +npx @agentmemory/agentmemory + +# 2. From the repo root, in another shell: +npm run bench:load +``` + +To override the matrix: + +```bash +BENCH_N=1000 BENCH_C=1,10 BENCH_OPS=100 npm run bench:load +``` + +To have the harness spawn a daemon for the run (after `npm run build`): + +```bash +AGENTMEMORY_BENCH_AUTOSTART=1 npm run bench:load +``` + +Other env knobs (see the file header for the canonical list): + +- `AGENTMEMORY_URL` — base URL of the daemon (default + `http://localhost:3111`). +- `BENCH_SEED` — seed for the `mulberry32` content RNG. Same seed + + same daemon build = byte-identical seed corpus. +- `BENCH_OUT_DIR` — where the JSON report lands (default + `benchmark/results/`). + +### Where results land + +`benchmark/results/load-100k-.json`. The harness +`mkdir -p`s the directory. The file has a `schema_version: 1` field so +future format changes don't silently break consumers. + +### Content generation is seedable + +Synthetic memory content is built from a small noun / verb / concept +vocabulary fed by a `mulberry32(BENCH_SEED)` PRNG. Same seed + same +build = same corpus. The point isn't "realistic" content (there isn't +one realistic content); the point is **reproducibility** — re-running +the harness against the same git sha should give the same content +mixture going in, so latency variance comes from the daemon and not +from JSON payload jitter. + +### Publishing numbers per release + +The release flow appends a `## Performance` section to `CHANGELOG.md` +referencing the JSON in `benchmark/results/` for that release's git +sha. p99 is the headline number; the JSON is the receipt. diff --git a/benchmark/lib/percentiles.ts b/benchmark/lib/percentiles.ts new file mode 100644 index 00000000..204feeab --- /dev/null +++ b/benchmark/lib/percentiles.ts @@ -0,0 +1,22 @@ +/** + * Nearest-rank percentile over a pre-sorted ascending array of numbers. + * + * No dependencies, no allocation. The caller is responsible for sorting + * the input ascending (`arr.sort((a, b) => a - b)`) — sorting in here + * would hide an O(n log n) cost in what looks like a cheap lookup. + * + * @param sorted Ascending-sorted samples. Empty array returns `NaN`. + * @param p Percentile in [0, 100]. Values outside the range are clamped. + * @returns The sample at the nearest rank, or `NaN` for empty input. + */ +export function pXX(sorted: number[], p: number): number { + const n = sorted.length; + if (n === 0) return NaN; + const clamped = Math.max(0, Math.min(100, p)); + if (clamped === 0) return sorted[0]!; + if (clamped === 100) return sorted[n - 1]!; + // Nearest-rank: rank = ceil(p/100 * n), index = rank - 1. + const rank = Math.ceil((clamped / 100) * n); + const idx = Math.min(n - 1, Math.max(0, rank - 1)); + return sorted[idx]!; +} diff --git a/benchmark/load-100k.ts b/benchmark/load-100k.ts new file mode 100644 index 00000000..676aa647 --- /dev/null +++ b/benchmark/load-100k.ts @@ -0,0 +1,528 @@ +/** + * Load harness — seeds N synthetic memories against a local agentmemory + * daemon, then drives a matrix of (N, concurrency, endpoint) cells and + * records p50 / p90 / p99 latency + throughput per cell. + * + * Spec: GitHub issue #346. + * + * Runs against an already-running daemon at `http://localhost:3111` by + * default. Set `AGENTMEMORY_BENCH_AUTOSTART=1` to spawn one via + * `node dist/cli.js start` for the duration of the run. + * + * Env knobs: + * AGENTMEMORY_BENCH_AUTOSTART "1" to spawn the daemon (default: assume up) + * AGENTMEMORY_URL base URL of the daemon (default: http://localhost:3111) + * BENCH_N comma-separated N sizes (default: 1000,10000,100000) + * BENCH_C comma-separated concurrency levels (default: 1,10,100) + * BENCH_OPS ops per cell during measurement (default: 200) + * BENCH_SEED seed for the mulberry32 RNG (default: 0xC0FFEE) + * BENCH_OUT_DIR results dir (default: benchmark/results) + * + * The harness writes one JSON file per run named + * `load-100k-.json`. The git sha is best-effort — falls + * back to a timestamp when run outside a checkout. + */ + +import { spawn, type ChildProcess } from "node:child_process"; +import { execFileSync } from "node:child_process"; +import { mkdirSync, writeFileSync, existsSync } from "node:fs"; +import { join, resolve } from "node:path"; +import { performance } from "node:perf_hooks"; + +import { pXX } from "./lib/percentiles.js"; + +/** Seedable PRNG. Mulberry32 — 32-bit state, uniform output in [0, 1). */ +function mulberry32(seed: number): () => number { + let s = seed >>> 0; + return () => { + s = (s + 0x6d2b79f5) >>> 0; + let t = s; + t = Math.imul(t ^ (t >>> 15), t | 1); + t ^= t + Math.imul(t ^ (t >>> 7), t | 61); + return ((t ^ (t >>> 14)) >>> 0) / 4294967296; + }; +} + +const NOUNS = [ + "cache", "queue", "router", "stream", "shard", "lock", "buffer", "worker", + "engine", "trigger", "function", "memory", "index", "graph", "vector", + "session", "observation", "summary", "embedding", "tokenizer", "scheduler", + "consumer", "producer", "channel", "actor", "pipeline", "watcher", "pool", +]; +const VERBS = [ + "flushes", "rotates", "compacts", "rebalances", "drains", "warms", + "expires", "deduplicates", "snapshots", "replays", "promotes", "demotes", + "merges", "splits", "indexes", "scans", "compresses", "uploads", +]; +const CONCEPTS = [ + "throughput", "latency", "backpressure", "consistency", "isolation", + "durability", "idempotency", "fan-out", "cardinality", "skew", + "hot-path", "cold-start", "tail-latency", "saturation", "quiescence", +]; + +function buildContent(rng: () => number, i: number): string { + const n = NOUNS[Math.floor(rng() * NOUNS.length)]!; + const v = VERBS[Math.floor(rng() * VERBS.length)]!; + const c1 = CONCEPTS[Math.floor(rng() * CONCEPTS.length)]!; + const c2 = CONCEPTS[Math.floor(rng() * CONCEPTS.length)]!; + const k = Math.floor(rng() * 9999); + return `seed-${i} the ${n} ${v} ${c1} under ${c2} pressure (k=${k})`; +} + +interface RunConfig { + baseUrl: string; + Ns: number[]; + Cs: number[]; + opsPerCell: number; + seed: number; + outDir: string; + autoStart: boolean; +} + +function parseIntList(raw: string | undefined, fallback: number[]): number[] { + if (!raw) return fallback; + const out = raw + .split(",") + .map((s) => parseInt(s.trim(), 10)) + .filter((n) => Number.isFinite(n) && n > 0); + return out.length > 0 ? out : fallback; +} + +function loadConfig(): RunConfig { + return { + baseUrl: (process.env["AGENTMEMORY_URL"] || "http://localhost:3111").replace( + /\/+$/, + "", + ), + Ns: parseIntList(process.env["BENCH_N"], [1000, 10000, 100000]), + Cs: parseIntList(process.env["BENCH_C"], [1, 10, 100]), + opsPerCell: parseInt(process.env["BENCH_OPS"] || "200", 10) || 200, + seed: parseInt(process.env["BENCH_SEED"] || "12648430", 10) || 12648430, + outDir: + process.env["BENCH_OUT_DIR"] || + resolve(process.cwd(), "benchmark", "results"), + autoStart: process.env["AGENTMEMORY_BENCH_AUTOSTART"] === "1", + }; +} + +async function waitForLivez(baseUrl: string, timeoutMs: number): Promise { + const start = Date.now(); + let lastErr: unknown = null; + while (Date.now() - start < timeoutMs) { + try { + const res = await fetch(`${baseUrl}/agentmemory/livez`, { + signal: AbortSignal.timeout(2000), + }); + if (res.ok) return; + lastErr = new Error(`livez HTTP ${res.status}`); + } catch (err) { + lastErr = err; + } + await new Promise((r) => setTimeout(r, 500)); + } + const reason = + lastErr instanceof Error ? lastErr.message : String(lastErr ?? "unknown"); + throw new Error( + `daemon at ${baseUrl} did not become ready within ${timeoutMs} ms: ${reason}`, + ); +} + +function maybeStartDaemon(): ChildProcess | null { + const candidates = ["dist/cli.mjs", "dist/cli.js"].map((p) => + resolve(process.cwd(), p), + ); + const cliPath = candidates.find((p) => existsSync(p)); + if (!cliPath) { + throw new Error( + `AGENTMEMORY_BENCH_AUTOSTART=1 but neither ${candidates.join(" nor ")} exists — run \`npm run build\` first`, + ); + } + const child = spawn(process.execPath, [cliPath, "start"], { + stdio: ["ignore", "pipe", "pipe"], + detached: false, + }); + child.stdout?.on("data", () => { + /* drain */ + }); + child.stderr?.on("data", () => { + /* drain */ + }); + return child; +} + +function shortGitSha(): string { + try { + const sha = execFileSync("git", ["rev-parse", "--short", "HEAD"], { + encoding: "utf8", + stdio: ["ignore", "pipe", "ignore"], + }).trim(); + if (sha) return sha; + } catch { + /* no git */ + } + return `nogit-${Date.now().toString(36)}`; +} + +interface CellResult { + endpoint: string; + N: number; + C: number; + ops: number; + errors: number; + wall_ms: number; + throughput_per_sec: number; + p50_ms: number; + p90_ms: number; + p99_ms: number; + min_ms: number; + max_ms: number; +} + +/** + * Issue `total` requests against `fetcher`, capped at `concurrency` + * in-flight at any moment. Records per-request latency in ms. + */ +async function driveLoad( + concurrency: number, + total: number, + fetcher: (i: number) => Promise, +): Promise<{ latencies: number[]; errors: number; wallMs: number }> { + const latencies: number[] = []; + let errors = 0; + let issued = 0; + const wallStart = performance.now(); + + async function worker(): Promise { + while (true) { + const i = issued++; + if (i >= total) return; + const t0 = performance.now(); + try { + await fetcher(i); + latencies.push(performance.now() - t0); + } catch { + errors++; + } + } + } + + const workers = Array.from({ length: Math.max(1, concurrency) }, () => + worker(), + ); + await Promise.allSettled(workers); + const wallMs = performance.now() - wallStart; + return { latencies, errors, wallMs }; +} + +function summarize( + endpoint: string, + N: number, + C: number, + latencies: number[], + errors: number, + wallMs: number, +): CellResult { + const sorted = latencies.slice().sort((a, b) => a - b); + const ops = sorted.length; + return { + endpoint, + N, + C, + ops, + errors, + wall_ms: Math.round(wallMs * 1000) / 1000, + throughput_per_sec: + wallMs > 0 ? Math.round((ops / (wallMs / 1000)) * 100) / 100 : 0, + p50_ms: Math.round(pXX(sorted, 50) * 1000) / 1000, + p90_ms: Math.round(pXX(sorted, 90) * 1000) / 1000, + p99_ms: Math.round(pXX(sorted, 99) * 1000) / 1000, + min_ms: ops > 0 ? Math.round(sorted[0]! * 1000) / 1000 : NaN, + max_ms: ops > 0 ? Math.round(sorted[ops - 1]! * 1000) / 1000 : NaN, + }; +} + +async function seedMemories( + baseUrl: string, + count: number, + rng: () => number, + seedConcurrency = 32, +): Promise<{ seeded: number; errors: number; wallMs: number }> { + let issued = 0; + let seeded = 0; + let errors = 0; + const t0 = performance.now(); + async function worker(): Promise { + while (true) { + const i = issued++; + if (i >= count) return; + const body = JSON.stringify({ + content: buildContent(rng, i), + type: "observation", + }); + try { + const res = await fetch(`${baseUrl}/agentmemory/remember`, { + method: "POST", + headers: { "content-type": "application/json" }, + body, + signal: AbortSignal.timeout(30_000), + }); + if (res.ok) { + seeded++; + } else { + errors++; + } + // drain body to free the socket + await res.text().catch(() => ""); + } catch { + errors++; + } + } + } + await Promise.allSettled( + Array.from({ length: seedConcurrency }, () => worker()), + ); + return { seeded, errors, wallMs: performance.now() - t0 }; +} + +async function measureRemember( + baseUrl: string, + rng: () => number, + N: number, + C: number, + ops: number, +): Promise { + const { latencies, errors, wallMs } = await driveLoad(C, ops, async (i) => { + const body = JSON.stringify({ + content: buildContent(rng, N + i), + type: "observation", + }); + const res = await fetch(`${baseUrl}/agentmemory/remember`, { + method: "POST", + headers: { "content-type": "application/json" }, + body, + signal: AbortSignal.timeout(30_000), + }); + await res.text().catch(() => ""); + if (!res.ok) throw new Error(`HTTP ${res.status}`); + }); + return summarize("POST /agentmemory/remember", N, C, latencies, errors, wallMs); +} + +async function measureSmartSearch( + baseUrl: string, + rng: () => number, + N: number, + C: number, + ops: number, +): Promise { + const queries = Array.from({ length: 32 }, (_, i) => buildContent(rng, i)); + const { latencies, errors, wallMs } = await driveLoad(C, ops, async (i) => { + const body = JSON.stringify({ + query: queries[i % queries.length], + limit: 5, + }); + const res = await fetch(`${baseUrl}/agentmemory/smart-search`, { + method: "POST", + headers: { "content-type": "application/json" }, + body, + signal: AbortSignal.timeout(30_000), + }); + await res.text().catch(() => ""); + if (!res.ok) throw new Error(`HTTP ${res.status}`); + }); + return summarize( + "POST /agentmemory/smart-search", + N, + C, + latencies, + errors, + wallMs, + ); +} + +async function measureMemoriesLatest( + baseUrl: string, + N: number, + C: number, + ops: number, +): Promise { + const { latencies, errors, wallMs } = await driveLoad(C, ops, async () => { + const res = await fetch(`${baseUrl}/agentmemory/memories?latest=true`, { + method: "GET", + signal: AbortSignal.timeout(30_000), + }); + await res.text().catch(() => ""); + if (!res.ok) throw new Error(`HTTP ${res.status}`); + }); + return summarize( + "GET /agentmemory/memories?latest=true", + N, + C, + latencies, + errors, + wallMs, + ); +} + +interface RunReport { + schema_version: 1; + generated_at: string; + git_sha: string; + base_url: string; + seed: number; + matrix: { N: number[]; C: number[] }; + ops_per_cell: number; + cells: CellResult[]; + notes: string; +} + +async function main(): Promise { + const cfg = loadConfig(); + console.log( + `[load-100k] base=${cfg.baseUrl} N=${cfg.Ns.join(",")} C=${cfg.Cs.join(",")} ops/cell=${cfg.opsPerCell} seed=${cfg.seed}`, + ); + + let daemon: ChildProcess | null = null; + if (cfg.autoStart) { + console.log("[load-100k] AGENTMEMORY_BENCH_AUTOSTART=1 — spawning daemon"); + daemon = maybeStartDaemon(); + } + + try { + console.log("[load-100k] waiting for /agentmemory/livez (timeout 30s)"); + await waitForLivez(cfg.baseUrl, 30_000); + + const cells: CellResult[] = []; + // N sorted ascending so each cell builds on the previous seed work. + const sortedNs = cfg.Ns.slice().sort((a, b) => a - b); + let seededSoFar = 0; + for (const N of sortedNs) { + const delta = N - seededSoFar; + if (delta > 0) { + console.log( + `[load-100k] seeding ${delta} memories (target N=${N})`, + ); + const rng = mulberry32(cfg.seed + seededSoFar); + const seedRes = await seedMemories(cfg.baseUrl, delta, rng); + seededSoFar += seedRes.seeded; + console.log( + `[load-100k] seeded=${seedRes.seeded} errors=${seedRes.errors} wall=${( + seedRes.wallMs / 1000 + ).toFixed(2)}s`, + ); + if (seedRes.errors > 0 && seedRes.seeded === 0) { + throw new Error( + `seeding produced 0 successes and ${seedRes.errors} errors — daemon misconfigured`, + ); + } + } + + for (const C of cfg.Cs) { + const probeRng = mulberry32(cfg.seed ^ (N * 0x9e3779b1) ^ C); + + console.log(`[load-100k] cell N=${N} C=${C} remember`); + const remember = await measureRemember( + cfg.baseUrl, + probeRng, + N, + C, + cfg.opsPerCell, + ); + cells.push(remember); + + console.log(`[load-100k] cell N=${N} C=${C} smart-search`); + const search = await measureSmartSearch( + cfg.baseUrl, + mulberry32(cfg.seed ^ (N * 0x85ebca77) ^ C), + N, + C, + cfg.opsPerCell, + ); + cells.push(search); + + console.log(`[load-100k] cell N=${N} C=${C} memories?latest=true`); + const memories = await measureMemoriesLatest( + cfg.baseUrl, + N, + C, + cfg.opsPerCell, + ); + cells.push(memories); + } + } + + const report: RunReport = { + schema_version: 1, + generated_at: new Date().toISOString(), + git_sha: shortGitSha(), + base_url: cfg.baseUrl, + seed: cfg.seed, + matrix: { N: sortedNs, C: cfg.Cs.slice() }, + ops_per_cell: cfg.opsPerCell, + cells, + notes: + "Single-process load harness. Latency in milliseconds. " + + "Throughput is wall-clock ops/sec for the cell (concurrent in-flight = C).", + }; + + mkdirSync(cfg.outDir, { recursive: true }); + const outPath = join(cfg.outDir, `load-100k-${report.git_sha}.json`); + writeFileSync(outPath, JSON.stringify(report, null, 2) + "\n", "utf8"); + console.log(`[load-100k] wrote ${outPath} (${cells.length} cells)`); + + // Compact table to stdout for the verification commit. + console.log(""); + console.log( + [ + "endpoint".padEnd(40), + "N".padStart(8), + "C".padStart(4), + "ops".padStart(6), + "err".padStart(4), + "p50_ms".padStart(8), + "p90_ms".padStart(8), + "p99_ms".padStart(8), + "tp/s".padStart(9), + ].join(" "), + ); + for (const c of cells) { + console.log( + [ + c.endpoint.padEnd(40), + String(c.N).padStart(8), + String(c.C).padStart(4), + String(c.ops).padStart(6), + String(c.errors).padStart(4), + c.p50_ms.toFixed(2).padStart(8), + c.p90_ms.toFixed(2).padStart(8), + c.p99_ms.toFixed(2).padStart(8), + c.throughput_per_sec.toFixed(2).padStart(9), + ].join(" "), + ); + } + } finally { + if (daemon) { + daemon.kill("SIGTERM"); + // give it 2s to exit cleanly before SIGKILL + await new Promise((resolveFn) => { + const t = setTimeout(() => { + try { + daemon!.kill("SIGKILL"); + } catch { + /* already dead */ + } + resolveFn(); + }, 2000); + daemon!.once("exit", () => { + clearTimeout(t); + resolveFn(); + }); + }); + } + } +} + +main().catch((err) => { + console.error("[load-100k] failed:", err instanceof Error ? err.stack : err); + process.exit(1); +}); diff --git a/benchmark/results/load-100k-96c0ed0.json b/benchmark/results/load-100k-96c0ed0.json new file mode 100644 index 00000000..726a0023 --- /dev/null +++ b/benchmark/results/load-100k-96c0ed0.json @@ -0,0 +1,61 @@ +{ + "schema_version": 1, + "generated_at": "2026-05-13T19:49:26.116Z", + "git_sha": "96c0ed0", + "base_url": "http://localhost:3111", + "seed": 12648430, + "matrix": { + "N": [ + 1000 + ], + "C": [ + 10 + ] + }, + "ops_per_cell": 200, + "cells": [ + { + "endpoint": "POST /agentmemory/remember", + "N": 1000, + "C": 10, + "ops": 200, + "errors": 0, + "wall_ms": 11504.64, + "throughput_per_sec": 17.38, + "p50_ms": 577.435, + "p90_ms": 607.335, + "p99_ms": 675.269, + "min_ms": 64.46, + "max_ms": 683.164 + }, + { + "endpoint": "POST /agentmemory/smart-search", + "N": 1000, + "C": 10, + "ops": 200, + "errors": 0, + "wall_ms": 3264.572, + "throughput_per_sec": 61.26, + "p50_ms": 160.064, + "p90_ms": 185.608, + "p99_ms": 224.354, + "min_ms": 98.498, + "max_ms": 251.317 + }, + { + "endpoint": "GET /agentmemory/memories?latest=true", + "N": 1000, + "C": 10, + "ops": 200, + "errors": 0, + "wall_ms": 8051.764, + "throughput_per_sec": 24.84, + "p50_ms": 395.462, + "p90_ms": 475.714, + "p99_ms": 542.648, + "min_ms": 158.79, + "max_ms": 635.331 + } + ], + "notes": "Single-process load harness. Latency in milliseconds. Throughput is wall-clock ops/sec for the cell (concurrent in-flight = C)." +} diff --git a/package.json b/package.json index 5526fbea..b7b3a50a 100644 --- a/package.json +++ b/package.json @@ -24,7 +24,8 @@ "test": "vitest run --exclude test/integration.test.ts", "test:watch": "vitest --exclude test/integration.test.ts", "test:integration": "vitest run test/integration.test.ts", - "test:all": "vitest run" + "test:all": "vitest run", + "bench:load": "node --import tsx benchmark/load-100k.ts" }, "keywords": [ "ai",