bench(atomicassets): HTTP load harness — WormDB vs the Postgres atomicassets-api#12
bench(atomicassets): HTTP load harness — WormDB vs the Postgres atomicassets-api#12igorls wants to merge 3 commits into
Conversation
…DB vs atomicassets-api) End-to-end served-latency + throughput driver for the AtomicAssets read path. Cycle B made WormDB serve the identical eosio-contract-api shape + query params as the reference Postgres atomicassets-api, so the same URL corpus hits both targets. - Samples a real corpus (asset_ids / owners / collections / (coll,schema) pairs) from a source endpoint, then runs a weighted mixed workload (point / collection / owner / faceted / browse / account) against each target under C concurrent workers. - Reports per-query-type + overall p50/p95/p99 latency and sustained req/s; warms caches first; runs targets sequentially so the client doesn't self-contend. - Env-driven: WORMDB / ATOMIC base URLs, N, C, SAMPLE, SAMPLE_FROM. Portable ESM (node or bun). Resource use (CPU/RSS) is sampled separately per host while it runs. Validated against the jungle4 wormdb-aa endpoint (0 errors, full per-type breakdown). The WAX-232M side-by-side vs the production atomicassets-api is the proving run (remote env).
… results files, mix Follow-up hardening on the load harness: - DURATION=<s> steady-state mode (each worker loops to a deadline) alongside N-per-target. - STATS_WORMDB / STATS_ATOMIC sample container CPU%/RSS via `docker stats` during that target's run (self-scheduling --no-stream polls; silently skipped if docker is absent). - Writes <OUT>.json + <OUT>.md — per-type + overall p50/95/99, min/mean/max, a latency histogram, resource use, and a side-by-side table — a committable proving artifact. - MIX=type=w,… overrides the query weights; corpus now sampled across newest+oldest pages for collection/owner variety. - README: a benchmark section (env table + the proving-run caveat: WAX-232M on native Linux is the real test; a Windows-loopback jungle4 run only validates the harness). Validated on jungle4 wormdb-aa: 8s/c20 -> 6.5k req/s, p50 2.5ms / p99 10ms, RSS ~74MiB, 0 errors; JSON+MD emitted with the histogram + resource sample.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: ccc502ea63
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| if (DURATION && now() >= deadline) break; | ||
| const m = pickMix(); | ||
| const s = now(); | ||
| const ok = await fetch(`${target.base}${m.url()}`).then((r) => r.text()).then(() => true).catch(() => false); |
There was a problem hiding this comment.
Treat non-2xx responses as failed requests
When a target returns an HTTP error for one of the benchmarked URLs, this still records the request as successful because any resolved fetch() response is converted to text and then true. In contexts where one implementation is missing a route/query shape or returns 4xx/5xx for part of the sampled corpus, those fast error pages are included in latency and throughput with errors=0, making the side-by-side benchmark look valid while measuring failures instead of served API responses.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Code Review
This pull request introduces a new HTTP load benchmark script (http-bench.mjs) and updates the README.md to document its usage for comparing WormDB and Postgres atomicassets-api read paths. The review feedback highlights several key improvement opportunities and potential bugs in the benchmark script, including: correcting HTTP error handling in fetch calls to prevent non-2xx responses from being marked as successful, adding timeouts to prevent the script from hanging, handling potential crashes if the query mix is empty, improving the percentile calculation accuracy, and robustly parsing Docker memory usage when reported in bytes.
Important
The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.
| if (DURATION && now() >= deadline) break; | ||
| const m = pickMix(); | ||
| const s = now(); | ||
| const ok = await fetch(`${target.base}${m.url()}`).then((r) => r.text()).then(() => true).catch(() => false); |
There was a problem hiding this comment.
There are two issues here:
- HTTP Error Handling:
fetchonly rejects on network failures. If the server returns a non-2xx error (e.g.,500 Internal Server Erroror502 Bad Gateway),fetchresolves successfully andr.text()succeeds, meaning the request is incorrectly counted as successful (ok = true). This skews latency and throughput results and hides server failures. - Missing Timeout: If a request hangs during the benchmark, it will block that concurrent worker indefinitely. Adding a timeout (e.g., 10 seconds) ensures the benchmark doesn't hang.
Checking r.ok and adding AbortSignal.timeout resolves both issues.
const ok = await fetch(`${target.base}${m.url()}`, { signal: AbortSignal.timeout(10000) })
.then((r) => r.ok ? r.text().then(() => true) : false)
.catch(() => false);| async function getJson(url) { | ||
| try { | ||
| const r = await fetch(url); | ||
| return r.ok ? await r.json() : null; | ||
| } catch { | ||
| return null; | ||
| } | ||
| } |
There was a problem hiding this comment.
The fetch call in getJson does not have a timeout. If the target server is unresponsive or slow during the initial corpus sampling phase, the benchmark script will hang indefinitely. Adding a timeout using AbortSignal.timeout ensures the script fails fast and gracefully.
async function getJson(url) {
try {
const r = await fetch(url, { signal: AbortSignal.timeout(10000) });
return r.ok ? await r.json() : null;
} catch {
return null;
}
}| ].map((m) => ({ ...m, w: W[m.type] })).filter((m) => m.w > 0 && have[m.type] > 0); | ||
| const totalW = sum(MIX.map((m) => m.w)); |
There was a problem hiding this comment.
If the sampled corpus has no valid assets, collections, or owners, or if the MIX weights filter out all query types, the MIX array will be empty. This causes pickMix() to return undefined, leading to a TypeError crash when trying to call pickMix().url(). Adding a check here prevents this crash and provides a clear error message.
].map((m) => ({ ...m, w: W[m.type] })).filter((m) => m.w > 0 && have[m.type] > 0);
if (!MIX.length) {
console.error("[bench] error: weighted query mix is empty. Check your sample data or MIX weights.");
process.exit(1);
}
const totalW = sum(MIX.map((m) => m.w));| } | ||
|
|
||
| // ── stats helpers ── | ||
| const pctile = (a, p) => (a.length ? a[Math.min(a.length - 1, Math.floor(a.length * p))] : NaN); |
There was a problem hiding this comment.
The current percentile calculation using Math.floor(a.length * p) with Math.min can collapse values (e.g., p99 and max on small arrays) and is slightly off from standard nearest-rank percentile definitions. Using Math.round(p * (a.length - 1)) is simpler, standard, and maps 0.0 to 0 and 1.0 to a.length - 1 perfectly without needing Math.min.
const pctile = (a, p) => (a.length ? a[Math.round(p * (a.length - 1))] : NaN);| const m = out.trim().match(/([\d.]+)%\s*;\s*([\d.]+)\s*([KMGi]+)/i); | ||
| if (m) { | ||
| let mem = parseFloat(m[2]); | ||
| const u = m[3].toLowerCase(); | ||
| if (u.startsWith("g")) mem *= 1024; | ||
| else if (u.startsWith("k")) mem /= 1024; // -> MiB | ||
| samples.push({ cpu: parseFloat(m[1]), mem }); | ||
| } |
There was a problem hiding this comment.
The regular expression for parsing memory usage does not match if the unit is bytes (B or b), which can happen if the container is idle or has extremely low memory usage. Updating the regex to support B (and T for completeness) and handling the conversion to MiB makes the resource sampler much more robust.
const m = out.trim().match(/([\d.]+)%\s*;\s*([\d.]+)\s*([KMGTB]i?)/i);
if (m) {
let mem = parseFloat(m[2]);
const u = m[3].toLowerCase();
if (u.startsWith("g")) mem *= 1024;
else if (u.startsWith("t")) mem *= 1024 * 1024;
else if (u.startsWith("k")) mem /= 1024;
else if (u.startsWith("b")) mem /= 1024 * 1024; // -> MiB
samples.push({ cpu: parseFloat(m[1]), mem });
}There was a problem hiding this comment.
Pull request overview
Adds an end-to-end HTTP load benchmarking harness for the AtomicAssets read APIs to compare WormDB vs the reference Postgres atomicassets-api using the same sampled query corpus, producing committable JSON/Markdown results.
Changes:
- Introduce
http-bench.mjsto sample real query inputs, run a weighted request mix at configurable concurrency, and report p50/p95/p99 + req/s (plus optionaldocker statsCPU/RSS sampling). - Extend the validation README with usage instructions and the environment-variable reference table for the new harness.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 5 comments.
| File | Description |
|---|---|
| benchmark/atomicassets/validate/README.md | Documents how to run the new HTTP benchmark harness and its env configuration. |
| benchmark/atomicassets/validate/http-bench.mjs | New load harness that samples a corpus, executes a weighted request mix, gathers latency/throughput stats, and writes JSON/MD artifacts. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| } | ||
|
|
||
| // ── stats helpers ── | ||
| const pctile = (a, p) => (a.length ? a[Math.min(a.length - 1, Math.floor(a.length * p))] : NaN); |
| const totalW = sum(MIX.map((m) => m.w)); | ||
| function pickMix() { | ||
| let r = Math.random() * totalW; | ||
| for (const m of MIX) if ((r -= m.w) < 0) return m; | ||
| return MIX[0]; | ||
| } |
| const r = await fetch(url); | ||
| return r.ok ? await r.json() : null; |
| const all = []; | ||
| let errs = 0, done = 0; | ||
| // warm up (fill caches / JIT) before measuring + before sampling resources | ||
| await Promise.all(Array.from({ length: Math.min(C, 20) }, async () => { for (let i = 0; i < 5; i++) { try { await fetch(`${target.base}${pickMix().url()}`).then((r) => r.text()); } catch {} } })); |
| if (DURATION && now() >= deadline) break; | ||
| const m = pickMix(); | ||
| const s = now(); | ||
| const ok = await fetch(`${target.base}${m.url()}`).then((r) => r.text()).then(() => true).catch(() => false); |
…lean exits Bot review (Codex/Gemini/Copilot) + an adversarial multi-lens self-review: - Count non-2xx/timeout/network failures as errors, never as fast responses: drain the body and return r.ok; per-request timeout via AbortController+clearTimeout (fetchT). - req/s counts SUCCESSFUL requests only, so fast error pages can't inflate throughput; a nonzero error count loudly flags the run as suspect. - Percentile = nearest-rank Math.round(p*(n-1)) — no p99==max collapse on small n. - docker-stats mem regex handles B/KiB/MiB/GiB/TiB (was MiB/GiB only). - Empty query mix fails fast with a clear message. - Wrap execution in main() + process.exitCode (no abrupt process.exit while undici sockets are open — that tripped a libuv "handle closing" assertion on Windows). - Fairness (adversarial review, confirmed): with 2+ targets, reduce the corpus to the cross-target INTERSECTION so a divergent dataset (live API vs lagging local) can't make targets do different work for the same URL; record dropped counts in JSON `coverage` so any divergence is visible. README documents the same-data assumption. Verified on jungle4: single-target unaffected (intersection gated off); 2-target run (same endpoint) drops 0, emits the side-by-side + coverage; all fatal paths exit cleanly.
|
Addressed all review feedback in From the bots:
Beyond the bots:
Verified on jungle4: single-target runs are unaffected (intersection gated to 2+ targets); a 2-target run against the same endpoint drops 0 and emits the side-by-side + coverage; all fatal paths exit cleanly. |
What
benchmark/atomicassets/validate/http-bench.mjs— an end-to-end HTTP latency + throughput load harness for the AtomicAssets read path, comparing one or more endpoints under the same query corpus. Cycle B made WormDB serve the identical eosio-contract-api shape + query params as the reference Postgresatomicassets-api, so the same URLs hit both targets and the comparison is apples-to-apples.This is the served-HTTP p50/95/99 + throughput half of the proving story. The existing
WSEG_RESULTS.mdmicro-bench already covers the storage win (~33×) + µs in-process lookups; this measures what a consumer actually sees over HTTP, head-to-head.How it works
point(/assets/:id),coll,owner,faceted(coll+schema),browse,account; override viaMIX=point=50,coll=20,….Nrequests per target, orDURATION=<s>steady-state;Cconcurrent workers; warms caches first; targets run sequentially so the client never self-contends.STATS_WORMDB/STATS_ATOMICsample container CPU%/RSS viadocker statsduring the run (skipped if docker is absent).<OUT>.json+<OUT>.mdas a committable artifact.Portable ESM (node or bun), env-driven — see the README env table.
Validation
Run against the live jungle4
aa-wormdb(harness validation, not a proving number — Windows Docker-Desktop loopback adds ~2–4 ms and a 1552-asset testnet segment makes postings trivial):JSON + markdown emitted with the per-type breakdown, histogram, and resource sample.
Proving run (later, not in this PR)
WAX-232M on native Linux, both targets on the same data. Needs a v2-segment WormDB endpoint (the
WSEG_RESULTSsegment was the Rust-POC v1 format; the current Zig reader is ASSET_VERSION 2, fail-closed on v1 → a rebuild) + anatomicassets-apiinstance (a dedicated/replica box, not production under live traffic).🤖 Generated with Claude Code