bench(atomicassets): HTTP load harness — WormDB vs the Postgres atomicassets-api by igorls · Pull Request #12 · eosrio/hyperion-tools

igorls · 2026-06-07T09:08:32Z

What

benchmark/atomicassets/validate/http-bench.mjs — an end-to-end HTTP latency + throughput load harness for the AtomicAssets read path, comparing one or more endpoints under the same query corpus. Cycle B made WormDB serve the identical eosio-contract-api shape + query params as the reference Postgres atomicassets-api, so the same URLs hit both targets and the comparison is apples-to-apples.

This is the served-HTTP p50/95/99 + throughput half of the proving story. The existing WSEG_RESULTS.md micro-bench already covers the storage win (~33×) + µs in-process lookups; this measures what a consumer actually sees over HTTP, head-to-head.

How it works

Corpus — samples real ids / owners / collections / (coll,schema) pairs from a source endpoint, across newest+oldest pages for variety.
Mix — weighted point (/assets/:id), coll, owner, faceted (coll+schema), browse, account; override via MIX=point=50,coll=20,….
Load — N requests per target, or DURATION=<s> steady-state; C concurrent workers; warms caches first; targets run sequentially so the client never self-contends.
Resource — STATS_WORMDB / STATS_ATOMIC sample container CPU%/RSS via docker stats during the run (skipped if docker is absent).
Output — per-type + overall p50/95/99 (min/mean/max + a latency histogram in the JSON), req/s, and a side-by-side table; writes <OUT>.json + <OUT>.md as a committable artifact.

Portable ESM (node or bun), env-driven — see the README env table.

Validation

Run against the live jungle4 aa-wormdb (harness validation, not a proving number — Windows Docker-Desktop loopback adds ~2–4 ms and a 1552-asset testnet segment makes postings trivial):

8s / c=20 → 6502 req/s, overall p50 2.54ms / p95 6.48ms / p99 10.02ms, RSS ~74 MiB, 0 errors

JSON + markdown emitted with the per-type breakdown, histogram, and resource sample.

Proving run (later, not in this PR)

WAX-232M on native Linux, both targets on the same data. Needs a v2-segment WormDB endpoint (the WSEG_RESULTS segment was the Rust-POC v1 format; the current Zig reader is ASSET_VERSION 2, fail-closed on v1 → a rebuild) + an atomicassets-api instance (a dedicated/replica box, not production under live traffic).

🤖 Generated with Claude Code

…DB vs atomicassets-api) End-to-end served-latency + throughput driver for the AtomicAssets read path. Cycle B made WormDB serve the identical eosio-contract-api shape + query params as the reference Postgres atomicassets-api, so the same URL corpus hits both targets. - Samples a real corpus (asset_ids / owners / collections / (coll,schema) pairs) from a source endpoint, then runs a weighted mixed workload (point / collection / owner / faceted / browse / account) against each target under C concurrent workers. - Reports per-query-type + overall p50/p95/p99 latency and sustained req/s; warms caches first; runs targets sequentially so the client doesn't self-contend. - Env-driven: WORMDB / ATOMIC base URLs, N, C, SAMPLE, SAMPLE_FROM. Portable ESM (node or bun). Resource use (CPU/RSS) is sampled separately per host while it runs. Validated against the jungle4 wormdb-aa endpoint (0 errors, full per-type breakdown). The WAX-232M side-by-side vs the production atomicassets-api is the proving run (remote env).

… results files, mix Follow-up hardening on the load harness: - DURATION=<s> steady-state mode (each worker loops to a deadline) alongside N-per-target. - STATS_WORMDB / STATS_ATOMIC sample container CPU%/RSS via `docker stats` during that target's run (self-scheduling --no-stream polls; silently skipped if docker is absent). - Writes <OUT>.json + <OUT>.md — per-type + overall p50/95/99, min/mean/max, a latency histogram, resource use, and a side-by-side table — a committable proving artifact. - MIX=type=w,… overrides the query weights; corpus now sampled across newest+oldest pages for collection/owner variety. - README: a benchmark section (env table + the proving-run caveat: WAX-232M on native Linux is the real test; a Windows-loopback jungle4 run only validates the harness). Validated on jungle4 wormdb-aa: 8s/c20 -> 6.5k req/s, p50 2.5ms / p99 10ms, RSS ~74MiB, 0 errors; JSON+MD emitted with the histogram + resource sample.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: ccc502ea63

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-06-07T09:09:50Z

+        if (DURATION && now() >= deadline) break;
+        const m = pickMix();
+        const s = now();
+        const ok = await fetch(`${target.base}${m.url()}`).then((r) => r.text()).then(() => true).catch(() => false);


Treat non-2xx responses as failed requests

When a target returns an HTTP error for one of the benchmarked URLs, this still records the request as successful because any resolved fetch() response is converted to text and then true. In contexts where one implementation is missing a route/query shape or returns 4xx/5xx for part of the sampled corpus, those fast error pages are included in latency and throughput with errors=0, making the side-by-side benchmark look valid while measuring failures instead of served API responses.

Useful? React with 👍 / 👎.

gemini-code-assist

Code Review

This pull request introduces a new HTTP load benchmark script (http-bench.mjs) and updates the README.md to document its usage for comparing WormDB and Postgres atomicassets-api read paths. The review feedback highlights several key improvement opportunities and potential bugs in the benchmark script, including: correcting HTTP error handling in fetch calls to prevent non-2xx responses from being marked as successful, adding timeouts to prevent the script from hanging, handling potential crashes if the query mix is empty, improving the percentile calculation accuracy, and robustly parsing Docker memory usage when reported in bytes.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

gemini-code-assist · 2026-06-07T09:10:14Z

+        if (DURATION && now() >= deadline) break;
+        const m = pickMix();
+        const s = now();
+        const ok = await fetch(`${target.base}${m.url()}`).then((r) => r.text()).then(() => true).catch(() => false);


There are two issues here:

HTTP Error Handling: fetch only rejects on network failures. If the server returns a non-2xx error (e.g., 500 Internal Server Error or 502 Bad Gateway), fetch resolves successfully and r.text() succeeds, meaning the request is incorrectly counted as successful (ok = true). This skews latency and throughput results and hides server failures.

Missing Timeout: If a request hangs during the benchmark, it will block that concurrent worker indefinitely. Adding a timeout (e.g., 10 seconds) ensures the benchmark doesn't hang.

Checking r.ok and adding AbortSignal.timeout resolves both issues.

const ok = await fetch(`${target.base}${m.url()}`, { signal: AbortSignal.timeout(10000) }) .then((r) => r.ok ? r.text().then(() => true) : false) .catch(() => false);

gemini-code-assist · 2026-06-07T09:10:14Z

+async function getJson(url) {
+  try {
+    const r = await fetch(url);
+    return r.ok ? await r.json() : null;
+  } catch {
+    return null;
+  }
+}


The fetch call in getJson does not have a timeout. If the target server is unresponsive or slow during the initial corpus sampling phase, the benchmark script will hang indefinitely. Adding a timeout using AbortSignal.timeout ensures the script fails fast and gracefully.

async function getJson(url) { try { const r = await fetch(url, { signal: AbortSignal.timeout(10000) }); return r.ok ? await r.json() : null; } catch { return null; } }

gemini-code-assist · 2026-06-07T09:10:14Z

+].map((m) => ({ ...m, w: W[m.type] })).filter((m) => m.w > 0 && have[m.type] > 0);
+const totalW = sum(MIX.map((m) => m.w));


If the sampled corpus has no valid assets, collections, or owners, or if the MIX weights filter out all query types, the MIX array will be empty. This causes pickMix() to return undefined, leading to a TypeError crash when trying to call pickMix().url(). Adding a check here prevents this crash and provides a clear error message.

].map((m) => ({ ...m, w: W[m.type] })).filter((m) => m.w > 0 && have[m.type] > 0); if (!MIX.length) { console.error("[bench] error: weighted query mix is empty. Check your sample data or MIX weights."); process.exit(1); } const totalW = sum(MIX.map((m) => m.w));

gemini-code-assist · 2026-06-07T09:10:14Z

+}
+
+// ── stats helpers ──
+const pctile = (a, p) => (a.length ? a[Math.min(a.length - 1, Math.floor(a.length * p))] : NaN);


The current percentile calculation using Math.floor(a.length * p) with Math.min can collapse values (e.g., p99 and max on small arrays) and is slightly off from standard nearest-rank percentile definitions. Using Math.round(p * (a.length - 1)) is simpler, standard, and maps 0.0 to 0 and 1.0 to a.length - 1 perfectly without needing Math.min.

const pctile = (a, p) => (a.length ? a[Math.round(p * (a.length - 1))] : NaN);

gemini-code-assist · 2026-06-07T09:10:14Z

+      const m = out.trim().match(/([\d.]+)%\s*;\s*([\d.]+)\s*([KMGi]+)/i);
+      if (m) {
+        let mem = parseFloat(m[2]);
+        const u = m[3].toLowerCase();
+        if (u.startsWith("g")) mem *= 1024;
+        else if (u.startsWith("k")) mem /= 1024; // -> MiB
+        samples.push({ cpu: parseFloat(m[1]), mem });
+      }


The regular expression for parsing memory usage does not match if the unit is bytes (B or b), which can happen if the container is idle or has extremely low memory usage. Updating the regex to support B (and T for completeness) and handling the conversion to MiB makes the resource sampler much more robust.

const m = out.trim().match(/([\d.]+)%\s*;\s*([\d.]+)\s*([KMGTB]i?)/i); if (m) { let mem = parseFloat(m[2]); const u = m[3].toLowerCase(); if (u.startsWith("g")) mem *= 1024; else if (u.startsWith("t")) mem *= 1024 * 1024; else if (u.startsWith("k")) mem /= 1024; else if (u.startsWith("b")) mem /= 1024 * 1024; // -> MiB samples.push({ cpu: parseFloat(m[1]), mem }); }

Copilot

Pull request overview

Adds an end-to-end HTTP load benchmarking harness for the AtomicAssets read APIs to compare WormDB vs the reference Postgres atomicassets-api using the same sampled query corpus, producing committable JSON/Markdown results.

Changes:

Introduce http-bench.mjs to sample real query inputs, run a weighted request mix at configurable concurrency, and report p50/p95/p99 + req/s (plus optional docker stats CPU/RSS sampling).
Extend the validation README with usage instructions and the environment-variable reference table for the new harness.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 5 comments.

File	Description
benchmark/atomicassets/validate/README.md	Documents how to run the new HTTP benchmark harness and its env configuration.
benchmark/atomicassets/validate/http-bench.mjs	New load harness that samples a corpus, executes a weighted request mix, gathers latency/throughput stats, and writes JSON/MD artifacts.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

+}
+
+// ── stats helpers ──
+const pctile = (a, p) => (a.length ? a[Math.min(a.length - 1, Math.floor(a.length * p))] : NaN);


+const totalW = sum(MIX.map((m) => m.w));
+function pickMix() {
+  let r = Math.random() * totalW;
+  for (const m of MIX) if ((r -= m.w) < 0) return m;
+  return MIX[0];
+}


+    const r = await fetch(url);
+    return r.ok ? await r.json() : null;


+  const all = [];
+  let errs = 0, done = 0;
+  // warm up (fill caches / JIT) before measuring + before sampling resources
+  await Promise.all(Array.from({ length: Math.min(C, 20) }, async () => { for (let i = 0; i < 5; i++) { try { await fetch(`${target.base}${pickMix().url()}`).then((r) => r.text()); } catch {} } }));


+        if (DURATION && now() >= deadline) break;
+        const m = pickMix();
+        const s = now();
+        const ok = await fetch(`${target.base}${m.url()}`).then((r) => r.text()).then(() => true).catch(() => false);


…lean exits Bot review (Codex/Gemini/Copilot) + an adversarial multi-lens self-review: - Count non-2xx/timeout/network failures as errors, never as fast responses: drain the body and return r.ok; per-request timeout via AbortController+clearTimeout (fetchT). - req/s counts SUCCESSFUL requests only, so fast error pages can't inflate throughput; a nonzero error count loudly flags the run as suspect. - Percentile = nearest-rank Math.round(p*(n-1)) — no p99==max collapse on small n. - docker-stats mem regex handles B/KiB/MiB/GiB/TiB (was MiB/GiB only). - Empty query mix fails fast with a clear message. - Wrap execution in main() + process.exitCode (no abrupt process.exit while undici sockets are open — that tripped a libuv "handle closing" assertion on Windows). - Fairness (adversarial review, confirmed): with 2+ targets, reduce the corpus to the cross-target INTERSECTION so a divergent dataset (live API vs lagging local) can't make targets do different work for the same URL; record dropped counts in JSON `coverage` so any divergence is visible. README documents the same-data assumption. Verified on jungle4: single-target unaffected (intersection gated off); 2-target run (same endpoint) drops 0, emits the side-by-side + coverage; all fatal paths exit cleanly.

igorls · 2026-06-07T09:54:31Z

Addressed all review feedback in a22769a, plus an adversarial multi-lens self-review.

From the bots:

non-2xx counted as success (Codex P2, Gemini high) — now drains the body and returns r.ok; 4xx/5xx/timeouts/network errors count as failures, excluded from latency.
missing timeouts (Gemini, Copilot ×3 — sampling/warmup/main) — one fetchT helper wraps every request with AbortController + clearTimeout (cleared on settle, so no lingering 10s timers pile up over a big run).
empty MIX crash (Gemini, Copilot) — fails fast with a clear message.
percentile off-by-one (Gemini, Copilot) — nearest-rank Math.round(p*(n-1)), no p99==max collapse on small n.
docker mem regex misses bytes (Gemini) — now handles B/KiB/MiB/GiB/TiB.

Beyond the bots:

throughput integrity — req/s counts successful requests only, so fast error pages can't inflate it; any nonzero error count loudly flags the run as suspect.
clean exits — execution is wrapped in main() with process.exitCode instead of an abrupt process.exit() while undici keep-alive sockets are open (that tripped a libuv handle closing assertion → exit 127 on Windows; now a clean exit 1).
comparison fairness — an adversarial review flagged (and verified) that the corpus was sampled from one target only, so a divergent dataset (live API vs a lagging local WormDB) could make list queries do different work for the same URL (a miss there is HTTP 200 with a smaller page, not a flagged 404). With 2+ targets the corpus is now reduced to the cross-target intersection, and per-dimension dropped counts are recorded in the JSON coverage block so any divergence is visible. The README documents the same-data assumption.

Verified on jungle4: single-target runs are unaffected (intersection gated to 2+ targets); a 2-target run against the same endpoint drops 0 and emits the side-by-side + coverage; all fatal paths exit cleanly.

igorls added 2 commits June 7, 2026 06:03

Copilot AI review requested due to automatic review settings June 7, 2026 09:08

Copilot started reviewing on behalf of igorls June 7, 2026 09:08 View session

chatgpt-codex-connector Bot reviewed Jun 7, 2026

View reviewed changes

gemini-code-assist Bot reviewed Jun 7, 2026

View reviewed changes

Copilot AI reviewed Jun 7, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

bench(atomicassets): HTTP load harness — WormDB vs the Postgres atomicassets-api#12

bench(atomicassets): HTTP load harness — WormDB vs the Postgres atomicassets-api#12
igorls wants to merge 3 commits into
mainfrom
bench/aa-http-loadgen

igorls commented Jun 7, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

chatgpt-codex-connector Bot Jun 7, 2026

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

gemini-code-assist Bot Jun 7, 2026

Uh oh!

gemini-code-assist Bot Jun 7, 2026

Uh oh!

gemini-code-assist Bot Jun 7, 2026

Uh oh!

gemini-code-assist Bot Jun 7, 2026

Uh oh!

gemini-code-assist Bot Jun 7, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

igorls commented Jun 7, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		].map((m) => ({ ...m, w: W[m.type] })).filter((m) => m.w > 0 && have[m.type] > 0);
		const totalW = sum(MIX.map((m) => m.w));

		const r = await fetch(url);
		return r.ok ? await r.json() : null;

Conversation

igorls commented Jun 7, 2026

What

How it works

Validation

Proving run (later, not in this PR)

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot Jun 7, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot Jun 7, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Jun 7, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Jun 7, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Jun 7, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Jun 7, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

igorls commented Jun 7, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants