Skip to content

bench(atomicassets): HTTP load harness — WormDB vs the Postgres atomicassets-api#12

Open
igorls wants to merge 3 commits into
mainfrom
bench/aa-http-loadgen
Open

bench(atomicassets): HTTP load harness — WormDB vs the Postgres atomicassets-api#12
igorls wants to merge 3 commits into
mainfrom
bench/aa-http-loadgen

Conversation

@igorls
Copy link
Copy Markdown
Member

@igorls igorls commented Jun 7, 2026

What

benchmark/atomicassets/validate/http-bench.mjs — an end-to-end HTTP latency + throughput load harness for the AtomicAssets read path, comparing one or more endpoints under the same query corpus. Cycle B made WormDB serve the identical eosio-contract-api shape + query params as the reference Postgres atomicassets-api, so the same URLs hit both targets and the comparison is apples-to-apples.

This is the served-HTTP p50/95/99 + throughput half of the proving story. The existing WSEG_RESULTS.md micro-bench already covers the storage win (~33×) + µs in-process lookups; this measures what a consumer actually sees over HTTP, head-to-head.

How it works

  • Corpus — samples real ids / owners / collections / (coll,schema) pairs from a source endpoint, across newest+oldest pages for variety.
  • Mix — weighted point (/assets/:id), coll, owner, faceted (coll+schema), browse, account; override via MIX=point=50,coll=20,….
  • LoadN requests per target, or DURATION=<s> steady-state; C concurrent workers; warms caches first; targets run sequentially so the client never self-contends.
  • ResourceSTATS_WORMDB / STATS_ATOMIC sample container CPU%/RSS via docker stats during the run (skipped if docker is absent).
  • Output — per-type + overall p50/95/99 (min/mean/max + a latency histogram in the JSON), req/s, and a side-by-side table; writes <OUT>.json + <OUT>.md as a committable artifact.

Portable ESM (node or bun), env-driven — see the README env table.

Validation

Run against the live jungle4 aa-wormdb (harness validation, not a proving number — Windows Docker-Desktop loopback adds ~2–4 ms and a 1552-asset testnet segment makes postings trivial):

8s / c=20 → 6502 req/s, overall p50 2.54ms / p95 6.48ms / p99 10.02ms, RSS ~74 MiB, 0 errors

JSON + markdown emitted with the per-type breakdown, histogram, and resource sample.

Proving run (later, not in this PR)

WAX-232M on native Linux, both targets on the same data. Needs a v2-segment WormDB endpoint (the WSEG_RESULTS segment was the Rust-POC v1 format; the current Zig reader is ASSET_VERSION 2, fail-closed on v1 → a rebuild) + an atomicassets-api instance (a dedicated/replica box, not production under live traffic).

🤖 Generated with Claude Code

igorls added 2 commits June 7, 2026 06:03
…DB vs atomicassets-api)

End-to-end served-latency + throughput driver for the AtomicAssets read path. Cycle B
made WormDB serve the identical eosio-contract-api shape + query params as the reference
Postgres atomicassets-api, so the same URL corpus hits both targets.

- Samples a real corpus (asset_ids / owners / collections / (coll,schema) pairs) from a
  source endpoint, then runs a weighted mixed workload (point / collection / owner /
  faceted / browse / account) against each target under C concurrent workers.
- Reports per-query-type + overall p50/p95/p99 latency and sustained req/s; warms caches
  first; runs targets sequentially so the client doesn't self-contend.
- Env-driven: WORMDB / ATOMIC base URLs, N, C, SAMPLE, SAMPLE_FROM. Portable ESM
  (node or bun). Resource use (CPU/RSS) is sampled separately per host while it runs.

Validated against the jungle4 wormdb-aa endpoint (0 errors, full per-type breakdown). The
WAX-232M side-by-side vs the production atomicassets-api is the proving run (remote env).
… results files, mix

Follow-up hardening on the load harness:
- DURATION=<s> steady-state mode (each worker loops to a deadline) alongside N-per-target.
- STATS_WORMDB / STATS_ATOMIC sample container CPU%/RSS via `docker stats` during that
  target's run (self-scheduling --no-stream polls; silently skipped if docker is absent).
- Writes <OUT>.json + <OUT>.md — per-type + overall p50/95/99, min/mean/max, a latency
  histogram, resource use, and a side-by-side table — a committable proving artifact.
- MIX=type=w,… overrides the query weights; corpus now sampled across newest+oldest pages
  for collection/owner variety.
- README: a benchmark section (env table + the proving-run caveat: WAX-232M on native
  Linux is the real test; a Windows-loopback jungle4 run only validates the harness).

Validated on jungle4 wormdb-aa: 8s/c20 -> 6.5k req/s, p50 2.5ms / p99 10ms, RSS ~74MiB,
0 errors; JSON+MD emitted with the histogram + resource sample.
Copilot AI review requested due to automatic review settings June 7, 2026 09:08
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: ccc502ea63

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

if (DURATION && now() >= deadline) break;
const m = pickMix();
const s = now();
const ok = await fetch(`${target.base}${m.url()}`).then((r) => r.text()).then(() => true).catch(() => false);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Treat non-2xx responses as failed requests

When a target returns an HTTP error for one of the benchmarked URLs, this still records the request as successful because any resolved fetch() response is converted to text and then true. In contexts where one implementation is missing a route/query shape or returns 4xx/5xx for part of the sampled corpus, those fast error pages are included in latency and throughput with errors=0, making the side-by-side benchmark look valid while measuring failures instead of served API responses.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a new HTTP load benchmark script (http-bench.mjs) and updates the README.md to document its usage for comparing WormDB and Postgres atomicassets-api read paths. The review feedback highlights several key improvement opportunities and potential bugs in the benchmark script, including: correcting HTTP error handling in fetch calls to prevent non-2xx responses from being marked as successful, adding timeouts to prevent the script from hanging, handling potential crashes if the query mix is empty, improving the percentile calculation accuracy, and robustly parsing Docker memory usage when reported in bytes.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

if (DURATION && now() >= deadline) break;
const m = pickMix();
const s = now();
const ok = await fetch(`${target.base}${m.url()}`).then((r) => r.text()).then(() => true).catch(() => false);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

There are two issues here:

  1. HTTP Error Handling: fetch only rejects on network failures. If the server returns a non-2xx error (e.g., 500 Internal Server Error or 502 Bad Gateway), fetch resolves successfully and r.text() succeeds, meaning the request is incorrectly counted as successful (ok = true). This skews latency and throughput results and hides server failures.
  2. Missing Timeout: If a request hangs during the benchmark, it will block that concurrent worker indefinitely. Adding a timeout (e.g., 10 seconds) ensures the benchmark doesn't hang.

Checking r.ok and adding AbortSignal.timeout resolves both issues.

        const ok = await fetch(`${target.base}${m.url()}`, { signal: AbortSignal.timeout(10000) })
          .then((r) => r.ok ? r.text().then(() => true) : false)
          .catch(() => false);

Comment on lines +48 to +55
async function getJson(url) {
try {
const r = await fetch(url);
return r.ok ? await r.json() : null;
} catch {
return null;
}
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The fetch call in getJson does not have a timeout. If the target server is unresponsive or slow during the initial corpus sampling phase, the benchmark script will hang indefinitely. Adding a timeout using AbortSignal.timeout ensures the script fails fast and gracefully.

async function getJson(url) {
  try {
    const r = await fetch(url, { signal: AbortSignal.timeout(10000) });
    return r.ok ? await r.json() : null;
  } catch {
    return null;
  }
}

Comment on lines +96 to +97
].map((m) => ({ ...m, w: W[m.type] })).filter((m) => m.w > 0 && have[m.type] > 0);
const totalW = sum(MIX.map((m) => m.w));
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

If the sampled corpus has no valid assets, collections, or owners, or if the MIX weights filter out all query types, the MIX array will be empty. This causes pickMix() to return undefined, leading to a TypeError crash when trying to call pickMix().url(). Adding a check here prevents this crash and provides a clear error message.

].map((m) => ({ ...m, w: W[m.type] })).filter((m) => m.w > 0 && have[m.type] > 0);
if (!MIX.length) {
  console.error("[bench] error: weighted query mix is empty. Check your sample data or MIX weights.");
  process.exit(1);
}
const totalW = sum(MIX.map((m) => m.w));

}

// ── stats helpers ──
const pctile = (a, p) => (a.length ? a[Math.min(a.length - 1, Math.floor(a.length * p))] : NaN);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The current percentile calculation using Math.floor(a.length * p) with Math.min can collapse values (e.g., p99 and max on small arrays) and is slightly off from standard nearest-rank percentile definitions. Using Math.round(p * (a.length - 1)) is simpler, standard, and maps 0.0 to 0 and 1.0 to a.length - 1 perfectly without needing Math.min.

const pctile = (a, p) => (a.length ? a[Math.round(p * (a.length - 1))] : NaN);

Comment on lines +135 to +142
const m = out.trim().match(/([\d.]+)%\s*;\s*([\d.]+)\s*([KMGi]+)/i);
if (m) {
let mem = parseFloat(m[2]);
const u = m[3].toLowerCase();
if (u.startsWith("g")) mem *= 1024;
else if (u.startsWith("k")) mem /= 1024; // -> MiB
samples.push({ cpu: parseFloat(m[1]), mem });
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The regular expression for parsing memory usage does not match if the unit is bytes (B or b), which can happen if the container is idle or has extremely low memory usage. Updating the regex to support B (and T for completeness) and handling the conversion to MiB makes the resource sampler much more robust.

      const m = out.trim().match(/([\d.]+)%\s*;\s*([\d.]+)\s*([KMGTB]i?)/i);
      if (m) {
        let mem = parseFloat(m[2]);
        const u = m[3].toLowerCase();
        if (u.startsWith("g")) mem *= 1024;
        else if (u.startsWith("t")) mem *= 1024 * 1024;
        else if (u.startsWith("k")) mem /= 1024;
        else if (u.startsWith("b")) mem /= 1024 * 1024; // -> MiB
        samples.push({ cpu: parseFloat(m[1]), mem });
      }

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds an end-to-end HTTP load benchmarking harness for the AtomicAssets read APIs to compare WormDB vs the reference Postgres atomicassets-api using the same sampled query corpus, producing committable JSON/Markdown results.

Changes:

  • Introduce http-bench.mjs to sample real query inputs, run a weighted request mix at configurable concurrency, and report p50/p95/p99 + req/s (plus optional docker stats CPU/RSS sampling).
  • Extend the validation README with usage instructions and the environment-variable reference table for the new harness.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 5 comments.

File Description
benchmark/atomicassets/validate/README.md Documents how to run the new HTTP benchmark harness and its env configuration.
benchmark/atomicassets/validate/http-bench.mjs New load harness that samples a corpus, executes a weighted request mix, gathers latency/throughput stats, and writes JSON/MD artifacts.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

}

// ── stats helpers ──
const pctile = (a, p) => (a.length ? a[Math.min(a.length - 1, Math.floor(a.length * p))] : NaN);
Comment on lines +97 to +102
const totalW = sum(MIX.map((m) => m.w));
function pickMix() {
let r = Math.random() * totalW;
for (const m of MIX) if ((r -= m.w) < 0) return m;
return MIX[0];
}
Comment on lines +50 to +51
const r = await fetch(url);
return r.ok ? await r.json() : null;
const all = [];
let errs = 0, done = 0;
// warm up (fill caches / JIT) before measuring + before sampling resources
await Promise.all(Array.from({ length: Math.min(C, 20) }, async () => { for (let i = 0; i < 5; i++) { try { await fetch(`${target.base}${pickMix().url()}`).then((r) => r.text()); } catch {} } }));
if (DURATION && now() >= deadline) break;
const m = pickMix();
const s = now();
const ok = await fetch(`${target.base}${m.url()}`).then((r) => r.text()).then(() => true).catch(() => false);
…lean exits

Bot review (Codex/Gemini/Copilot) + an adversarial multi-lens self-review:

- Count non-2xx/timeout/network failures as errors, never as fast responses: drain the
  body and return r.ok; per-request timeout via AbortController+clearTimeout (fetchT).
- req/s counts SUCCESSFUL requests only, so fast error pages can't inflate throughput;
  a nonzero error count loudly flags the run as suspect.
- Percentile = nearest-rank Math.round(p*(n-1)) — no p99==max collapse on small n.
- docker-stats mem regex handles B/KiB/MiB/GiB/TiB (was MiB/GiB only).
- Empty query mix fails fast with a clear message.
- Wrap execution in main() + process.exitCode (no abrupt process.exit while undici
  sockets are open — that tripped a libuv "handle closing" assertion on Windows).
- Fairness (adversarial review, confirmed): with 2+ targets, reduce the corpus to the
  cross-target INTERSECTION so a divergent dataset (live API vs lagging local) can't make
  targets do different work for the same URL; record dropped counts in JSON `coverage` so
  any divergence is visible. README documents the same-data assumption.

Verified on jungle4: single-target unaffected (intersection gated off); 2-target run
(same endpoint) drops 0, emits the side-by-side + coverage; all fatal paths exit cleanly.
@igorls
Copy link
Copy Markdown
Member Author

igorls commented Jun 7, 2026

Addressed all review feedback in a22769a, plus an adversarial multi-lens self-review.

From the bots:

  • non-2xx counted as success (Codex P2, Gemini high) — now drains the body and returns r.ok; 4xx/5xx/timeouts/network errors count as failures, excluded from latency.
  • missing timeouts (Gemini, Copilot ×3 — sampling/warmup/main) — one fetchT helper wraps every request with AbortController + clearTimeout (cleared on settle, so no lingering 10s timers pile up over a big run).
  • empty MIX crash (Gemini, Copilot) — fails fast with a clear message.
  • percentile off-by-one (Gemini, Copilot) — nearest-rank Math.round(p*(n-1)), no p99==max collapse on small n.
  • docker mem regex misses bytes (Gemini) — now handles B/KiB/MiB/GiB/TiB.

Beyond the bots:

  • throughput integrityreq/s counts successful requests only, so fast error pages can't inflate it; any nonzero error count loudly flags the run as suspect.
  • clean exits — execution is wrapped in main() with process.exitCode instead of an abrupt process.exit() while undici keep-alive sockets are open (that tripped a libuv handle closing assertion → exit 127 on Windows; now a clean exit 1).
  • comparison fairness — an adversarial review flagged (and verified) that the corpus was sampled from one target only, so a divergent dataset (live API vs a lagging local WormDB) could make list queries do different work for the same URL (a miss there is HTTP 200 with a smaller page, not a flagged 404). With 2+ targets the corpus is now reduced to the cross-target intersection, and per-dimension dropped counts are recorded in the JSON coverage block so any divergence is visible. The README documents the same-data assumption.

Verified on jungle4: single-target runs are unaffected (intersection gated to 2+ targets); a 2-target run against the same endpoint drops 0 and emits the side-by-side + coverage; all fatal paths exit cleanly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants