bench: endpoint benchmark harness for prod ↔ preview cutover comparison by alastairong1 · Pull Request #101 · ST0x-Technology/st0x.rest.api

alastairong1 · 2026-05-06T15:19:54Z

Summary

Adds a local-only benchmark harness under bench/ that compares response speed and reliability of every safe-to-bench endpoint between api.preview.st0x.io and api.st0x.io. Used today to gate the preview→prod cutover; structured so it can be wired into CI later (deferred).

15 idempotent read endpoints covered (mutating endpoints excluded by design)
Tooling: oha (load gen) + jq + bash. No Rust code touched.
Auto-discovers fixtures (token addresses, an order hash, an owner, a tx hash) per host
Runs at 0.8 qps × 1 concurrency × 30 reqs/endpoint to stay under the 60 rpm per-key rate limit (`config/rest-api.toml`)
Emits per-host JSON + a side-by-side markdown report with advisory regression thresholds (p95 +25%, success drop −2pp)

bench/.env and bench/results/* are gitignored.

Cutover findings (using this harness)

Preview build vs current prod, apples-to-apples (same fixtures both hosts):

Reliability — preview fixes broken-on-prod functionality:

`/v1/trades/{owner}` → 500 on prod, 200 on preview
`/v1/trades/tx/{tx_hash}` → 500 on prod, 200 on preview
`/v1/trades/token/{addr}`, `/v1/trades/taker/{addr}` → 404 on prod, 200 on preview
`/health/detailed`, `/registry/history`, `/v1/trades/batch` → 404 on prod (routes not deployed), 200 on preview

Performance:

`/v1/orders/token/{addr}` p95: 4.8 s on prod → 1.1 s on preview (4.5× faster)
Working-on-both endpoints (`health`, `tokens`, `registry`) comparable

Non-blocking follow-ups:

`order-by-hash` p95 ≈ 11 s on both hosts (RPC multicall live-quote refresh — UX concern, not a regression)
`swap-quote` / `swap-calldata` / `trades-batch` body templates in `endpoints.toml` are wrong (both hosts return 422 identically). Harness gap, not server issue.

Usage

```bash
cp bench/.env.example bench/.env # fill in BENCH_USER and BENCH_PASS
cargo install oha # not yet in nix shell
bench/all.sh # ~21 min for both hosts

→ bench/results/-compare.md

```

See `bench/README.md` for tunables and what's deferred.

Test plan

Smoke test against preview with auth — all 15 endpoints return real data, schema parses correctly
Verified rate-limit budget: 0.8 qps stays under per-key 60 rpm; spot 429s on cached endpoints only
Verified bug fixes hold (oha 1.14 JSON schema, null percentile handling, success-rate derivation from status codes not oha's transport-level field)
Full apples-to-apples bench against both hosts produced actionable cutover findings (above)
CI integration — deferred per scope decision; `bench/all.sh` exits 0 on advisory regressions, hooks ready for future job

🤖 Generated with Claude Code

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

… README) - Fix 1: guard empty discover_keys loop so empty string isn't iterated - Fix 2: null oha percentiles pass through as null (not 0); statusCodeDistribution null survives to_entries via type guard - Fix 3: derive report timestamp from prod result filename so all artifacts share a stem - Fix 5: copy oha .err file to bench/results/.<name>.err on non-zero exit for post-mortem - Fix 7: README correctly states oha is not in nix develop; python3 version note - Fix 8: clearer Python error when neither tomllib nor tomli are available Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

coderabbitai · 2026-05-06T15:20:05Z

Important

Review skipped

Draft detected.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 103c92ba-2e76-4036-8d47-6dd8acf2bc77

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

🔍 Trigger review

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch bench/endpoint-harness

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

- oha 1.14 replaced -j with --output-format json - Summary now derives total_requests from statusCodeDistribution sum (oha doesn't emit a top-level request count) and uses summary.successRate directly when present - Discover now reads camelCase fields (orderHash, txHash) and uses the trades-by-token endpoint to source orderHash + txHash, since orders can exist with zero trades - Added defensive skip when summary computation produces empty output

- oha's summary.successRate counts 4xx/5xx as successful (it means 'transport-level success'). Derive HTTP-level success from statusCodeDistribution where status < 400. Without this fix the harness reported 100% success on endpoints that were returning 404 or 429 on every request. - Surface the per-endpoint status-code histogram in the summary block so failures aren't masked by rolled-up percentages. - Discover now walks every token in /v1/tokens until one has at least one trade, so order_hash and tx_hash fixtures populate even when the first token has no associated trades. - Use a portable while-read array idiom (mapfile is bash 4+ only).

…env) Discovery now writes a target-specific fixture file so each host benches against values that exist in its own data, avoiding cross-host 404s. all.sh now runs discovery against both hosts before benching.

…te limit The API's per-key rate limit is 60 rpm and the global limit is 600 rpm (see config/rest-api.toml). The previous defaults (50 reqs at concurrency 5) ran at ~25 req/sec per endpoint and produced 429-saturated results on every endpoint touching uncached data — the harness was the wrong tool, not the servers being slow. New defaults: BENCH_REQUESTS=30, BENCH_CONCURRENCY=1, BENCH_QPS=0.8, BENCH_INTER_ENDPOINT_SLEEP=5 This stays under the 60 rpm budget on a rolling 60s window across all 15 endpoints. One host completes in ~10 min; both hosts in ~21 min. Override via env vars for soak / ceiling tests when running with a high-rate-limit key.

alastairong1 and others added 8 commits May 6, 2026 15:00

bench: scaffold directory and gitignore

2a68b92

bench: declare endpoint coverage (14 idempotent reads)

414a2c4

bench: add shared shell helpers

560f8b5

bench: add fixture discovery script

eb3992c

bench: add per-host runner using oha

2b75f1e

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

bench: add comparison report generator

2b4a51f

bench: add top-level entry point and README

c079cfe

alastairong1 added 4 commits May 7, 2026 11:51

bench: per-host fixtures (.discovered-prod.env / .discovered-preview.…

2ec4aa2

…env) Discovery now writes a target-specific fixture file so each host benches against values that exist in its own data, avoiding cross-host 404s. all.sh now runs discovery against both hosts before benching.

alastairong1 changed the title ~~bench: endpoint perf+reliability harness for preview→prod cutover~~ bench: endpoint benchmark harness for prod ↔ preview cutover comparison May 7, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

bench: endpoint benchmark harness for prod ↔ preview cutover comparison#101

bench: endpoint benchmark harness for prod ↔ preview cutover comparison#101
alastairong1 wants to merge 12 commits into
mainfrom
bench/endpoint-harness

alastairong1 commented May 6, 2026 •

edited

Loading

Uh oh!

coderabbitai Bot commented May 6, 2026 •

edited

Loading

Review skipped

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

alastairong1 commented May 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Cutover findings (using this harness)

Usage

→ bench/results/-compare.md

Test plan

Uh oh!

coderabbitai Bot commented May 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review skipped

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

alastairong1 commented May 6, 2026 •

edited

Loading

coderabbitai Bot commented May 6, 2026 •

edited

Loading