bench: endpoint benchmark harness for prod ↔ preview cutover comparison#101
Draft
alastairong1 wants to merge 12 commits into
Draft
bench: endpoint benchmark harness for prod ↔ preview cutover comparison#101alastairong1 wants to merge 12 commits into
alastairong1 wants to merge 12 commits into
Conversation
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
… README) - Fix 1: guard empty discover_keys loop so empty string isn't iterated - Fix 2: null oha percentiles pass through as null (not 0); statusCodeDistribution null survives to_entries via type guard - Fix 3: derive report timestamp from prod result filename so all artifacts share a stem - Fix 5: copy oha .err file to bench/results/.<name>.err on non-zero exit for post-mortem - Fix 7: README correctly states oha is not in nix develop; python3 version note - Fix 8: clearer Python error when neither tomllib nor tomli are available Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
Important Review skippedDraft detected. Please check the settings in the CodeRabbit UI or the ⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
- oha 1.14 replaced -j with --output-format json - Summary now derives total_requests from statusCodeDistribution sum (oha doesn't emit a top-level request count) and uses summary.successRate directly when present - Discover now reads camelCase fields (orderHash, txHash) and uses the trades-by-token endpoint to source orderHash + txHash, since orders can exist with zero trades - Added defensive skip when summary computation produces empty output
- oha's summary.successRate counts 4xx/5xx as successful (it means 'transport-level success'). Derive HTTP-level success from statusCodeDistribution where status < 400. Without this fix the harness reported 100% success on endpoints that were returning 404 or 429 on every request. - Surface the per-endpoint status-code histogram in the summary block so failures aren't masked by rolled-up percentages. - Discover now walks every token in /v1/tokens until one has at least one trade, so order_hash and tx_hash fixtures populate even when the first token has no associated trades. - Use a portable while-read array idiom (mapfile is bash 4+ only).
…env) Discovery now writes a target-specific fixture file so each host benches against values that exist in its own data, avoiding cross-host 404s. all.sh now runs discovery against both hosts before benching.
…te limit The API's per-key rate limit is 60 rpm and the global limit is 600 rpm (see config/rest-api.toml). The previous defaults (50 reqs at concurrency 5) ran at ~25 req/sec per endpoint and produced 429-saturated results on every endpoint touching uncached data — the harness was the wrong tool, not the servers being slow. New defaults: BENCH_REQUESTS=30, BENCH_CONCURRENCY=1, BENCH_QPS=0.8, BENCH_INTER_ENDPOINT_SLEEP=5 This stays under the 60 rpm budget on a rolling 60s window across all 15 endpoints. One host completes in ~10 min; both hosts in ~21 min. Override via env vars for soak / ceiling tests when running with a high-rate-limit key.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Adds a local-only benchmark harness under
bench/that compares response speed and reliability of every safe-to-bench endpoint betweenapi.preview.st0x.ioandapi.st0x.io. Used today to gate the preview→prod cutover; structured so it can be wired into CI later (deferred).oha(load gen) +jq+ bash. No Rust code touched.bench/.envandbench/results/*are gitignored.Cutover findings (using this harness)
Preview build vs current prod, apples-to-apples (same fixtures both hosts):
Reliability — preview fixes broken-on-prod functionality:
Performance:
Non-blocking follow-ups:
Usage
```bash
cp bench/.env.example bench/.env # fill in BENCH_USER and BENCH_PASS
cargo install oha # not yet in nix shell
bench/all.sh # ~21 min for both hosts
→ bench/results/-compare.md
```
See `bench/README.md` for tunables and what's deferred.
Test plan
🤖 Generated with Claude Code