Improve error reporting in final JSON results#29
Draft
cjonas9 wants to merge 10 commits intostepped-pacerfrom
Draft
Improve error reporting in final JSON results#29cjonas9 wants to merge 10 commits intostepped-pacerfrom
cjonas9 wants to merge 10 commits intostepped-pacerfrom
Conversation
…r into improve-error-reporting
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What
Following the stepped-pacer redesign (#23):
step-intervalsnapshot timeline to the final JSON results. Each snapshot captures target RPS, success/error counts, error rate (%), and latency percentiles (p50, p95, p99, p99.9) for that window onlyVisually, the results JSON now looks like:
Why
The richer JSON gives operators a clear view of how the RPC degraded as load increased. Previously, the results JSON flattened all metrics into a single summary bucket per endpoint without any notion of an RPS-timeline, so there was no way to tell at what RPS the RPC buckled. This gives an operator running a load test a concise/minimal degradation curve (e.g. when errors started, how latency + error rates changed at each step) without requiring them to dig through through logs.
Known limitations
The intervals default to 5s when no ramp-up is provided as there's no natural "RPS phase" notion without a ramp-up stepping through various RPS levels.