[PoC] Add Averages to "Slow XYZ" Recorders #470
Draft
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Important
This is a proof-of-concept PR, seeking opinions and evaluations from maintainers and contributors.
What is this?
This is an extension to the already existing "Slow XYZ" recorders, e.g. the
SlowRequestsrecorder. With the newly collected data, the Slow Request card now looks like this:Disclaimer
Note
The idea for this isn't new. It was first raised in #281. There was also a draft implementation in #284, from which I took some inspiration. For my own implementation, I came up with a solution for the main issues that were discussed in that first draft:
averagevalue should be recorded – the average of all requests or the average of requests above the threshold?What has changed?
For now, this proof-of-concept implementation only extends the functionality of the
SlowRequestsrecorder.Here's what's changed:
averageduration or all requests.averageduration, making it more obvious what the “average” value represents.averageandtotalto the sorting options.ⓘwith a short explanation about the shown metrics.How has this been implemented?
The recorder now tracks all requests by default and calculates their
maxandavgdurations. Additionally, if a request duration exceeds the configured threshold, it also increments thecountmetric.For tracking the total number of requests, several approaches exist:
slow-requests-total). Then use the->count()aggregate.slow-requests). Then (mis-)use the->sum()aggregate with a static value of1.countcolumn of thepulse_aggregatestable. This column contains the total count and is used for accurate average value calculation.Evaluation
I implemented, tested, and evaluated all three approaches.
value: 1instead ofvalue: $durationDecision
I chose option 3, as its advantages seem to outweigh its disadvantages the most. The implementation is straightforward and uses already-existing data. Also, it shouldn't break neither the ingesting nor the bucket logic. What do you think?
To-Dos
SlowJobsSlowOutgoingRequestsSlowQueriesMore Screenshots