feat: mute collectors based on exception count#56
feat: mute collectors based on exception count#56kartva wants to merge 13 commits intoopensearch-project:mainfrom
Conversation
| // since other collectors run every 5 sec, we can obtain an estimate | ||
| // of the exceptions they've thrown in the last ~10 runs or so. | ||
| int exceptionCount = StatsCollector.instance().exceptionCount(collector.getErrorMetric()); | ||
| if (!collector.inProgress() || exceptionCount > 5) { |
There was a problem hiding this comment.
We can do exceptionCount > 5 in the POC, but in the final implementation, this threshold should be configurable.
| */ | ||
| public class ServiceMetrics { | ||
| public static SampleAggregator READER_METRICS_AGGREGATOR, | ||
| STAT_METRICS_AGGREGATOR = new SampleAggregator(StatMetrics.values()), |
There was a problem hiding this comment.
Any specific reasons we removed those aggregators? Are those just for testing?
There was a problem hiding this comment.
These aggregators are unused in the entire Performance Analyzer codebase. They seem to be leftover from an earlier refactoring possibly involving RCA.
Do you think this is a change that would benefit from being its own PR?
and use them in deciding whether to mute a collector
caused by Mockito ref: https://stackoverflow.com/questions/68113065/mockito-inaccessible-object-exception-while-creating-a-mock-object To quote: "This can happen if Mockito requires reflective access to non-public parts in a Java module. you can get around this by explicitly allowing access via --add-opens in your java call"
invokePrivilegedAndLogError already exists for that purpose consider transitioning current uses to invokePrivilegedAndLogError this is important as the exception counting mechanism relies on exceptions not being swallowed earlier in call stack
by adding repetition options to runBehavior
89de19a to
7658b58
Compare
will file separate PR
|
What sort of configuration setting do you think would be appropriate for |
|
summary of changes so far:
|
Is your feature request related to a problem? Please provide an existing Issue # , or describe.
As discussed at opensearch-project/performance-analyzer#549
Describe the solution you are proposing
We use the exception counts maintained by
StatsCollectorto dynamically mute collectors based on the number of exceptions they throw in the last minute of sampling.Additional context
Currently in the proof of concept stage. If this is successful, further avenues of investigation may be:
Check List
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.