Search before asking
Motivation
Currently, we have topic load-related metrics like the followings:
topic_load_times{cluster="standalone",quantile="0.5"} 140.0
topic_load_times{cluster="standalone",quantile="0.75"} 183.0
topic_load_times{cluster="standalone",quantile="0.95"} 249.0
topic_load_times{cluster="standalone",quantile="0.99"} 249.0
topic_load_times{cluster="standalone",quantile="0.999"} 249.0
topic_load_times{cluster="standalone",quantile="0.9999"} 249.0
topic_load_times_count{cluster="standalone"} 6.0
topic_load_times_sum{cluster="standalone"} 955.0
topic_load_times_created{cluster="standalone"} 1.671240308864E9
But we are not able to detect if there are topics that failed to load due to
zookeeper/bookkeeper problems.
It's better to add new metrics for the topic load failed operation so that users
can add alerts based on the metrics.
Solution
Add topic_load_failed_count metrics
Alternatives
No response
Anything else?
The metrics changes requires a proposal
Are you willing to submit a PR?
Search before asking
Motivation
Currently, we have topic load-related metrics like the followings:
But we are not able to detect if there are topics that failed to load due to
zookeeper/bookkeeper problems.
It's better to add new metrics for the topic load failed operation so that users
can add alerts based on the metrics.
Solution
Add
topic_load_failed_countmetricsAlternatives
No response
Anything else?
The metrics changes requires a proposal
Are you willing to submit a PR?