Bug 2007542 - Rate limit error reports#7139
Conversation
afa5945 to
4cec688
Compare
skhamis
left a comment
There was a problem hiding this comment.
Really nice! This should help cut how often we send (but still get us what we need). Did have a few nits/comments/suggestions, at least the comment updates but don't need to block on em. r+!
| static GLOBALS: Mutex<Globals> = Mutex::new(Globals::new()); | ||
|
|
||
| pub fn report_error_to_app(type_name: String, message: String) { | ||
| let mut globals = GLOBALS.lock(); |
There was a problem hiding this comment.
nit: Maybe we should do all the lock work together? Something like:
let breadcrumbs = {
let mut globals = GLOBALS.lock();
if !globals.rate_limiter.should_send_report(&type_name, Instant::now()) {
return;
}
globals.breadcrumbs.get_breadcrumbs()
};
let breadcrumbs = breadcrumbs.join("\n");
tracing_support::error!(target: "app-services-error-reporter::error", message, type_name, breadcrumbs);
}| breadcrumb[0..split_point].to_string() | ||
| } | ||
|
|
||
| /// Rate-limits error reports by type to 20 / hour |
There was a problem hiding this comment.
Just to make it more clear (based on the tests):
Rate-limits error reports per component by type to a max of 20/hr
| // The first error report is okay | ||
| assert!(rate_limiter.should_send_report("test-type", start)); | ||
| // The report should be rate limited until 3 minutes pass, then we can send another one. | ||
| // Subtract time from the instant to simulate time going forward. |
42c546b to
c41e0f4
Compare
| } | ||
| // For all other cases, fall through and allow the report to be sent. | ||
| // | ||
| // Note: this also covers the `None` case which happens when the clock is |
There was a problem hiding this comment.
@bendk This is really in the weeds but out of (morbid) curiousity are we expecting the time since the last report was checked to be wall clock time more often than not?
There was a problem hiding this comment.
Yes I think so. I think this could get slightly messed up when the OS syncs with NTP or if users manually adjust their clocks, but that should be very marginal. If that happens, maybe an error report won't will be sent, but we should get back into a good state soon after.
There was a problem hiding this comment.
Yeah, no big deal. Worse case scenario we amend this logic. Thanks for humoring me and for calling this out in the comments.
There was a problem hiding this comment.
I thought that was a great call-out and future me is probably going to have the same question . I updated the docs to explain things a bit better.
c41e0f4 to
83a8955
Compare
Pull Request checklist
[ci full]to the PR title.