-
Notifications
You must be signed in to change notification settings - Fork 10
perf(monitor): avoid querying the same thing twice, avoid seq scans #82
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
| end | ||
|
|
||
| def run! | ||
| @memo = {} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe a bit of a stretch, but this makes me think that a separate class that can calc the metrics might be helpful, and this class is more in charge of the looping/sleeping mechanics. then we don't need this overwritable internal state, we just new up a new metric-calc-class during each call to run!. I see that Runnable handles some of this, but i guess organizing it that way means we have this notion of resettable state which is a bit smelly
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, agreed -- I was also starting to think about how we could reimagine Runnable to take on less of the whole lifecycle of Delayed::Monitor and Delayed::Worker kinds of classes.
If I keep pulling on that thread, I imagine that this smelliness will go away, but I didn't want to refactor too much within this one PR.
effron
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just a non blocking thoughts, otherwise changes look good!
domainLGTM
Worth noting that when a scan covers the entire table, **a sequential scan might actually be better than an index scan.** However, since the monitor already produces a count of failed jobs, we can memoize that result to avoid doing that work twice. Furthermore, in postgres at least, these don't just become index scans -- they become _INDEX ONLY_ scans, which should be cheaper than table scans since the index should be significantly more compact!
These concepts are now synonymous. I went as far as removing the `working` scope because it's not even truthful. (We don't know whether it's actively working, we only know that it was claimed by a worker and that the claim hasn't yet expired.)
c0ce18d to
f9b904a
Compare
| -> Sort (cost=...) | ||
| Output: (CASE WHEN ((priority >= 0) AND (priority < 10)) THEN 0 WHEN ((priority >= 10) AND (priority < 20)) THEN 10 WHEN ((priority >= 20) AND (priority < 30)) THEN 20 WHEN (priority >= 30) THEN 30 ELSE NULL::integer END), queue | ||
| Sort Key: (CASE WHEN ((delayed_jobs.priority >= 0) AND (delayed_jobs.priority < 10)) THEN 0 WHEN ((delayed_jobs.priority >= 10) AND (delayed_jobs.priority < 20)) THEN 10 WHEN ((delayed_jobs.priority >= 20) AND (delayed_jobs.priority < 30)) THEN 20 WHEN (delayed_jobs.priority >= 30) THEN 30 ELSE NULL::integer END), delayed_jobs.queue | ||
| -> Seq Scan on public.delayed_jobs (cost=...) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The [legacy index] case gets two Seq Scans instead of one. However, in practice, the @memo means that the monitor is still doing less work, as one of these Seq Scans is identical to a seq scan it already had to do to count failed rows.
effron
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
domainLGTM
This does two things:
locked_countandworking_countseparately (since they are synonymous)countmetric into two:failed_countandlive_countWorth noting that when a scan covers the entire table, a sequential scan might actually be better than an index scan. However, since the monitor already produces a count of failed jobs, we can memoize that result to avoid doing that work twice. (Furthermore, in postgres at least, these don't just become index scans -- they become INDEX ONLY scans, which should be cheaper than table scans since the index should be significantly more compact!)
/no-platform