admission: dynamically determine burstQualification#162604
admission: dynamically determine burstQualification#162604trunk-io[bot] merged 1 commit intocockroachdb:masterfrom
Conversation
|
😎 Merged successfully - details. |
|
Your pull request contains more than 1000 changes. It is strongly encouraged to split big PRs into smaller chunks. 🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf. |
sumeerbhola
left a comment
There was a problem hiding this comment.
@sumeerbhola reviewed 7 files and all commit messages, and made 6 comments.
Reviewable status:complete! 0 of 0 LGTMs obtained (waiting on @joshimhoff and @tbg).
pkg/util/admission/work_queue.go line 298 at r1 (raw file):
// Only used if mode == usesCPUTimeTokens. defaultCPUTimeTokenEstimator cpuTimeTokenEstimator // burstBucketCapacity is the capacity for newly created tenant burst
nit: ... capacity and tokens in newly ...
since we also start with the token bucket being full
or maybe not
pkg/util/admission/work_queue.go line 731 at r1 (raw file):
// and get ahead of this request. We don't need to be fair for such // concurrent requests. if q.granter.tryGet(tenant.cpuTimeBurstBucket.burstQualification(), info.RequestedCount) {
can't reach into the tenant struct after releasing the mutex. Slight staleness is fine, in that we can grab the burstQualification value before releasing the mutex and then use it here.
Can you audit the other changes in this PR to confirm we aren't accessing tenant.cpuTimeBurstBucket without the mutex.
pkg/util/admission/work_queue.go line 1663 at r1 (raw file):
// have the same effect as sorting on burstQualification, but the // source of truth of whether a tenant can burst is cpuTimeBurstBucket's // burstQualification method, so we must call it here.
nit: if you want to have the "Sorting on used ..." sentence can you move it to the next paragraph.
And consider starting it as: "A reader may wonder if sorting on used would have the same effect. It is indeed similar, but it is not the same. <the explanation on why, from what is stated here, plus our slack thread>"
pkg/util/admission/cpu_time_token_grant_coordinator.go line 131 at r1 (raw file):
opts.mode = usesCPUTimeTokens requesters[tier] = makeWorkQueue( ambientCtx, KVWork, &childGranters[tier], settings, wqMetrics, opts)
Since the implementation of makeWorkQueue always constructs a WorkQueue it would have been better to make it return *WorkQueue instead of a requester, and avoid the type assertion below. I suppose that would require another wrapping layer since makeWorkQueue also matches makeRequesterFunc. Can you add a comment below stating that the type assertion is always valid since makeWorkQueue always returns a *WorkQueue.
pkg/util/admission/cpu_time_token_burst.go line 52 at r1 (raw file):
// to refill are made. // TODO(josh): Need to think over this more, before merge. Sumeer, if // you have thoughts, lmk.
This seems odd. I would have expected it to be full.
What did the prototype do?
pkg/util/admission/cpu_time_token_burst.go line 59 at r1 (raw file):
// priority. See the comments above cpuTimeBurstBucket for more. func (m *cpuTimeBurstBucket) burstQualification() burstQualification { if m.hardcodedRetValue != nil {
This burstQualification is on the hot path every time we deduct or add tokens. Having a testing related branch here doesn't seem right. Can we accomplish unit testing without this?
joshimhoff
left a comment
There was a problem hiding this comment.
PTAL!
@joshimhoff made 7 comments and resolved 2 discussions.
Reviewable status:complete! 0 of 0 LGTMs obtained (waiting on sumeerbhola and tbg).
pkg/util/admission/cpu_time_token_burst.go line 52 at r1 (raw file):
Previously, sumeerbhola wrote…
This seems odd. I would have expected it to be full.
What did the prototype do?
Prototype inits it full: https://github.com/cockroachdb/cockroach/compare/master...joshimhoff:cockroach:sumeer?expand=1#diff-7a12f96aeb4b5198ddb554aa3c54c611a44c70bcba2bc0b59b740e36595078b7R1679. I've adjusted code to init it full here too.
pkg/util/admission/cpu_time_token_burst.go line 59 at r1 (raw file):
Previously, sumeerbhola wrote…
This
burstQualificationis on the hot path every time we deduct or add tokens. Having a testing related branch here doesn't seem right. Can we accomplish unit testing without this?
Good point, done. Now the test manipulates the token bucket state via calls to refillBurstBuckets and Admit.
pkg/util/admission/cpu_time_token_grant_coordinator.go line 131 at r1 (raw file):
Previously, sumeerbhola wrote…
Since the implementation of
makeWorkQueuealways constructs aWorkQueueit would have been better to make it return*WorkQueueinstead of arequester, and avoid the type assertion below. I suppose that would require another wrapping layer sincemakeWorkQueuealso matchesmakeRequesterFunc. Can you add a comment below stating that the type assertion is always valid sincemakeWorkQueuealways returns a*WorkQueue.
Done.
pkg/util/admission/work_queue.go line 298 at r1 (raw file):
Previously, sumeerbhola wrote…
nit: ... capacity and tokens in newly ...
since we also start with the token bucket being full
or maybe not
Done.
pkg/util/admission/work_queue.go line 731 at r1 (raw file):
Previously, sumeerbhola wrote…
can't reach into the
tenantstruct after releasing the mutex. Slight staleness is fine, in that we can grab the burstQualification value before releasing the mutex and then use it here.Can you audit the other changes in this PR to confirm we aren't accessing
tenant.cpuTimeBurstBucketwithout the mutex.
Apologies for missing it. I've fixed the bug. Thanks for catching it.
I've scrutinized the PR for other similar bugs & haven't found any. One thing that I had to think a bit about was the change to Less. used was previously accessed there, so makes sense that can call the burstQualification method too, but it isn't really documented. So far, it looks to me like all interaction with the tenantHeap do take out lock.
I'm gonna make a second pass tomorrow. Want to push out fixes today, ahead of making a second pass, to keep PR review moving.
pkg/util/admission/work_queue.go line 1663 at r1 (raw file):
Previously, sumeerbhola wrote…
nit: if you want to have the "Sorting on used ..." sentence can you move it to the next paragraph.
And consider starting it as: "A reader may wonder if sorting on used would have the same effect. It is indeed similar, but it is not the same. <the explanation on why, from what is stated here, plus our slack thread>"
Done.
tbg
left a comment
There was a problem hiding this comment.
Just catching up. Mostly nits.
@tbg reviewed 7 files and all commit messages, and made 5 comments.
Reviewable status:complete! 0 of 0 LGTMs obtained (waiting on joshimhoff and sumeerbhola).
pkg/util/admission/cpu_time_token_burst.go line 63 at r2 (raw file):
// call to refillBurstBuckets is made by cpuTimeTokenAllocator. For this // 1ms period, we don't allow any tenants to burst. if m.capacity == 0 {
nit: won't m.tokens also be zero, and wouldn't we return noBurst from this method regardless even without this explicit zero check? It also doesn't seem to particularly matter what happens during the first millisecond.
pkg/util/admission/cpu_time_token_burst.go line 95 at r2 (raw file):
m.adjust(toAdd) min := -m.capacity / 4 if m.tokens < min {
nit: m.tokens = max(m.tokens, -m.capacity / 4)
pkg/util/admission/cpu_time_token_burst.go line 100 at r2 (raw file):
} func (m *cpuTimeBurstBucket) String() string {
Implement SafeFormatter and string through it please as seen here.
pkg/util/admission/work_queue.go line 1656 at r2 (raw file):
} func (th *tenantHeap) Less(i, j int) bool {
nit: there's probably a way to rewrite this neatly using cmp.Compare and cmp.Or as in
cockroach/pkg/kv/kvserver/allocator/mmaprototype/allocator_state.go
Lines 554 to 557 in 16cc43f
joshimhoff
left a comment
There was a problem hiding this comment.
@joshimhoff made 4 comments and resolved 2 discussions.
Reviewable status:complete! 0 of 0 LGTMs obtained (waiting on sumeerbhola and tbg).
pkg/util/admission/cpu_time_token_burst.go line 63 at r2 (raw file):
Previously, tbg (Tobias Grieger) wrote…
nit: won't
m.tokensalso be zero, and wouldn't we returnnoBurstfrom this method regardless even without this explicit zero check? It also doesn't seem to particularly matter what happens during the first millisecond.
Good point. Removed check & adjusted comment.
pkg/util/admission/cpu_time_token_burst.go line 95 at r2 (raw file):
Previously, tbg (Tobias Grieger) wrote…
nit:
m.tokens = max(m.tokens, -m.capacity / 4)
Done.
pkg/util/admission/cpu_time_token_burst.go line 100 at r2 (raw file):
Previously, tbg (Tobias Grieger) wrote…
Implement
SafeFormatterandstringthrough it please as seen here.
Done. Thanks for the pointer.
pkg/util/admission/work_queue.go line 1656 at r2 (raw file):
Previously, tbg (Tobias Grieger) wrote…
nit: there's probably a way to rewrite this neatly using
cmp.Compareandcmp.Oras in.cockroach/pkg/kv/kvserver/allocator/mmaprototype/allocator_state.go
Lines 554 to 557 in 16cc43f
Nice idea. Done.
joshimhoff
left a comment
There was a problem hiding this comment.
@joshimhoff made 1 comment.
Reviewable status:complete! 0 of 0 LGTMs obtained (waiting on sumeerbhola and tbg).
pkg/util/admission/work_queue.go line 731 at r1 (raw file):
Previously, joshimhoff (Josh Imhoff) wrote…
Apologies for missing it. I've fixed the bug. Thanks for catching it.
I've scrutinized the PR for other similar bugs & haven't found any. One thing that I had to think a bit about was the change to
Less.usedwas previously accessed there, so makes sense that can call theburstQualificationmethod too, but it isn't really documented. So far, it looks to me like all interaction with thetenantHeapdo take out lock.I'm gonna make a second pass tomorrow. Want to push out fixes today, ahead of making a second pass, to keep PR review moving.
I see now that this is documented. tenantHeap is accessed via q.mu.tenantHeap.
sumeerbhola
left a comment
There was a problem hiding this comment.
flushing a couple of small comments. I need to look at the tests. Rest looks good.
@sumeerbhola reviewed 4 files and all commit messages, and made 3 comments.
Reviewable status:complete! 0 of 0 LGTMs obtained (waiting on joshimhoff and tbg).
pkg/util/admission/work_queue.go line 1663 at r1 (raw file):
Previously, joshimhoff (Josh Imhoff) wrote…
Done.
Unless all these cmp calls inline, i am not sure this change is a good idea. Also using cmp.Or means all the cmp.Compares will be evaluated and there is no possibility of early return.
The reason I am paranoid is that the heap is adjusted on every token acquisition and return, and there is one WorkQueue for cpu for the whole node with a mutex. Every small increase in work done under that mutex can increase contention.
We sometimes hand inline code for very performance sensitive paths (e.g. in Pebble), and this seems to warrant the same care.
pkg/util/admission/cpu_time_token_grant_coordinator.go line 136 at r4 (raw file):
// returns a *WorkQueue. // TODO(josh): Consider having makeWorkQueue return *WorkQueue, instead // of requester.
you can remove this TODO. It isn't important.
sumeerbhola
left a comment
There was a problem hiding this comment.
@sumeerbhola reviewed 2 files and made 6 comments.
Reviewable status:complete! 0 of 0 LGTMs obtained (waiting on joshimhoff and tbg).
pkg/util/admission/work_queue.go line 729 at r4 (raw file):
// Optimistically update used to avoid locking again. q.adjustTenantUsedLocked(tenant, info.RequestedCount) burstQual := tenant.cpuTimeBurstBucket.burstQualification()
we should evaluate the burstQual value before calling adjustTenantUsedLocked, since the qualification is based on what it has used prior to this request.
pkg/util/admission/testdata/cpu_time_token_work_queue line 94 at r4 (raw file):
admit id=1000 tenant=54 requested-count=100 bypass=false ---- tryGet: input noBurst, returning true
See the code comment elsewhere. This should qualify for the burst at this point.
pkg/util/admission/testdata/cpu_time_token_work_queue line 221 at r4 (raw file):
# Note that we already have TestCPUTimeTokenBurst for testing that # the burst buckets correctly categorize tenants into canBurst and # noBurst. This test is an integration test instead. It ensures that
despite this comment, how about having one transition to noBurst in this test.
pkg/util/admission/work_queue_test.go line 457 at r4 (raw file):
tenant := scanTenantID(t, d) var bypass bool d.ScanArgs(t, "bypass", &bypass)
nit: you could use d.HasArg to make bypass be false by default.
pkg/util/admission/work_queue_test.go line 788 at r4 (raw file):
_, err = q.Admit(ctx, info1) require.NoError(t, err) checkBurstQual(canBurst)
can we reach in and check the exact token counts, so we don't have to solely believe the counts based on the comments?
pkg/util/admission/work_queue_test.go line 836 at r4 (raw file):
// First refill will cap tenant 1 bucket at the min, which is // -bucketCapacity/4 = -100/4 = -25. Second refill adds 117, so
This -25 case would also benefit from explicitly checking the token counts, since noBurst can happen for many reasons.
joshimhoff
left a comment
There was a problem hiding this comment.
PTAL!
@joshimhoff made 9 comments and resolved 1 discussion.
Reviewable status:complete! 0 of 0 LGTMs obtained (waiting on sumeerbhola and tbg).
pkg/util/admission/cpu_time_token_grant_coordinator.go line 136 at r4 (raw file):
Previously, sumeerbhola wrote…
you can remove this TODO. It isn't important.
Done.
pkg/util/admission/work_queue.go line 1663 at r1 (raw file):
Previously, sumeerbhola wrote…
Unless all these cmp calls inline, i am not sure this change is a good idea. Also using cmp.Or means all the cmp.Compares will be evaluated and there is no possibility of early return.
The reason I am paranoid is that the heap is adjusted on every token acquisition and return, and there is one WorkQueue for cpu for the whole node with a mutex. Every small increase in work done under that mutex can increase contention.We sometimes hand inline code for very performance sensitive paths (e.g. in Pebble), and this seems to warrant the same care.
Given the concern about early return, I have decided to revert to the earlier version.
pkg/util/admission/work_queue.go line 729 at r4 (raw file):
Previously, sumeerbhola wrote…
we should evaluate the burstQual value before calling adjustTenantUsedLocked, since the qualification is based on what it has used prior to this request.
Good point. Done.
pkg/util/admission/testdata/cpu_time_token_work_queue line 94 at r4 (raw file):
Previously, sumeerbhola wrote…
See the code comment elsewhere. This should qualify for the burst at this point.
Fixed.
pkg/util/admission/testdata/cpu_time_token_work_queue line 221 at r4 (raw file):
Previously, sumeerbhola wrote…
despite this comment, how about having one transition to noBurst in this test.
Done.
pkg/util/admission/work_queue_test.go line 457 at r4 (raw file):
Previously, sumeerbhola wrote…
nit: you could use
d.HasArgto make bypass be false by default.
Done.
pkg/util/admission/work_queue_test.go line 788 at r4 (raw file):
Previously, sumeerbhola wrote…
can we reach in and check the exact token counts, so we don't have to solely believe the counts based on the comments?
Ya good idea. I've made this test a case in the data-driven test I had, since then the token counts get printed automatically. Not sure why I didn't do that to start.
pkg/util/admission/work_queue_test.go line 836 at r4 (raw file):
Previously, sumeerbhola wrote…
This -25 case would also benefit from explicitly checking the token counts, since
noBurstcan happen for many reasons.
Done.
662bbf7 to
7b01094
Compare
tbg
left a comment
There was a problem hiding this comment.
but I didn't look at the tests and I didn't review this closely enough to spot subtle bugs. @sumeerbhola's approval is the one that counts 😄
@tbg partially reviewed 9 files and all commit messages, made 2 comments, and resolved 2 discussions.
Reviewable status:complete! 1 of 0 LGTMs obtained (waiting on sumeerbhola).
pkg/util/admission/work_queue.go line 1663 at r1 (raw file):
Previously, joshimhoff (Josh Imhoff) wrote…
Given the concern about early return, I have decided to revert to the earlier version.
Oh, those are good points. Sorry about the detour!
|
Thanks for the review, Tobias! Sumeer, I'm still fixing a few race bugs in the test itself that show up in CI, but I don't think needs to block review. I'm aware of them tho & will fix before merge. |
sumeerbhola
left a comment
There was a problem hiding this comment.
@sumeerbhola reviewed 7 files and all commit messages, and made 1 comment.
Reviewable status:complete! 2 of 0 LGTMs obtained (waiting on joshimhoff).
1cbaf38 to
a273998
Compare
This commit introduces token buckets to CPU time token AC for dynamically determining the burstQualification of a tenant. If a tenant qualifies for burst, two things happen. First, in the WorkQueue, its work sorts above the work of tenants that don't qualify for burst priority. Second, it has access to more CPU time tokens than tenants that don't qualify (see cpu_time_token_granter.go for more). These two things are related: Since canBurst tenants have access to more CPU time than noBurst tenants, it is important that work from a canBurst tenant always sorts before work from a noBurst tenant -- else available capacity is left on the table. Fixes: cockroachdb#155992 Release note: None.
|
bors r=sumeerbhola |
|
Bors is deprecated. Please use |
|
/trunk merge |
admission: dynamically determine burstQualification
This commit introduces token buckets to CPU time token AC for dynamically determining the burstQualification of a tenant. If a tenant qualifies for burst, two things happen. First, in the WorkQueue, its work sorts above the work of tenants that don't qualify for burst priority. Second, it has access to more CPU time tokens than tenants that don't qualify (see cpu_time_token_granter.go for more). These two things are related: Since canBurst tenants have access to more CPU time than noBurst tenants, it is important that work from a canBurst tenant always sorts before work from a noBurst tenant -- else available capacity is left on the table.
Fixes: #155992
Release note: None.