Skip to content

admission: dynamically determine burstQualification#162604

Merged
trunk-io[bot] merged 1 commit intocockroachdb:masterfrom
joshimhoff:burst
Feb 24, 2026
Merged

admission: dynamically determine burstQualification#162604
trunk-io[bot] merged 1 commit intocockroachdb:masterfrom
joshimhoff:burst

Conversation

@joshimhoff
Copy link
Copy Markdown
Collaborator

admission: dynamically determine burstQualification

This commit introduces token buckets to CPU time token AC for dynamically determining the burstQualification of a tenant. If a tenant qualifies for burst, two things happen. First, in the WorkQueue, its work sorts above the work of tenants that don't qualify for burst priority. Second, it has access to more CPU time tokens than tenants that don't qualify (see cpu_time_token_granter.go for more). These two things are related: Since canBurst tenants have access to more CPU time than noBurst tenants, it is important that work from a canBurst tenant always sorts before work from a noBurst tenant -- else available capacity is left on the table.

Fixes: #155992

Release note: None.

@trunk-io
Copy link
Copy Markdown
Contributor

trunk-io Bot commented Feb 6, 2026

😎 Merged successfully - details.

@blathers-crl
Copy link
Copy Markdown

blathers-crl Bot commented Feb 6, 2026

Your pull request contains more than 1000 changes. It is strongly encouraged to split big PRs into smaller chunks.

🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf.

@cockroach-teamcity
Copy link
Copy Markdown
Member

This change is Reviewable

@joshimhoff joshimhoff marked this pull request as ready for review February 6, 2026 17:35
@joshimhoff joshimhoff requested a review from a team as a code owner February 6, 2026 17:35
@joshimhoff joshimhoff requested review from sumeerbhola and tbg February 6, 2026 17:35
Copy link
Copy Markdown
Collaborator

@sumeerbhola sumeerbhola left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sumeerbhola reviewed 7 files and all commit messages, and made 6 comments.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @joshimhoff and @tbg).


pkg/util/admission/work_queue.go line 298 at r1 (raw file):

		// Only used if mode == usesCPUTimeTokens.
		defaultCPUTimeTokenEstimator cpuTimeTokenEstimator
		// burstBucketCapacity is the capacity for newly created tenant burst

nit: ... capacity and tokens in newly ...

since we also start with the token bucket being full


or maybe not


pkg/util/admission/work_queue.go line 731 at r1 (raw file):

		// and get ahead of this request. We don't need to be fair for such
		// concurrent requests.
		if q.granter.tryGet(tenant.cpuTimeBurstBucket.burstQualification(), info.RequestedCount) {

can't reach into the tenant struct after releasing the mutex. Slight staleness is fine, in that we can grab the burstQualification value before releasing the mutex and then use it here.

Can you audit the other changes in this PR to confirm we aren't accessing tenant.cpuTimeBurstBucket without the mutex.


pkg/util/admission/work_queue.go line 1663 at r1 (raw file):

	// have the same effect as sorting on burstQualification, but the
	// source of truth of whether a tenant can burst is cpuTimeBurstBucket's
	// burstQualification method, so we must call it here.

nit: if you want to have the "Sorting on used ..." sentence can you move it to the next paragraph.

And consider starting it as: "A reader may wonder if sorting on used would have the same effect. It is indeed similar, but it is not the same. <the explanation on why, from what is stated here, plus our slack thread>"


pkg/util/admission/cpu_time_token_grant_coordinator.go line 131 at r1 (raw file):

		opts.mode = usesCPUTimeTokens
		requesters[tier] = makeWorkQueue(
			ambientCtx, KVWork, &childGranters[tier], settings, wqMetrics, opts)

Since the implementation of makeWorkQueue always constructs a WorkQueue it would have been better to make it return *WorkQueue instead of a requester, and avoid the type assertion below. I suppose that would require another wrapping layer since makeWorkQueue also matches makeRequesterFunc. Can you add a comment below stating that the type assertion is always valid since makeWorkQueue always returns a *WorkQueue.


pkg/util/admission/cpu_time_token_burst.go line 52 at r1 (raw file):

	// to refill are made.
	// TODO(josh): Need to think over this more, before merge. Sumeer, if
	// you have thoughts, lmk.

This seems odd. I would have expected it to be full.
What did the prototype do?


pkg/util/admission/cpu_time_token_burst.go line 59 at r1 (raw file):

// priority. See the comments above cpuTimeBurstBucket for more.
func (m *cpuTimeBurstBucket) burstQualification() burstQualification {
	if m.hardcodedRetValue != nil {

This burstQualification is on the hot path every time we deduct or add tokens. Having a testing related branch here doesn't seem right. Can we accomplish unit testing without this?

Copy link
Copy Markdown
Collaborator Author

@joshimhoff joshimhoff left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PTAL!

@joshimhoff made 7 comments and resolved 2 discussions.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on sumeerbhola and tbg).


pkg/util/admission/cpu_time_token_burst.go line 52 at r1 (raw file):

Previously, sumeerbhola wrote…

This seems odd. I would have expected it to be full.
What did the prototype do?

Prototype inits it full: https://github.com/cockroachdb/cockroach/compare/master...joshimhoff:cockroach:sumeer?expand=1#diff-7a12f96aeb4b5198ddb554aa3c54c611a44c70bcba2bc0b59b740e36595078b7R1679. I've adjusted code to init it full here too.


pkg/util/admission/cpu_time_token_burst.go line 59 at r1 (raw file):

Previously, sumeerbhola wrote…

This burstQualification is on the hot path every time we deduct or add tokens. Having a testing related branch here doesn't seem right. Can we accomplish unit testing without this?

Good point, done. Now the test manipulates the token bucket state via calls to refillBurstBuckets and Admit.


pkg/util/admission/cpu_time_token_grant_coordinator.go line 131 at r1 (raw file):

Previously, sumeerbhola wrote…

Since the implementation of makeWorkQueue always constructs a WorkQueue it would have been better to make it return *WorkQueue instead of a requester, and avoid the type assertion below. I suppose that would require another wrapping layer since makeWorkQueue also matches makeRequesterFunc. Can you add a comment below stating that the type assertion is always valid since makeWorkQueue always returns a *WorkQueue.

Done.


pkg/util/admission/work_queue.go line 298 at r1 (raw file):

Previously, sumeerbhola wrote…

nit: ... capacity and tokens in newly ...

since we also start with the token bucket being full


or maybe not

Done.


pkg/util/admission/work_queue.go line 731 at r1 (raw file):

Previously, sumeerbhola wrote…

can't reach into the tenant struct after releasing the mutex. Slight staleness is fine, in that we can grab the burstQualification value before releasing the mutex and then use it here.

Can you audit the other changes in this PR to confirm we aren't accessing tenant.cpuTimeBurstBucket without the mutex.

Apologies for missing it. I've fixed the bug. Thanks for catching it.

I've scrutinized the PR for other similar bugs & haven't found any. One thing that I had to think a bit about was the change to Less. used was previously accessed there, so makes sense that can call the burstQualification method too, but it isn't really documented. So far, it looks to me like all interaction with the tenantHeap do take out lock.

I'm gonna make a second pass tomorrow. Want to push out fixes today, ahead of making a second pass, to keep PR review moving.


pkg/util/admission/work_queue.go line 1663 at r1 (raw file):

Previously, sumeerbhola wrote…

nit: if you want to have the "Sorting on used ..." sentence can you move it to the next paragraph.

And consider starting it as: "A reader may wonder if sorting on used would have the same effect. It is indeed similar, but it is not the same. <the explanation on why, from what is stated here, plus our slack thread>"

Done.

Copy link
Copy Markdown
Member

@tbg tbg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just catching up. Mostly nits.

@tbg reviewed 7 files and all commit messages, and made 5 comments.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on joshimhoff and sumeerbhola).


pkg/util/admission/cpu_time_token_burst.go line 63 at r2 (raw file):

	// call to refillBurstBuckets is made by cpuTimeTokenAllocator. For this
	// 1ms period, we don't allow any tenants to burst.
	if m.capacity == 0 {

nit: won't m.tokens also be zero, and wouldn't we return noBurst from this method regardless even without this explicit zero check? It also doesn't seem to particularly matter what happens during the first millisecond.


pkg/util/admission/cpu_time_token_burst.go line 95 at r2 (raw file):

	m.adjust(toAdd)
	min := -m.capacity / 4
	if m.tokens < min {

nit: m.tokens = max(m.tokens, -m.capacity / 4)


pkg/util/admission/cpu_time_token_burst.go line 100 at r2 (raw file):

}

func (m *cpuTimeBurstBucket) String() string {

Implement SafeFormatter and string through it please as seen here.


pkg/util/admission/work_queue.go line 1656 at r2 (raw file):

}

func (th *tenantHeap) Less(i, j int) bool {

nit: there's probably a way to rewrite this neatly using cmp.Compare and cmp.Or as in

return cmp.Or(cmp.Compare(a.sls, b.sls),
cmp.Compare(a.leasePreferenceIndex, b.leasePreferenceIndex),
cmp.Compare(a.StoreID, b.StoreID))
}
.

Copy link
Copy Markdown
Collaborator Author

@joshimhoff joshimhoff left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@joshimhoff made 4 comments and resolved 2 discussions.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on sumeerbhola and tbg).


pkg/util/admission/cpu_time_token_burst.go line 63 at r2 (raw file):

Previously, tbg (Tobias Grieger) wrote…

nit: won't m.tokens also be zero, and wouldn't we return noBurst from this method regardless even without this explicit zero check? It also doesn't seem to particularly matter what happens during the first millisecond.

Good point. Removed check & adjusted comment.


pkg/util/admission/cpu_time_token_burst.go line 95 at r2 (raw file):

Previously, tbg (Tobias Grieger) wrote…

nit: m.tokens = max(m.tokens, -m.capacity / 4)

Done.


pkg/util/admission/cpu_time_token_burst.go line 100 at r2 (raw file):

Previously, tbg (Tobias Grieger) wrote…

Implement SafeFormatter and string through it please as seen here.

Done. Thanks for the pointer.


pkg/util/admission/work_queue.go line 1656 at r2 (raw file):

Previously, tbg (Tobias Grieger) wrote…

nit: there's probably a way to rewrite this neatly using cmp.Compare and cmp.Or as in

return cmp.Or(cmp.Compare(a.sls, b.sls),
cmp.Compare(a.leasePreferenceIndex, b.leasePreferenceIndex),
cmp.Compare(a.StoreID, b.StoreID))
}
.

Nice idea. Done.

Copy link
Copy Markdown
Collaborator Author

@joshimhoff joshimhoff left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@joshimhoff made 1 comment.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on sumeerbhola and tbg).


pkg/util/admission/work_queue.go line 731 at r1 (raw file):

Previously, joshimhoff (Josh Imhoff) wrote…

Apologies for missing it. I've fixed the bug. Thanks for catching it.

I've scrutinized the PR for other similar bugs & haven't found any. One thing that I had to think a bit about was the change to Less. used was previously accessed there, so makes sense that can call the burstQualification method too, but it isn't really documented. So far, it looks to me like all interaction with the tenantHeap do take out lock.

I'm gonna make a second pass tomorrow. Want to push out fixes today, ahead of making a second pass, to keep PR review moving.

I see now that this is documented. tenantHeap is accessed via q.mu.tenantHeap.

Copy link
Copy Markdown
Collaborator

@sumeerbhola sumeerbhola left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

flushing a couple of small comments. I need to look at the tests. Rest looks good.

@sumeerbhola reviewed 4 files and all commit messages, and made 3 comments.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on joshimhoff and tbg).


pkg/util/admission/work_queue.go line 1663 at r1 (raw file):

Previously, joshimhoff (Josh Imhoff) wrote…

Done.

Unless all these cmp calls inline, i am not sure this change is a good idea. Also using cmp.Or means all the cmp.Compares will be evaluated and there is no possibility of early return.
The reason I am paranoid is that the heap is adjusted on every token acquisition and return, and there is one WorkQueue for cpu for the whole node with a mutex. Every small increase in work done under that mutex can increase contention.

We sometimes hand inline code for very performance sensitive paths (e.g. in Pebble), and this seems to warrant the same care.


pkg/util/admission/cpu_time_token_grant_coordinator.go line 136 at r4 (raw file):

		// returns a *WorkQueue.
		// TODO(josh): Consider having makeWorkQueue return *WorkQueue, instead
		// of requester.

you can remove this TODO. It isn't important.

Copy link
Copy Markdown
Collaborator

@sumeerbhola sumeerbhola left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sumeerbhola reviewed 2 files and made 6 comments.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on joshimhoff and tbg).


pkg/util/admission/work_queue.go line 729 at r4 (raw file):

		// Optimistically update used to avoid locking again.
		q.adjustTenantUsedLocked(tenant, info.RequestedCount)
		burstQual := tenant.cpuTimeBurstBucket.burstQualification()

we should evaluate the burstQual value before calling adjustTenantUsedLocked, since the qualification is based on what it has used prior to this request.


pkg/util/admission/testdata/cpu_time_token_work_queue line 94 at r4 (raw file):

admit id=1000 tenant=54 requested-count=100 bypass=false
----
tryGet: input noBurst, returning true

See the code comment elsewhere. This should qualify for the burst at this point.


pkg/util/admission/testdata/cpu_time_token_work_queue line 221 at r4 (raw file):

# Note that we already have TestCPUTimeTokenBurst for testing that
# the burst buckets correctly categorize tenants into canBurst and
# noBurst. This test is an integration test instead. It ensures that

despite this comment, how about having one transition to noBurst in this test.


pkg/util/admission/work_queue_test.go line 457 at r4 (raw file):

				tenant := scanTenantID(t, d)
				var bypass bool
				d.ScanArgs(t, "bypass", &bypass)

nit: you could use d.HasArg to make bypass be false by default.


pkg/util/admission/work_queue_test.go line 788 at r4 (raw file):

	_, err = q.Admit(ctx, info1)
	require.NoError(t, err)
	checkBurstQual(canBurst)

can we reach in and check the exact token counts, so we don't have to solely believe the counts based on the comments?


pkg/util/admission/work_queue_test.go line 836 at r4 (raw file):

	// First refill will cap tenant 1 bucket at the min, which is
	// -bucketCapacity/4 = -100/4 = -25. Second refill adds 117, so

This -25 case would also benefit from explicitly checking the token counts, since noBurst can happen for many reasons.

Copy link
Copy Markdown
Collaborator Author

@joshimhoff joshimhoff left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PTAL!

@joshimhoff made 9 comments and resolved 1 discussion.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on sumeerbhola and tbg).


pkg/util/admission/cpu_time_token_grant_coordinator.go line 136 at r4 (raw file):

Previously, sumeerbhola wrote…

you can remove this TODO. It isn't important.

Done.


pkg/util/admission/work_queue.go line 1663 at r1 (raw file):

Previously, sumeerbhola wrote…

Unless all these cmp calls inline, i am not sure this change is a good idea. Also using cmp.Or means all the cmp.Compares will be evaluated and there is no possibility of early return.
The reason I am paranoid is that the heap is adjusted on every token acquisition and return, and there is one WorkQueue for cpu for the whole node with a mutex. Every small increase in work done under that mutex can increase contention.

We sometimes hand inline code for very performance sensitive paths (e.g. in Pebble), and this seems to warrant the same care.

Given the concern about early return, I have decided to revert to the earlier version.


pkg/util/admission/work_queue.go line 729 at r4 (raw file):

Previously, sumeerbhola wrote…

we should evaluate the burstQual value before calling adjustTenantUsedLocked, since the qualification is based on what it has used prior to this request.

Good point. Done.


pkg/util/admission/testdata/cpu_time_token_work_queue line 94 at r4 (raw file):

Previously, sumeerbhola wrote…

See the code comment elsewhere. This should qualify for the burst at this point.

Fixed.


pkg/util/admission/testdata/cpu_time_token_work_queue line 221 at r4 (raw file):

Previously, sumeerbhola wrote…

despite this comment, how about having one transition to noBurst in this test.

Done.


pkg/util/admission/work_queue_test.go line 457 at r4 (raw file):

Previously, sumeerbhola wrote…

nit: you could use d.HasArg to make bypass be false by default.

Done.


pkg/util/admission/work_queue_test.go line 788 at r4 (raw file):

Previously, sumeerbhola wrote…

can we reach in and check the exact token counts, so we don't have to solely believe the counts based on the comments?

Ya good idea. I've made this test a case in the data-driven test I had, since then the token counts get printed automatically. Not sure why I didn't do that to start.


pkg/util/admission/work_queue_test.go line 836 at r4 (raw file):

Previously, sumeerbhola wrote…

This -25 case would also benefit from explicitly checking the token counts, since noBurst can happen for many reasons.

Done.

@joshimhoff joshimhoff force-pushed the burst branch 2 times, most recently from 662bbf7 to 7b01094 Compare February 19, 2026 21:25
Copy link
Copy Markdown
Member

@tbg tbg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm: but I didn't look at the tests and I didn't review this closely enough to spot subtle bugs. @sumeerbhola's approval is the one that counts 😄

@tbg partially reviewed 9 files and all commit messages, made 2 comments, and resolved 2 discussions.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on sumeerbhola).


pkg/util/admission/work_queue.go line 1663 at r1 (raw file):

Previously, joshimhoff (Josh Imhoff) wrote…

Given the concern about early return, I have decided to revert to the earlier version.

Oh, those are good points. Sorry about the detour!

@joshimhoff
Copy link
Copy Markdown
Collaborator Author

joshimhoff commented Feb 20, 2026

Thanks for the review, Tobias! Sumeer, I'm still fixing a few race bugs in the test itself that show up in CI, but I don't think needs to block review. I'm aware of them tho & will fix before merge.

Copy link
Copy Markdown
Collaborator

@sumeerbhola sumeerbhola left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

@sumeerbhola reviewed 7 files and all commit messages, and made 1 comment.
Reviewable status: :shipit: complete! 2 of 0 LGTMs obtained (waiting on joshimhoff).

@joshimhoff joshimhoff force-pushed the burst branch 3 times, most recently from 1cbaf38 to a273998 Compare February 24, 2026 01:03
This commit introduces token buckets to CPU time token AC
for dynamically determining the burstQualification of a
tenant. If a tenant qualifies for burst, two things happen.
First, in the WorkQueue, its work sorts above the work of
tenants that don't qualify for burst priority. Second, it
has access to more CPU time tokens than tenants that don't
qualify (see cpu_time_token_granter.go for more). These
two things are related: Since canBurst tenants have access
to more CPU time than noBurst tenants, it is important that
work from a canBurst tenant always sorts before work from a
noBurst tenant -- else available capacity is left on the
table.

Fixes: cockroachdb#155992

Release note: None.
@joshimhoff
Copy link
Copy Markdown
Collaborator Author

bors r=sumeerbhola

@blathers-crl
Copy link
Copy Markdown

blathers-crl Bot commented Feb 24, 2026

Bors is deprecated. Please use /trunk merge instead.

@joshimhoff
Copy link
Copy Markdown
Collaborator Author

/trunk merge

@trunk-io trunk-io Bot merged commit dd8a593 into cockroachdb:master Feb 24, 2026
25 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

admission: dynamically determine burstQualification based on tenant-scoped CTT buckets

4 participants