feat(capman): Add DynamicConcurrentQueries allocation policy#8022
Open
phacops wants to merge 1 commit into
Open
feat(capman): Add DynamicConcurrentQueries allocation policy#8022phacops wants to merge 1 commit into
phacops wants to merge 1 commit into
Conversation
Add an allocation policy that lets every org run queries while the cluster
has spare capacity, and sheds load from the heaviest orgs once it approaches
its sustainable limit.
The policy tracks concurrent queries in two Redis buckets: a global bucket
(per storage, i.e. per ClickHouse cluster) and a per-org bucket. While the
global concurrent count is at or below the global soft limit, all queries for
all orgs are allowed. Above it, the effective per-org ceiling shrinks
proportionally to global pressure
effective_limit = max(1, floor(per_org_soft_limit * global_soft_limit / global_concurrent))
so the orgs furthest above their per-org soft limit are rejected first, with
progressively more orgs shed as the cluster gets busier. Small orgs keep
running (an org may always run at least one query), focusing the impact on the
heaviest senders. Per-org soft limits can be overridden per organization.
Wire the policy into the shared EAP routing strategy allocation-policy list so
it runs in the CBRS path alongside the existing policies, dormant by default
(is_active=0, is_enforced=0) to match the staged-rollout pattern. It composes
with regular allocation policies via the existing combine logic (can_run =
all, max_threads = min) and is auto-discovered by snuba-admin capacity
management for both routing strategies and regular storages.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Agent transcript: https://claudescope.sentry.dev/share/j-IJ2ngiYfk7NnJwDS2CYpZ2HFZu6AmUg4c58q0Xbc8
MeredithAnya
approved these changes
Jun 12, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Add an allocation policy that lets every org run queries while the cluster has spare capacity, and progressively sheds load from the heaviest orgs as it approaches its sustainable concurrent-query limit. It is governed by two soft limits — a global one and a per-org one — rather than a single hard per-tenant cap.
Shedding behavior
Concurrent queries are tracked in two Redis buckets: a global bucket (scoped per storage, i.e. per ClickHouse cluster) and a per-org bucket. While the global concurrent count is at or below
global_soft_limit, every query for every org runs. Once it crosses the soft limit, the effective per-org ceiling shrinks proportionally to global pressure:So the orgs furthest above their per-org soft limit are rejected first, with more orgs shed as the cluster gets busier. An org may always run at least one query, keeping the impact focused on the heaviest senders rather than starving small orgs. Per-org soft limits are overridable per organization via
organization_soft_limit_override.CBRS and snuba-admin integration
The policy is wired into the shared EAP routing strategy allocation-policy list so it runs in the CBRS path alongside the existing policies, dormant by default (
is_active=0,is_enforced=0) to match the staged-rollout pattern of its neighbors. It composes with regular allocation policies through the existing combine logic (can_run = all,max_threads = min) and is auto-discovered by snuba-admin capacity management for both routing strategies and regular storages — no hardcoded registration. It can also be attached to any storage YAML underallocation_policies.Agent transcript: https://claudescope.sentry.dev/share/kU8cZ7sz8002CldlP3Ol3TOwLD2U0-I64cIH9-aj68U