Skip to content

feat(capman): Add DynamicConcurrentQueries allocation policy#8022

Open
phacops wants to merge 1 commit into
masterfrom
feat/dynamic-concurrent-queries-allocation-policy
Open

feat(capman): Add DynamicConcurrentQueries allocation policy#8022
phacops wants to merge 1 commit into
masterfrom
feat/dynamic-concurrent-queries-allocation-policy

Conversation

@phacops

@phacops phacops commented Jun 12, 2026

Copy link
Copy Markdown
Contributor

Add an allocation policy that lets every org run queries while the cluster has spare capacity, and progressively sheds load from the heaviest orgs as it approaches its sustainable concurrent-query limit. It is governed by two soft limits — a global one and a per-org one — rather than a single hard per-tenant cap.

Shedding behavior

Concurrent queries are tracked in two Redis buckets: a global bucket (scoped per storage, i.e. per ClickHouse cluster) and a per-org bucket. While the global concurrent count is at or below global_soft_limit, every query for every org runs. Once it crosses the soft limit, the effective per-org ceiling shrinks proportionally to global pressure:

effective_limit = max(1, floor(per_org_soft_limit * global_soft_limit / global_concurrent))

So the orgs furthest above their per-org soft limit are rejected first, with more orgs shed as the cluster gets busier. An org may always run at least one query, keeping the impact focused on the heaviest senders rather than starving small orgs. Per-org soft limits are overridable per organization via organization_soft_limit_override.

CBRS and snuba-admin integration

The policy is wired into the shared EAP routing strategy allocation-policy list so it runs in the CBRS path alongside the existing policies, dormant by default (is_active=0, is_enforced=0) to match the staged-rollout pattern of its neighbors. It composes with regular allocation policies through the existing combine logic (can_run = all, max_threads = min) and is auto-discovered by snuba-admin capacity management for both routing strategies and regular storages — no hardcoded registration. It can also be attached to any storage YAML under allocation_policies.

Agent transcript: https://claudescope.sentry.dev/share/kU8cZ7sz8002CldlP3Ol3TOwLD2U0-I64cIH9-aj68U

Add an allocation policy that lets every org run queries while the cluster
has spare capacity, and sheds load from the heaviest orgs once it approaches
its sustainable limit.

The policy tracks concurrent queries in two Redis buckets: a global bucket
(per storage, i.e. per ClickHouse cluster) and a per-org bucket. While the
global concurrent count is at or below the global soft limit, all queries for
all orgs are allowed. Above it, the effective per-org ceiling shrinks
proportionally to global pressure

    effective_limit = max(1, floor(per_org_soft_limit * global_soft_limit / global_concurrent))

so the orgs furthest above their per-org soft limit are rejected first, with
progressively more orgs shed as the cluster gets busier. Small orgs keep
running (an org may always run at least one query), focusing the impact on the
heaviest senders. Per-org soft limits can be overridden per organization.

Wire the policy into the shared EAP routing strategy allocation-policy list so
it runs in the CBRS path alongside the existing policies, dormant by default
(is_active=0, is_enforced=0) to match the staged-rollout pattern. It composes
with regular allocation policies via the existing combine logic (can_run =
all, max_threads = min) and is auto-discovered by snuba-admin capacity
management for both routing strategies and regular storages.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Agent transcript: https://claudescope.sentry.dev/share/j-IJ2ngiYfk7NnJwDS2CYpZ2HFZu6AmUg4c58q0Xbc8
@phacops phacops marked this pull request as ready for review June 12, 2026 15:01
@phacops phacops requested review from a team as code owners June 12, 2026 15:01

@MeredithAnya MeredithAnya left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure why not, lets give it a go

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants