Skip to content

core: Implement load balancing policy delay plumbing#12807

Open
AgraVator wants to merge 4 commits into
grpc:masterfrom
AgraVator:lb-policy-delay
Open

core: Implement load balancing policy delay plumbing#12807
AgraVator wants to merge 4 commits into
grpc:masterfrom
AgraVator:lb-policy-delay

Conversation

@AgraVator

@AgraVator AgraVator commented May 13, 2026

Copy link
Copy Markdown
Contributor

This PR implements the plumbing required to propagate delay reason tokens from load balancing policies up to the transport layer and tracers, as specified in the LB policy delay design.

What changed

  • api: Added delayReasonToken to PickResult and factory method withNoResult(String).
  • api: Added delayStarted(String) and delayEnded() hooks to ClientStreamTracer to track delay segments.
  • core: Updated DelayedClientTransport to track delay tokens in PendingStream and notify tracers when delays start, change, or end.
  • core/util: Updated PickFirst and RoundRobin policies to emit cached tokens when connecting.
  • xds: Updated RingHash, RLS, and CDS policies to emit specific delay tokens when buffering picks.
  • xds: Updated PriorityLoadBalancer to wrap child pickers and prepend priority_X: to child tokens to track failovers.

Notes

  • This change focuses strictly on the plumbing of the delay reasons. Implementation of actual OpenTelemetry metrics and spans is deferred to a later phase.

AgraVator added 2 commits May 13, 2026 22:15
This commit implements the plumbing required to propagate delay reason tokens from load balancing policies up to the transport layer and tracers, as specified in the LB policy delay design.
connectivityState = newState;
picker = newPicker;
if (newState == CONNECTING || newState == IDLE) {
picker = new PriorityPicker(newPicker, priority);

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

appends "priority_p0:" to the child delay token

@AgraVator AgraVator marked this pull request as ready for review June 8, 2026 11:13
@AgraVator AgraVator requested review from ejona86 and shivaspeaks June 8, 2026 11:14

@shivaspeaks shivaspeaks left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Quick review, LGTM overall. I'll take a deeper look with implementation doc when I'll be back in office.

This change looks to be something that should be consistent in all the languages. Is there a gRFC baking for this? If so link that PR in description?

PickResult childResult = delegate.pickSubchannel(args);
if (!childResult.hasResult() && childResult.getDelayReasonToken() != null) {
return PickResult.withNoResult(
"priority_" + priority + ":" + childResult.getDelayReasonToken());

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A question to understand this better from performance perspective.
This string concatenation happens on the hot path for every buffered RPC. If the priority tree is deep or policies are nested, this may lead to,
Allocation Overhead- repeated string and PickResult allocations on every pickSubchannel call.
Metric Cardinality- These nested tokens (e.g., priority_p0:priority_p1:ring_hash:connecting) are used as metric labels. Highly nested tokens can cause a cardinality explosion.

Is there a way we can cache the concatenated PickResult in the PriorityPicker (if the child's result is also cached/static) to avoid per-pick allocations? I assume we need the new childResult's DelayReasonToken, I'm not sure if that stays static or is dynamic. If it stays static then we can move out or else we should at least create "priority_" + priority + ":" + statically?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants