core: Implement load balancing policy delay plumbing#12807
Conversation
This commit implements the plumbing required to propagate delay reason tokens from load balancing policies up to the transport layer and tracers, as specified in the LB policy delay design.
| connectivityState = newState; | ||
| picker = newPicker; | ||
| if (newState == CONNECTING || newState == IDLE) { | ||
| picker = new PriorityPicker(newPicker, priority); |
There was a problem hiding this comment.
appends "priority_p0:" to the child delay token
shivaspeaks
left a comment
There was a problem hiding this comment.
Quick review, LGTM overall. I'll take a deeper look with implementation doc when I'll be back in office.
This change looks to be something that should be consistent in all the languages. Is there a gRFC baking for this? If so link that PR in description?
| PickResult childResult = delegate.pickSubchannel(args); | ||
| if (!childResult.hasResult() && childResult.getDelayReasonToken() != null) { | ||
| return PickResult.withNoResult( | ||
| "priority_" + priority + ":" + childResult.getDelayReasonToken()); |
There was a problem hiding this comment.
A question to understand this better from performance perspective.
This string concatenation happens on the hot path for every buffered RPC. If the priority tree is deep or policies are nested, this may lead to,
Allocation Overhead- repeated string and PickResult allocations on every pickSubchannel call.
Metric Cardinality- These nested tokens (e.g., priority_p0:priority_p1:ring_hash:connecting) are used as metric labels. Highly nested tokens can cause a cardinality explosion.
Is there a way we can cache the concatenated PickResult in the PriorityPicker (if the child's result is also cached/static) to avoid per-pick allocations? I assume we need the new childResult's DelayReasonToken, I'm not sure if that stays static or is dynamic. If it stays static then we can move out or else we should at least create "priority_" + priority + ":" + statically?
This PR implements the plumbing required to propagate delay reason tokens from load balancing policies up to the transport layer and tracers, as specified in the LB policy delay design.
What changed
delayReasonTokentoPickResultand factory methodwithNoResult(String).delayStarted(String)anddelayEnded()hooks toClientStreamTracerto track delay segments.DelayedClientTransportto track delay tokens inPendingStreamand notify tracers when delays start, change, or end.PickFirstandRoundRobinpolicies to emit cached tokens when connecting.RingHash,RLS, andCDSpolicies to emit specific delay tokens when buffering picks.PriorityLoadBalancerto wrap child pickers and prependpriority_X:to child tokens to track failovers.Notes