Add observability for inactive nodes for getBulk#183
Merged
Conversation
…g) each unique combination of metric + call should get its own counter
a06ba67 to
396ebe5
Compare
shy-1234
reviewed
Feb 17, 2026
| log.debug("Current Read Queue Size - " + size + " for app " + appName + " & zone " + zone + " and node : " + evcNode); | ||
|
|
||
| if (!canAddToOpQueue) { | ||
| final String hostName; |
Contributor
There was a problem hiding this comment.
Is there a reason why we remove the hostName from the metric? At a quick glance, the hostName could be useful to identify which node was inactive or under read queue full?
Collaborator
Author
There was a problem hiding this comment.
because of the way the metric caching was implemented, it was reporting the wrong host anyway (first host that reported that metric since client startup was cached forever) so it may not have been useful in the past. Removed to be mindful of memory ballooning on the client
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Fixes an observability gap in getBulk and async getBulk, where when server node was detected to be inactive (due to transient or permanent connectivity loss), no INACTIVE_NODE metric was emitted. Now:
The
internal.evc.client.failmetric is emitted with a tagevc.fail.reason=inactiveNode.How this PR affects the various read paths-
All read paths now emit INACTIVE_NODE when the target node is down.