Skip to content

Add observability for inactive nodes for getBulk#183

Merged
Sunjeet merged 1 commit intomasterfrom
sunjeets/fix/observability
Feb 17, 2026
Merged

Add observability for inactive nodes for getBulk#183
Sunjeet merged 1 commit intomasterfrom
sunjeets/fix/observability

Conversation

@Sunjeet
Copy link
Collaborator

@Sunjeet Sunjeet commented Feb 12, 2026

Fixes an observability gap in getBulk and async getBulk, where when server node was detected to be inactive (due to transient or permanent connectivity loss), no INACTIVE_NODE metric was emitted. Now:

The internal.evc.client.fail metric is emitted with a tag evc.fail.reason=inactiveNode.

How this PR affects the various read paths-

  • getBulk / async getBulk: INACTIVE_NODE metric is now emitted
  • chunked paths: INACTIVE_NODE metric is now emitted
  • single get: unchanged (was already emitting the metric)

All read paths now emit INACTIVE_NODE when the target node is down.

@Sunjeet Sunjeet marked this pull request as ready for review February 12, 2026 17:58
…g) each unique combination of metric + call should get its own counter
@Sunjeet Sunjeet force-pushed the sunjeets/fix/observability branch from a06ba67 to 396ebe5 Compare February 17, 2026 15:16
@Sunjeet Sunjeet merged commit b61b86a into master Feb 17, 2026
1 check passed
log.debug("Current Read Queue Size - " + size + " for app " + appName + " & zone " + zone + " and node : " + evcNode);

if (!canAddToOpQueue) {
final String hostName;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason why we remove the hostName from the metric? At a quick glance, the hostName could be useful to identify which node was inactive or under read queue full?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

because of the way the metric caching was implemented, it was reporting the wrong host anyway (first host that reported that metric since client startup was cached forever) so it may not have been useful in the past. Removed to be mindful of memory ballooning on the client

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants

Comments