Skip to content

bug: EvaluationError failures from NodeReconciler not tracked in metrics #241

@Shreya2005-2005

Description

@Shreya2005-2005

Describe the bug

In internal/controller/node_controller.go, when evaluateRuleForNode
fails inside processNodeAgainstAllRules, recordNodeFailure is called
but metrics.Failures counter is NOT incremented.

Line 155 (node_controller.go):

r.recordNodeFailure(rule, node.Name, "EvaluationError", err.Error())
// metrics.Failures.WithLabelValues(rule.Name, "EvaluationError").Inc() -- MISSING

Compare with processAllNodesForRule in nodereadinessrule_controller.go
line 275-276 which correctly does both:

r.recordNodeFailure(rule, node.Name, "EvaluationError", err.Error())
metrics.Failures.WithLabelValues(rule.Name, "EvaluationError").Inc()

Impact

Failures triggered via the NodeReconciler path (node condition changes)
are silently missing from node_readiness_failures_total Prometheus
metric, causing incomplete observability for operators monitoring
cluster health.

Expected behavior

metrics.Failures.WithLabelValues(rule.Name, "EvaluationError").Inc()
should be called alongside recordNodeFailure in node_controller.go.

File

internal/controller/node_controller.go line 155

Are you able to fix this issue?

Yes (I will propose a PR)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions