Is your feature request related to a problem or existing issue? Please describe.
The current architecture for reporting component readiness has some significant scalability challenges that become really apparent in larger clusters. Right now, every component that needs monitoring requires its own sidecar container running on every node. So if you're monitoring three components across a hundred nodes, that's 300 reporter containers all running continuously, each one polling its component every 30 seconds and making API calls to update node conditions.
This creates a few problems. First, there's the sheer resource overhead - all those containers consume CPU, memory, and network bandwidth even when nothing is changing. Second, the constant polling means we're making hundreds of API calls per second to the Kubernetes API server just to update conditions that might not have changed at all. Third, the tight coupling between reporters and components makes it difficult to monitor things that don't fit the sidecar pattern, like static pods or host-level services.
In a real-world scenario with a thousand-node cluster monitoring three components per node, you end up with 3,000 reporter containers making around 100 API calls per second. That's a lot of overhead for something that could be much more efficient.
Describe the solution you'd like
I'd like to propose a fundamentally different architecture that leverages Prometheus, which most Kubernetes clusters already have deployed for monitoring. Instead of running a reporter sidecar on every node, we could have a single reporter deployment that periodically queries Prometheus to check component health across all nodes at once.
Here's how it would work: Components would expose their health status as Prometheus metrics (which many already do). The reporter would run as a single deployment and periodically execute Prometheus queries to check the readiness of all components across all nodes. When it detects that a component's state has changed, it would update the corresponding node condition. This centralizes all the monitoring logic into one place instead of scattering it across thousands of sidecars.
The benefits are pretty dramatic. Instead of thousands of sidecar containers, you'd have a single deployment (maybe 3 replicas for high availability). Instead of N×M containers making individual API calls, you'd have one service making batch queries to Prometheus and only updating node conditions when states actually change. And because you're using Prometheus queries to define readiness, you can monitor any component that exposes metrics, not just ones that support sidecar injection.
The reporter would query Prometheus at a configurable interval (say, every 30 seconds) and evaluate readiness rules for all nodes at once. This is much more efficient than having individual sidecars each making their own checks. You'd also get much better observability since everything would be visible in your existing Prometheus/Grafana setup, and you could use the same Prometheus queries for both alerting and node readiness decisions.
As a future enhancement, we could make this even more efficient by integrating with Alertmanager. Instead of polling Prometheus on a schedule, Alertmanager could send webhooks to the reporter when component states change, making the system truly event-driven. This would reduce latency and eliminate unnecessary queries when nothing is changing. We could also add support for Pushgateway for components that can't expose persistent metrics endpoints.
Describe alternatives you've considered
No response
Is your feature request related to a problem or existing issue? Please describe.
The current architecture for reporting component readiness has some significant scalability challenges that become really apparent in larger clusters. Right now, every component that needs monitoring requires its own sidecar container running on every node. So if you're monitoring three components across a hundred nodes, that's 300 reporter containers all running continuously, each one polling its component every 30 seconds and making API calls to update node conditions.
This creates a few problems. First, there's the sheer resource overhead - all those containers consume CPU, memory, and network bandwidth even when nothing is changing. Second, the constant polling means we're making hundreds of API calls per second to the Kubernetes API server just to update conditions that might not have changed at all. Third, the tight coupling between reporters and components makes it difficult to monitor things that don't fit the sidecar pattern, like static pods or host-level services.
In a real-world scenario with a thousand-node cluster monitoring three components per node, you end up with 3,000 reporter containers making around 100 API calls per second. That's a lot of overhead for something that could be much more efficient.
Describe the solution you'd like
I'd like to propose a fundamentally different architecture that leverages Prometheus, which most Kubernetes clusters already have deployed for monitoring. Instead of running a reporter sidecar on every node, we could have a single reporter deployment that periodically queries Prometheus to check component health across all nodes at once.
Here's how it would work: Components would expose their health status as Prometheus metrics (which many already do). The reporter would run as a single deployment and periodically execute Prometheus queries to check the readiness of all components across all nodes. When it detects that a component's state has changed, it would update the corresponding node condition. This centralizes all the monitoring logic into one place instead of scattering it across thousands of sidecars.
The benefits are pretty dramatic. Instead of thousands of sidecar containers, you'd have a single deployment (maybe 3 replicas for high availability). Instead of N×M containers making individual API calls, you'd have one service making batch queries to Prometheus and only updating node conditions when states actually change. And because you're using Prometheus queries to define readiness, you can monitor any component that exposes metrics, not just ones that support sidecar injection.
The reporter would query Prometheus at a configurable interval (say, every 30 seconds) and evaluate readiness rules for all nodes at once. This is much more efficient than having individual sidecars each making their own checks. You'd also get much better observability since everything would be visible in your existing Prometheus/Grafana setup, and you could use the same Prometheus queries for both alerting and node readiness decisions.
As a future enhancement, we could make this even more efficient by integrating with Alertmanager. Instead of polling Prometheus on a schedule, Alertmanager could send webhooks to the reporter when component states change, making the system truly event-driven. This would reduce latency and eliminate unnecessary queries when nothing is changing. We could also add support for Pushgateway for components that can't expose persistent metrics endpoints.
Describe alternatives you've considered
No response