feat: added service monitor for metrics#275
Conversation
📝 WalkthroughWalkthroughThis PR adds playbook reference labels to ten alert rules across eviction and operator alert files, introduces a new Kubernetes ClusterRoleBinding for RBAC metrics access, creates a ServiceMonitor Helm template for Prometheus metrics scraping, and adds corresponding Helm values configuration for ServiceMonitor behavior. Changes
Estimated code review effort🎯 2 (Simple) | ⏱️ ~15 minutes Possibly related PRs
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@charts/openstack-hypervisor-operator/templates/metrics-reader-rbac.yaml`:
- Around line 24-26: The RoleBinding currently binds the "metrics-reader" Role
to the operator ServiceAccount; change it to bind to the Prometheus scraper
ServiceAccount by adding a new values key (e.g.
serviceMonitor.scrapeServiceAccount) to the chart's values/schema and update the
RoleBinding template (the subject referencing include
"openstack-hypervisor-operator.serviceAccountName") to use that new value
instead; also ensure the ServiceMonitor template (servicemonitor.yaml)
references the same serviceMonitor.scrapeServiceAccount and that the Prometheus
ServiceAccount resource exists or is documented as required for the deployment.
In `@charts/openstack-hypervisor-operator/templates/servicemonitor.yaml`:
- Around line 31-33: Replace the hardcoded tlsConfig.insecureSkipVerify with a
values-driven setting (e.g. .Values.serviceMonitor.tls.insecureSkipVerify)
defaulting to false in values.yaml and update the ServiceMonitor template to
render tlsConfig.insecureSkipVerify from that value; also add optional
.Values.serviceMonitor.tls.serverName and .Values.serviceMonitor.tls.caFile keys
and conditionally include tlsConfig.serverName and tlsConfig.caFile only when
set, keeping bearerTokenFile unchanged so existing behavior is preserved.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 31661d19-e071-4166-9644-11b4cc28aa0e
📒 Files selected for processing (5)
charts/openstack-hypervisor-operator/alerts/eviction.yamlcharts/openstack-hypervisor-operator/alerts/operator.yamlcharts/openstack-hypervisor-operator/templates/metrics-reader-rbac.yamlcharts/openstack-hypervisor-operator/templates/servicemonitor.yamlcharts/openstack-hypervisor-operator/values.yaml
| - kind: ServiceAccount | ||
| name: '{{ include "openstack-hypervisor-operator.serviceAccountName" . }}' | ||
| namespace: '{{ .Release.Namespace }}' |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Verify token source and RBAC subject alignment for metrics scraping.
# 1) Find how ServiceMonitor authenticates.
rg -n -C3 'kind:\s*ServiceMonitor|bearerTokenFile|authorization|bearerTokenSecret' charts/openstack-hypervisor-operator
# 2) Find who is granted nonResourceURL /metrics access.
rg -n -C3 'kind:\s*ClusterRoleBinding|metrics-reader|nonResourceURLs|/metrics|subjects:' charts/openstack-hypervisor-operator
# 3) Check whether metrics endpoint auth/authz is enabled in manager/proxy wiring.
rg -n -C3 'metrics-bind-address|metrics-secure|metrics-require-rbac|kube-rbac-proxy|WithAuthenticationAndAuthorization'Repository: cobaltcore-dev/openstack-hypervisor-operator
Length of output: 20446
🏁 Script executed:
# Check if .Values.serviceMonitor.scrapeServiceAccount exists in values
cat charts/openstack-hypervisor-operator/values.yaml
# Look for any serviceAccount references in values
rg -n 'serviceAccount|scrape' charts/openstack-hypervisor-operator/values.yaml
# Check the serviceAccountName helper function
rg -n 'serviceAccountName' charts/openstack-hypervisor-operator/templatesRepository: cobaltcore-dev/openstack-hypervisor-operator
Length of output: 3296
🏁 Script executed:
# Read the helpers file to see the serviceAccountName implementation
cat charts/openstack-hypervisor-operator/templates/_helpers.tpl | head -70Repository: cobaltcore-dev/openstack-hypervisor-operator
Length of output: 2317
🏁 Script executed:
# Check what the actual Prometheus service account name would be
# Prometheus operator typically uses standard naming conventions
rg -n 'prometheus|monitoring' charts/openstack-hypervisor-operator/values.yaml
# Check if there are any defaults or documented configurations for Prometheus
rg -n 'prometheus' charts/openstack-hypervisor-operator/templates/servicemonitor.yaml
# Verify if there's any deployment or documentation about Prometheus namespace/SA
fd -type f -name "README*" -o -name "*.md" | head -20Repository: cobaltcore-dev/openstack-hypervisor-operator
Length of output: 295
🏁 Script executed:
# Fix the fd command
find . -type f \( -name "README*" -o -name "*.md" \) 2>/dev/null | head -5
# Check if there's any mention of what service account Prometheus uses
rg -n 'prometheus' charts/openstack-hypervisor-operator/templates/servicemonitor.yaml -A5 -B5
# Look at the complete servicemonitor.yaml file
cat charts/openstack-hypervisor-operator/templates/servicemonitor.yamlRepository: cobaltcore-dev/openstack-hypervisor-operator
Length of output: 1635
Update metrics RBAC binding to target the Prometheus scraper's ServiceAccount instead of the operator's.
The metrics endpoint has authentication and authorization enforced (via --metrics-secure=true and the WithAuthenticationAndAuthorization filter in cmd/main.go), but the metrics-reader role is currently bound to the operator ServiceAccount. The ServiceMonitor scrapes using the bearer token from Prometheus's own ServiceAccount (line 33 of servicemonitor.yaml), creating a token-to-RBAC mismatch that will cause 403 errors.
To fix this, the binding must target the Prometheus scraper's ServiceAccount. However, the suggested configuration path (.Values.serviceMonitor.scrapeServiceAccount) does not exist in the values.yaml. Define this configuration in the values schema, then update the binding subjects to reference it:
subjects:
- kind: ServiceAccount
- name: '{{ include "openstack-hypervisor-operator.serviceAccountName" . }}'
- namespace: '{{ .Release.Namespace }}'
+ name: '{{ .Values.serviceMonitor.scrapeServiceAccount.name }}'
+ namespace: '{{ .Values.serviceMonitor.scrapeServiceAccount.namespace | default .Release.Namespace }}'Additionally, ensure the Prometheus ServiceAccount exists and is correctly referenced by the ServiceMonitor.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@charts/openstack-hypervisor-operator/templates/metrics-reader-rbac.yaml`
around lines 24 - 26, The RoleBinding currently binds the "metrics-reader" Role
to the operator ServiceAccount; change it to bind to the Prometheus scraper
ServiceAccount by adding a new values key (e.g.
serviceMonitor.scrapeServiceAccount) to the chart's values/schema and update the
RoleBinding template (the subject referencing include
"openstack-hypervisor-operator.serviceAccountName") to use that new value
instead; also ensure the ServiceMonitor template (servicemonitor.yaml)
references the same serviceMonitor.scrapeServiceAccount and that the Prometheus
ServiceAccount resource exists or is documented as required for the deployment.
| tlsConfig: | ||
| insecureSkipVerify: true | ||
| bearerTokenFile: /var/run/secrets/kubernetes.io/serviceaccount/token |
There was a problem hiding this comment.
🛠️ Refactor suggestion | 🟠 Major
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Verify whether insecure TLS is currently hardcoded and if a value toggle already exists.
rg -n -C3 'insecureSkipVerify|tlsConfig|serviceMonitor\..*insecure|caFile|serverName' charts/openstack-hypervisor-operatorRepository: cobaltcore-dev/openstack-hypervisor-operator
Length of output: 897
🏁 Script executed:
find charts/openstack-hypervisor-operator -name "values*.yaml" -o -name "values*.yml" | head -20Repository: cobaltcore-dev/openstack-hypervisor-operator
Length of output: 136
🏁 Script executed:
cat -n charts/openstack-hypervisor-operator/templates/servicemonitor.yamlRepository: cobaltcore-dev/openstack-hypervisor-operator
Length of output: 1941
🏁 Script executed:
rg -A 10 "serviceMonitor:" charts/openstack-hypervisor-operator/values* | head -50Repository: cobaltcore-dev/openstack-hypervisor-operator
Length of output: 272
Make TLS verification configurable with secure defaults.
The insecureSkipVerify: true hardcoded at line 32 disables certificate validation, creating a security risk. This should be made configurable through values.yaml with a default of false to enforce certificate verification, aligning with the existing pattern used for other ServiceMonitor settings like interval and scrapeTimeout.
Consider also adding optional support for serverName and caFile to accommodate custom certificate chains and hostname verification in air-gapped or self-signed certificate environments.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@charts/openstack-hypervisor-operator/templates/servicemonitor.yaml` around
lines 31 - 33, Replace the hardcoded tlsConfig.insecureSkipVerify with a
values-driven setting (e.g. .Values.serviceMonitor.tls.insecureSkipVerify)
defaulting to false in values.yaml and update the ServiceMonitor template to
render tlsConfig.insecureSkipVerify from that value; also add optional
.Values.serviceMonitor.tls.serverName and .Values.serviceMonitor.tls.caFile keys
and conditionally include tlsConfig.serverName and tlsConfig.caFile only when
set, keeping bearerTokenFile unchanged so existing behavior is preserved.
Summary by CodeRabbit
Release Notes
New Features
Configuration
metricsService.serviceMonitorsection for managing metrics collection settings, including creation toggle, labels, annotations, and scrape intervals