Skip to content

feat: added service monitor for metrics#275

Open
christophrichtersap wants to merge 1 commit intomainfrom
feat/servicemonitor
Open

feat: added service monitor for metrics#275
christophrichtersap wants to merge 1 commit intomainfrom
feat/servicemonitor

Conversation

@christophrichtersap
Copy link
Contributor

@christophrichtersap christophrichtersap commented Mar 25, 2026

Summary by CodeRabbit

Release Notes

  • New Features

    • Added playbook documentation links to 10 alert rules to guide incident response procedures
    • Introduced Prometheus ServiceMonitor integration for enhanced metrics collection with configurable scraping parameters
    • Implemented RBAC permissions for secure metrics reader access
  • Configuration

    • New metricsService.serviceMonitor section for managing metrics collection settings, including creation toggle, labels, annotations, and scrape intervals

@coderabbitai
Copy link

coderabbitai bot commented Mar 25, 2026

📝 Walkthrough

Walkthrough

This PR adds playbook reference labels to ten alert rules across eviction and operator alert files, introduces a new Kubernetes ClusterRoleBinding for RBAC metrics access, creates a ServiceMonitor Helm template for Prometheus metrics scraping, and adds corresponding Helm values configuration for ServiceMonitor behavior.

Changes

Cohort / File(s) Summary
Alert Playbook Labels
charts/openstack-hypervisor-operator/alerts/eviction.yaml, charts/openstack-hypervisor-operator/alerts/operator.yaml
Added playbook label references to 10 alert rules (3 eviction-related and 7 operator-related), each pointing to corresponding documentation playbooks; no changes to alert expressions, thresholds, or other properties.
Kubernetes RBAC Resources
charts/openstack-hypervisor-operator/templates/metrics-reader-rbac.yaml
Added new ClusterRoleBinding resource that binds the metrics-reader ClusterRole to the operator's ServiceAccount for RBAC metrics access.
ServiceMonitor Helm Template
charts/openstack-hypervisor-operator/templates/servicemonitor.yaml
Introduced new conditionally-rendered ServiceMonitor Kubernetes resource for Prometheus metrics discovery, configured with HTTPS endpoints, bearer token authentication, and optional metric relabeling.
Helm Values Configuration
charts/openstack-hypervisor-operator/values.yaml
Added new metricsService.serviceMonitor configuration block with settings for ServiceMonitor creation, labels, annotations, scrape intervals, timeouts, and metric/label relabelings.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~15 minutes

Possibly related PRs

Suggested reviewers

  • fwiesel

Poem

🐰 Ten alerts now have their playbooks bright,
Metrics flow in Prometheus's light,
RBAC bindings, ServiceMonitor aligned,
Hypervisor health, now well-defined! 📊✨

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'feat: added service monitor for metrics' accurately describes the main change, which adds a ServiceMonitor template and related configuration to enable Prometheus metrics scraping.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/servicemonitor

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot was unable to review this pull request because the user who requested the review has reached their quota limit.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@charts/openstack-hypervisor-operator/templates/metrics-reader-rbac.yaml`:
- Around line 24-26: The RoleBinding currently binds the "metrics-reader" Role
to the operator ServiceAccount; change it to bind to the Prometheus scraper
ServiceAccount by adding a new values key (e.g.
serviceMonitor.scrapeServiceAccount) to the chart's values/schema and update the
RoleBinding template (the subject referencing include
"openstack-hypervisor-operator.serviceAccountName") to use that new value
instead; also ensure the ServiceMonitor template (servicemonitor.yaml)
references the same serviceMonitor.scrapeServiceAccount and that the Prometheus
ServiceAccount resource exists or is documented as required for the deployment.

In `@charts/openstack-hypervisor-operator/templates/servicemonitor.yaml`:
- Around line 31-33: Replace the hardcoded tlsConfig.insecureSkipVerify with a
values-driven setting (e.g. .Values.serviceMonitor.tls.insecureSkipVerify)
defaulting to false in values.yaml and update the ServiceMonitor template to
render tlsConfig.insecureSkipVerify from that value; also add optional
.Values.serviceMonitor.tls.serverName and .Values.serviceMonitor.tls.caFile keys
and conditionally include tlsConfig.serverName and tlsConfig.caFile only when
set, keeping bearerTokenFile unchanged so existing behavior is preserved.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 31661d19-e071-4166-9644-11b4cc28aa0e

📥 Commits

Reviewing files that changed from the base of the PR and between 800d17e and 9644aca.

📒 Files selected for processing (5)
  • charts/openstack-hypervisor-operator/alerts/eviction.yaml
  • charts/openstack-hypervisor-operator/alerts/operator.yaml
  • charts/openstack-hypervisor-operator/templates/metrics-reader-rbac.yaml
  • charts/openstack-hypervisor-operator/templates/servicemonitor.yaml
  • charts/openstack-hypervisor-operator/values.yaml

Comment on lines +24 to +26
- kind: ServiceAccount
name: '{{ include "openstack-hypervisor-operator.serviceAccountName" . }}'
namespace: '{{ .Release.Namespace }}'
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Verify token source and RBAC subject alignment for metrics scraping.

# 1) Find how ServiceMonitor authenticates.
rg -n -C3 'kind:\s*ServiceMonitor|bearerTokenFile|authorization|bearerTokenSecret' charts/openstack-hypervisor-operator

# 2) Find who is granted nonResourceURL /metrics access.
rg -n -C3 'kind:\s*ClusterRoleBinding|metrics-reader|nonResourceURLs|/metrics|subjects:' charts/openstack-hypervisor-operator

# 3) Check whether metrics endpoint auth/authz is enabled in manager/proxy wiring.
rg -n -C3 'metrics-bind-address|metrics-secure|metrics-require-rbac|kube-rbac-proxy|WithAuthenticationAndAuthorization'

Repository: cobaltcore-dev/openstack-hypervisor-operator

Length of output: 20446


🏁 Script executed:

# Check if .Values.serviceMonitor.scrapeServiceAccount exists in values
cat charts/openstack-hypervisor-operator/values.yaml

# Look for any serviceAccount references in values
rg -n 'serviceAccount|scrape' charts/openstack-hypervisor-operator/values.yaml

# Check the serviceAccountName helper function
rg -n 'serviceAccountName' charts/openstack-hypervisor-operator/templates

Repository: cobaltcore-dev/openstack-hypervisor-operator

Length of output: 3296


🏁 Script executed:

# Read the helpers file to see the serviceAccountName implementation
cat charts/openstack-hypervisor-operator/templates/_helpers.tpl | head -70

Repository: cobaltcore-dev/openstack-hypervisor-operator

Length of output: 2317


🏁 Script executed:

# Check what the actual Prometheus service account name would be
# Prometheus operator typically uses standard naming conventions
rg -n 'prometheus|monitoring' charts/openstack-hypervisor-operator/values.yaml

# Check if there are any defaults or documented configurations for Prometheus
rg -n 'prometheus' charts/openstack-hypervisor-operator/templates/servicemonitor.yaml

# Verify if there's any deployment or documentation about Prometheus namespace/SA
fd -type f -name "README*" -o -name "*.md" | head -20

Repository: cobaltcore-dev/openstack-hypervisor-operator

Length of output: 295


🏁 Script executed:

# Fix the fd command
find . -type f \( -name "README*" -o -name "*.md" \) 2>/dev/null | head -5

# Check if there's any mention of what service account Prometheus uses
rg -n 'prometheus' charts/openstack-hypervisor-operator/templates/servicemonitor.yaml -A5 -B5

# Look at the complete servicemonitor.yaml file
cat charts/openstack-hypervisor-operator/templates/servicemonitor.yaml

Repository: cobaltcore-dev/openstack-hypervisor-operator

Length of output: 1635


Update metrics RBAC binding to target the Prometheus scraper's ServiceAccount instead of the operator's.

The metrics endpoint has authentication and authorization enforced (via --metrics-secure=true and the WithAuthenticationAndAuthorization filter in cmd/main.go), but the metrics-reader role is currently bound to the operator ServiceAccount. The ServiceMonitor scrapes using the bearer token from Prometheus's own ServiceAccount (line 33 of servicemonitor.yaml), creating a token-to-RBAC mismatch that will cause 403 errors.

To fix this, the binding must target the Prometheus scraper's ServiceAccount. However, the suggested configuration path (.Values.serviceMonitor.scrapeServiceAccount) does not exist in the values.yaml. Define this configuration in the values schema, then update the binding subjects to reference it:

 subjects:
 - kind: ServiceAccount
-  name: '{{ include "openstack-hypervisor-operator.serviceAccountName" . }}'
-  namespace: '{{ .Release.Namespace }}'
+  name: '{{ .Values.serviceMonitor.scrapeServiceAccount.name }}'
+  namespace: '{{ .Values.serviceMonitor.scrapeServiceAccount.namespace | default .Release.Namespace }}'

Additionally, ensure the Prometheus ServiceAccount exists and is correctly referenced by the ServiceMonitor.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@charts/openstack-hypervisor-operator/templates/metrics-reader-rbac.yaml`
around lines 24 - 26, The RoleBinding currently binds the "metrics-reader" Role
to the operator ServiceAccount; change it to bind to the Prometheus scraper
ServiceAccount by adding a new values key (e.g.
serviceMonitor.scrapeServiceAccount) to the chart's values/schema and update the
RoleBinding template (the subject referencing include
"openstack-hypervisor-operator.serviceAccountName") to use that new value
instead; also ensure the ServiceMonitor template (servicemonitor.yaml)
references the same serviceMonitor.scrapeServiceAccount and that the Prometheus
ServiceAccount resource exists or is documented as required for the deployment.

Comment on lines +31 to +33
tlsConfig:
insecureSkipVerify: true
bearerTokenFile: /var/run/secrets/kubernetes.io/serviceaccount/token
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Verify whether insecure TLS is currently hardcoded and if a value toggle already exists.
rg -n -C3 'insecureSkipVerify|tlsConfig|serviceMonitor\..*insecure|caFile|serverName' charts/openstack-hypervisor-operator

Repository: cobaltcore-dev/openstack-hypervisor-operator

Length of output: 897


🏁 Script executed:

find charts/openstack-hypervisor-operator -name "values*.yaml" -o -name "values*.yml" | head -20

Repository: cobaltcore-dev/openstack-hypervisor-operator

Length of output: 136


🏁 Script executed:

cat -n charts/openstack-hypervisor-operator/templates/servicemonitor.yaml

Repository: cobaltcore-dev/openstack-hypervisor-operator

Length of output: 1941


🏁 Script executed:

rg -A 10 "serviceMonitor:" charts/openstack-hypervisor-operator/values* | head -50

Repository: cobaltcore-dev/openstack-hypervisor-operator

Length of output: 272


Make TLS verification configurable with secure defaults.

The insecureSkipVerify: true hardcoded at line 32 disables certificate validation, creating a security risk. This should be made configurable through values.yaml with a default of false to enforce certificate verification, aligning with the existing pattern used for other ServiceMonitor settings like interval and scrapeTimeout.

Consider also adding optional support for serverName and caFile to accommodate custom certificate chains and hostname verification in air-gapped or self-signed certificate environments.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@charts/openstack-hypervisor-operator/templates/servicemonitor.yaml` around
lines 31 - 33, Replace the hardcoded tlsConfig.insecureSkipVerify with a
values-driven setting (e.g. .Values.serviceMonitor.tls.insecureSkipVerify)
defaulting to false in values.yaml and update the ServiceMonitor template to
render tlsConfig.insecureSkipVerify from that value; also add optional
.Values.serviceMonitor.tls.serverName and .Values.serviceMonitor.tls.caFile keys
and conditionally include tlsConfig.serverName and tlsConfig.caFile only when
set, keeping bearerTokenFile unchanged so existing behavior is preserved.

toanju
toanju previously approved these changes Mar 25, 2026
@toanju toanju dismissed their stale review March 25, 2026 20:47

still wip

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants