Skip to content

Add Kepler power monitoring for TNF clusters#52

Draft
lucaconsalvi wants to merge 2 commits intoopenshift-eng:mainfrom
lucaconsalvi:shift-week-kepler-demo
Draft

Add Kepler power monitoring for TNF clusters#52
lucaconsalvi wants to merge 2 commits intoopenshift-eng:mainfrom
lucaconsalvi:shift-week-kepler-demo

Conversation

@lucaconsalvi
Copy link
Copy Markdown
Contributor

Summary

  • Add Kepler v0.11.3 deployment automation (Ansible role + playbook + shell scripts) for power
    monitoring on TNF clusters
  • Include Grafana with a pre-configured TNF Power Monitoring dashboard for visualization
  • Add a Claude Code skill (/tnf-power) that queries Kepler metrics via Prometheus and generates
    a power consumption report
  • Add documentation covering Kepler architecture, usage, and a presentation guide

Details

Deployment

  • kepler.yml playbook with kepler Ansible role handling namespace, RBAC, DaemonSet,
    ServiceMonitor, and user workload monitoring setup
  • Grafana deployment with a custom dashboard (per-node power, control plane breakdown, top
    containers)
  • make deploy-kepler / make remove-kepler targets and corresponding shell scripts
  • Supports both install and removal via -e kepler_state=absent

Claude Code Skill

  • /tnf-power skill queries Prometheus for node and container CPU power metrics
  • Detects RAPL (bare metal) vs estimated (VM) measurement mode
  • Reports cluster total, per-node breakdown, control plane overhead, and top containers

Documentation

  • docs/kepler/README.md — Setup and usage guide
  • docs/kepler/KEPLER-ARCHITECTURE.md — How Kepler works on TNF clusters
  • docs/kepler/KEPLER-PRESENTATION.md — Demo/presentation walkthrough

Test plan

  • Deploy Kepler on a TNF cluster via make deploy-kepler
  • Verify Kepler exporter pods are running on both nodes
  • Confirm metrics are scraped in Prometheus (2 active targets)
  • Run /tnf-power skill and verify report output
  • Access Grafana dashboard and confirm panels render
  • Remove Kepler via make remove-kepler and verify cleanup

@openshift-ci openshift-ci Bot requested review from jaypoulz and jeff-roche March 5, 2026 14:23
@openshift-ci
Copy link
Copy Markdown

openshift-ci Bot commented Mar 5, 2026

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: lucaconsalvi
Once this PR has been reviewed and has the lgtm label, please assign jeff-roche for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@lucaconsalvi lucaconsalvi reopened this Apr 29, 2026
@lucaconsalvi lucaconsalvi marked this pull request as draft April 29, 2026 08:21
@openshift-ci openshift-ci Bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Apr 29, 2026
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 29, 2026

Important

Review skipped

Auto reviews are limited based on label configuration.

🚫 Review skipped — only excluded labels are configured. (1)
  • do-not-merge/work-in-progress

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Enterprise

Run ID: dd6b0b97-d2e3-4c26-b4c3-1de3c87d5fce

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands and usage tips.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant