Using the descheduler with PSI metrics

Important

This repository is not inteded to be used in production environments. This repository is containing files to evaluate the PSI integration into the descheduler only. For a production level integration please monitor the OpenShift Documentation in order to understand when this feature GAs.

Using the descheduler with PSI metrics

PSI metrics expose node level pressure (or even cgroup). These metrics can be leveraged by the descheduler in order to rebalance workloads according to the real node usage.

User stories

As a cluster administrator I want to ensure that all my nodes are equally utilized in order to avoid pressure for the individual workloads

Scope & Limitations

Limited to Virtual Machines run with OpenShift Virtualization
Limited to worker machine pools (PSI metrics are needed also for master nodes)

Usage

Note

Use a cluster with at least 6 worker nodes
The cluster should have no other workloads running

Note

Two alternative test scenarios are available, replace TEST_SCENARIO=1 with TEST_SCENARIO=2 to switch to the second one.

$ bash to.sh deploy
...
$ bash to.sh apply
...
$ TEST_SCENARIO=1 bash e2e-test.sh
...
$

Deployment walk through

Running bash to.sh deploy will deploy all necessary parts. In this section we are looking what exactly is getting deployed.

Reconfiguration of the machine pools to enable PSI metrics at Kernel level and expose them via the node_exporter

oc apply -f manifests/10-mc-psi-controlplane.yaml
oc apply -f manifests/11-mc-psi-worker.yaml
oc apply -f manifests/12-mc-schedstats-worker.yaml

Deploy the Descheduler Operator and OpenShift Virtualization

oc apply -f manifests/20-namespaces.yaml
oc apply -f manifests/30-operatorgroup.yaml
oc apply -f manifests/31-subscriptions.yaml

until _oc apply -f manifests/40-cnv-operator-cr.yaml ; do echo -n . sleep 6 ; done
until _oc apply -f manifests/41-descheduler-operator-cr.yaml ; do echo -n . sleep 6 ; done

oc adm policy add-cluster-role-to-user cluster-monitoring-view -z $SA -n $NS  # for desched metrics

Deploy the node tainting component

oc create -n $NS configmap desched-taint --from-file contrib/desched-taint.sh
oc apply -n $NS -f manifests/50-desched-taint.yaml
oc adm policy add-cluster-role-to-user system:controller:node-controller -z $SA -n $NS" # for tainter

Monitoring

Two dashboards are available for monitoring the descheduler behaviour.

Load Aware ReBalancing (Grafana)

A Grafana dashboard for the PSI-based load-aware rebalancing profile. See monitoring/README.md for deployment instructions.

Memory Aware Rebalancing (Perses — local)

A Perses dashboard focused on the memory-aware aspects: synthetic utilization values, dynamic thresholds, PSI pressure, node classification over time, and evictions.

The stack runs locally in containers and proxies to the remote cluster's Thanos querier, so no cluster-side deployment is required.

Prerequisites: podman (or docker) with compose support, and an active oc login to the target cluster.

$ cd monitoring/perses
$ KUBECONFIG=/path/to/kubeconfig ./start.sh

Open http://localhost:8080 and navigate to Projects → descheduler → Dashboards → memory-aware-rebalancing.

To stop the stack:

$ ./start.sh stop

To refresh an expired token (tokens are short-lived), simply re-run ./start.sh — it regenerates nginx.conf with the new token and restarts the proxy container.

What the dashboard shows

Section	Panels
Utilization	CPU utilization per node + fleet average (★); Memory utilization per node + fleet average
Pressure	CPU PSI pressure per node + fleet average; Memory PSI pressure per node + fleet average
Synthetic utilization value & thresholds	Per-node descheduler score with dynamic high (red dashed) and low (orange dashed) threshold bands
Node classification	`StatusHistoryChart` — Underutilized / Normal / Overutilized per node over time
Evictions	Total counter + per-node time series for `KubeVirtRelieveAndMigrate`
Detailed metrics (collapsed)	CPU & memory pressure and utilization, each per node with fleet average overlaid

All panels are filterable by node via the Node variable at the top of the dashboard.

Name		Name	Last commit message	Last commit date
Latest commit History 77 Commits
contrib		contrib
manifests		manifests
monitoring		monitoring
tests		tests
.gitignore		.gitignore
README.md		README.md
e2e-test.sh		e2e-test.sh
test_scenario_1.sh		test_scenario_1.sh
test_scenario_2.sh		test_scenario_2.sh
to.sh		to.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Using the descheduler with PSI metrics

User stories

Scope & Limitations

Usage

Deployment walk through

Monitoring

Load Aware ReBalancing (Grafana)

Memory Aware Rebalancing (Perses — local)

What the dashboard shows

About

Uh oh!

Releases 1

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Using the descheduler with PSI metrics

User stories

Scope & Limitations

Usage

Deployment walk through

Monitoring

Load Aware ReBalancing (Grafana)

Memory Aware Rebalancing (Perses — local)

What the dashboard shows

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages