Important
This repository is not inteded to be used in production environments. This repository is containing files to evaluate the PSI integration into the descheduler only. For a production level integration please monitor the OpenShift Documentation in order to understand when this feature GAs.
PSI metrics expose node level pressure (or even cgroup). These metrics can be leveraged by the descheduler in order to rebalance workloads according to the real node usage.
- As a cluster administrator I want to ensure that all my nodes are equally utilized in order to avoid pressure for the individual workloads
- Limited to Virtual Machines run with OpenShift Virtualization
- Limited to worker machine pools (PSI metrics are needed also for master nodes)
Note
- Use a cluster with at least 6 worker nodes
- The cluster should have no other workloads running
Note
Two alternative test scenarios are available, replace TEST_SCENARIO=1 with TEST_SCENARIO=2 to switch to the second one.
$ bash to.sh deploy
...
$ bash to.sh apply
...
$ TEST_SCENARIO=1 bash e2e-test.sh
...
$Running bash to.sh deploy will deploy all necessary parts.
In this section we are looking what exactly is getting deployed.
-
Reconfiguration of the machine pools to enable PSI metrics at Kernel level and expose them via the
node_exporteroc apply -f manifests/10-mc-psi-controlplane.yaml oc apply -f manifests/11-mc-psi-worker.yaml oc apply -f manifests/12-mc-schedstats-worker.yaml -
Deploy the Descheduler Operator and OpenShift Virtualization
oc apply -f manifests/20-namespaces.yaml oc apply -f manifests/30-operatorgroup.yaml oc apply -f manifests/31-subscriptions.yaml until _oc apply -f manifests/40-cnv-operator-cr.yaml ; do echo -n . sleep 6 ; done until _oc apply -f manifests/41-descheduler-operator-cr.yaml ; do echo -n . sleep 6 ; done oc adm policy add-cluster-role-to-user cluster-monitoring-view -z $SA -n $NS # for desched metrics -
Deploy the node tainting component
oc create -n $NS configmap desched-taint --from-file contrib/desched-taint.sh oc apply -n $NS -f manifests/50-desched-taint.yaml oc adm policy add-cluster-role-to-user system:controller:node-controller -z $SA -n $NS" # for tainter
Two dashboards are available for monitoring the descheduler behaviour.
A Grafana dashboard for the PSI-based load-aware rebalancing profile. See monitoring/README.md for deployment instructions.
A Perses dashboard focused on the memory-aware aspects: synthetic utilization values, dynamic thresholds, PSI pressure, node classification over time, and evictions.
The stack runs locally in containers and proxies to the remote cluster's Thanos querier, so no cluster-side deployment is required.
Prerequisites: podman (or docker) with compose support, and an active oc login
to the target cluster.
$ cd monitoring/perses
$ KUBECONFIG=/path/to/kubeconfig ./start.shOpen http://localhost:8080 and navigate to
Projects → descheduler → Dashboards → memory-aware-rebalancing.
To stop the stack:
$ ./start.sh stopTo refresh an expired token (tokens are short-lived), simply re-run ./start.sh —
it regenerates nginx.conf with the new token and restarts the proxy container.
| Section | Panels |
|---|---|
| Utilization | CPU utilization per node + fleet average (★); Memory utilization per node + fleet average |
| Pressure | CPU PSI pressure per node + fleet average; Memory PSI pressure per node + fleet average |
| Synthetic utilization value & thresholds | Per-node descheduler score with dynamic high (red dashed) and low (orange dashed) threshold bands |
| Node classification | StatusHistoryChart — Underutilized / Normal / Overutilized per node over time |
| Evictions | Total counter + per-node time series for KubeVirtRelieveAndMigrate |
| Detailed metrics (collapsed) | CPU & memory pressure and utilization, each per node with fleet average overlaid |
All panels are filterable by node via the Node variable at the top of the dashboard.