e2e: PP: cover ExecCPUAffinity support in tests #1432

shajmakh · 2025-11-13T13:45:17Z

Add basic e2e tests that checks the default behavior of performance-profile with default enabled ExecCPUAffinity: first.

openshift-ci · 2025-11-13T13:46:08Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: shajmakh
Once this PR has been reviewed and has the lgtm label, please assign ffromani for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details

Needs approval from an approver in each of these files:

OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

shajmakh · 2025-11-13T15:13:51Z

depend on #1426

shajmakh · 2026-01-20T11:14:22Z

regarding ci/prow/e2e-gcp-pao-updating-profile the newly added test in the PR is failing because the exec process was always (for 20 retries) pinned to the first CPU of the set although the execCPUAffinity feature is disabled.
this was tested locally several times and it passed. looking deeper in the mustgather, we can see that the PP cpu config is as follow:
cpu: isolated: 1-3 reserved: "0"
while the test logs show us that the exclusive CPUs that were assigned to the running (guaranteed) container were:
first exclusive CPU: 1, all exclusive CPUs: 1,4
which means CPU 4 is likely offline which leaves only CPU 1 for the process to be pinned to.
looking at the node's allocatable cpus:
allocatable: cpu: "5"
which means that the PP didn't distribute the rest of the unreserved CPUs. The thing that caused unalignments when scheduling a workload.
Investigation is ongoing to solve this.

/test e2e-gcp-pao-workloadhints

GCP cluster profile uses ipi-gcp flow which by default uses 6 vCPUs for compute machines (see `step-registry/ipi/conf/gcp/ipi-conf-ref.yaml`).The performance profile suites configures a profile with `reserved: "0"` and `isolated: "1-3"` (see openshift/cluster-node-tuning-operator#909), unless environment vars are specificed. In general this is the good practice to include all node's cpus in the PP cpu section, but reason why we need this now is that we have some new tests that requires most the cpus to be all distributed using PP (see openshift/cluster-node-tuning-operator#1432 (comment)). In this commit we start updating only the affected job on which the test would run, later we will need to add this setting to all other jobs that consume ipi-gcp cluster configuration. Note: this is subject to change should the CPU specifications on GCP get modified. Signed-off-by: Shereen Haj <shajmakh@redhat.com>

shajmakh · 2026-01-22T07:53:12Z

/retest

GCP cluster profile uses ipi-gcp flow which by default uses 6 vCPUs for compute machines (see `step-registry/ipi/conf/gcp/ipi-conf-ref.yaml`).The performance profile suites configures a profile with `reserved: "0"` and `isolated: "1-3"` (see openshift/cluster-node-tuning-operator#909), unless environment vars are specificed. In general this is the good practice to include all node's cpus in the PP cpu section, but reason why we need this now is that we have some new tests that requires most the cpus to be all distributed using PP (see openshift/cluster-node-tuning-operator#1432 (comment)). Note: this is subject to change should the CPU specifications on GCP get modified. Signed-off-by: Shereen Haj <shajmakh@redhat.com>

shajmakh · 2026-01-22T13:22:18Z

when temporarly removed the failing test due to misaligning node topology with PP cpu section,
ci/prow/e2e-gcp-pao-updating-profile lane passed. A fix for the infra issue is proposed here: openshift/release#73835

shajmakh · 2026-01-23T10:50:00Z

/hold
for prow change to be merged

GCP cluster profile uses ipi-gcp flow which by default uses 6 vCPUs for compute machines (see `step-registry/ipi/conf/gcp/ipi-conf-ref.yaml`).The performance profile suites configures a profile with `reserved: "0"` and `isolated: "1-3"` (see openshift/cluster-node-tuning-operator#909), unless environment vars are specificed. In general this is the good practice to include all node's cpus in the PP cpu section, but reason why we need this now is that we have some new tests that requires most the cpus to be all distributed using PP (see openshift/cluster-node-tuning-operator#1432 (comment)). Note: this is subject to change should the CPU specifications on GCP get modified. Signed-off-by: Shereen Haj <shajmakh@redhat.com>

shajmakh · 2026-01-26T10:55:32Z

/test e2e-aws-ovn
/test e2e-aws-operator

openshift-ci · 2026-01-26T13:22:00Z

@shajmakh: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name	Commit	Details	Required	Rerun command
ci/prow/e2e-hypershift-pao	`41afeca`	link	true	`/test e2e-hypershift-pao`
ci/prow/e2e-aws-ovn	`41afeca`	link	true	`/test e2e-aws-ovn`
ci/prow/e2e-gcp-pao	`41afeca`	link	true	`/test e2e-gcp-pao`

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

Reflect the changes made in Dockerfile.rhel9 in PR openshift#1436 to the Dockerfile used by OKD. OKD nightly tests are failing with: ``` time="2026-01-12T08:38:47Z" level=info msg="Completed image extract for release image \"registry.ci.openshift.org/origin/4.22-okd-scos-2026-01-12-053025@sha256:ff55c198c9fe6b1f64915ad339e2d2185ba20cc887547fdb44f4dfa97f3bbef9\" in 9.859313051s" error: couldn't retrieve test suites: failed to extract test binaries: encountered errors while extracting binaries: extracted binary at path "/tmp/home/.cache/openshift-tests/registry_ci_openshift_org_origin_release-scos_4_22_0-0_okd-scos-2026-01-12-053025_178412928257/cluster-node-tuning-operator-test-ext.gz" does not exist. the src path "/usr/bin/cluster-node-tuning-operator-test-ext.gz" doesn't exist in image "registry.ci.openshift.org/origin/4.22-okd-scos-2026-01-12-053025@sha256:ff55c198c9fe6b1f64915ad339e2d2185ba20cc887547fdb44f4dfa97f3bbef9". note the version of origin needs to match the version of the cluster under test ```

For High-performance configuration, cri-o started supporting exec-cpu-affinity feature and when configured to `first` it provides the ability for exec process to be pinned to the first CPU from the shared-CPUs IF set or to the first one from the islolated set. (see cri-o/cri-o@4dd7fb9) In performance profile, we want to enable this high-performance feature by default, and disable it (legacy) it provides an annotation option. The annotation is there just as a backup in case of bugs getting reported by the consequences of this feature enablement, and should be removed in 2 releases. Run `./hack/render-sync.sh` to update the (no-cluster) e2e tests expected outputs. Signed-off-by: Shereen Haj <shajmakh@redhat.com>

u/s CI stops the run after 2 failures of the same test, which prevents a full run of the tests. The fail-fast option is better be removed in test runs that do not involve node reboots, that way the CI would reflect a full run of the suite which would help us save reruns to see the next failing test. In other words, removing it will reflect all of the failing tests in one run. remove this flag initially for suites of PP that are known to not have reboots. Signed-off-by: Shereen Haj <shajmakh@redhat.com>

The annotation tells crun to manage container's cgroup using the systemd default subgroup behavior. This is meant to reduce the nested cgroups so the container cgroup is placed on the pod/system clice (the parent) rather than creating a cgroup per container. Signed-off-by: Shereen Haj <shajmakh@redhat.com>

Add main e2e tests that checks the behavior of performance-profile with `ExecCPUAffinity: first` and without it (legacy). Signed-off-by: Shereen Haj <shajmakh@redhat.com>

Add unit tests for functions in resources helper package for tests. Assisted-by: Cursor v1.2.2 AI-Attribution: AIA Entirely AI, Human-initiated, Reviewed, Cursor v1.2.2 v1.0 Signed-off-by: Shereen Haj <shajmakh@redhat.com>

openshift-ci bot requested review from MarSik and swatisehgal November 13, 2025 13:46

shajmakh mentioned this pull request Nov 13, 2025

perfprof: enable exec-cpu-affinity by default (annotation) #1426

Open

shajmakh force-pushed the exec-affinity-pp-e2e branch 3 times, most recently from 6ef4f1a to beeea3d Compare November 18, 2025 10:34

shajmakh force-pushed the exec-affinity-pp-e2e branch 4 times, most recently from 565820a to 0af067c Compare January 20, 2026 11:14

shajmakh mentioned this pull request Jan 22, 2026

telco: PP: configure the isolated and reserved cpus on gcp openshift/release#73835

Merged

openshift-ci bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jan 23, 2026

shajmakh force-pushed the exec-affinity-pp-e2e branch 2 times, most recently from 0af067c to 41afeca Compare January 26, 2026 06:56

shajmakh force-pushed the exec-affinity-pp-e2e branch from 41afeca to acdb51a Compare January 27, 2026 10:03

Prashanth684 and others added 3 commits January 27, 2026 12:03

shajmakh added 3 commits January 27, 2026 12:03

e2e: PP: cover ExecCPUAffinity support in tests

981aab7

Add main e2e tests that checks the behavior of performance-profile with `ExecCPUAffinity: first` and without it (legacy). Signed-off-by: Shereen Haj <shajmakh@redhat.com>

PP: e2e utils:add unit tests

acdb51a

Add unit tests for functions in resources helper package for tests. Assisted-by: Cursor v1.2.2 AI-Attribution: AIA Entirely AI, Human-initiated, Reviewed, Cursor v1.2.2 v1.0 Signed-off-by: Shereen Haj <shajmakh@redhat.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

e2e: PP: cover ExecCPUAffinity support in tests #1432

e2e: PP: cover ExecCPUAffinity support in tests #1432

shajmakh commented Nov 13, 2025

Uh oh!

openshift-ci bot commented Nov 13, 2025

Uh oh!

shajmakh commented Nov 13, 2025

Uh oh!

shajmakh commented Jan 20, 2026

Uh oh!

shajmakh commented Jan 22, 2026

Uh oh!

shajmakh commented Jan 22, 2026

Uh oh!

shajmakh commented Jan 23, 2026

Uh oh!

shajmakh commented Jan 26, 2026

Uh oh!

openshift-ci bot commented Jan 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

e2e: PP: cover ExecCPUAffinity support in tests #1432

Are you sure you want to change the base?

e2e: PP: cover ExecCPUAffinity support in tests #1432

Conversation

shajmakh commented Nov 13, 2025

Uh oh!

openshift-ci bot commented Nov 13, 2025

Uh oh!

shajmakh commented Nov 13, 2025

Uh oh!

shajmakh commented Jan 20, 2026

Uh oh!

shajmakh commented Jan 22, 2026

Uh oh!

shajmakh commented Jan 22, 2026

Uh oh!

shajmakh commented Jan 23, 2026

Uh oh!

shajmakh commented Jan 26, 2026

Uh oh!

openshift-ci bot commented Jan 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants