-
Notifications
You must be signed in to change notification settings - Fork 120
e2e: PP: cover ExecCPUAffinity support in tests #1432
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: shajmakh The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
|
depend on #1426 |
6ef4f1a to
beeea3d
Compare
565820a to
0af067c
Compare
|
regarding /test e2e-gcp-pao-workloadhints |
GCP cluster profile uses ipi-gcp flow which by default uses 6 vCPUs for compute machines (see `step-registry/ipi/conf/gcp/ipi-conf-ref.yaml`).The performance profile suites configures a profile with `reserved: "0"` and `isolated: "1-3"` (see openshift/cluster-node-tuning-operator#909), unless environment vars are specificed. In general this is the good practice to include all node's cpus in the PP cpu section, but reason why we need this now is that we have some new tests that requires most the cpus to be all distributed using PP (see openshift/cluster-node-tuning-operator#1432 (comment)). In this commit we start updating only the affected job on which the test would run, later we will need to add this setting to all other jobs that consume ipi-gcp cluster configuration. Note: this is subject to change should the CPU specifications on GCP get modified. Signed-off-by: Shereen Haj <shajmakh@redhat.com>
GCP cluster profile uses ipi-gcp flow which by default uses 6 vCPUs for compute machines (see `step-registry/ipi/conf/gcp/ipi-conf-ref.yaml`).The performance profile suites configures a profile with `reserved: "0"` and `isolated: "1-3"` (see openshift/cluster-node-tuning-operator#909), unless environment vars are specificed. In general this is the good practice to include all node's cpus in the PP cpu section, but reason why we need this now is that we have some new tests that requires most the cpus to be all distributed using PP (see openshift/cluster-node-tuning-operator#1432 (comment)). In this commit we start updating only the affected job on which the test would run, later we will need to add this setting to all other jobs that consume ipi-gcp cluster configuration. Note: this is subject to change should the CPU specifications on GCP get modified. Signed-off-by: Shereen Haj <shajmakh@redhat.com>
|
/retest |
GCP cluster profile uses ipi-gcp flow which by default uses 6 vCPUs for compute machines (see `step-registry/ipi/conf/gcp/ipi-conf-ref.yaml`).The performance profile suites configures a profile with `reserved: "0"` and `isolated: "1-3"` (see openshift/cluster-node-tuning-operator#909), unless environment vars are specificed. In general this is the good practice to include all node's cpus in the PP cpu section, but reason why we need this now is that we have some new tests that requires most the cpus to be all distributed using PP (see openshift/cluster-node-tuning-operator#1432 (comment)). Note: this is subject to change should the CPU specifications on GCP get modified. Signed-off-by: Shereen Haj <shajmakh@redhat.com>
|
when temporarly removed the failing test due to misaligning node topology with PP cpu section, |
|
/hold |
GCP cluster profile uses ipi-gcp flow which by default uses 6 vCPUs for compute machines (see `step-registry/ipi/conf/gcp/ipi-conf-ref.yaml`).The performance profile suites configures a profile with `reserved: "0"` and `isolated: "1-3"` (see openshift/cluster-node-tuning-operator#909), unless environment vars are specificed. In general this is the good practice to include all node's cpus in the PP cpu section, but reason why we need this now is that we have some new tests that requires most the cpus to be all distributed using PP (see openshift/cluster-node-tuning-operator#1432 (comment)). Note: this is subject to change should the CPU specifications on GCP get modified. Signed-off-by: Shereen Haj <shajmakh@redhat.com>
0af067c to
41afeca
Compare
|
/test e2e-aws-ovn |
|
@shajmakh: The following tests failed, say
Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
41afeca to
acdb51a
Compare
Reflect the changes made in Dockerfile.rhel9 in PR openshift#1436 to the Dockerfile used by OKD. OKD nightly tests are failing with: ``` time="2026-01-12T08:38:47Z" level=info msg="Completed image extract for release image \"registry.ci.openshift.org/origin/4.22-okd-scos-2026-01-12-053025@sha256:ff55c198c9fe6b1f64915ad339e2d2185ba20cc887547fdb44f4dfa97f3bbef9\" in 9.859313051s" error: couldn't retrieve test suites: failed to extract test binaries: encountered errors while extracting binaries: extracted binary at path "/tmp/home/.cache/openshift-tests/registry_ci_openshift_org_origin_release-scos_4_22_0-0_okd-scos-2026-01-12-053025_178412928257/cluster-node-tuning-operator-test-ext.gz" does not exist. the src path "/usr/bin/cluster-node-tuning-operator-test-ext.gz" doesn't exist in image "registry.ci.openshift.org/origin/4.22-okd-scos-2026-01-12-053025@sha256:ff55c198c9fe6b1f64915ad339e2d2185ba20cc887547fdb44f4dfa97f3bbef9". note the version of origin needs to match the version of the cluster under test ```
For High-performance configuration, cri-o started supporting exec-cpu-affinity feature and when configured to `first` it provides the ability for exec process to be pinned to the first CPU from the shared-CPUs IF set or to the first one from the islolated set. (see cri-o/cri-o@4dd7fb9) In performance profile, we want to enable this high-performance feature by default, and disable it (legacy) it provides an annotation option. The annotation is there just as a backup in case of bugs getting reported by the consequences of this feature enablement, and should be removed in 2 releases. Run `./hack/render-sync.sh` to update the (no-cluster) e2e tests expected outputs. Signed-off-by: Shereen Haj <shajmakh@redhat.com>
u/s CI stops the run after 2 failures of the same test, which prevents a full run of the tests. The fail-fast option is better be removed in test runs that do not involve node reboots, that way the CI would reflect a full run of the suite which would help us save reruns to see the next failing test. In other words, removing it will reflect all of the failing tests in one run. remove this flag initially for suites of PP that are known to not have reboots. Signed-off-by: Shereen Haj <shajmakh@redhat.com>
The annotation tells crun to manage container's cgroup using the systemd default subgroup behavior. This is meant to reduce the nested cgroups so the container cgroup is placed on the pod/system clice (the parent) rather than creating a cgroup per container. Signed-off-by: Shereen Haj <shajmakh@redhat.com>
Add main e2e tests that checks the behavior of performance-profile with `ExecCPUAffinity: first` and without it (legacy). Signed-off-by: Shereen Haj <shajmakh@redhat.com>
Add unit tests for functions in resources helper package for tests. Assisted-by: Cursor v1.2.2 AI-Attribution: AIA Entirely AI, Human-initiated, Reviewed, Cursor v1.2.2 v1.0 Signed-off-by: Shereen Haj <shajmakh@redhat.com>
Add basic e2e tests that checks the default behavior of performance-profile with default enabled
ExecCPUAffinity: first.