From 8cf0de936fb4f29939201c3e69e883c95b10eba9 Mon Sep 17 00:00:00 2001 From: Harel Meir Date: Sun, 4 Jan 2026 15:12:34 +0200 Subject: [PATCH 1/8] [IUO] multi-arch STP Signed-off-by: Harel Meir --- .../sig-iuo/golden_image_multiarch_support.md | 308 ++++++++++++++++++ 1 file changed, 308 insertions(+) create mode 100644 stps/sig-iuo/golden_image_multiarch_support.md diff --git a/stps/sig-iuo/golden_image_multiarch_support.md b/stps/sig-iuo/golden_image_multiarch_support.md new file mode 100644 index 0000000..00f42ec --- /dev/null +++ b/stps/sig-iuo/golden_image_multiarch_support.md @@ -0,0 +1,308 @@ +# Openshift-virtualization-tests Test plan + +## Golden Images Support For Heterogeneous Clusters - Quality Engineering Plan** + +### **Metadata & Tracking** + +| Field | Details | +|:-----------------------|:---------------------------------------------------------------------------------------------------------------------------------| +| **Enhancement(s)** | [dic-on-heterogeneous-cluster](https://github.com/kubevirt/enhancements/tree/main/veps/sig-storage/dic-on-heterogeneous-cluster) | +| **Feature in Jira** | [VIRTSTRAT-494](https://issues.redhat.com/browse/VIRTSTRAT-494) | +| **Jira Tracking** | [CNV-67900](https://issues.redhat.com/browse/CNV-67900) | +| **QE Owner(s)** | Harel Meir | +| **Owning SIG** | sig-iuo | +| **Participating SIGs** | sig-infra, sig-storage, sig-virt | +| **Current Status** | Draft | + +--- + +### **I. Motivation and Requirements Review (QE Review Guidelines)** +This section documents the mandatory QE review process. The goal is to understand the feature's value, technology, and testability prior to formal test planning. + +#### **1. Requirement & User Story Review Checklist** + +| Check | Done | Details/Notes | Comments | +|:---------------------------------------|:-----|:----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:---------| +| **Review Requirements** | [V] | Reviewed the relevant requirements. | | +| **Understand Value** | [V] | Confirmed clear user stories and understood.
Understand the difference between U/S and D/S requirements
**What is the value of the feature for RH customers**. | | +| **Customer Use Cases** | [V] | Ensured requirements contain relevant **customer use cases**. | | +| **Testability** | [V] | Confirmed requirements are **testable and unambiguous**. | | +| **Acceptance Criteria** | [v] | Ensured acceptance criteria are **defined clearly** (clear user stories; D/S requirements clearly defined in Jira). | | +| **Non-Functional Requirements (NFRs)** | [V] | Confirmed coverage for NFRs, including Performance, Security, Usability, Downtime, Connectivity, Monitoring (alerts/metrics), Scalability, Portability (e.g., cloud support), and Docs. | | + +#### Overview + +Golden images are commonly used OS boot disk images that are used to create virtual machines (VMs) in a Kubernetes +cluster. Their purpose is to ensure these images are automatically available and kept up-to-date. The original design of +the golden images is documented in the [kubevirt/community repository](https://github.com/kubevirt/community/blob/69d061862e0839608932d225a728a7a6e7a89f29/design-proposals/golden-image-delivery-and-update-pipeline.md). + +The initial design assumed homogeneous clusters, where all nodes in the cluster share the same architecture. However, as +there is a need to support heterogeneous clusters, where nodes may have different architectures (e.g., `arm64`, `amd64`, +`s390x`), this assumption does not apply anymore, and some changes are required to support this use-case. + +#### Motivation + +The high level flow of the golden images is as follows: + +1. The HyperConverged Cluster Operator (HCO) image, includes predefined DataImportCronTemplate files. HCO generates + a list of `DataImportCronTemplate` objects in the `SSP` CR, based on these files. +2. SSP creates `DataImportCron` CRs from the `DataImportCronTemplate` objects in the `SSP` CR. +3. Either SSP or CDI create a `DataSource` CRs based on the `DataImportCron` CRs. + The CDI monitors the `DataImportCron` CRs, and ensures the corresponding `DataSource` CRs are updated as needed. +4. CDI checks the image URL or the ImageStream periodically (according to the cron expression in the + `DataImportCron` CR), and if the image is updated, it creates a new `VolumeSnapshot` (or a `PVC`), imports the + latest version of the requested image into this new `VolumeSnapshot`/`PVC`, and modifies the `DataSource` CR to point + to the latest `VolumeSnapshot`/`PVC`. + + To perform the actual import, CDI creates a `DataVolume` CR, from the `spec.template` field in the `DataImportCron` + CR. +5. When creating a VM, users set the VM's `spec.dataVolumeTemplate` field, to point to the desired `DataSource` CR. + CDI creates a new `PVC`, and clones the `VolumeSnapshot`/`PVC` that is referenced by the `DataSource`, + into this `PVC`. + +Cluster administrators can create custom golden images, by adding `DataImportCronTemplate` objects to the +`HyperConverged` CR. The HCO adds these custom templates to the `SSP` custom resource, initiating the same workflow. + +**Technology Challenges** +The current design assumes homogeneous clusters, where all nodes in the cluster share the same architecture. However, +heterogeneous clusters with nodes of different architectures (e.g., `arm64`, `amd64`, `s390x`) introduce challenges: + +* CDI-importer may pick wrong arch image to pull: +The predefined `DataImportCronTemplate` already configured with multi-architecture images (image manifests pointing to +multiple architecture-specific images). However, when the cdi-importer pulls an image, it selects one suitable for the +node's architecture, which may not match the architecture of the VM being created. For example, pulling an `arm64` +image for a VM running on an `amd64` node will cause the VM to fail. + +* Custom templates arch specific +Users may need to create VMs with specific architectures to run architecture-specific applications. For that, they +will want to create a custom golden image with a specific architecture, and use it to create VMs with the same +architecture. This use-case is not supported by the current design. + +##### HCO part + +#### User stories + +1. As a VM creator, I want to create virtual machines with specific architectures to run architecture-specific applications. +2. As a cluster administrator, I want to define custom golden images with multi-architecture support, enabling VM creators to deploy VMs with the desired architecture. +3. As a VM creator, I want my existing tools and script will continue to work as before, without any changes. + + +#### **2. Technology and Design Review** + +| Check | Done | Details/Notes | Comments | +|:---------------------------------|:-----|:----------------------------------------------------------------------------------|:-------------------------| +| **Developer Handoff/QE Kickoff** | [V] | Met with Nahshon from HCO team. | | +| **Technology Challenges** | [V] | CDI-importer picking the correct image arch, custom templates arch specific | | +| **Test Environment Needs** | [V] | HA cluster for simple functionality testing, MultiArch cluster for full coverage. | Jenkins deploy job exist | +| **API Extensions** | [V] | HCO nodeInfo, dataImportCronTemplates | | +| **Topology Considerations** | [V] | ARM64 and S390X vms are supported since cnv-4.19 | | + + + + +### **II. Software Test Plan (STP)** + +This STP serves as the **overall roadmap for testing**, detailing the scope, approach, resources, and schedule. + +#### **1. Scope of Testing** + +This testplan covers FG activation, propagation to other components and new alerts & metrics tested. + +**Document Conventions (if applicable):** +- **MultiArch cluster** - heterogeneous cluster with 3 amd64 control-plane nodes, 2 amd64 worker nodes, and 1/2 arm64 worker nodes. +- **HA cluster** - homogenous cluster with 3 control-plane and 3 amd64 worker node. +- **Archs** - architectures. +- **Related resources** - golden images accosiated DataImportCron,DataSource,DV/VMsnapshot CRs. +- **FG** - enableMultiArchBootImageImport FeatureGate in HCO CR enabling this feature. + +**In Scope:** +##### Functional Testing +- HCO monitors the cluster's nodes architectures correctly. +- HCO dataImportCronTemplates annotations listing the correct architectures and propagates to SSP. +- Correct resources created and ready to use for common golden images per arch. +- Correct resources created and ready to use for custom golden images per arch. + +##### Non-Functional Testing +**Regression Testing** +- spec.workloads.nodePlacement should determine the workload nodes architecture. +- spec.workloads.nodePlacement of not existing arch. + +**Backward Compatibility Testing** +- Legacy Datasource points to the arch specific Datasource + +**Upgrade testing** +- Verify that resources names preserved as expected after upgrade. + + +#### **2. Testing Goals** + +Define specific, measurable testing objectives for this feature, such as: + +- [ ] Achieve 100% feature coverage for core functionality. +- [x] Validate feature enablement until related resources created as expected. +- [x] Verify backward compatibility with legacy dataSources. +- [x] Verify NodePlacement. +- [x] Verify negative scenarios. +- [x] Verify new alerts & metrics +- [x] Verify related resources names preserved as expected after upgrade. +- [ ] Automate 100% of functional tests. + + +#### **3. Non-Goals (Testing Scope Exclusions)** + +Explicitly document what is **out of scope** for testing. **Critical:** All non-goals require explicit stakeholder agreement to prevent "I assumed you were testing that" issues. + + +| Non-Goal | Rationale | PM/ Lead Agreement | +|:---------------------------------|:------------------------------------------------------------------|:-------------------| +| Update existing VM | If a VM already running, it won't use the arch-specific resources | [ ] Name/Date | +| Performance Testing | Feature not related to scale | [ ] Name/Date | +| Security Testing | Feature not security related | [ ] Name/Date | +| Usability testing | Should be done by UI team | [ ] Name/Date | +| Compatibility | Should be done by Virt/SSP team(create vms from multiple archs) | [ ] Name/Date | +| Templates creation & utilization | Should be done by SSP team | [ ] Name/Date | +| Import & datasource new API | Should be done by Storage team | [ ] Name/Date | + + +#### **4. Test Strategy** + +##### **A. Types of Testing** + +**Note:** Mark "Y" if applicable, "N/A" if not applicable (with justification in Comments). Empty cells indicate incomplete review. + +| Item (Testing Type) | Applicable (Y/N or N/A) | Comments | +|:-------------------------------|:------------------------|:-------------------------------| +| Functional Testing | Yes | Defined above | +| Automation Testing | Yes | All tests should be automated. | +| Performance Testing | N/A | | +| Security Testing | N/A | | +| Usability Testing | N/A | | +| Compatibility Testing | N/A | | +| Regression Testing | Yes | Defined above | +| Upgrade Testing | Yes | Defined above | +| Backward Compatibility Testing | Yes | Defined above | + +##### **B. Potential Areas to Consider** + +**Note:** Mark "Y" if applicable, "N/A" if not applicable (with justification in Comment). Empty cells indicate incomplete review. + +| Item | Description | Applicable (Y/N or N/A) | Comment | +|:-----------------------|:-------------------------------------------------------------------------------------------------------------------|:------------------------|:-------------------------------| +| **Dependencies** | Dependent on deliverables from other components/products? Identify what is tested by which team. | N | | +| **Monitoring** | Does the feature require metrics and/or alerts? | Y | Two new alerts & metrics added | +| **Cross Integrations** | Does the feature affect other features/require testing by other components? Identify what is tested by which team. | Y | SSP, storage, virt, upgrade | +| **UI** | Does the feature require UI? If so, ensure the UI aligns with the requirements. | Y | | + +**Dependencies** + + +###### Dependencies/Cross Integration +Since it's a cross-team feature for CNV, the following should be tested: +**IUO** +- Testing HCO monitors new nodes architectures correctly. +- Testing FG activation and propagation to CNV components. +- Verifies new metrics & alerts. +- Verifies upgrade. + +**SSP** +- Templates creation & utilization +- Verify new SSP API. + +**Storage** +- Verify that cdi-importer doing so successfully. +- Verify Legacy datasource's backwards compatibility. +- etc + +**Virt** +- Verify VMs picked correctly the appropriate node by arch. +- Verify VMs migrated to nodes with the same arch. +- Verify upgrade. +- etc + +#### **5. Test Environment** + +**Note:** "N/A" means explicitly not applicable. Cannot leave empty cells. + +| Environment Component | Configuration | Comments | +|:----------------------------------------------|:-------------------------|:---------------------------------------------------------------| +| **Cluster Topology** | MultiArch cluster | 3 control-plan and 3/4 worker nodes. | +| **OCP & OpenShift Virtualization Version(s)** | OCP 4.21, CNV-4.21 | OCP 4.21 and openshift-virtualization 4.21. | +| **CPU Virtualization** | Multi-arch cluster | 3 amd64 control-plane, 2 amd64 workers, and 1/2 arm64 workers. | +| **Compute Resources** | N/A | | +| **Special Hardware** | | | +| **Storage** | io2-csi storage class | | +| **Network** | OVN-Kubernetes (default) | No network requirements | +| **Required Operators** | | | +| **Platform** | AWS | | +| **Special Configurations** | | | + +#### **5.5. Testing Tools & Frameworks** + +Document any **new or additional** testing tools, frameworks, or infrastructure required specifically for this feature. + +**Note:** Only list tools that are **new** or **different** from standard testing infrastructure. Leave empty if using standard tools. + +| Category | Tools/Frameworks | +|:-------------------|:------------------------------| +| **Test Framework** | MultiArch cluster, HA cluster | +| **CI/CD** | Jenkins pipeline | +| **Other Tools** | | + +#### **6. Entry Criteria** + +The following conditions must be met before testing can begin: + +- [] Requirements and design documents are **approved and merged** +- [] Test environment can be **set up and configured** +- [] multi-cpu architecture enabled in openshift-virtualization repo +- [ ] [ֿAdd feature-specific entry criteria as needed] + +#### **7. Risks and Limitations** + +Document specific risks and limitations for this feature. If a risk category is not applicable, mark as "N/A" with justification in mitigation strategy. + +**Note:** Empty "Specific Risk" cells mean this must be filled. "N/A" means explicitly not applicable with justification. + +| Risk Category | Specific Risk for This Feature | Mitigation Strategy | Status | +|:-----------------------------------|:--------------------------------------------------------------------------------|:-----------------------------------------------------------------------------------------------|:-------| +| Timeline/Schedule | Code-Freeze in 2 weeks | prioritize this week | [ ] | +| Test Coverage | Cannot perform automation testing until virt team enable multi-cpu architecture | | [ ] | +| Test Environment | Cannot test upgrade well until additional ARM64 worker will be added by Devops | Try to do so manually | [ ] | +| Untestable Aspects | N/A | | [ ] | +| Resource Constraints | Feature includes almost all CNV teams | [Your mitigation, e.g., "Focus automation on critical paths, coordinate with dev for testing"] | [ ] | +| Dependencies | SSP, Storage and Virt team finishing their part | Sync | [ ] | +| Blocker Bug for legacy DataSources | CNV-75762 | Already fixed, Storage QE need to verify | [ ] | + +#### **8. Known Limitations** + +Document any known limitations, constraints, or trade-offs in the feature implementation or testing approach. + +--- + +### **III. Test Scenarios & Traceability** + +This section links requirements to test coverage, enabling reviewers to verify all requirements are tested. + +| Requirement ID | Requirement Summary | Test Scenario(s) | Test Type(s) | Priority | +|:---------------|:----------------------|:-----------------------------------------------------------------------------|:-----------------------|:---------| +| [Jira-xxx] | As a user... | HCO listing supported archs correctly | Functional | P0 | +| [Jira-xxx] | As an admin... | HCO annotating dataImportCronTemplates with supported archs | Functional | P0 | +| [Jira-xxx] | NFR-2 (Security) | Correct resources created and ready to use for common golden images per arch | Functional | P1 | +| [Jira-xxx] | As a cluster admin... | Correct resources created and ready to use for custom golden images per arch | Functional | P1 | +| [Jira-xxx] | As a cluster admin... | nodePlacement should determine the workload nodes architecture | Regression | P1 | +| [Jira-xxx] | As a cluster admin... | nodePlacement of not existing arch | Regression | P1 | +| [Jira-xxx] | As a cluster admin... | Legacy Datasource points to the arch specific Datasource | Backward Compatibility | P1 | +| [Jira-xxx] | As a cluster admin... | related resources names preserved as expected after upgrade | Upgrade | P1 | + +--- + +### **IV. Sign-off and Approval** + +This Software Test Plan requires approval from the following stakeholders: + +* **Reviewers:** + - [Name / @github-username] + - [Name / @github-username] +* **Approvers:** + - [Name / @github-username] + - [Name / @github-username] From b618e3e2f1652c3f20efb64b459f9e8759f1d387 Mon Sep 17 00:00:00 2001 From: Harel Meir Date: Mon, 5 Jan 2026 11:50:19 +0200 Subject: [PATCH 2/8] Added monitoring and focused on HCO Signed-off-by: Harel Meir --- .../sig-iuo/golden_image_multiarch_support.md | 326 +++++++++++------- 1 file changed, 205 insertions(+), 121 deletions(-) diff --git a/stps/sig-iuo/golden_image_multiarch_support.md b/stps/sig-iuo/golden_image_multiarch_support.md index 00f42ec..cc9a253 100644 --- a/stps/sig-iuo/golden_image_multiarch_support.md +++ b/stps/sig-iuo/golden_image_multiarch_support.md @@ -17,6 +17,7 @@ --- ### **I. Motivation and Requirements Review (QE Review Guidelines)** + This section documents the mandatory QE review process. The goal is to understand the feature's value, technology, and testability prior to formal test planning. #### **1. Requirement & User Story Review Checklist** @@ -30,62 +31,46 @@ This section documents the mandatory QE review process. The goal is to understan | **Acceptance Criteria** | [v] | Ensured acceptance criteria are **defined clearly** (clear user stories; D/S requirements clearly defined in Jira). | | | **Non-Functional Requirements (NFRs)** | [V] | Confirmed coverage for NFRs, including Performance, Security, Usability, Downtime, Connectivity, Monitoring (alerts/metrics), Scalability, Portability (e.g., cloud support), and Docs. | | -#### Overview +--- -Golden images are commonly used OS boot disk images that are used to create virtual machines (VMs) in a Kubernetes -cluster. Their purpose is to ensure these images are automatically available and kept up-to-date. The original design of -the golden images is documented in the [kubevirt/community repository](https://github.com/kubevirt/community/blob/69d061862e0839608932d225a728a7a6e7a89f29/design-proposals/golden-image-delivery-and-update-pipeline.md). +##### **Background** -The initial design assumed homogeneous clusters, where all nodes in the cluster share the same architecture. However, as -there is a need to support heterogeneous clusters, where nodes may have different architectures (e.g., `arm64`, `amd64`, -`s390x`), this assumption does not apply anymore, and some changes are required to support this use-case. +**What are Golden Images?** -#### Motivation +Golden images are pre-configured OS boot disk images used to create virtual machines (VMs) in OpenShift Virtualization. They serve as ready-to-use templates that are automatically available and kept up-to-date, eliminating the need for users to manually configure OS images for each VM. -The high level flow of the golden images is as follows: +The original golden images design (documented in the [kubevirt/community repository](https://github.com/kubevirt/community/blob/69d061862e0839608932d225a728a7a6e7a89f29/design-proposals/golden-image-delivery-and-update-pipeline.md)) was built for **homogeneous clusters** where all nodes share the same CPU architecture. -1. The HyperConverged Cluster Operator (HCO) image, includes predefined DataImportCronTemplate files. HCO generates - a list of `DataImportCronTemplate` objects in the `SSP` CR, based on these files. -2. SSP creates `DataImportCron` CRs from the `DataImportCronTemplate` objects in the `SSP` CR. -3. Either SSP or CDI create a `DataSource` CRs based on the `DataImportCron` CRs. - The CDI monitors the `DataImportCron` CRs, and ensures the corresponding `DataSource` CRs are updated as needed. -4. CDI checks the image URL or the ImageStream periodically (according to the cron expression in the - `DataImportCron` CR), and if the image is updated, it creates a new `VolumeSnapshot` (or a `PVC`), imports the - latest version of the requested image into this new `VolumeSnapshot`/`PVC`, and modifies the `DataSource` CR to point - to the latest `VolumeSnapshot`/`PVC`. - - To perform the actual import, CDI creates a `DataVolume` CR, from the `spec.template` field in the `DataImportCron` - CR. -5. When creating a VM, users set the VM's `spec.dataVolumeTemplate` field, to point to the desired `DataSource` CR. - CDI creates a new `PVC`, and clones the `VolumeSnapshot`/`PVC` that is referenced by the `DataSource`, - into this `PVC`. +**The Problem: Heterogeneous Clusters** -Cluster administrators can create custom golden images, by adding `DataImportCronTemplate` objects to the -`HyperConverged` CR. The HCO adds these custom templates to the `SSP` custom resource, initiating the same workflow. +Modern Kubernetes clusters increasingly run nodes with different CPU architectures (e.g., `amd64`, `arm64`, `s390x`). The current golden images implementation does not support this: -**Technology Challenges** -The current design assumes homogeneous clusters, where all nodes in the cluster share the same architecture. However, -heterogeneous clusters with nodes of different architectures (e.g., `arm64`, `amd64`, `s390x`) introduce challenges: +1. **Wrong architecture image selection**: When the CDI-importer pulls a multi-architecture image, it selects the image variant matching the *importer pod's node architecture*, not the target VM's architecture. For example, if the importer runs on an `arm64` node but the VM will run on `amd64`, the wrong image is imported and the VM fails to boot. -* CDI-importer may pick wrong arch image to pull: -The predefined `DataImportCronTemplate` already configured with multi-architecture images (image manifests pointing to -multiple architecture-specific images). However, when the cdi-importer pulls an image, it selects one suitable for the -node's architecture, which may not match the architecture of the VM being created. For example, pulling an `arm64` -image for a VM running on an `amd64` node will cause the VM to fail. +2. **No custom multi-arch support**: Cluster administrators cannot define custom golden images that work across multiple architectures. -* Custom templates arch specific -Users may need to create VMs with specific architectures to run architecture-specific applications. For that, they -will want to create a custom golden image with a specific architecture, and use it to create VMs with the same -architecture. This use-case is not supported by the current design. +--- -##### HCO part +##### **Motivation** -#### User stories +This feature enables OpenShift Virtualization to properly manage golden images in heterogeneous clusters by: -1. As a VM creator, I want to create virtual machines with specific architectures to run architecture-specific applications. -2. As a cluster administrator, I want to define custom golden images with multi-architecture support, enabling VM creators to deploy VMs with the desired architecture. -3. As a VM creator, I want my existing tools and script will continue to work as before, without any changes. +- Tracking which CPU architectures exist in the cluster +- Creating architecture-specific `DataSource` resources for each golden image +- Ensuring VMs boot from images matching their target architecture +- Maintaining backward compatibility with existing tooling and scripts + +--- +##### **User Stories** + +| ID | User Story | +|:-----|:--------------------------------------------------------------------------------------------------------------------------------------------------------------------| +| US-1 | As a **VM creator**, I want to create virtual machines with specific CPU architectures to run architecture-specific applications. | +| US-2 | As a **cluster administrator**, I want to define custom golden images with multi-architecture support, so VM creators can deploy VMs on any supported architecture. | +| US-3 | As a **VM creator**, I want my existing tools and scripts to continue working without modification after this feature is enabled. | + +--- #### **2. Technology and Design Review** @@ -94,44 +79,149 @@ architecture. This use-case is not supported by the current design. | **Developer Handoff/QE Kickoff** | [V] | Met with Nahshon from HCO team. | | | **Technology Challenges** | [V] | CDI-importer picking the correct image arch, custom templates arch specific | | | **Test Environment Needs** | [V] | HA cluster for simple functionality testing, MultiArch cluster for full coverage. | Jenkins deploy job exist | -| **API Extensions** | [V] | HCO nodeInfo, dataImportCronTemplates | | -| **Topology Considerations** | [V] | ARM64 and S390X vms are supported since cnv-4.19 | | +| **API Extensions** | [V] | HCO `status.nodeInfo`, `status.dataImportCronTemplates` with new fields | | +| **Topology Considerations** | [V] | ARM64 and S390X VMs are supported since CNV-4.19 | | + +#### **HCO Operator Changes** + +📖 **Feature Documentation:** [Golden Images in Heterogeneous Clusters](https://docs.redhat.com/en/documentation/openshift_container_platform/4.20/html/virtualization/advanced-vm-creation#virt-golden-image-heterogeneous-clusters) +The HyperConverged Cluster Operator (HCO) is highly affected by this feature. The following changes are introduced: + +##### **New Feature Gate** + +| Item | Details | +|:------------------|:---------------------------------------------------------------------------| +| **Feature Gate** | `enableMultiArchBootImageImport` | +| **Default Value** | `false` (disabled) | +| **Purpose** | Enables heterogeneous cluster support for golden images when set to `true` | + +##### **New Status Fields** + +HCO `status` is extended with a `nodeInfo` field that tracks cluster node architectures. for example: + +```yaml +status: + nodeInfo: + controlPlaneArchitectures: + - amd64 + workloadsArchitectures: + - amd64 + - arm64 +``` + +| Field | Description | +|:----------------------------|:---------------------------------------------------------| +| `controlPlaneArchitectures` | List of CPU architectures present on control-plane nodes | +| `workloadsArchitectures` | List of CPU architectures present on workload nodes | + +##### **DataImportCronTemplates Enhancements** + +Each `DataImportCronTemplate` in the HCO status is enhanced with: +| Field/Annotation | Description | +|:-------------------------------------|:--------------------------------------------------------------------------------------------------------| +| `ssp.kubevirt.io/dict.architectures` | Annotation listing architectures supported by the image (filtered to cluster's available architectures) | +| `originalSupportedArchitectures` | Status field showing original architectures from the image manifest | +| `conditions` | Status conditions for issues (e.g., `UnsupportedArchitectures`) | +Example status with an unsupported architecture: + +```yaml +status: + dataImportCronTemplates: + - metadata: + name: centos-stream10-image-cron + annotations: + ssp.kubevirt.io/dict.architectures: "amd64" + status: + originalSupportedArchitectures: "amd64,arm64,s390x" + conditions: + - type: Deployed + status: "False" + reason: UnsupportedArchitectures + message: "DataImportCronTemplate has no supported architectures for the current cluster" +``` + +##### **Node Architecture Tracking** + +| Behavior | Description | +|:----------------------------|:-----------------------------------------------------------------------------------------------| +| **Workload Node Detection** | By default, nodes labeled with `node-role.kubernetes.io/worker` | +| **Custom Node Placement** | If `spec.workloads.nodePlacement` is set, HCO uses it to determine workload nodes | +| **Control-Plane Detection** | Nodes labeled with `node-role.kubernetes.io/control-plane` or `node-role.kubernetes.io/master` | +| **Dynamic Updates** | Architecture lists are updated automatically as nodes are added/removed | + +##### **Architecture Filtering** + +When the feature gate is enabled, HCO: + +1. Reads the `ssp.kubevirt.io/dict.architectures` annotation from each predefined `DataImportCronTemplate` +2. Filters out architectures not present in the cluster's workload nodes +3. Propagates the filtered list to the SSP CR for `DataImportCron` and `DataSource` creation + +##### **Monitoring & Alerts** + +New alerts are introduced to help administrators manage golden images in heterogeneous clusters: + +| Alert | Description | +|:-----------------------------------------------------------------------------------------------------------------------------------|:--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| +| [`HCOMultiArchGoldenImagesDisabled`](https://kubevirt.io/monitoring/runbooks/HCOMultiArchGoldenImagesDisabled.html) | Fires when the cluster has workload nodes with different CPU architectures but `enableMultiArchBootImageImport` is disabled. Without this feature gate, golden images may be imported with the wrong architecture, causing VMs to fail to start. | +| [`HCOGoldenImageWithNoArchitectureAnnotation`](https://kubevirt.io/monitoring/runbooks/HCOGoldenImageWithNoArchitectureAnnotation) | Fires when a custom `DataImportCronTemplate` is missing the `ssp.kubevirt.io/dict.architectures` annotation (only when FG is enabled). Without this annotation, the golden image has no defined architecture and VMs may fail to start. | +| [`HCOGoldenImageWithNoSupportedArchitecture`](https://kubevirt.io/monitoring/runbooks/HCOGoldenImageWithNoSupportedArchitecture) | Fires when a `DataImportCronTemplate`'s `ssp.kubevirt.io/dict.architectures` annotation doesn't include any architecture supported by the cluster (only when FG is enabled). The golden image won't be created since no cluster nodes can use it. | + +**Examples:** +- **`HCOMultiArchGoldenImagesDisabled`**: A cluster with both `amd64` and `arm64` worker nodes triggers this alert if the feature gate is disabled, warning that VMs may boot from incompatible images. +- **`HCOGoldenImageWithNoArchitectureAnnotation`**: A custom DICT `my-custom-image` defined in `spec.dataImportCronTemplates` without the `ssp.kubevirt.io/dict.architectures` annotation triggers this alert. +- **`HCOGoldenImageWithNoSupportedArchitecture`**: A DICT annotated with `ssp.kubevirt.io/dict.architectures: s390x` on a cluster with only `amd64` and `arm64` workers triggers this alert, and the DICT status shows `reason: UnsupportedArchitectures`. + +--- ### **II. Software Test Plan (STP)** This STP serves as the **overall roadmap for testing**, detailing the scope, approach, resources, and schedule. +**Document Conventions:** + +| Term | Definition | +|:----------------------|:---------------------------------------------------------------------------------------------------------------------------| +| **MultiArch cluster** | Heterogeneous cluster with 3 amd64 control-plane nodes, 2 amd64 worker nodes, and 1-2 arm64 worker nodes | +| **HA cluster** | Homogeneous cluster with 3 control-plane nodes and 3 amd64 worker nodes | +| **FG** | `enableMultiArchBootImageImport` feature gate in HCO CR (see Section I.5.1) | +| **Related resources** | Golden image associated resources: `DataImportCron`, `DataSource`, `DataVolume`, `VolumeSnapshot` | +| **nodeInfo** | HCO status field tracking cluster architectures (`status.nodeInfo`) | + #### **1. Scope of Testing** -This testplan covers FG activation, propagation to other components and new alerts & metrics tested. +This test plan covers scenarios related to HCO, and monitoring: -**Document Conventions (if applicable):** -- **MultiArch cluster** - heterogeneous cluster with 3 amd64 control-plane nodes, 2 amd64 worker nodes, and 1/2 arm64 worker nodes. -- **HA cluster** - homogenous cluster with 3 control-plane and 3 amd64 worker node. -- **Archs** - architectures. -- **Related resources** - golden images accosiated DataImportCron,DataSource,DV/VMsnapshot CRs. -- **FG** - enableMultiArchBootImageImport FeatureGate in HCO CR enabling this feature. +* HCO node architecture tracking nodeInfo. +* FG activation and propagation to CNV components. +* DataImportCronTemplates architecture annotations and filtering. +* New alerts & metrics validation. **In Scope:** ##### Functional Testing -- HCO monitors the cluster's nodes architectures correctly. -- HCO dataImportCronTemplates annotations listing the correct architectures and propagates to SSP. -- Correct resources created and ready to use for common golden images per arch. -- Correct resources created and ready to use for custom golden images per arch. +* HCO monitors the cluster's nodes architectures correctly. +* HCO dataImportCronTemplates annotations listing the correct architectures and propagates to SSP. +* Correct resources created and ready to use for common golden images per arch. +* Correct resources created and ready to use for custom golden images per arch. -##### Non-Functional Testing +##### Alerts Testing +* `HCOMultiArchGoldenImagesDisabled` fires on heterogeneous cluster with FG disabled. +* `HCOGoldenImageWithNoArchitectureAnnotation` fires when custom DICT missing architecture annotation. +* `HCOGoldenImageWithNoSupportedArchitecture` fires when DICT has no cluster-supported architecture. + +##### Non-Functional Testing **Regression Testing** -- spec.workloads.nodePlacement should determine the workload nodes architecture. -- spec.workloads.nodePlacement of not existing arch. +* spec.workloads.nodePlacement should determine the workload nodes architecture. +* spec.workloads.nodePlacement of not existing arch. **Backward Compatibility Testing** -- Legacy Datasource points to the arch specific Datasource +* Legacy Datasource points to the arch specific Datasource **Upgrade testing** -- Verify that resources names preserved as expected after upgrade. +* Verify that resources names preserved as expected after upgrade. #### **2. Testing Goals** @@ -188,36 +278,21 @@ Explicitly document what is **out of scope** for testing. **Critical:** All non- | Item | Description | Applicable (Y/N or N/A) | Comment | |:-----------------------|:-------------------------------------------------------------------------------------------------------------------|:------------------------|:-------------------------------| -| **Dependencies** | Dependent on deliverables from other components/products? Identify what is tested by which team. | N | | -| **Monitoring** | Does the feature require metrics and/or alerts? | Y | Two new alerts & metrics added | +| **Dependencies** | Dependent on deliverables from other components/products? Identify what is tested by which team. | | | +| **Monitoring** | Does the feature require metrics and/or alerts? | Y | Three new alerts added | | **Cross Integrations** | Does the feature affect other features/require testing by other components? Identify what is tested by which team. | Y | SSP, storage, virt, upgrade | | **UI** | Does the feature require UI? If so, ensure the UI aligns with the requirements. | Y | | -**Dependencies** - - -###### Dependencies/Cross Integration -Since it's a cross-team feature for CNV, the following should be tested: -**IUO** -- Testing HCO monitors new nodes architectures correctly. -- Testing FG activation and propagation to CNV components. -- Verifies new metrics & alerts. -- Verifies upgrade. - -**SSP** -- Templates creation & utilization -- Verify new SSP API. +##### **C. Dependencies/Cross Integration** -**Storage** -- Verify that cdi-importer doing so successfully. -- Verify Legacy datasource's backwards compatibility. -- etc +This is a cross-team feature for CNV. Testing responsibilities are divided as follows: -**Virt** -- Verify VMs picked correctly the appropriate node by arch. -- Verify VMs migrated to nodes with the same arch. -- Verify upgrade. -- etc +| Team | Testing Responsibility | +|:------------|:-------------------------------------------------------------------------------------------------------------| +| **IUO** | HCO node architecture tracking (`status.nodeInfo`), FG activation/propagation, new metrics & alerts, upgrade | +| **SSP** | Templates creation & utilization, new SSP API (`enableMultipleArchitectures`, `cluster` fields) | +| **Storage** | CDI-importer architecture selection, legacy `DataSource` backward compatibility, new CDI `platform` API | +| **Virt** | VM scheduling to correct architecture nodes, VM migration between same-arch nodes, upgrade | #### **5. Test Environment** @@ -225,16 +300,16 @@ Since it's a cross-team feature for CNV, the following should be tested: | Environment Component | Configuration | Comments | |:----------------------------------------------|:-------------------------|:---------------------------------------------------------------| -| **Cluster Topology** | MultiArch cluster | 3 control-plan and 3/4 worker nodes. | -| **OCP & OpenShift Virtualization Version(s)** | OCP 4.21, CNV-4.21 | OCP 4.21 and openshift-virtualization 4.21. | -| **CPU Virtualization** | Multi-arch cluster | 3 amd64 control-plane, 2 amd64 workers, and 1/2 arm64 workers. | -| **Compute Resources** | N/A | | -| **Special Hardware** | | | -| **Storage** | io2-csi storage class | | -| **Network** | OVN-Kubernetes (default) | No network requirements | -| **Required Operators** | | | -| **Platform** | AWS | | -| **Special Configurations** | | | +| **Cluster Topology** | MultiArch cluster | 3 control-plane and 3-4 worker nodes | +| **OCP & OpenShift Virtualization Version(s)** | OCP 4.21, CNV-4.21 | OCP 4.21 and OpenShift Virtualization 4.21 | +| **CPU Virtualization** | Multi-arch cluster | 3 amd64 control-plane, 2 amd64 workers, and 1-2 arm64 workers | +| **Compute Resources** | N/A | No special compute requirements | +| **Special Hardware** | N/A | No special hardware required | +| **Storage** | io2-csi storage class | AWS EBS io2 CSI driver | +| **Network** | OVN-Kubernetes (default) | No special network requirements | +| **Required Operators** | HCO, SSP, CDI | Standard OpenShift Virtualization operators | +| **Platform** | AWS | ARM64 workers available on AWS | +| **Special Configurations** | N/A | No special configurations required | #### **5.5. Testing Tools & Frameworks** @@ -246,16 +321,15 @@ Document any **new or additional** testing tools, frameworks, or infrastructure |:-------------------|:------------------------------| | **Test Framework** | MultiArch cluster, HA cluster | | **CI/CD** | Jenkins pipeline | -| **Other Tools** | | +| **Other Tools** | N/A | #### **6. Entry Criteria** The following conditions must be met before testing can begin: -- [] Requirements and design documents are **approved and merged** -- [] Test environment can be **set up and configured** -- [] multi-cpu architecture enabled in openshift-virtualization repo -- [ ] [ֿAdd feature-specific entry criteria as needed] +- [X] VEP [dic-on-heterogeneous-cluster](https://github.com/kubevirt/enhancements/tree/main/veps/sig-storage/dic-on-heterogeneous-cluster) is **approved and merged** +- [x] Test environment (MultiArch cluster) can be **set up and configured** +- [ ] Multi-CPU architecture support enabled in openshift-virtualization repo #### **7. Risks and Limitations** @@ -263,19 +337,24 @@ Document specific risks and limitations for this feature. If a risk category is **Note:** Empty "Specific Risk" cells mean this must be filled. "N/A" means explicitly not applicable with justification. -| Risk Category | Specific Risk for This Feature | Mitigation Strategy | Status | -|:-----------------------------------|:--------------------------------------------------------------------------------|:-----------------------------------------------------------------------------------------------|:-------| -| Timeline/Schedule | Code-Freeze in 2 weeks | prioritize this week | [ ] | -| Test Coverage | Cannot perform automation testing until virt team enable multi-cpu architecture | | [ ] | -| Test Environment | Cannot test upgrade well until additional ARM64 worker will be added by Devops | Try to do so manually | [ ] | -| Untestable Aspects | N/A | | [ ] | -| Resource Constraints | Feature includes almost all CNV teams | [Your mitigation, e.g., "Focus automation on critical paths, coordinate with dev for testing"] | [ ] | -| Dependencies | SSP, Storage and Virt team finishing their part | Sync | [ ] | -| Blocker Bug for legacy DataSources | CNV-75762 | Already fixed, Storage QE need to verify | [ ] | +| Risk Category | Specific Risk for This Feature | Mitigation Strategy | Status | +|:-----------------------------------|:--------------------------------------------------------------------------------|:--------------------------------------------------------------------|:-------| +| Timeline/Schedule | Code-Freeze in 2 weeks | Prioritize HCO-specific testing this week | [ ] | +| Test Coverage | Cannot perform automation testing until virt team enable multi-cpu architecture | Coordinate with virt team on timeline; prepare test code in advance | [ ] | +| Test Environment | Cannot test upgrade well until additional ARM64 worker added by DevOps | Manual testing as fallback | [ ] | +| Untestable Aspects | N/A | N/A | [x] | +| Resource Constraints | Feature spans multiple CNV teams (IUO, SSP, Storage, Virt) | Focus automation on HCO-specific paths; coordinate with other teams | [ ] | +| Dependencies | SSP, Storage, and Virt teams must complete their implementations | Regular sync meetings; track dependencies in Jira | [ ] | +| Blocker Bug for legacy DataSources | CNV-75762 | Already fixed; Storage QE to verify | [ ] | #### **8. Known Limitations** -Document any known limitations, constraints, or trade-offs in the feature implementation or testing approach. +| Limitation | Description | Impact | +|:--------------------------------------|:---------------------------------------------------------------------------------------|:---------------------------------------------| +| Existing VMs not updated | VMs already running will not automatically use new architecture-specific resources | Users must recreate VMs to use new resources | +| Platform variant format not supported | Format like `linux/arm64/v8` is not supported in this phase | Future enhancement if needed | +| Manual annotation in HCO image build | `ssp.kubevirt.io/dict.architectures` annotation is set manually during HCO image build | May be automated in future phases | +| Architecture validation not performed | Custom golden image architectures are not validated by the system | Users responsible for correct configuration | --- @@ -283,16 +362,21 @@ Document any known limitations, constraints, or trade-offs in the feature implem This section links requirements to test coverage, enabling reviewers to verify all requirements are tested. -| Requirement ID | Requirement Summary | Test Scenario(s) | Test Type(s) | Priority | -|:---------------|:----------------------|:-----------------------------------------------------------------------------|:-----------------------|:---------| -| [Jira-xxx] | As a user... | HCO listing supported archs correctly | Functional | P0 | -| [Jira-xxx] | As an admin... | HCO annotating dataImportCronTemplates with supported archs | Functional | P0 | -| [Jira-xxx] | NFR-2 (Security) | Correct resources created and ready to use for common golden images per arch | Functional | P1 | -| [Jira-xxx] | As a cluster admin... | Correct resources created and ready to use for custom golden images per arch | Functional | P1 | -| [Jira-xxx] | As a cluster admin... | nodePlacement should determine the workload nodes architecture | Regression | P1 | -| [Jira-xxx] | As a cluster admin... | nodePlacement of not existing arch | Regression | P1 | -| [Jira-xxx] | As a cluster admin... | Legacy Datasource points to the arch specific Datasource | Backward Compatibility | P1 | -| [Jira-xxx] | As a cluster admin... | related resources names preserved as expected after upgrade | Upgrade | P1 | +| Requirement ID | User Story | Test Scenario | Test Type(s) | Priority | +|:---------------|:-----------|:--------------------------------------------------------------------------------------------------------|:-----------------------|:---------| +| | US-1 | Enable FG and verify feature activates | Functional | P0 | +| | US-1 | Verify `status.nodeInfo.workloadsArchitectures` lists correct architectures | Functional | P0 | +| | US-1 | Verify `status.nodeInfo.controlPlaneArchitectures` lists correct architectures | Functional | P0 | +| | US-1, US-2 | Verify `ssp.kubevirt.io/dict.architectures` annotation is set on DataImportCronTemplates | Functional | P0 | +| | US-1 | Verify unsupported architectures are filtered from annotations and presented in status | Functional | P1 | +| | US-2 | Verify custom golden images get correct architecture annotations | Functional | P1 | +| | US-1 | Verify `spec.workloads.nodePlacement` overrides workload node detection | Regression | P1 | +| | US-1 | Verify behavior with `nodePlacement` specifying non-existing architecture | Regression | P1 | +| | US-3 | Verify legacy `DataSource` points to architecture-specific `DataSource` | Backward Compatibility | P1 | +| | US-3 | Verify resource names preserved after upgrade | Upgrade | P1 | +| | US-1 | Verify `HCOMultiArchGoldenImagesDisabled` alert fires on heterogeneous cluster with FG disabled | Functional | P2 | +| | US-2 | Verify `HCOGoldenImageWithNoArchitectureAnnotation` alert fires when custom DICT missing annotation | Functional | P2 | +| | US-2 | Verify `HCOGoldenImageWithNoSupportedArchitecture` alert fires when DICT has no cluster-supported arch | Functional | P2 | --- From a02e6e9bbec08c3c3aa8c9146d9ae7216e2b1119 Mon Sep 17 00:00:00 2001 From: Harel Meir Date: Mon, 12 Jan 2026 13:51:20 +0200 Subject: [PATCH 3/8] Align with new template Signed-off-by: Harel Meir --- .../sig-iuo/golden_image_multiarch_support.md | 553 ++++++++---------- 1 file changed, 231 insertions(+), 322 deletions(-) diff --git a/stps/sig-iuo/golden_image_multiarch_support.md b/stps/sig-iuo/golden_image_multiarch_support.md index cc9a253..57237e7 100644 --- a/stps/sig-iuo/golden_image_multiarch_support.md +++ b/stps/sig-iuo/golden_image_multiarch_support.md @@ -1,6 +1,6 @@ # Openshift-virtualization-tests Test plan -## Golden Images Support For Heterogeneous Clusters - Quality Engineering Plan** +## HCO support for heterogeneous multi-arch clusters (golden images support) - Quality Engineering Plan ### **Metadata & Tracking** @@ -16,314 +16,209 @@ --- -### **I. Motivation and Requirements Review (QE Review Guidelines)** - -This section documents the mandatory QE review process. The goal is to understand the feature's value, technology, and testability prior to formal test planning. - -#### **1. Requirement & User Story Review Checklist** - -| Check | Done | Details/Notes | Comments | -|:---------------------------------------|:-----|:----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:---------| -| **Review Requirements** | [V] | Reviewed the relevant requirements. | | -| **Understand Value** | [V] | Confirmed clear user stories and understood.
Understand the difference between U/S and D/S requirements
**What is the value of the feature for RH customers**. | | -| **Customer Use Cases** | [V] | Ensured requirements contain relevant **customer use cases**. | | -| **Testability** | [V] | Confirmed requirements are **testable and unambiguous**. | | -| **Acceptance Criteria** | [v] | Ensured acceptance criteria are **defined clearly** (clear user stories; D/S requirements clearly defined in Jira). | | -| **Non-Functional Requirements (NFRs)** | [V] | Confirmed coverage for NFRs, including Performance, Security, Usability, Downtime, Connectivity, Monitoring (alerts/metrics), Scalability, Portability (e.g., cloud support), and Docs. | | - ---- - -##### **Background** +**Document Conventions (if applicable):** [Define acronyms or terms specific to this document] -**What are Golden Images?** +| Term | Definition | +|:----------------------|:---------------------------------------------------------------------------------------------------------| +| **MultiArch cluster** | Heterogeneous cluster with 3 amd64 control-plane nodes, 2 amd64 worker nodes, and 1-2 arm64 worker nodes | +| **HA cluster** | Homogeneous cluster with 3 control-plane nodes and 3 amd64 worker nodes | +| **MultiArch FG** | `enableMultiArchBootImageImport` feature gate in HCO CR. | +| **Related resources** | Golden image associated resources: `DataImportCron`, `DataSource`, `DataVolume`, `VolumeSnapshot` | +| **nodeInfo** | HCO status field tracking cluster architectures (`status.nodeInfo`) | -Golden images are pre-configured OS boot disk images used to create virtual machines (VMs) in OpenShift Virtualization. They serve as ready-to-use templates that are automatically available and kept up-to-date, eliminating the need for users to manually configure OS images for each VM. +### **Feature Overview** -The original golden images design (documented in the [kubevirt/community repository](https://github.com/kubevirt/community/blob/69d061862e0839608932d225a728a7a6e7a89f29/design-proposals/golden-image-delivery-and-update-pipeline.md)) was built for **homogeneous clusters** where all nodes share the same CPU architecture. + -**The Problem: Heterogeneous Clusters** - -Modern Kubernetes clusters increasingly run nodes with different CPU architectures (e.g., `amd64`, `arm64`, `s390x`). The current golden images implementation does not support this: - -1. **Wrong architecture image selection**: When the CDI-importer pulls a multi-architecture image, it selects the image variant matching the *importer pod's node architecture*, not the target VM's architecture. For example, if the importer runs on an `arm64` node but the VM will run on `amd64`, the wrong image is imported and the VM fails to boot. - -2. **No custom multi-arch support**: Cluster administrators cannot define custom golden images that work across multiple architectures. +This feature enables golden images support for heterogeneous clusters, where nodes may have different CPU architectures. It allows customers to create persistent virtual machines with specific architectures by automatically managing architecture-specific `DataImportCron` and `DataSource` resources for each supported architecture in the cluster. The feature is controlled by the `enableMultiArchBootImageImport` feature gate in the HyperConverged CR, and involves coordination between HCO (which tracks cluster node architectures), SSP (which creates architecture-specific templates), and CDI (which imports the correct architecture-specific images). --- -##### **Motivation** - -This feature enables OpenShift Virtualization to properly manage golden images in heterogeneous clusters by: +### **I. Motivation and Requirements Review (QE Review Guidelines)** -- Tracking which CPU architectures exist in the cluster -- Creating architecture-specific `DataSource` resources for each golden image -- Ensuring VMs boot from images matching their target architecture -- Maintaining backward compatibility with existing tooling and scripts +This section documents the mandatory QE review process. The goal is to understand the feature's value, +technology, and testability before formal test planning. ---- +#### **1. Requirement & User Story Review Checklist** -##### **User Stories** + -| ID | User Story | -|:-----|:--------------------------------------------------------------------------------------------------------------------------------------------------------------------| -| US-1 | As a **VM creator**, I want to create virtual machines with specific CPU architectures to run architecture-specific applications. | -| US-2 | As a **cluster administrator**, I want to define custom golden images with multi-architecture support, so VM creators can deploy VMs on any supported architecture. | -| US-3 | As a **VM creator**, I want my existing tools and scripts to continue working without modification after this feature is enabled. | - ---- +| Check | Done | Details/Notes | Comments | +|:---------------------------------------|:-----|:----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:---------------------------------| +| **Review Requirements** | [x] | - Nodes must be properly labeled to differentiate supported architectures
- Allow migration across same-arch nodes
- ARM + x86 workload observability
- VMs must only run on nodes supporting their architecture (e.g., ARM VMs on ARM nodes) | | +| **Understand Value** | [x] | **User Stories & Value:**
1. *As a VM creator*, I want to create VMs with specific architectures to run architecture-specific applications. **Value**: Enable users to create persistent VMs with specific architectures on heterogeneous clusters.
2. *As a cluster admin*, I want to define custom golden images with multi-architecture support. **Value**: Allow users to define custom golden images with multi-architecture support.
3. *As a cluster admin*, I want custom golden images for specific architectures ensuring VMs run only on matching nodes. **Value**: Reliable VM deployment by automatically matching image architectures to compatible node hardware without breaking existing workflows.
4. *As a VM creator*, I want my existing tools and scripts to continue working without changes. **Value**: Backward compatibility for users with existing scripts/tools referencing specific DataSource CRs. | | +| **Customer Use Cases** | [x] | **UC1 - Hybrid Development/Testing**: Developer builds app targeting x86 servers and ARM edge devices, running test VMs for both architectures in a single cluster.
**UC2 - Edge + Data Center Integration**: Operator provisions ARM VMs for edge workloads and x86 VMs for core workloads within the same management plane.
**UC3 - ISV Application Validation**: QA team runs ARM VM test environments in the same cluster used for x86-based CI/CD pipelines. | | +| **Testability** | [X] | Everything is testable, despite upgrade to a version which this FG enabled by default. | Should be done in 4.22 timeframe | +| **Acceptance Criteria** | [x] | - HCO must consistently report accurate node architectures
- common golden images should only be annotated with architectures that are actually supported
- Related resources should be created only for golden images annotated with architectures that are actually supported
- custom golden images without arch-annotation should be backward-compatible
-Non supported architectures shouldn't result in resource creation
-Legacy Datasources should be backward-compatible
- Trigger alert when a golden image is annotated with an unsupported architecture
- Trigger alert when running on a multi-arch cluster while Multiarch FG is disabled
- Trigger alert when a custom golden image lacks an architecture annotation
- VMs migrate across worker nodes of the same architecture during upgrades | | +| **Non-Functional Requirements (NFRs)** | [x] | - **Usability**: New boot sources and VMs creation
- **Monitoring**: 3 new alerts
- **Regression**: Run IUO T2 tests (must-gather, nodeplacement, golden images tests in particular)
- **Doc**: Should be documented by doc team | | #### **2. Technology and Design Review** -| Check | Done | Details/Notes | Comments | -|:---------------------------------|:-----|:----------------------------------------------------------------------------------|:-------------------------| -| **Developer Handoff/QE Kickoff** | [V] | Met with Nahshon from HCO team. | | -| **Technology Challenges** | [V] | CDI-importer picking the correct image arch, custom templates arch specific | | -| **Test Environment Needs** | [V] | HA cluster for simple functionality testing, MultiArch cluster for full coverage. | Jenkins deploy job exist | -| **API Extensions** | [V] | HCO `status.nodeInfo`, `status.dataImportCronTemplates` with new fields | | -| **Topology Considerations** | [V] | ARM64 and S390X VMs are supported since CNV-4.19 | | - -#### **HCO Operator Changes** - -📖 **Feature Documentation:** [Golden Images in Heterogeneous Clusters](https://docs.redhat.com/en/documentation/openshift_container_platform/4.20/html/virtualization/advanced-vm-creation#virt-golden-image-heterogeneous-clusters) -The HyperConverged Cluster Operator (HCO) is highly affected by this feature. The following changes are introduced: - -##### **New Feature Gate** - -| Item | Details | -|:------------------|:---------------------------------------------------------------------------| -| **Feature Gate** | `enableMultiArchBootImageImport` | -| **Default Value** | `false` (disabled) | -| **Purpose** | Enables heterogeneous cluster support for golden images when set to `true` | - -##### **New Status Fields** - -HCO `status` is extended with a `nodeInfo` field that tracks cluster node architectures. for example: - -```yaml -status: - nodeInfo: - controlPlaneArchitectures: - - amd64 - workloadsArchitectures: - - amd64 - - arm64 -``` - -| Field | Description | -|:----------------------------|:---------------------------------------------------------| -| `controlPlaneArchitectures` | List of CPU architectures present on control-plane nodes | -| `workloadsArchitectures` | List of CPU architectures present on workload nodes | - -##### **DataImportCronTemplates Enhancements** - -Each `DataImportCronTemplate` in the HCO status is enhanced with: - -| Field/Annotation | Description | -|:-------------------------------------|:--------------------------------------------------------------------------------------------------------| -| `ssp.kubevirt.io/dict.architectures` | Annotation listing architectures supported by the image (filtered to cluster's available architectures) | -| `originalSupportedArchitectures` | Status field showing original architectures from the image manifest | -| `conditions` | Status conditions for issues (e.g., `UnsupportedArchitectures`) | - -Example status with an unsupported architecture: - -```yaml -status: - dataImportCronTemplates: - - metadata: - name: centos-stream10-image-cron - annotations: - ssp.kubevirt.io/dict.architectures: "amd64" - status: - originalSupportedArchitectures: "amd64,arm64,s390x" - conditions: - - type: Deployed - status: "False" - reason: UnsupportedArchitectures - message: "DataImportCronTemplate has no supported architectures for the current cluster" -``` - -##### **Node Architecture Tracking** - -| Behavior | Description | -|:----------------------------|:-----------------------------------------------------------------------------------------------| -| **Workload Node Detection** | By default, nodes labeled with `node-role.kubernetes.io/worker` | -| **Custom Node Placement** | If `spec.workloads.nodePlacement` is set, HCO uses it to determine workload nodes | -| **Control-Plane Detection** | Nodes labeled with `node-role.kubernetes.io/control-plane` or `node-role.kubernetes.io/master` | -| **Dynamic Updates** | Architecture lists are updated automatically as nodes are added/removed | - -##### **Architecture Filtering** - -When the feature gate is enabled, HCO: - -1. Reads the `ssp.kubevirt.io/dict.architectures` annotation from each predefined `DataImportCronTemplate` -2. Filters out architectures not present in the cluster's workload nodes -3. Propagates the filtered list to the SSP CR for `DataImportCron` and `DataSource` creation - -##### **Monitoring & Alerts** - -New alerts are introduced to help administrators manage golden images in heterogeneous clusters: - -| Alert | Description | -|:-----------------------------------------------------------------------------------------------------------------------------------|:--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| -| [`HCOMultiArchGoldenImagesDisabled`](https://kubevirt.io/monitoring/runbooks/HCOMultiArchGoldenImagesDisabled.html) | Fires when the cluster has workload nodes with different CPU architectures but `enableMultiArchBootImageImport` is disabled. Without this feature gate, golden images may be imported with the wrong architecture, causing VMs to fail to start. | -| [`HCOGoldenImageWithNoArchitectureAnnotation`](https://kubevirt.io/monitoring/runbooks/HCOGoldenImageWithNoArchitectureAnnotation) | Fires when a custom `DataImportCronTemplate` is missing the `ssp.kubevirt.io/dict.architectures` annotation (only when FG is enabled). Without this annotation, the golden image has no defined architecture and VMs may fail to start. | -| [`HCOGoldenImageWithNoSupportedArchitecture`](https://kubevirt.io/monitoring/runbooks/HCOGoldenImageWithNoSupportedArchitecture) | Fires when a `DataImportCronTemplate`'s `ssp.kubevirt.io/dict.architectures` annotation doesn't include any architecture supported by the cluster (only when FG is enabled). The golden image won't be created since no cluster nodes can use it. | - -**Examples:** -- **`HCOMultiArchGoldenImagesDisabled`**: A cluster with both `amd64` and `arm64` worker nodes triggers this alert if the feature gate is disabled, warning that VMs may boot from incompatible images. -- **`HCOGoldenImageWithNoArchitectureAnnotation`**: A custom DICT `my-custom-image` defined in `spec.dataImportCronTemplates` without the `ssp.kubevirt.io/dict.architectures` annotation triggers this alert. -- **`HCOGoldenImageWithNoSupportedArchitecture`**: A DICT annotated with `ssp.kubevirt.io/dict.architectures: s390x` on a cluster with only `amd64` and `arm64` workers triggers this alert, and the DICT status shows `reason: UnsupportedArchitectures`. + ---- +| Check | Done | Details/Notes | Comments | +|:---------------------------------|:-----|:-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:-------------------------------------------------------| +| **Developer Handoff/QE Kickoff** | [x] | Met with Nahshon from HCO team | | +| **Technology Challenges** | [x] | Can use HA cluster, but should be verified on Multiarch cluster which is available only for 12 hours | Initial testing on HA, final verification on Multiarch | +| **Test Environment Needs** | [x] | MultiArch cluster, HA cluster | | +| **API Extensions** | [x] | **HCO**: `status.nodeInfo` (controlPlaneArchitectures, workloadsArchitectures), `status.dataImportCronTemplates` (originalSupportedArchitectures, conditions)
**SSP**: `enableMultipleArchitectures`, `cluster` fields (workloadArchitectures, controlPlaneArchitectures)
**CDI**: `platform.architecture` field in `DataVolumeSourceRegistry`, arch-specific `DataSource` (`-`), legacy `DataSource` redirects to arch-specific one | | +| **Topology Considerations** | [x] | Related resources should be created per worker node architecture. Currently its ARM64 and AMD64. | | ### **II. Software Test Plan (STP)** This STP serves as the **overall roadmap for testing**, detailing the scope, approach, resources, and schedule. -**Document Conventions:** - -| Term | Definition | -|:----------------------|:---------------------------------------------------------------------------------------------------------------------------| -| **MultiArch cluster** | Heterogeneous cluster with 3 amd64 control-plane nodes, 2 amd64 worker nodes, and 1-2 arm64 worker nodes | -| **HA cluster** | Homogeneous cluster with 3 control-plane nodes and 3 amd64 worker nodes | -| **FG** | `enableMultiArchBootImageImport` feature gate in HCO CR (see Section I.5.1) | -| **Related resources** | Golden image associated resources: `DataImportCron`, `DataSource`, `DataVolume`, `VolumeSnapshot` | -| **nodeInfo** | HCO status field tracking cluster architectures (`status.nodeInfo`) | - #### **1. Scope of Testing** -This test plan covers scenarios related to HCO, and monitoring: - -* HCO node architecture tracking nodeInfo. -* FG activation and propagation to CNV components. -* DataImportCronTemplates architecture annotations and filtering. -* New alerts & metrics validation. - -**In Scope:** -##### Functional Testing -* HCO monitors the cluster's nodes architectures correctly. -* HCO dataImportCronTemplates annotations listing the correct architectures and propagates to SSP. -* Correct resources created and ready to use for common golden images per arch. -* Correct resources created and ready to use for custom golden images per arch. - -##### Alerts Testing -* `HCOMultiArchGoldenImagesDisabled` fires on heterogeneous cluster with FG disabled. -* `HCOGoldenImageWithNoArchitectureAnnotation` fires when custom DICT missing architecture annotation. -* `HCOGoldenImageWithNoSupportedArchitecture` fires when DICT has no cluster-supported architecture. - -##### Non-Functional Testing -**Regression Testing** -* spec.workloads.nodePlacement should determine the workload nodes architecture. -* spec.workloads.nodePlacement of not existing arch. - -**Backward Compatibility Testing** -* Legacy Datasource points to the arch specific Datasource - -**Upgrade testing** -* Verify that resources names preserved as expected after upgrade. - - -#### **2. Testing Goals** - -Define specific, measurable testing objectives for this feature, such as: - -- [ ] Achieve 100% feature coverage for core functionality. -- [x] Validate feature enablement until related resources created as expected. -- [x] Verify backward compatibility with legacy dataSources. -- [x] Verify NodePlacement. -- [x] Verify negative scenarios. -- [x] Verify new alerts & metrics -- [x] Verify related resources names preserved as expected after upgrade. -- [ ] Automate 100% of functional tests. - - -#### **3. Non-Goals (Testing Scope Exclusions)** - -Explicitly document what is **out of scope** for testing. **Critical:** All non-goals require explicit stakeholder agreement to prevent "I assumed you were testing that" issues. - - -| Non-Goal | Rationale | PM/ Lead Agreement | -|:---------------------------------|:------------------------------------------------------------------|:-------------------| -| Update existing VM | If a VM already running, it won't use the arch-specific resources | [ ] Name/Date | -| Performance Testing | Feature not related to scale | [ ] Name/Date | -| Security Testing | Feature not security related | [ ] Name/Date | -| Usability testing | Should be done by UI team | [ ] Name/Date | -| Compatibility | Should be done by Virt/SSP team(create vms from multiple archs) | [ ] Name/Date | -| Templates creation & utilization | Should be done by SSP team | [ ] Name/Date | -| Import & datasource new API | Should be done by Storage team | [ ] Name/Date | - - -#### **4. Test Strategy** - -##### **A. Types of Testing** - -**Note:** Mark "Y" if applicable, "N/A" if not applicable (with justification in Comments). Empty cells indicate incomplete review. - -| Item (Testing Type) | Applicable (Y/N or N/A) | Comments | -|:-------------------------------|:------------------------|:-------------------------------| -| Functional Testing | Yes | Defined above | -| Automation Testing | Yes | All tests should be automated. | -| Performance Testing | N/A | | -| Security Testing | N/A | | -| Usability Testing | N/A | | -| Compatibility Testing | N/A | | -| Regression Testing | Yes | Defined above | -| Upgrade Testing | Yes | Defined above | -| Backward Compatibility Testing | Yes | Defined above | - -##### **B. Potential Areas to Consider** - -**Note:** Mark "Y" if applicable, "N/A" if not applicable (with justification in Comment). Empty cells indicate incomplete review. - -| Item | Description | Applicable (Y/N or N/A) | Comment | -|:-----------------------|:-------------------------------------------------------------------------------------------------------------------|:------------------------|:-------------------------------| -| **Dependencies** | Dependent on deliverables from other components/products? Identify what is tested by which team. | | | -| **Monitoring** | Does the feature require metrics and/or alerts? | Y | Three new alerts added | -| **Cross Integrations** | Does the feature affect other features/require testing by other components? Identify what is tested by which team. | Y | SSP, storage, virt, upgrade | -| **UI** | Does the feature require UI? If so, ensure the UI aligns with the requirements. | Y | | - -##### **C. Dependencies/Cross Integration** - -This is a cross-team feature for CNV. Testing responsibilities are divided as follows: - -| Team | Testing Responsibility | -|:------------|:-------------------------------------------------------------------------------------------------------------| -| **IUO** | HCO node architecture tracking (`status.nodeInfo`), FG activation/propagation, new metrics & alerts, upgrade | -| **SSP** | Templates creation & utilization, new SSP API (`enableMultipleArchitectures`, `cluster` fields) | -| **Storage** | CDI-importer architecture selection, legacy `DataSource` backward compatibility, new CDI `platform` API | -| **Virt** | VM scheduling to correct architecture nodes, VM migration between same-arch nodes, upgrade | - -#### **5. Test Environment** - -**Note:** "N/A" means explicitly not applicable. Cannot leave empty cells. - -| Environment Component | Configuration | Comments | -|:----------------------------------------------|:-------------------------|:---------------------------------------------------------------| -| **Cluster Topology** | MultiArch cluster | 3 control-plane and 3-4 worker nodes | -| **OCP & OpenShift Virtualization Version(s)** | OCP 4.21, CNV-4.21 | OCP 4.21 and OpenShift Virtualization 4.21 | -| **CPU Virtualization** | Multi-arch cluster | 3 amd64 control-plane, 2 amd64 workers, and 1-2 arm64 workers | -| **Compute Resources** | N/A | No special compute requirements | -| **Special Hardware** | N/A | No special hardware required | -| **Storage** | io2-csi storage class | AWS EBS io2 CSI driver | -| **Network** | OVN-Kubernetes (default) | No special network requirements | -| **Required Operators** | HCO, SSP, CDI | Standard OpenShift Virtualization operators | -| **Platform** | AWS | ARM64 workers available on AWS | -| **Special Configurations** | N/A | No special configurations required | - -#### **5.5. Testing Tools & Frameworks** - -Document any **new or additional** testing tools, frameworks, or infrastructure required specifically for this feature. - -**Note:** Only list tools that are **new** or **different** from standard testing infrastructure. Leave empty if using standard tools. - -| Category | Tools/Frameworks | -|:-------------------|:------------------------------| -| **Test Framework** | MultiArch cluster, HA cluster | -| **CI/CD** | Jenkins pipeline | -| **Other Tools** | N/A | - -#### **6. Entry Criteria** + + +**Testing Goals** + + + + + +**Functional Goals**: +- **[P0]** Verify HCO monitors the cluster's nodes architectures correctly, and updated in addition/removal of nodes. +- **[P0]** Verify golden images are annotated only with architectures that are actually supported in HCO+SSP. +- **[P0]** Verify related resources created only for golden images annotated with supported architecture, named with the architecture suffix, and are ready to use. +- **[P1]** Verify Golden images annotated only with unsupported architectures should present the fail status in HCO dataImportCronTemplates status. + + +**Monitoring Goals**: +- **[P1]** Verify alert fired when golden image is annotated with an unsupported architecture. +- **[P1]** Verify alert fired when running on a multi-arch cluster while Multiarch FG is disabled. +- **[P1]** Verify alert fired when a custom golden image lacks an architecture annotation + +**Backward compatibility Goals**: +- **[P0]** Verify Legacy Datasources points to default arch-annotated Datasources. +- **[P0]** Verify nodePlacement affects related resources creation. + +**Upgrade goals** +- **[P0]** Verify ARM64 and AMD64 vms are migrated across worker nodes of the same architecture during upgrades +- **[P0]** Verify related resources preserved after upgrade. +- **[P1]** Verify the functional test post-upgrade to version when FG is enabled by default. + + + + +**Out of Scope (Testing Scope Exclusions)** + + + +| Non-Goal | Rationale | PM/ Lead Agreement | +|:---------------------------------|:---------------------------------------------------------------------|:-------------------| +| Update existing VM | If a VM is already running, it won't use the arch-specific resources | [ ] Name/Date | +| Performance Testing | Feature not scale related | [ ] Name/Date | +| Security Testing | Feature not security related | [ ] Name/Date | +| Usability testing | Should be done by UI team | [ ] Name/Date | +| Compatibility | Should be done by Virt/SSP team(create vms from multiple archs) | [ ] Name/Date | +| Templates creation & utilization | Should be done by SSP team | [ ] Name/Date | +| Imports & datasource new API | Should be done by Storage team | [ ] Name/Date | +| Testing with s390x architecture | The feature is "Multiarch Support enablement for ARM" | [ ] Name/Date | + +#### **2. Test Strategy** + + + +| Item | Description | Applicable (Y/N or N/A) | Comments | +|:-------------------------------|:-------------------------------------------------------------------------------------------------------------------------------------------------------------|:------------------------|:------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| +| Functional Testing | Validates that the feature works according to specified requirements and user stories | Y | nodes architecture monitoring, arch-annotations, related resources creation | +| Automation Testing | Ensures test cases are automated for continuous integration and regression coverage | Y | All test cases should be automated at openshift-virtualization-tests repo. | +| Performance Testing | Validates feature performance meets requirements (latency, throughput, resource usage) | N/A | Not related to scale. | +| Security Testing | Verifies security requirements, RBAC, authentication, authorization, and vulnerability scanning | N/A | Not related to security. | +| Usability Testing | Validates user experience, UI/UX consistency, and accessibility requirements. Does the feature require UI? If so, ensure the UI aligns with the requirements | Y | [UI/UX design doc](https://docs.google.com/document/d/18UKIXiAlyLTABQZdvDD5N85A6uM2CdBbif4eN1dVj-0/edit?usp=sharing) specify requirements.
Done by UI team [CNV-62535](https://issues.redhat.com/browse/CNV-62535) | +| Compatibility Testing | Ensures feature works across supported platforms, versions, and configurations | N/A | Should be done by SSP/Virt team | +| Regression Testing | Verifies that new changes do not break existing functionality | N/A | | +| Upgrade Testing | Validates upgrade paths from previous versions, data migration, and configuration preservation | Y | VMs migrated and updated successfully, related resources preserved. | +| Backward Compatibility Testing | Ensures feature maintains compatibility with previous API versions and configurations | Y | Legacy Datasource pointers, custom golden images without arch annotation. | +| Dependencies | Dependent on deliverables from other components/products? Identify what is tested by which team. | N/A | | +| Cross Integrations | Does the feature affect other features/require testing by other components? Identify what is tested by which team. | Y | **IUO**: HCO node architecture tracking (`status.nodeInfo`), FG activation/propagation, new metrics & alerts, upgrade
**SSP**: Templates creation & utilization, new SSP API (`enableMultipleArchitectures`, `cluster` fields)
**Storage**: CDI-importer architecture selection, legacy `DataSource` backward compatibility, new CDI `platform` API
**Virt**: VM scheduling to correct architecture nodes, VM migration between same-arch nodes, upgrade | +| Monitoring | Does the feature require metrics and/or alerts? | Y | [`HCOMultiArchGoldenImagesDisabled`](https://kubevirt.io/monitoring/runbooks/HCOMultiArchGoldenImagesDisabled.html),
[`HCOGoldenImageWithNoArchitectureAnnotation`](https://kubevirt.io/monitoring/runbooks/HCOGoldenImageWithNoArchitectureAnnotation),
[`HCOGoldenImageWithNoSupportedArchitecture`](https://kubevirt.io/monitoring/runbooks/HCOGoldenImageWithNoSupportedArchitecture) | +| Cloud Testing | Does the feature require multi-cloud platform testing? Consider cloud-specific features. | N/A | not related to cloud | + +#### **3. Test Environment** + + + +| Environment Component | Configuration | Comments | +|:----------------------------------------------|:-------------------------|:--------------------------------------------------------------| +| **Cluster Topology** | MultiArch cluster | 3 control-plane and 3-4 worker nodes | +| **OCP & OpenShift Virtualization Version(s)** | OCP 4.21, CNV-4.21 | OCP 4.21 and OpenShift Virtualization 4.21 | +| **CPU Virtualization** | Multi-arch cluster | 3 amd64 control-plane, 2 amd64 workers, and 1-2 arm64 workers | +| **Compute Resources** | N/A | No special compute requirements | +| **Special Hardware** | N/A | No special hardware required | +| **Storage** | io2-csi storage class | AWS EBS io2 CSI driver | +| **Network** | OVN-Kubernetes (default) | No special network requirements | +| **Required Operators** | N/A | N/A | +| **Platform** | AWS | ARM64 workers available on AWS | +| **Special Configurations** | N/A | No special configurations required | + +#### **3.1. Testing Tools & Frameworks** + + + +| Category | Tools/Frameworks | +|:-------------------|:------------------| +| **Test Framework** | MultiArch cluster | +| **CI/CD** | | +| **Other Tools** | | + +#### **4. Entry Criteria** The following conditions must be met before testing can begin: @@ -331,52 +226,66 @@ The following conditions must be met before testing can begin: - [x] Test environment (MultiArch cluster) can be **set up and configured** - [ ] Multi-CPU architecture support enabled in openshift-virtualization repo -#### **7. Risks and Limitations** +#### **5. Risks** + + -Document specific risks and limitations for this feature. If a risk category is not applicable, mark as "N/A" with justification in mitigation strategy. +| Risk Category | Specific Risk for This Feature | Mitigation Strategy | Status | +|:-----------------------------------|:------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:--------------------------------------------------------------------------------------------------------------------|:-------| +| Timeline/Schedule | Code-Freeze this week | Prioritize HCO-specific testing this week. Upgrade automation can wait, since its impacting 4.22 anyway | [ ] | +| Test Coverage | | | [X] | +| Test Environment | Requires additional ARM64 workers node - [Jira ticket](https://issues.redhat.com/browse/CNV-73894) | Can be done manually | [ ] | +| Untestable Aspects | | | [X] | +| Resource Constraints | N/A | N/A | [ ] | +| Dependencies | Allowing multi-cpu architecture on openshift-virtualization-tests | Review the [PR](https://github.com/RedHatQE/openshift-virtualization-tests/pull/3147) whenever its ready to review. | [ ] | +| Blocker Bug for legacy DataSources | [CNV-75762](https://issues.redhat.com/browse/CNV-75762) | on POST - Storage QE to verify | [ ] | +| Other non-blocker bugs | 1. [[UI] architecture is incorrect for fedora arm and inconsistent on UI for other os](https://issues.redhat.com/browse/CNV-68981)
2. [[Storage] Arch-specific DataSources (arm64) persist after removing arm64 nodes](https://issues.redhat.com/browse/CNV-68996)
3. [[Storage] Bootable volumes are re-imported after set enableMultiArchBootImageImport to true for AMD64](https://issues.redhat.com/browse/CNV-75084) | | [ ] | -**Note:** Empty "Specific Risk" cells mean this must be filled. "N/A" means explicitly not applicable with justification. +#### **6. Known Limitations** + + -| Risk Category | Specific Risk for This Feature | Mitigation Strategy | Status | -|:-----------------------------------|:--------------------------------------------------------------------------------|:--------------------------------------------------------------------|:-------| -| Timeline/Schedule | Code-Freeze in 2 weeks | Prioritize HCO-specific testing this week | [ ] | -| Test Coverage | Cannot perform automation testing until virt team enable multi-cpu architecture | Coordinate with virt team on timeline; prepare test code in advance | [ ] | -| Test Environment | Cannot test upgrade well until additional ARM64 worker added by DevOps | Manual testing as fallback | [ ] | -| Untestable Aspects | N/A | N/A | [x] | -| Resource Constraints | Feature spans multiple CNV teams (IUO, SSP, Storage, Virt) | Focus automation on HCO-specific paths; coordinate with other teams | [ ] | -| Dependencies | SSP, Storage, and Virt teams must complete their implementations | Regular sync meetings; track dependencies in Jira | [ ] | -| Blocker Bug for legacy DataSources | CNV-75762 | Already fixed; Storage QE to verify | [ ] | +- Existing VMs are not updated: VMs that are already running will not automatically use new architecture-specific resources. Users must recreate VMs to take advantage of new resources. +- Platform variant format not supported: Formats like `linux/arm64/v8` are not supported at this time; this may be enhanced in future releases if needed. +- Architecture validation not performed: The system does not validate custom golden image architectures. It is the user's responsibility to ensure correct configuration. -#### **8. Known Limitations** -| Limitation | Description | Impact | -|:--------------------------------------|:---------------------------------------------------------------------------------------|:---------------------------------------------| -| Existing VMs not updated | VMs already running will not automatically use new architecture-specific resources | Users must recreate VMs to use new resources | -| Platform variant format not supported | Format like `linux/arm64/v8` is not supported in this phase | Future enhancement if needed | -| Manual annotation in HCO image build | `ssp.kubevirt.io/dict.architectures` annotation is set manually during HCO image build | May be automated in future phases | -| Architecture validation not performed | Custom golden image architectures are not validated by the system | Users responsible for correct configuration | --- ### **III. Test Scenarios & Traceability** -This section links requirements to test coverage, enabling reviewers to verify all requirements are tested. - -| Requirement ID | User Story | Test Scenario | Test Type(s) | Priority | -|:---------------|:-----------|:--------------------------------------------------------------------------------------------------------|:-----------------------|:---------| -| | US-1 | Enable FG and verify feature activates | Functional | P0 | -| | US-1 | Verify `status.nodeInfo.workloadsArchitectures` lists correct architectures | Functional | P0 | -| | US-1 | Verify `status.nodeInfo.controlPlaneArchitectures` lists correct architectures | Functional | P0 | -| | US-1, US-2 | Verify `ssp.kubevirt.io/dict.architectures` annotation is set on DataImportCronTemplates | Functional | P0 | -| | US-1 | Verify unsupported architectures are filtered from annotations and presented in status | Functional | P1 | -| | US-2 | Verify custom golden images get correct architecture annotations | Functional | P1 | -| | US-1 | Verify `spec.workloads.nodePlacement` overrides workload node detection | Regression | P1 | -| | US-1 | Verify behavior with `nodePlacement` specifying non-existing architecture | Regression | P1 | -| | US-3 | Verify legacy `DataSource` points to architecture-specific `DataSource` | Backward Compatibility | P1 | -| | US-3 | Verify resource names preserved after upgrade | Upgrade | P1 | -| | US-1 | Verify `HCOMultiArchGoldenImagesDisabled` alert fires on heterogeneous cluster with FG disabled | Functional | P2 | -| | US-2 | Verify `HCOGoldenImageWithNoArchitectureAnnotation` alert fires when custom DICT missing annotation | Functional | P2 | -| | US-2 | Verify `HCOGoldenImageWithNoSupportedArchitecture` alert fires when DICT has no cluster-supported arch | Functional | P2 | + + + + +| Requirement ID | Requirement Summary | Test Scenario(s) | Tier | Priority | +|:---------------|:------------------------------------------------------------------------------------------------------------------------|:--------------------------------------------------------------------------------------------------------------------------------------------------------|:-------|:---------| +| | As a VM creator, I want to create VMs with specific architectures to run architecture-specific applications | Verify HCO monitors the cluster's nodes architectures correctly, and updated in addition/removal of nodes | Tier 1 | P0 | +| | As a cluster admin, I want to define custom golden images with multi-architecture support | Verify golden images are annotated only with architectures that are actually supported in HCO+SSP | Tier 1 | P0 | +| | As a VM creator/cluster admin, I want to create VMs / custom golden images for specific architectures | Verify related resources created only for golden images annotated with supported architecture, named with the architecture suffix, and are ready to use | Tier 2 | P0 | +| | As a cluster admin, I want to define custom golden images for specific architectures ensuring VMs run on matching nodes | Verify golden images annotated only with unsupported architectures present the fail status in HCO dataImportCronTemplates status | Tier 1 | P1 | +| | As a cluster admin, I want custom golden images for specific architectures ensuring VMs run only on matching nodes | Verify alert fired when golden image is annotated with an unsupported architecture | Tier 1 | P1 | +| | As a VM creator/cluster admin, I want to create VMs / custom golden images for specific architectures | Verify alert fired when running on a multi-arch cluster while Multiarch FG is disabled | Tier 1 | P1 | +| | As a cluster admin, I want to define custom golden images with multi-architecture support | Verify alert fired when a custom golden image lacks an architecture annotation | Tier 1 | P1 | +| | As a VM creator, I want my existing tools and scripts to continue working without changes | Verify legacy Datasources point to default arch-annotated Datasources | Tier 2 | P0 | +| | As a cluster admin, I want custom golden images for specific architectures ensuring VMs run only on matching nodes | Verify nodePlacement affects related resources creation | Tier 2 | P0 | +| | As a VM creator, I want to create VMs with specific architectures to run architecture-specific applications | Verify ARM64 and AMD64 VMs are migrated across worker nodes of the same architecture during upgrades | Tier 2 | P0 | +| | As a VM creator/cluster admin, I want to create VMs / custom golden images for specific architectures | Verify related resources preserved after upgrade | Tier 2 | P0 | +| | As a VM creator/cluster admin, I want to create VMs / define custom golden images with multi-architecture support | Verify the functional tests post-upgrade to version when Multiarch FG is enabled by default | Tier 2 | P1 | --- From bd69bf2dd1e5c8b5d46f6f69c9ad5a3a996b0e15 Mon Sep 17 00:00:00 2001 From: Harel Meir Date: Mon, 16 Feb 2026 08:23:06 +0200 Subject: [PATCH 4/8] Update additional test cases & address comments Signed-off-by: Harel Meir --- .../sig-iuo/golden_image_multiarch_support.md | 112 +++++++++++------- 1 file changed, 66 insertions(+), 46 deletions(-) diff --git a/stps/sig-iuo/golden_image_multiarch_support.md b/stps/sig-iuo/golden_image_multiarch_support.md index 57237e7..b83f0c2 100644 --- a/stps/sig-iuo/golden_image_multiarch_support.md +++ b/stps/sig-iuo/golden_image_multiarch_support.md @@ -11,7 +11,7 @@ | **Jira Tracking** | [CNV-67900](https://issues.redhat.com/browse/CNV-67900) | | **QE Owner(s)** | Harel Meir | | **Owning SIG** | sig-iuo | -| **Participating SIGs** | sig-infra, sig-storage, sig-virt | +| **Participating SIGs** | sig-infra, sig-storage, sig-virt, sig-network | | **Current Status** | Draft | --- @@ -31,7 +31,7 @@ -This feature enables golden images support for heterogeneous clusters, where nodes may have different CPU architectures. It allows customers to create persistent virtual machines with specific architectures by automatically managing architecture-specific `DataImportCron` and `DataSource` resources for each supported architecture in the cluster. The feature is controlled by the `enableMultiArchBootImageImport` feature gate in the HyperConverged CR, and involves coordination between HCO (which tracks cluster node architectures), SSP (which creates architecture-specific templates), and CDI (which imports the correct architecture-specific images). +This feature enables golden images support for heterogeneous clusters, where nodes may have different CPU architectures. It allows customers to create persistent virtual machines with specific architectures by automatically managing architecture-specific `DataImportCron` and `DataSource` resources for each supported architecture in the cluster. The feature is controlled by the `enableMultiArchBootImageImport` feature gate in the HyperConverged CR, and involves coordination between HCO (which tracks cluster node architectures), SSP (which creates architecture-specific templates), and CDI (which imports the correct architecture-specific images). For 4.22, this feature gate will be disabled by default. It will move to be enabled by default based on test stability, customer usage, and other factors, not earlier than 4.23. --- @@ -47,14 +47,14 @@ technology, and testability before formal test planning. 2. **Details/Notes column**: Summary of the topic (e.g., list key requirements, describe customer value, note acceptance criteria) 3. **Comments column**: Document any concerns, gaps, or follow-up items needed --> -| Check | Done | Details/Notes | Comments | -|:---------------------------------------|:-----|:----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:---------------------------------| -| **Review Requirements** | [x] | - Nodes must be properly labeled to differentiate supported architectures
- Allow migration across same-arch nodes
- ARM + x86 workload observability
- VMs must only run on nodes supporting their architecture (e.g., ARM VMs on ARM nodes) | | -| **Understand Value** | [x] | **User Stories & Value:**
1. *As a VM creator*, I want to create VMs with specific architectures to run architecture-specific applications. **Value**: Enable users to create persistent VMs with specific architectures on heterogeneous clusters.
2. *As a cluster admin*, I want to define custom golden images with multi-architecture support. **Value**: Allow users to define custom golden images with multi-architecture support.
3. *As a cluster admin*, I want custom golden images for specific architectures ensuring VMs run only on matching nodes. **Value**: Reliable VM deployment by automatically matching image architectures to compatible node hardware without breaking existing workflows.
4. *As a VM creator*, I want my existing tools and scripts to continue working without changes. **Value**: Backward compatibility for users with existing scripts/tools referencing specific DataSource CRs. | | -| **Customer Use Cases** | [x] | **UC1 - Hybrid Development/Testing**: Developer builds app targeting x86 servers and ARM edge devices, running test VMs for both architectures in a single cluster.
**UC2 - Edge + Data Center Integration**: Operator provisions ARM VMs for edge workloads and x86 VMs for core workloads within the same management plane.
**UC3 - ISV Application Validation**: QA team runs ARM VM test environments in the same cluster used for x86-based CI/CD pipelines. | | -| **Testability** | [X] | Everything is testable, despite upgrade to a version which this FG enabled by default. | Should be done in 4.22 timeframe | -| **Acceptance Criteria** | [x] | - HCO must consistently report accurate node architectures
- common golden images should only be annotated with architectures that are actually supported
- Related resources should be created only for golden images annotated with architectures that are actually supported
- custom golden images without arch-annotation should be backward-compatible
-Non supported architectures shouldn't result in resource creation
-Legacy Datasources should be backward-compatible
- Trigger alert when a golden image is annotated with an unsupported architecture
- Trigger alert when running on a multi-arch cluster while Multiarch FG is disabled
- Trigger alert when a custom golden image lacks an architecture annotation
- VMs migrate across worker nodes of the same architecture during upgrades | | -| **Non-Functional Requirements (NFRs)** | [x] | - **Usability**: New boot sources and VMs creation
- **Monitoring**: 3 new alerts
- **Regression**: Run IUO T2 tests (must-gather, nodeplacement, golden images tests in particular)
- **Doc**: Should be documented by doc team | | +| Check | Done | Details/Notes | Comments | +|:---------------------------------------|:-----|:---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:---------------------------------| +| **Review Requirements** | [x] | - Nodes must be properly labeled to differentiate supported architectures
- Allow migration across same-arch nodes
- ARM + x86 workload observability
- VMs must only run on nodes supporting their architecture (e.g., ARM VMs on ARM nodes) | | +| **Understand Value** | [x] | 1. Enable users to create persistent VMs with specific architectures on heterogeneous clusters.
2.Allow users to define custom golden images with multi-architecture support.
3. Reliable VM deployment by automatically matching image architectures to compatible node hardware without breaking existing workflows.
4. Backward compatibility for users with existing scripts/tools referencing specific DataSource CRs. | | +| **Customer Use Cases** | [x] | **UC1 - Hybrid Development/Testing**: Developer builds app targeting x86 servers and ARM edge devices, running test VMs for both architectures in a single cluster.
**UC2 - Edge + Data Center Integration**: Operator provisions ARM VMs for edge workloads and x86 VMs for core workloads within the same management plane.
**UC3 - ISV Application Validation**: QA team runs ARM VM test environments in the same cluster used for x86-based CI/CD pipelines. | | +| **Testability** | [X] | Everything is testable, despite upgrade to a version which this FG enabled by default. (currently disabled by default) | Should be done in 4.23 timeframe | +| **Acceptance Criteria** | [x] | - HCO must consistently report accurate node architectures
- common golden images should only be annotated with architectures that are actually supported
- Related resources should be created only for golden images annotated with architectures that are actually supported
- custom golden images without arch-annotation should be backward-compatible
-Non supported architectures shouldn't result in resource creation
-Legacy Datasources should be backward-compatible
- Trigger alert when a golden image is annotated with an unsupported architecture
- Trigger alert when running on a multi-arch cluster while Multiarch FG is disabled
- Trigger alert when a custom golden image lacks an architecture annotation
- VMs migrate across worker nodes of the same architecture during upgrades | | +| **Non-Functional Requirements (NFRs)** | [x] | - **Usability**: New boot sources and VMs creation
- **Monitoring**: 3 new alerts and metrics
- **Regression**: Run IUO T2 tests (must-gather, nodeplacement, golden images tests in particular)
- **Doc**: Should be documented by doc team | | #### **2. Technology and Design Review** @@ -130,9 +130,12 @@ Each goal should tie back to requirements from Section I and be independently ve **Monitoring Goals**: -- **[P1]** Verify alert fired when golden image is annotated with an unsupported architecture. -- **[P1]** Verify alert fired when running on a multi-arch cluster while Multiarch FG is disabled. -- **[P1]** Verify alert fired when a custom golden image lacks an architecture annotation +- **[P1]** Verify alert `HCOGoldenImageWithNoSupportedArchitecture` fired when golden image is annotated with an unsupported architecture (T1). +- **[P1]** Verify alert `HCOMultiArchGoldenImagesDisabled` fired when running on a multi-arch cluster while Multiarch FG is disabled (T1). +- **[P1]** Verify alert `HCOGoldenImageWithNoArchitectureAnnotation` fired when a custom golden image lacks an architecture annotation (T1). +- **[P1]** Verify metric `kubevirt_hco_dataimportcrontemplate_with_supported_architectures` value is correctly exposed via Prometheus (T2). +- **[P1]** Verify metric `kubevirt_hco_multi_arch_boot_images_enabled` value is correctly exposed via Prometheus (T2). +- **[P1]** Verify metric `kubevirt_hco_dataimportcrontemplate_with_architecture_annotation` value is correctly exposed via Prometheus (T2). **Backward compatibility Goals**: - **[P0]** Verify Legacy Datasources points to default arch-annotated Datasources. @@ -162,10 +165,12 @@ that" issues; each out-of-scope item must have PM/Lead sign-off. | Update existing VM | If a VM is already running, it won't use the arch-specific resources | [ ] Name/Date | | Performance Testing | Feature not scale related | [ ] Name/Date | | Security Testing | Feature not security related | [ ] Name/Date | -| Usability testing | Should be done by UI team | [ ] Name/Date | -| Compatibility | Should be done by Virt/SSP team(create vms from multiple archs) | [ ] Name/Date | -| Templates creation & utilization | Should be done by SSP team | [ ] Name/Date | -| Imports & datasource new API | Should be done by Storage team | [ ] Name/Date | +| Usability testing | Should be done by [UI team](https://issues.redhat.com/browse/CNV-61832) (4.20) | [ ] Name/Date | +| Compatibility | Should be done by [SSP/Infra team](https://issues.redhat.com/browse/CNV-76714) (create vms from multiple archs/templates) | [ ] Name/Date | +| Templates creation & utilization | Should be done by [SSP/Infra team](https://issues.redhat.com/browse/CNV-76714) | [ ] Name/Date | +| Imports & datasource new API | Should be done by [Storage team](https://issues.redhat.com/browse/CNV-76732) | [ ] Name/Date | +| Test VMs migration between same arch nodes | Should be done by [Virt team](https://issues.redhat.com/browse/CNV-26818) | [ ] Name/Date | +| 'defaultCPUModel' integrations | Should be done by [Virt team](https://issues.redhat.com/browse/CNV-26818) | [ ] Name/Date | | Testing with s390x architecture | The feature is "Multiarch Support enablement for ARM" | [ ] Name/Date | #### **2. Test Strategy** @@ -180,14 +185,14 @@ that" issues; each out-of-scope item must have PM/Lead sign-off. | Performance Testing | Validates feature performance meets requirements (latency, throughput, resource usage) | N/A | Not related to scale. | | Security Testing | Verifies security requirements, RBAC, authentication, authorization, and vulnerability scanning | N/A | Not related to security. | | Usability Testing | Validates user experience, UI/UX consistency, and accessibility requirements. Does the feature require UI? If so, ensure the UI aligns with the requirements | Y | [UI/UX design doc](https://docs.google.com/document/d/18UKIXiAlyLTABQZdvDD5N85A6uM2CdBbif4eN1dVj-0/edit?usp=sharing) specify requirements.
Done by UI team [CNV-62535](https://issues.redhat.com/browse/CNV-62535) | -| Compatibility Testing | Ensures feature works across supported platforms, versions, and configurations | N/A | Should be done by SSP/Virt team | -| Regression Testing | Verifies that new changes do not break existing functionality | N/A | | +| Compatibility Testing | Ensures feature works across supported platforms, versions, and configurations | N/A | Should be done by [SSP/Infra team](https://issues.redhat.com/browse/CNV-76714)/[Virt team](https://issues.redhat.com/browse/CNV-26818) | +| Regression Testing | Verifies that new changes do not break existing functionality | Y | Run IUO T2 tests (must-gather, nodeplacement, golden images tests in particular) to verify no regressions in existing functionality. | | Upgrade Testing | Validates upgrade paths from previous versions, data migration, and configuration preservation | Y | VMs migrated and updated successfully, related resources preserved. | | Backward Compatibility Testing | Ensures feature maintains compatibility with previous API versions and configurations | Y | Legacy Datasource pointers, custom golden images without arch annotation. | -| Dependencies | Dependent on deliverables from other components/products? Identify what is tested by which team. | N/A | | -| Cross Integrations | Does the feature affect other features/require testing by other components? Identify what is tested by which team. | Y | **IUO**: HCO node architecture tracking (`status.nodeInfo`), FG activation/propagation, new metrics & alerts, upgrade
**SSP**: Templates creation & utilization, new SSP API (`enableMultipleArchitectures`, `cluster` fields)
**Storage**: CDI-importer architecture selection, legacy `DataSource` backward compatibility, new CDI `platform` API
**Virt**: VM scheduling to correct architecture nodes, VM migration between same-arch nodes, upgrade | -| Monitoring | Does the feature require metrics and/or alerts? | Y | [`HCOMultiArchGoldenImagesDisabled`](https://kubevirt.io/monitoring/runbooks/HCOMultiArchGoldenImagesDisabled.html),
[`HCOGoldenImageWithNoArchitectureAnnotation`](https://kubevirt.io/monitoring/runbooks/HCOGoldenImageWithNoArchitectureAnnotation),
[`HCOGoldenImageWithNoSupportedArchitecture`](https://kubevirt.io/monitoring/runbooks/HCOGoldenImageWithNoSupportedArchitecture) | -| Cloud Testing | Does the feature require multi-cloud platform testing? Consider cloud-specific features. | N/A | not related to cloud | +| Dependencies | Dependent on deliverables from other components/products? Identify what is tested by which team. | Y | Allowing multi-cpu architecture on openshift-virtualization-tests. Review the [PR](https://github.com/RedHatQE/openshift-virtualization-tests/pull/3147) whenever its ready to review. | +| Cross Integrations | Does the feature affect other features/require testing by other components? Identify what is tested by which team. | Y | **IUO**: HCO node architecture tracking (`status.nodeInfo`), FG activation/propagation, new metrics & alerts, upgrade
**[SSP/Infra](https://issues.redhat.com/browse/CNV-76714)**: Templates creation & utilization, new SSP API (`enableMultipleArchitectures`, `cluster` fields)
**[Storage](https://issues.redhat.com/browse/CNV-76732)**: CDI-importer architecture selection, legacy `DataSource` backward compatibility, new CDI `platform` API
**[Virt](https://issues.redhat.com/browse/CNV-26818)**: VM scheduling to correct architecture nodes, VM migration between same-arch nodes, upgrade, defaultCPUModel
**[Network](https://issues.redhat.com/browse/CNV-76741)**: Network-related multiarch testing | +| Monitoring | Does the feature require metrics and/or alerts? | Y | **Alerts (T1):**
[`HCOMultiArchGoldenImagesDisabled`](https://kubevirt.io/monitoring/runbooks/HCOMultiArchGoldenImagesDisabled.html),
[`HCOGoldenImageWithNoArchitectureAnnotation`](https://kubevirt.io/monitoring/runbooks/HCOGoldenImageWithNoArchitectureAnnotation),
[`HCOGoldenImageWithNoSupportedArchitecture`](https://kubevirt.io/monitoring/runbooks/HCOGoldenImageWithNoSupportedArchitecture)

**Metrics (T2):**
`kubevirt_hco_multi_arch_boot_images_enabled`,
`kubevirt_hco_dataimportcrontemplate_with_architecture_annotation`,
`kubevirt_hco_dataimportcrontemplate_with_supported_architectures` | +| Cloud Testing | Does the feature require multi-cloud platform testing? Consider cloud-specific features. | Y | Testing environment AWS cluster | #### **3. Test Environment** @@ -224,7 +229,8 @@ The following conditions must be met before testing can begin: - [X] VEP [dic-on-heterogeneous-cluster](https://github.com/kubevirt/enhancements/tree/main/veps/sig-storage/dic-on-heterogeneous-cluster) is **approved and merged** - [x] Test environment (MultiArch cluster) can be **set up and configured** -- [ ] Multi-CPU architecture support enabled in openshift-virtualization repo +- [ ] Multi-CPU architecture support enabled in openshift-virtualization-tests repo +- [ ] HCO jenkins jobs created & scheduled #### **5. Risks** @@ -236,14 +242,14 @@ with justification. --> | Risk Category | Specific Risk for This Feature | Mitigation Strategy | Status | |:-----------------------------------|:------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:--------------------------------------------------------------------------------------------------------------------|:-------| -| Timeline/Schedule | Code-Freeze this week | Prioritize HCO-specific testing this week. Upgrade automation can wait, since its impacting 4.22 anyway | [ ] | -| Test Coverage | | | [X] | -| Test Environment | Requires additional ARM64 workers node - [Jira ticket](https://issues.redhat.com/browse/CNV-73894) | Can be done manually | [ ] | -| Untestable Aspects | | | [X] | -| Resource Constraints | N/A | N/A | [ ] | +| Timeline/Schedule | N/A | N/A | [X] | +| Test Coverage | Should be coordinated with all cnv-sigs | Review & sync with other sigs | [ ] | +| Test Environment | N/A | N/A | [X] | +| Untestable Aspects | N/A | N/A | [X] | +| Resource Constraints | MultiArch cluster available only for 12 hours | Test automation on HA cluster first, final verification on MultiArch | [X] | | Dependencies | Allowing multi-cpu architecture on openshift-virtualization-tests | Review the [PR](https://github.com/RedHatQE/openshift-virtualization-tests/pull/3147) whenever its ready to review. | [ ] | | Blocker Bug for legacy DataSources | [CNV-75762](https://issues.redhat.com/browse/CNV-75762) | on POST - Storage QE to verify | [ ] | -| Other non-blocker bugs | 1. [[UI] architecture is incorrect for fedora arm and inconsistent on UI for other os](https://issues.redhat.com/browse/CNV-68981)
2. [[Storage] Arch-specific DataSources (arm64) persist after removing arm64 nodes](https://issues.redhat.com/browse/CNV-68996)
3. [[Storage] Bootable volumes are re-imported after set enableMultiArchBootImageImport to true for AMD64](https://issues.redhat.com/browse/CNV-75084) | | [ ] | +| Other non-blocker bugs | 1. [[UI] architecture is incorrect for fedora arm and inconsistent on UI for other os](https://issues.redhat.com/browse/CNV-68981)
2. [[Storage] Arch-specific DataSources (arm64) persist after removing arm64 nodes](https://issues.redhat.com/browse/CNV-68996) | Make sure they are fixed & verified | [ ] | #### **6. Known Limitations** @@ -274,18 +280,21 @@ tested. --> | Requirement ID | Requirement Summary | Test Scenario(s) | Tier | Priority | |:---------------|:------------------------------------------------------------------------------------------------------------------------|:--------------------------------------------------------------------------------------------------------------------------------------------------------|:-------|:---------| -| | As a VM creator, I want to create VMs with specific architectures to run architecture-specific applications | Verify HCO monitors the cluster's nodes architectures correctly, and updated in addition/removal of nodes | Tier 1 | P0 | -| | As a cluster admin, I want to define custom golden images with multi-architecture support | Verify golden images are annotated only with architectures that are actually supported in HCO+SSP | Tier 1 | P0 | -| | As a VM creator/cluster admin, I want to create VMs / custom golden images for specific architectures | Verify related resources created only for golden images annotated with supported architecture, named with the architecture suffix, and are ready to use | Tier 2 | P0 | -| | As a cluster admin, I want to define custom golden images for specific architectures ensuring VMs run on matching nodes | Verify golden images annotated only with unsupported architectures present the fail status in HCO dataImportCronTemplates status | Tier 1 | P1 | -| | As a cluster admin, I want custom golden images for specific architectures ensuring VMs run only on matching nodes | Verify alert fired when golden image is annotated with an unsupported architecture | Tier 1 | P1 | -| | As a VM creator/cluster admin, I want to create VMs / custom golden images for specific architectures | Verify alert fired when running on a multi-arch cluster while Multiarch FG is disabled | Tier 1 | P1 | -| | As a cluster admin, I want to define custom golden images with multi-architecture support | Verify alert fired when a custom golden image lacks an architecture annotation | Tier 1 | P1 | -| | As a VM creator, I want my existing tools and scripts to continue working without changes | Verify legacy Datasources point to default arch-annotated Datasources | Tier 2 | P0 | -| | As a cluster admin, I want custom golden images for specific architectures ensuring VMs run only on matching nodes | Verify nodePlacement affects related resources creation | Tier 2 | P0 | -| | As a VM creator, I want to create VMs with specific architectures to run architecture-specific applications | Verify ARM64 and AMD64 VMs are migrated across worker nodes of the same architecture during upgrades | Tier 2 | P0 | -| | As a VM creator/cluster admin, I want to create VMs / custom golden images for specific architectures | Verify related resources preserved after upgrade | Tier 2 | P0 | -| | As a VM creator/cluster admin, I want to create VMs / define custom golden images with multi-architecture support | Verify the functional tests post-upgrade to version when Multiarch FG is enabled by default | Tier 2 | P1 | +| | As an admin, I expect HCO to detect and report the correct node architectures in my cluster | Verify HCO monitors the cluster's nodes architectures correctly, and updated in addition/removal of nodes | Tier 1 | P0 | +| | As an admin, I expect golden images to be annotated only with architectures my cluster actually supports | Verify golden images are annotated only with architectures that are actually supported in HCO+SSP | Tier 1 | P0 | +| | As a user, I expect arch-specific boot sources to be created and ready so I can create VMs on the correct architecture | Verify related resources created only for golden images annotated with supported architecture, named with the architecture suffix, and are ready to use | Tier 2 | P0 | +| | As an admin, I expect a clear failure status when a golden image targets an unsupported architecture | Verify golden images annotated only with unsupported architectures present the fail status in HCO dataImportCronTemplates status | Tier 1 | P1 | +| | As an admin, I expect an alert when a golden image is annotated with an unsupported architecture | Verify alert `HCOGoldenImageWithNoSupportedArchitecture` fired when golden image is annotated with an unsupported architecture | Tier 1 | P1 | +| | As an admin, I expect an alert when Multiarch FG is disabled on a multi-arch cluster | Verify alert `HCOMultiArchGoldenImagesDisabled` fired when running on a multi-arch cluster while Multiarch FG is disabled | Tier 1 | P1 | +| | As an admin, I expect an alert when a custom golden image is missing an architecture annotation | Verify alert `HCOGoldenImageWithNoArchitectureAnnotation` fired when a custom golden image lacks an architecture annotation | Tier 1 | P1 | +| | As an admin, I expect the `kubevirt_hco_dataimportcrontemplate_with_supported_architectures` metric to reflect golden image arch support status | Verify metric value is correctly exposed via Prometheus and matches expected state per golden image | Tier 2 | P2 | +| | As an admin, I expect the `kubevirt_hco_multi_arch_boot_images_enabled` metric to reflect whether multiarch FG is enabled | Verify metric value is correctly exposed via Prometheus (1=enabled, 0=disabled, absent on single-arch) | Tier 2 | P2 | +| | As an admin, I expect the `kubevirt_hco_dataimportcrontemplate_with_architecture_annotation` metric to reflect annotation presence | Verify metric value is correctly exposed via Prometheus and matches expected state per golden image | Tier 2 | P2 | +| | As a user, I expect legacy DataSource references to keep working without breaking my existing tools | Verify legacy Datasources point to default arch-annotated Datasources | Tier 2 | P0 | +| | As an admin, I expect nodePlacement settings to be respected for arch-specific resources | Verify nodePlacement affects related resources creation | Tier 2 | P0 | +| | As a user, I expect my running VMs to migrate to same-architecture nodes during upgrades | Verify ARM64 and AMD64 VMs are migrated across worker nodes of the same architecture during upgrades | Tier 2 | P0 | +| | As an admin, I expect arch-specific resources to be preserved after an upgrade | Verify related resources preserved after upgrade | Tier 2 | P0 | +| | As an admin, I expect the feature to work when the Multiarch FG becomes enabled by default post-upgrade | Verify the functional tests post-upgrade to version when Multiarch FG is enabled by default | Tier 2 | P1 | --- @@ -294,8 +303,19 @@ tested. --> This Software Test Plan requires approval from the following stakeholders: * **Reviewers:** - - [Name / @github-username] - - [Name / @github-username] + - [QE Lead / @rnester] + - [sig-iuo representative / @nunnatsa @rllobilo @OhadRevah @albarker-rh] + - [sig-storage representative] + - [sig-virt representative] + - [sig-infra representative] + +* **Approvers:** + - [QE Lead / @rnester] + +* **Reviewers:** + - QE Architect: Ruth Netser + - QE Members: Yossi Segev, Anat Wax, Sergei Volkov * **Approvers:** - - [Name / @github-username] - - [Name / @github-username] + - QE Architect: Ruth Netser + - Product Manager/Owner: Ronen Sde-Or, Petr Horacek + - Principal Developer: Edward Haas From ed1f1b84ef6727b45180b1368bf6037dbaedd986 Mon Sep 17 00:00:00 2001 From: Harel Meir Date: Tue, 17 Feb 2026 12:09:32 +0200 Subject: [PATCH 5/8] Improvements Signed-off-by: Harel Meir --- ...ch_support.md => multiarch_arm_support.md} | 166 +++++++++--------- 1 file changed, 79 insertions(+), 87 deletions(-) rename stps/sig-iuo/{golden_image_multiarch_support.md => multiarch_arm_support.md} (67%) diff --git a/stps/sig-iuo/golden_image_multiarch_support.md b/stps/sig-iuo/multiarch_arm_support.md similarity index 67% rename from stps/sig-iuo/golden_image_multiarch_support.md rename to stps/sig-iuo/multiarch_arm_support.md index b83f0c2..4a3ed72 100644 --- a/stps/sig-iuo/golden_image_multiarch_support.md +++ b/stps/sig-iuo/multiarch_arm_support.md @@ -1,18 +1,18 @@ # Openshift-virtualization-tests Test plan -## HCO support for heterogeneous multi-arch clusters (golden images support) - Quality Engineering Plan +## Multiarch Support enablement for ARM - Quality Engineering Plan ### **Metadata & Tracking** | Field | Details | |:-----------------------|:---------------------------------------------------------------------------------------------------------------------------------| | **Enhancement(s)** | [dic-on-heterogeneous-cluster](https://github.com/kubevirt/enhancements/tree/main/veps/sig-storage/dic-on-heterogeneous-cluster) | -| **Feature in Jira** | [VIRTSTRAT-494](https://issues.redhat.com/browse/VIRTSTRAT-494) | +| **Feature in Jira** | [VIRTSTRAT-494 - Multiarch Support enablement for ARM](https://issues.redhat.com/browse/VIRTSTRAT-494) | | **Jira Tracking** | [CNV-67900](https://issues.redhat.com/browse/CNV-67900) | | **QE Owner(s)** | Harel Meir | | **Owning SIG** | sig-iuo | | **Participating SIGs** | sig-infra, sig-storage, sig-virt, sig-network | -| **Current Status** | Draft | +| **Current Status** | Review | --- @@ -31,7 +31,7 @@ -This feature enables golden images support for heterogeneous clusters, where nodes may have different CPU architectures. It allows customers to create persistent virtual machines with specific architectures by automatically managing architecture-specific `DataImportCron` and `DataSource` resources for each supported architecture in the cluster. The feature is controlled by the `enableMultiArchBootImageImport` feature gate in the HyperConverged CR, and involves coordination between HCO (which tracks cluster node architectures), SSP (which creates architecture-specific templates), and CDI (which imports the correct architecture-specific images). For 4.22, this feature gate will be disabled by default. It will move to be enabled by default based on test stability, customer usage, and other factors, not earlier than 4.23. +This feature enables ARM VM support in mixed-architecture (amd64/arm64) OpenShift Virtualization clusters. Architecture-specific golden image resources (`DataImportCron`, `DataSource`, `DataVolume`) are automatically managed per supported architecture, controlled by the `enableMultiArchBootImageImport` feature gate in HCO. The feature coordinates across HCO (node architecture tracking, feature gate), SSP (templates, instance types), CDI (image imports), and KubeVirt (scheduling, lifecycle). For 4.22 the feature gate is disabled by default, moving to enabled-by-default not earlier than 4.23. This STP covers sig-iuo responsibilities while documenting cross-SIG testing ownership. --- @@ -47,14 +47,14 @@ technology, and testability before formal test planning. 2. **Details/Notes column**: Summary of the topic (e.g., list key requirements, describe customer value, note acceptance criteria) 3. **Comments column**: Document any concerns, gaps, or follow-up items needed --> -| Check | Done | Details/Notes | Comments | -|:---------------------------------------|:-----|:---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:---------------------------------| -| **Review Requirements** | [x] | - Nodes must be properly labeled to differentiate supported architectures
- Allow migration across same-arch nodes
- ARM + x86 workload observability
- VMs must only run on nodes supporting their architecture (e.g., ARM VMs on ARM nodes) | | -| **Understand Value** | [x] | 1. Enable users to create persistent VMs with specific architectures on heterogeneous clusters.
2.Allow users to define custom golden images with multi-architecture support.
3. Reliable VM deployment by automatically matching image architectures to compatible node hardware without breaking existing workflows.
4. Backward compatibility for users with existing scripts/tools referencing specific DataSource CRs. | | -| **Customer Use Cases** | [x] | **UC1 - Hybrid Development/Testing**: Developer builds app targeting x86 servers and ARM edge devices, running test VMs for both architectures in a single cluster.
**UC2 - Edge + Data Center Integration**: Operator provisions ARM VMs for edge workloads and x86 VMs for core workloads within the same management plane.
**UC3 - ISV Application Validation**: QA team runs ARM VM test environments in the same cluster used for x86-based CI/CD pipelines. | | -| **Testability** | [X] | Everything is testable, despite upgrade to a version which this FG enabled by default. (currently disabled by default) | Should be done in 4.23 timeframe | -| **Acceptance Criteria** | [x] | - HCO must consistently report accurate node architectures
- common golden images should only be annotated with architectures that are actually supported
- Related resources should be created only for golden images annotated with architectures that are actually supported
- custom golden images without arch-annotation should be backward-compatible
-Non supported architectures shouldn't result in resource creation
-Legacy Datasources should be backward-compatible
- Trigger alert when a golden image is annotated with an unsupported architecture
- Trigger alert when running on a multi-arch cluster while Multiarch FG is disabled
- Trigger alert when a custom golden image lacks an architecture annotation
- VMs migrate across worker nodes of the same architecture during upgrades | | -| **Non-Functional Requirements (NFRs)** | [x] | - **Usability**: New boot sources and VMs creation
- **Monitoring**: 3 new alerts and metrics
- **Regression**: Run IUO T2 tests (must-gather, nodeplacement, golden images tests in particular)
- **Doc**: Should be documented by doc team | | +| Check | Done | Details/Notes | Comments | +|:---------------------------------------|:-----|:---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:---------------------------------| +| **Review Requirements** | [x] | - Nodes must be properly labeled to differentiate supported architectures
- Allow migration across same-arch nodes
- ARM + x86 workload observability
- VMs must only run on nodes supporting their architecture (e.g., ARM VMs on ARM nodes)
- Golden images managed per-architecture with correct DataImportCron/DataSource resources
- Feature gate (`enableMultiArchBootImageImport`) controls golden image behavior
- Instance types and preferences support architecture-specific configurations
- Consistent VM lifecycle management (start, stop, migrate) across architectures
- Unified monitoring and logging across multiarch VMs | | +| **Understand Value** | [x] | 1. Enable users to create persistent VMs with specific architectures on heterogeneous clusters.
2. Allow users to define custom golden images with multi-architecture support.
3. Reliable VM deployment by automatically matching image architectures to compatible node hardware without breaking existing workflows.
4. Backward compatibility for users with existing scripts/tools referencing specific DataSource CRs.
5. ARM VM provisioning in mixed-architecture clusters without requiring separate management planes.
6. Architecture-aware scheduling eliminates manual node selection for VM placement.
7. Unified monitoring provides consistent observability across x86 and ARM workloads. | | +| **Customer Use Cases** | [x] | **UC1 - Hybrid Development/Testing**: Developer builds app targeting x86 servers and ARM edge devices, running test VMs for both architectures in a single cluster.
**UC2 - Edge + Data Center Integration**: Operator provisions ARM VMs for edge workloads and x86 VMs for core workloads within the same management plane.
**UC3 - ISV Application Validation**: QA team runs ARM VM test environments in the same cluster used for x86-based CI/CD pipelines. | | +| **Testability** | [X] | Everything is testable, despite upgrade to a version which this FG enabled by default. (currently disabled by default) | Should be done in 4.23 timeframe | +| **Acceptance Criteria** | [x] | - HCO accurately reports node architectures in `status.nodeInfo`
- Golden images annotated only with supported architectures
- Arch-specific resources created only for supported architectures, correctly named, and ready to use
- Unsupported-only arch annotations result in fail status in HCO `dataImportCronTemplates`
- Each alert fires and corresponding metric reports correct value (unsupported arch, disabled FG, missing annotation)
- Legacy DataSources remain backward-compatible
- VMs migrate to same-arch nodes during upgrades; arch-specific resources preserved
- Functional tests pass post-upgrade when FG enabled by default | | +| **Non-Functional Requirements (NFRs)** | [x] | - **Usability**: New boot sources and VMs creation
- **Monitoring**: 3 new alerts and metrics
- **Regression**: Run IUO T2 tests (must-gather, nodeplacement, golden images tests in particular)
- **Doc**: Should be documented by doc team
- **Portability**: Feature operates on AWS multiarch clusters
- **Upgrade**: VMs survive upgrade with correct architecture placement preserved | | #### **2. Technology and Design Review** @@ -122,6 +122,7 @@ Each goal should tie back to requirements from Section I and be independently ve - **[P1]** Test integration with OpenShift monitoring stack: metrics appear in Prometheus, alerts fire correctly in Alertmanager --> + **Functional Goals**: - **[P0]** Verify HCO monitors the cluster's nodes architectures correctly, and updated in addition/removal of nodes. - **[P0]** Verify golden images are annotated only with architectures that are actually supported in HCO+SSP. @@ -130,22 +131,24 @@ Each goal should tie back to requirements from Section I and be independently ve **Monitoring Goals**: -- **[P1]** Verify alert `HCOGoldenImageWithNoSupportedArchitecture` fired when golden image is annotated with an unsupported architecture (T1). -- **[P1]** Verify alert `HCOMultiArchGoldenImagesDisabled` fired when running on a multi-arch cluster while Multiarch FG is disabled (T1). -- **[P1]** Verify alert `HCOGoldenImageWithNoArchitectureAnnotation` fired when a custom golden image lacks an architecture annotation (T1). -- **[P1]** Verify metric `kubevirt_hco_dataimportcrontemplate_with_supported_architectures` value is correctly exposed via Prometheus (T2). -- **[P1]** Verify metric `kubevirt_hco_multi_arch_boot_images_enabled` value is correctly exposed via Prometheus (T2). -- **[P1]** Verify metric `kubevirt_hco_dataimportcrontemplate_with_architecture_annotation` value is correctly exposed via Prometheus (T2). +- **[P1]** Verify alert `HCOGoldenImageWithNoSupportedArchitecture` fires when DataImportCronTemplate annotated with unsupported architectures, and `kubevirt_hco_dataimportcrontemplate_with_supported_architectures` metric reports the appropriate value. +- **[P1]** Verify alert `HCOMultiArchGoldenImagesDisabled` fires when running on a multi-arch cluster while Multiarch FG is disabled, and `kubevirt_hco_multi_arch_boot_images_enabled` metric reports the appropriate value +- **[P1]** Verify alert `HCOGoldenImageWithNoArchitectureAnnotation` fires when a custom golden image lacks an architecture annotation, and `kubevirt_hco_dataimportcrontemplate_with_architecture_annotation` metric reports the appropriate value. +- **[P1]** Verify alert `HCOMultiArchGoldenImagesDisabled` not fired when running on a multi-arch cluster while Multiarch FG is disabled, but nodePlacement affecting only supported architectures. **Backward compatibility Goals**: - **[P0]** Verify Legacy Datasources points to default arch-annotated Datasources. -- **[P0]** Verify nodePlacement affects related resources creation. **Upgrade goals** - **[P0]** Verify ARM64 and AMD64 vms are migrated across worker nodes of the same architecture during upgrades - **[P0]** Verify related resources preserved after upgrade. - **[P1]** Verify the functional test post-upgrade to version when FG is enabled by default. +**Regression Goals** +All participating-sigs should run t1 and t2 on multiarch clusters, to make sure functionality doesn't break. +- **[P0]** sig-iuo t1 tests +- **[P0]** sig-iuo t2 tests with both cpu-arch +- **[P1]** Monitring t2 tests with both cpu-arch. @@ -160,39 +163,39 @@ that" issues; each out-of-scope item must have PM/Lead sign-off. **Note:** Replace example rows with your actual out-of-scope items. --> -| Non-Goal | Rationale | PM/ Lead Agreement | -|:---------------------------------|:---------------------------------------------------------------------|:-------------------| -| Update existing VM | If a VM is already running, it won't use the arch-specific resources | [ ] Name/Date | -| Performance Testing | Feature not scale related | [ ] Name/Date | -| Security Testing | Feature not security related | [ ] Name/Date | -| Usability testing | Should be done by [UI team](https://issues.redhat.com/browse/CNV-61832) (4.20) | [ ] Name/Date | -| Compatibility | Should be done by [SSP/Infra team](https://issues.redhat.com/browse/CNV-76714) (create vms from multiple archs/templates) | [ ] Name/Date | -| Templates creation & utilization | Should be done by [SSP/Infra team](https://issues.redhat.com/browse/CNV-76714) | [ ] Name/Date | -| Imports & datasource new API | Should be done by [Storage team](https://issues.redhat.com/browse/CNV-76732) | [ ] Name/Date | -| Test VMs migration between same arch nodes | Should be done by [Virt team](https://issues.redhat.com/browse/CNV-26818) | [ ] Name/Date | -| 'defaultCPUModel' integrations | Should be done by [Virt team](https://issues.redhat.com/browse/CNV-26818) | [ ] Name/Date | -| Testing with s390x architecture | The feature is "Multiarch Support enablement for ARM" | [ ] Name/Date | +| Non-Goal | Rationale | PM/ Lead Agreement | +|:-------------------------------------------|:--------------------------------------------------------------------------------------------------------------------------|:-------------------| +| Update existing VM | If a VM is already running, it won't use the arch-specific resources | [ ] Name/Date | +| Performance Testing | Feature not scale related | [ ] Name/Date | +| Security Testing | Feature not security related | [ ] Name/Date | +| Usability testing | Should be done by [UI team](https://issues.redhat.com/browse/CNV-61832) (4.20) | [ ] Name/Date | +| Compatibility | Should be done by [SSP/Infra team](https://issues.redhat.com/browse/CNV-76714) (create vms from multiple archs/templates) | [ ] Name/Date | +| Templates creation & utilization | Should be done by [SSP/Infra team](https://issues.redhat.com/browse/CNV-76714) | [ ] Name/Date | +| Imports & datasource new API | Should be done by [Storage team](https://issues.redhat.com/browse/CNV-76732) | [ ] Name/Date | +| Test VMs migration between same arch nodes | Should be done by [Virt team](https://issues.redhat.com/browse/CNV-26818) | [ ] Name/Date | +| 'defaultCPUModel' integrations | Should be done by [Virt team](https://issues.redhat.com/browse/CNV-74480) | [ ] Name/Date | +| Testing with s390x architecture | The feature is "Multiarch Support enablement for ARM" | [ ] Name/Date | #### **2. Test Strategy** -| Item | Description | Applicable (Y/N or N/A) | Comments | -|:-------------------------------|:-------------------------------------------------------------------------------------------------------------------------------------------------------------|:------------------------|:------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| -| Functional Testing | Validates that the feature works according to specified requirements and user stories | Y | nodes architecture monitoring, arch-annotations, related resources creation | -| Automation Testing | Ensures test cases are automated for continuous integration and regression coverage | Y | All test cases should be automated at openshift-virtualization-tests repo. | -| Performance Testing | Validates feature performance meets requirements (latency, throughput, resource usage) | N/A | Not related to scale. | -| Security Testing | Verifies security requirements, RBAC, authentication, authorization, and vulnerability scanning | N/A | Not related to security. | -| Usability Testing | Validates user experience, UI/UX consistency, and accessibility requirements. Does the feature require UI? If so, ensure the UI aligns with the requirements | Y | [UI/UX design doc](https://docs.google.com/document/d/18UKIXiAlyLTABQZdvDD5N85A6uM2CdBbif4eN1dVj-0/edit?usp=sharing) specify requirements.
Done by UI team [CNV-62535](https://issues.redhat.com/browse/CNV-62535) | -| Compatibility Testing | Ensures feature works across supported platforms, versions, and configurations | N/A | Should be done by [SSP/Infra team](https://issues.redhat.com/browse/CNV-76714)/[Virt team](https://issues.redhat.com/browse/CNV-26818) | -| Regression Testing | Verifies that new changes do not break existing functionality | Y | Run IUO T2 tests (must-gather, nodeplacement, golden images tests in particular) to verify no regressions in existing functionality. | -| Upgrade Testing | Validates upgrade paths from previous versions, data migration, and configuration preservation | Y | VMs migrated and updated successfully, related resources preserved. | -| Backward Compatibility Testing | Ensures feature maintains compatibility with previous API versions and configurations | Y | Legacy Datasource pointers, custom golden images without arch annotation. | -| Dependencies | Dependent on deliverables from other components/products? Identify what is tested by which team. | Y | Allowing multi-cpu architecture on openshift-virtualization-tests. Review the [PR](https://github.com/RedHatQE/openshift-virtualization-tests/pull/3147) whenever its ready to review. | +| Item | Description | Applicable (Y/N or N/A) | Comments | +|:-------------------------------|:-------------------------------------------------------------------------------------------------------------------------------------------------------------|:------------------------|:-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| +| Functional Testing | Validates that the feature works according to specified requirements and user stories | Y | nodes architecture monitoring, arch-annotations, related resources creation | +| Automation Testing | Ensures test cases are automated for continuous integration and regression coverage | Y | All test cases should be automated at openshift-virtualization-tests repo. | +| Performance Testing | Validates feature performance meets requirements (latency, throughput, resource usage) | N/A | Not related to scale. | +| Security Testing | Verifies security requirements, RBAC, authentication, authorization, and vulnerability scanning | N/A | Not related to security. | +| Usability Testing | Validates user experience, UI/UX consistency, and accessibility requirements. Does the feature require UI? If so, ensure the UI aligns with the requirements | Y | [UI/UX design doc](https://docs.google.com/document/d/18UKIXiAlyLTABQZdvDD5N85A6uM2CdBbif4eN1dVj-0/edit?usp=sharing) specify requirements.
Done by UI team [CNV-62535](https://issues.redhat.com/browse/CNV-62535) | +| Compatibility Testing | Ensures feature works across supported platforms, versions, and configurations | N/A | Should be done by [SSP/Infra team](https://issues.redhat.com/browse/CNV-76714)/[Virt team](https://issues.redhat.com/browse/CNV-26818) | +| Regression Testing | Verifies that new changes do not break existing functionality | Y | Run IUO T2 tests (must-gather, nodeplacement, golden images tests in particular) to verify no regressions in existing functionality. | +| Upgrade Testing | Validates upgrade paths from previous versions, data migration, and configuration preservation | Y | VMs migrated and updated successfully, related resources preserved. | +| Backward Compatibility Testing | Ensures feature maintains compatibility with previous API versions and configurations | Y | Legacy Datasource pointers, custom golden images without arch annotation. | +| Dependencies | Dependent on deliverables from other components/products? Identify what is tested by which team. | Y | Allowing multi-cpu architecture on openshift-virtualization-tests. Review the [PR](https://github.com/RedHatQE/openshift-virtualization-tests/pull/3147) whenever its ready to review. | | Cross Integrations | Does the feature affect other features/require testing by other components? Identify what is tested by which team. | Y | **IUO**: HCO node architecture tracking (`status.nodeInfo`), FG activation/propagation, new metrics & alerts, upgrade
**[SSP/Infra](https://issues.redhat.com/browse/CNV-76714)**: Templates creation & utilization, new SSP API (`enableMultipleArchitectures`, `cluster` fields)
**[Storage](https://issues.redhat.com/browse/CNV-76732)**: CDI-importer architecture selection, legacy `DataSource` backward compatibility, new CDI `platform` API
**[Virt](https://issues.redhat.com/browse/CNV-26818)**: VM scheduling to correct architecture nodes, VM migration between same-arch nodes, upgrade, defaultCPUModel
**[Network](https://issues.redhat.com/browse/CNV-76741)**: Network-related multiarch testing | -| Monitoring | Does the feature require metrics and/or alerts? | Y | **Alerts (T1):**
[`HCOMultiArchGoldenImagesDisabled`](https://kubevirt.io/monitoring/runbooks/HCOMultiArchGoldenImagesDisabled.html),
[`HCOGoldenImageWithNoArchitectureAnnotation`](https://kubevirt.io/monitoring/runbooks/HCOGoldenImageWithNoArchitectureAnnotation),
[`HCOGoldenImageWithNoSupportedArchitecture`](https://kubevirt.io/monitoring/runbooks/HCOGoldenImageWithNoSupportedArchitecture)

**Metrics (T2):**
`kubevirt_hco_multi_arch_boot_images_enabled`,
`kubevirt_hco_dataimportcrontemplate_with_architecture_annotation`,
`kubevirt_hco_dataimportcrontemplate_with_supported_architectures` | -| Cloud Testing | Does the feature require multi-cloud platform testing? Consider cloud-specific features. | Y | Testing environment AWS cluster | +| Monitoring | Does the feature require metrics and/or alerts? | Y | **Alerts + Metrics (T2):**
[`HCOGoldenImageWithNoSupportedArchitecture`](https://kubevirt.io/monitoring/runbooks/HCOGoldenImageWithNoSupportedArchitecture) + `kubevirt_hco_dataimportcrontemplate_with_supported_architectures`,
[`HCOMultiArchGoldenImagesDisabled`](https://kubevirt.io/monitoring/runbooks/HCOMultiArchGoldenImagesDisabled.html) + `kubevirt_hco_multi_arch_boot_images_enabled`,
[`HCOGoldenImageWithNoArchitectureAnnotation`](https://kubevirt.io/monitoring/runbooks/HCOGoldenImageWithNoArchitectureAnnotation) + `kubevirt_hco_dataimportcrontemplate_with_architecture_annotation` | +| Cloud Testing | Does the feature require multi-cloud platform testing? Consider cloud-specific features. | Y | Testing environment AWS cluster | #### **3. Test Environment** @@ -201,7 +204,7 @@ that" issues; each out-of-scope item must have PM/Lead sign-off. | Environment Component | Configuration | Comments | |:----------------------------------------------|:-------------------------|:--------------------------------------------------------------| | **Cluster Topology** | MultiArch cluster | 3 control-plane and 3-4 worker nodes | -| **OCP & OpenShift Virtualization Version(s)** | OCP 4.21, CNV-4.21 | OCP 4.21 and OpenShift Virtualization 4.21 | +| **OCP & OpenShift Virtualization Version(s)** | OCP 4.22, CNV-4.22 | OCP 4.22 and OpenShift Virtualization 4.22 | | **CPU Virtualization** | Multi-arch cluster | 3 amd64 control-plane, 2 amd64 workers, and 1-2 arm64 workers | | **Compute Resources** | N/A | No special compute requirements | | **Special Hardware** | N/A | No special hardware required | @@ -217,11 +220,11 @@ that" issues; each out-of-scope item must have PM/Lead sign-off. for this feature. **Note:** Only list tools that are **new** or **different** from standard testing infrastructure. Leave empty if using standard tools. --> -| Category | Tools/Frameworks | -|:-------------------|:------------------| -| **Test Framework** | MultiArch cluster | -| **CI/CD** | | -| **Other Tools** | | +| Category | Tools/Frameworks | +|:-------------------|:------------------------------------------| +| **Test Framework** | MultiArch cluster | +| **CI/CD** | Dedicated t2 jenkins job with --cpu-archs | +| **Other Tools** | | #### **4. Entry Criteria** @@ -230,7 +233,7 @@ The following conditions must be met before testing can begin: - [X] VEP [dic-on-heterogeneous-cluster](https://github.com/kubevirt/enhancements/tree/main/veps/sig-storage/dic-on-heterogeneous-cluster) is **approved and merged** - [x] Test environment (MultiArch cluster) can be **set up and configured** - [ ] Multi-CPU architecture support enabled in openshift-virtualization-tests repo -- [ ] HCO jenkins jobs created & scheduled +- [ ] HCO t2 jenkins jobs created & scheduled #### **5. Risks** @@ -240,16 +243,16 @@ justification in mitigation strategy. **Note:** Empty "Specific Risk" cells mean this must be filled. "N/A" means explicitly not applicable with justification. --> -| Risk Category | Specific Risk for This Feature | Mitigation Strategy | Status | -|:-----------------------------------|:------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:--------------------------------------------------------------------------------------------------------------------|:-------| -| Timeline/Schedule | N/A | N/A | [X] | -| Test Coverage | Should be coordinated with all cnv-sigs | Review & sync with other sigs | [ ] | -| Test Environment | N/A | N/A | [X] | -| Untestable Aspects | N/A | N/A | [X] | -| Resource Constraints | MultiArch cluster available only for 12 hours | Test automation on HA cluster first, final verification on MultiArch | [X] | -| Dependencies | Allowing multi-cpu architecture on openshift-virtualization-tests | Review the [PR](https://github.com/RedHatQE/openshift-virtualization-tests/pull/3147) whenever its ready to review. | [ ] | -| Blocker Bug for legacy DataSources | [CNV-75762](https://issues.redhat.com/browse/CNV-75762) | on POST - Storage QE to verify | [ ] | -| Other non-blocker bugs | 1. [[UI] architecture is incorrect for fedora arm and inconsistent on UI for other os](https://issues.redhat.com/browse/CNV-68981)
2. [[Storage] Arch-specific DataSources (arm64) persist after removing arm64 nodes](https://issues.redhat.com/browse/CNV-68996) | Make sure they are fixed & verified | [ ] | +| Risk Category | Specific Risk for This Feature | Mitigation Strategy | Status | +|:-----------------------------------|:------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:--------------------------------------------------------------------------------------------------------------------|:-------| +| Timeline/Schedule | N/A | N/A | [X] | +| Test Coverage | Should be coordinated with all cnv-sigs | Review & sync with other sigs | [ ] | +| Test Environment | N/A | N/A | [X] | +| Untestable Aspects | N/A | N/A | [X] | +| Resource Constraints | MultiArch cluster available only for 12 hours | Test automation on HA cluster first, final verification on MultiArch | [X] | +| Dependencies | Allowing multi-cpu architecture on openshift-virtualization-tests | Review the [PR](https://github.com/RedHatQE/openshift-virtualization-tests/pull/3147) whenever its ready to review. | [ ] | +| Blocker Bug for legacy DataSources | [CNV-75762](https://issues.redhat.com/browse/CNV-75762) | on POST - Storage QE to verify | [ ] | +| Other non-blocker bugs | 1. [[UI] architecture is incorrect for fedora arm and inconsistent on UI for other os](https://issues.redhat.com/browse/CNV-68981)
2. [[Storage] Arch-specific DataSources (arm64) persist after removing arm64 nodes](https://issues.redhat.com/browse/CNV-68996) | Make sure they are fixed & verified | [ ] | #### **6. Known Limitations** @@ -278,23 +281,20 @@ tested. --> **Requirement Summary:** Brief description from the Jira issue (user story format preferred) --> -| Requirement ID | Requirement Summary | Test Scenario(s) | Tier | Priority | -|:---------------|:------------------------------------------------------------------------------------------------------------------------|:--------------------------------------------------------------------------------------------------------------------------------------------------------|:-------|:---------| -| | As an admin, I expect HCO to detect and report the correct node architectures in my cluster | Verify HCO monitors the cluster's nodes architectures correctly, and updated in addition/removal of nodes | Tier 1 | P0 | -| | As an admin, I expect golden images to be annotated only with architectures my cluster actually supports | Verify golden images are annotated only with architectures that are actually supported in HCO+SSP | Tier 1 | P0 | -| | As a user, I expect arch-specific boot sources to be created and ready so I can create VMs on the correct architecture | Verify related resources created only for golden images annotated with supported architecture, named with the architecture suffix, and are ready to use | Tier 2 | P0 | -| | As an admin, I expect a clear failure status when a golden image targets an unsupported architecture | Verify golden images annotated only with unsupported architectures present the fail status in HCO dataImportCronTemplates status | Tier 1 | P1 | -| | As an admin, I expect an alert when a golden image is annotated with an unsupported architecture | Verify alert `HCOGoldenImageWithNoSupportedArchitecture` fired when golden image is annotated with an unsupported architecture | Tier 1 | P1 | -| | As an admin, I expect an alert when Multiarch FG is disabled on a multi-arch cluster | Verify alert `HCOMultiArchGoldenImagesDisabled` fired when running on a multi-arch cluster while Multiarch FG is disabled | Tier 1 | P1 | -| | As an admin, I expect an alert when a custom golden image is missing an architecture annotation | Verify alert `HCOGoldenImageWithNoArchitectureAnnotation` fired when a custom golden image lacks an architecture annotation | Tier 1 | P1 | -| | As an admin, I expect the `kubevirt_hco_dataimportcrontemplate_with_supported_architectures` metric to reflect golden image arch support status | Verify metric value is correctly exposed via Prometheus and matches expected state per golden image | Tier 2 | P2 | -| | As an admin, I expect the `kubevirt_hco_multi_arch_boot_images_enabled` metric to reflect whether multiarch FG is enabled | Verify metric value is correctly exposed via Prometheus (1=enabled, 0=disabled, absent on single-arch) | Tier 2 | P2 | -| | As an admin, I expect the `kubevirt_hco_dataimportcrontemplate_with_architecture_annotation` metric to reflect annotation presence | Verify metric value is correctly exposed via Prometheus and matches expected state per golden image | Tier 2 | P2 | -| | As a user, I expect legacy DataSource references to keep working without breaking my existing tools | Verify legacy Datasources point to default arch-annotated Datasources | Tier 2 | P0 | -| | As an admin, I expect nodePlacement settings to be respected for arch-specific resources | Verify nodePlacement affects related resources creation | Tier 2 | P0 | -| | As a user, I expect my running VMs to migrate to same-architecture nodes during upgrades | Verify ARM64 and AMD64 VMs are migrated across worker nodes of the same architecture during upgrades | Tier 2 | P0 | -| | As an admin, I expect arch-specific resources to be preserved after an upgrade | Verify related resources preserved after upgrade | Tier 2 | P0 | -| | As an admin, I expect the feature to work when the Multiarch FG becomes enabled by default post-upgrade | Verify the functional tests post-upgrade to version when Multiarch FG is enabled by default | Tier 2 | P1 | +| Requirement ID | Requirement Summary | Test Scenario(s) | Tier | Priority | +|:--------------------------------------------------------|:--------------------------------------------------------------------------|:----------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:-------|:---------| +| [CNV-67900](https://issues.redhat.com/browse/CNV-67900) | HCO detects and reports node architectures | Verify HCO monitors node architectures correctly, including on node addition/removal | Tier 1 | P0 | +| | Golden images annotated only with supported architectures | Verify golden images are annotated only with architectures actually supported in HCO+SSP | Tier 1 | P0 | +| | Arch-specific resources created and ready to use | Verify arch-specific resources created for supported architectures, named with architecture suffix, and are ready to use | Tier 2 | P0 | +| | Fail status for golden images with only unsupported architectures | Verify HCO `dataImportCronTemplates` status shows failure for golden images annotated only with unsupported architectures | Tier 1 | P1 | +| | Alert and metric for golden image with unsupported architecture | Verify `HCOGoldenImageWithNoSupportedArchitecture` alert fires and `kubevirt_hco_dataimportcrontemplate_with_supported_architectures` metric reports the appropriate value | Tier 2 | P1 | +| | Alert and metric for disabled Multiarch FG on multi-arch cluster | Verify `HCOMultiArchGoldenImagesDisabled` alert fires and `kubevirt_hco_multi_arch_boot_images_enabled` metric reports the appropriate value | Tier 2 | P1 | +| | Alert and metric for custom golden image missing arch annotation | Verify `HCOGoldenImageWithNoArchitectureAnnotation` alert fires and `kubevirt_hco_dataimportcrontemplate_with_architecture_annotation` metric reports the appropriate value | Tier 2 | P1 | +| | Alert not fired when nodePlacement limits to supported architectures only | Verify `HCOMultiArchGoldenImagesDisabled` not fired when Multiarch FG is disabled but nodePlacement restricts to supported architectures only | Tier 2 | P1 | +| | Legacy DataSources remain backward-compatible | Verify legacy DataSources point to default arch-annotated DataSources | Tier 2 | P0 | +| | VMs migrate to same-arch nodes during upgrades | Verify ARM64 and AMD64 VMs migrate to same-architecture worker nodes during upgrades | Tier 2 | P0 | +| | Arch-specific resources preserved after upgrade | Verify arch-specific resources persist after upgrade | Tier 2 | P0 | +| | Functional validation post-upgrade when FG enabled by default | Verify functional tests pass post-upgrade to version with Multiarch FG enabled by default | Tier 2 | P1 | --- @@ -311,11 +311,3 @@ This Software Test Plan requires approval from the following stakeholders: * **Approvers:** - [QE Lead / @rnester] - -* **Reviewers:** - - QE Architect: Ruth Netser - - QE Members: Yossi Segev, Anat Wax, Sergei Volkov -* **Approvers:** - - QE Architect: Ruth Netser - - Product Manager/Owner: Ronen Sde-Or, Petr Horacek - - Principal Developer: Edward Haas From 699bda007ee214541230fe31520e636559dc022d Mon Sep 17 00:00:00 2001 From: Harel Meir Date: Tue, 17 Feb 2026 15:40:47 +0200 Subject: [PATCH 6/8] Implement "master stp" approach to centerlized all sigs related information Signed-off-by: Harel Meir --- stps/sig-iuo/multiarch_arm_support.md | 144 ++++++++++++++++---------- 1 file changed, 88 insertions(+), 56 deletions(-) diff --git a/stps/sig-iuo/multiarch_arm_support.md b/stps/sig-iuo/multiarch_arm_support.md index 4a3ed72..c02865c 100644 --- a/stps/sig-iuo/multiarch_arm_support.md +++ b/stps/sig-iuo/multiarch_arm_support.md @@ -18,20 +18,21 @@ **Document Conventions (if applicable):** [Define acronyms or terms specific to this document] -| Term | Definition | -|:----------------------|:---------------------------------------------------------------------------------------------------------| -| **MultiArch cluster** | Heterogeneous cluster with 3 amd64 control-plane nodes, 2 amd64 worker nodes, and 1-2 arm64 worker nodes | -| **HA cluster** | Homogeneous cluster with 3 control-plane nodes and 3 amd64 worker nodes | -| **MultiArch FG** | `enableMultiArchBootImageImport` feature gate in HCO CR. | -| **Related resources** | Golden image associated resources: `DataImportCron`, `DataSource`, `DataVolume`, `VolumeSnapshot` | -| **nodeInfo** | HCO status field tracking cluster architectures (`status.nodeInfo`) | +| Term | Definition | +|:----------------------|:-------------------------------------------------------------------------------------------------------| +| **MultiArch cluster** | Heterogeneous cluster with 3 amd64 control-plane nodes, 2 amd64 worker nodes, and 2 arm64 worker nodes | +| **HA cluster** | Homogeneous cluster with 3 control-plane nodes and 3 amd64 worker nodes | +| **MultiArch FG** | `enableMultiArchBootImageImport` feature gate in HCO CR. | +| **Related resources** | Golden image associated resources: `DataImportCron`, `DataSource`, `DataVolume`, `VolumeSnapshot` | +| **nodeInfo** | HCO status field tracking cluster architectures (`status.nodeInfo`) | ### **Feature Overview** -This feature enables ARM VM support in mixed-architecture (amd64/arm64) OpenShift Virtualization clusters. Architecture-specific golden image resources (`DataImportCron`, `DataSource`, `DataVolume`) are automatically managed per supported architecture, controlled by the `enableMultiArchBootImageImport` feature gate in HCO. The feature coordinates across HCO (node architecture tracking, feature gate), SSP (templates, instance types), CDI (image imports), and KubeVirt (scheduling, lifecycle). For 4.22 the feature gate is disabled by default, moving to enabled-by-default not earlier than 4.23. This STP covers sig-iuo responsibilities while documenting cross-SIG testing ownership. +This feature enables ARM VM support in mixed-architecture (amd64/arm64) OpenShift Virtualization clusters. Architecture-specific golden image resources (`DataImportCron`, `DataSource`, `DataVolume`) are automatically managed per supported architecture, controlled by the `enableMultiArchBootImageImport` feature gate in HCO. +For 4.22 the feature gate is disabled by default, moving to enabled-by-default not earlier than 4.23. This STP covers sig-iuo responsibilities while documenting cross-SIG testing ownership. --- @@ -47,14 +48,14 @@ technology, and testability before formal test planning. 2. **Details/Notes column**: Summary of the topic (e.g., list key requirements, describe customer value, note acceptance criteria) 3. **Comments column**: Document any concerns, gaps, or follow-up items needed --> -| Check | Done | Details/Notes | Comments | -|:---------------------------------------|:-----|:---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:---------------------------------| -| **Review Requirements** | [x] | - Nodes must be properly labeled to differentiate supported architectures
- Allow migration across same-arch nodes
- ARM + x86 workload observability
- VMs must only run on nodes supporting their architecture (e.g., ARM VMs on ARM nodes)
- Golden images managed per-architecture with correct DataImportCron/DataSource resources
- Feature gate (`enableMultiArchBootImageImport`) controls golden image behavior
- Instance types and preferences support architecture-specific configurations
- Consistent VM lifecycle management (start, stop, migrate) across architectures
- Unified monitoring and logging across multiarch VMs | | -| **Understand Value** | [x] | 1. Enable users to create persistent VMs with specific architectures on heterogeneous clusters.
2. Allow users to define custom golden images with multi-architecture support.
3. Reliable VM deployment by automatically matching image architectures to compatible node hardware without breaking existing workflows.
4. Backward compatibility for users with existing scripts/tools referencing specific DataSource CRs.
5. ARM VM provisioning in mixed-architecture clusters without requiring separate management planes.
6. Architecture-aware scheduling eliminates manual node selection for VM placement.
7. Unified monitoring provides consistent observability across x86 and ARM workloads. | | -| **Customer Use Cases** | [x] | **UC1 - Hybrid Development/Testing**: Developer builds app targeting x86 servers and ARM edge devices, running test VMs for both architectures in a single cluster.
**UC2 - Edge + Data Center Integration**: Operator provisions ARM VMs for edge workloads and x86 VMs for core workloads within the same management plane.
**UC3 - ISV Application Validation**: QA team runs ARM VM test environments in the same cluster used for x86-based CI/CD pipelines. | | -| **Testability** | [X] | Everything is testable, despite upgrade to a version which this FG enabled by default. (currently disabled by default) | Should be done in 4.23 timeframe | -| **Acceptance Criteria** | [x] | - HCO accurately reports node architectures in `status.nodeInfo`
- Golden images annotated only with supported architectures
- Arch-specific resources created only for supported architectures, correctly named, and ready to use
- Unsupported-only arch annotations result in fail status in HCO `dataImportCronTemplates`
- Each alert fires and corresponding metric reports correct value (unsupported arch, disabled FG, missing annotation)
- Legacy DataSources remain backward-compatible
- VMs migrate to same-arch nodes during upgrades; arch-specific resources preserved
- Functional tests pass post-upgrade when FG enabled by default | | -| **Non-Functional Requirements (NFRs)** | [x] | - **Usability**: New boot sources and VMs creation
- **Monitoring**: 3 new alerts and metrics
- **Regression**: Run IUO T2 tests (must-gather, nodeplacement, golden images tests in particular)
- **Doc**: Should be documented by doc team
- **Portability**: Feature operates on AWS multiarch clusters
- **Upgrade**: VMs survive upgrade with correct architecture placement preserved | | +| Check | Done | Details/Notes | Comments | +|:---------------------------------------|:-----|:-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:---------------------------------| +| **Review Requirements** | [x] | - VMs must only run on nodes supporting their architecture (e.g., ARM VMs on ARM nodes)
- Golden images created and managed per-architecture with correct DataImportCron/DataSource resources
- Instance types and preferences support architecture-specific configurations
- Unified monitoring and logging across multiarch VMs | | +| **Understand Value** | [x] | - Enable users to create persistent VMs with specific architectures on heterogeneous clusters.
- Allow users to define custom golden images with multi-architecture support.
- Reliable VM deployment by automatically matching image architectures to compatible node hardware without breaking existing workflows.
- Backward compatibility for users with existing scripts/tools referencing specific DataSource CRs.
- Unified monitoring provides consistent observability across x86 and ARM workloads. | | +| **Customer Use Cases** | [x] | **UC1 - Hybrid Development/Testing**: Developer builds app targeting x86 servers and ARM edge devices, running test VMs for both architectures in a single cluster.
**UC2 - Edge + Data Center Integration**: Operator provisions ARM VMs for edge workloads and x86 VMs for core workloads within the same management plane.
**UC3 - ISV Application Validation**: QA team runs ARM VM test environments in the same cluster used for x86-based CI/CD pipelines. | | +| **Testability** | [X] | Everything is testable, despite upgrade to a version which this FG enabled by default. (currently disabled by default) | Should be done in 4.23 timeframe | +| **Acceptance Criteria** | [x] | - HCO & SSP accurately reports node architectures in `status.nodeInfo`
- Golden images annotated only with supported architectures
- Arch-specific resources created only for supported architectures, correctly named, and ready to use
- Unsupported-only arch annotations result in fail status in HCO `dataImportCronTemplates`
- Each alert fires and corresponding metric reports correct value (unsupported arch, disabled FG, missing annotation)
- Legacy DataSources remain backward-compatible
- VMs migrate to same-arch nodes during upgrades; arch-specific resources preserved
- Functional tests pass post-upgrade when FG enabled by default | | +| **Non-Functional Requirements (NFRs)** | [x] | - **Usability**: New boot sources and VMs creation
- **Monitoring**: 3 new alerts and metrics
- **Regression**: Run IUO T2 tests (must-gather, nodeplacement, golden images tests in particular)
- **Doc**: Should be documented by doc team
- **Portability**: Feature operates on AWS multiarch clusters
- **Upgrade**: VMs survive upgrade with correct architecture placement preserved | | #### **2. Technology and Design Review** @@ -124,10 +125,18 @@ Each goal should tie back to requirements from Section I and be independently ve **Functional Goals**: -- **[P0]** Verify HCO monitors the cluster's nodes architectures correctly, and updated in addition/removal of nodes. -- **[P0]** Verify golden images are annotated only with architectures that are actually supported in HCO+SSP. -- **[P0]** Verify related resources created only for golden images annotated with supported architecture, named with the architecture suffix, and are ready to use. -- **[P1]** Verify Golden images annotated only with unsupported architectures should present the fail status in HCO dataImportCronTemplates status. +- **[P0]** Verify HCO monitors the cluster's nodes architectures correctly, and updated in addition/removal of nodes (sig-iuo). +- **[P0]** Verify golden images are annotated only with architectures that are actually supported in HCO+SSP (sig-iuo). +- **[P0]** Verify related resources created only for golden images annotated with supported architecture, named with the architecture suffix, and are ready to use (sig-iuo). +- **[P1]** Verify Golden images annotated only with unsupported architectures should present the fail status in HCO dataImportCronTemplates status (sig-iuo). +- **[P0]** Verify arch-specific templates created with correct configurations ([sig-infra](https://issues.redhat.com/browse/CNV-76714)). +- **[P0]** Verify instance types expose architecture-specific hardware profiles (sig-infra). +- **[P0]** Verify VMs created from arch-specific templates run on matching architecture nodes (sig-infra). +- **[P0]** Verify CDI imports correct architecture-specific images via `platform.architecture` ([sig-storage](https://issues.redhat.com/browse/CNV-76732)). +- **[P0]** Verify arch-specific DataSources created with correct naming convention (sig-storage). +- **[P0]** Verify VMs scheduled only on nodes matching their CPU architecture ([sig-virt](https://issues.redhat.com/browse/CNV-26818)). +- **[P0]** Verify VM migration between same-architecture nodes works correctly (sig-virt). +- **[P1]** Verify `defaultCPUModel` integrations work with multiarch configurations ([sig-virt](https://issues.redhat.com/browse/CNV-74480)). **Monitoring Goals**: @@ -137,18 +146,43 @@ Each goal should tie back to requirements from Section I and be independently ve - **[P1]** Verify alert `HCOMultiArchGoldenImagesDisabled` not fired when running on a multi-arch cluster while Multiarch FG is disabled, but nodePlacement affecting only supported architectures. **Backward compatibility Goals**: -- **[P0]** Verify Legacy Datasources points to default arch-annotated Datasources. +- **[P0]** Verify Legacy Datasources points to default arch-annotated Datasources (sig-iuo). +- **[P0]** Verify legacy `DataSource` API backward-compatible with new CDI arch-specific naming (sig-storage). +- **[P0]** Verify custom golden images without arch annotation remain functional (sig-iuo). **Upgrade goals** -- **[P0]** Verify ARM64 and AMD64 vms are migrated across worker nodes of the same architecture during upgrades -- **[P0]** Verify related resources preserved after upgrade. -- **[P1]** Verify the functional test post-upgrade to version when FG is enabled by default. +- **[P0]** Verify ARM64 and AMD64 vms are migrated across worker nodes of the same architecture during upgrades (sig-iuo). +- **[P0]** Verify related resources preserved after upgrade (sig-iuo). +- **[P1]** Verify the functional test post-upgrade to version when FG is enabled by default (sig-iuo). +- **[P0]** Verify VM scheduling to correct arch nodes preserved post-upgrade (sig-virt). **Regression Goals** All participating-sigs should run t1 and t2 on multiarch clusters, to make sure functionality doesn't break. + +*sig-iuo:* + - **[P0]** sig-iuo t1 tests - **[P0]** sig-iuo t2 tests with both cpu-arch -- **[P1]** Monitring t2 tests with both cpu-arch. + +*sig-infra:* + +- **[P0]** sig-infra t1 tests +- **[P0]** sig-infra t2 tests with both cpu-arch + +*sig-storage:* + +- **[P0]** sig-storage t1 tests +- **[P0]** sig-storage t2 tests with both cpu-arch + +*sig-virt:* + +- **[P0]** sig-virt t1 tests +- **[P0]** sig-virt t2 tests with both cpu-arch + +*sig-network:* + +- **[P0]** sig-network t1 tests +- **[P0]** sig-network t2 tests with both cpu-arch @@ -163,18 +197,13 @@ that" issues; each out-of-scope item must have PM/Lead sign-off. **Note:** Replace example rows with your actual out-of-scope items. --> -| Non-Goal | Rationale | PM/ Lead Agreement | -|:-------------------------------------------|:--------------------------------------------------------------------------------------------------------------------------|:-------------------| -| Update existing VM | If a VM is already running, it won't use the arch-specific resources | [ ] Name/Date | -| Performance Testing | Feature not scale related | [ ] Name/Date | -| Security Testing | Feature not security related | [ ] Name/Date | -| Usability testing | Should be done by [UI team](https://issues.redhat.com/browse/CNV-61832) (4.20) | [ ] Name/Date | -| Compatibility | Should be done by [SSP/Infra team](https://issues.redhat.com/browse/CNV-76714) (create vms from multiple archs/templates) | [ ] Name/Date | -| Templates creation & utilization | Should be done by [SSP/Infra team](https://issues.redhat.com/browse/CNV-76714) | [ ] Name/Date | -| Imports & datasource new API | Should be done by [Storage team](https://issues.redhat.com/browse/CNV-76732) | [ ] Name/Date | -| Test VMs migration between same arch nodes | Should be done by [Virt team](https://issues.redhat.com/browse/CNV-26818) | [ ] Name/Date | -| 'defaultCPUModel' integrations | Should be done by [Virt team](https://issues.redhat.com/browse/CNV-74480) | [ ] Name/Date | -| Testing with s390x architecture | The feature is "Multiarch Support enablement for ARM" | [ ] Name/Date | +| Non-Goal | Rationale | PM/ Lead Agreement | +|:--------------------------------|:--------------------------------------------------------------------------------------------------|:-------------------| +| Update existing VM | Running VMs won't use new arch-specific resources | [ ] Name/Date | +| Performance Testing | Feature not scale related | [ ] Name/Date | +| Security Testing | Feature not security related | [ ] Name/Date | +| Usability Testing | Done by UI team in 4.20 ([CNV-61832](https://issues.redhat.com/browse/CNV-61832)) | [ ] Name/Date | +| Testing with s390x architecture | Feature scope is ARM enablement only | [ ] Name/Date | #### **2. Test Strategy** @@ -183,15 +212,15 @@ that" issues; each out-of-scope item must have PM/Lead sign-off. | Item | Description | Applicable (Y/N or N/A) | Comments | |:-------------------------------|:-------------------------------------------------------------------------------------------------------------------------------------------------------------|:------------------------|:-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| -| Functional Testing | Validates that the feature works according to specified requirements and user stories | Y | nodes architecture monitoring, arch-annotations, related resources creation | +| Functional Testing | Validates that the feature works according to specified requirements and user stories | Y | **sig-iuo:** node arch monitoring, arch-annotations, related resources creation, FG management
**sig-infra:** templates, instance types, VM creation from arch-specific templates
**sig-storage:** CDI imports, arch-specific DataSources, `platform.architecture` API
**sig-virt:** VM scheduling, migration between same-arch nodes, `defaultCPUModel` | | Automation Testing | Ensures test cases are automated for continuous integration and regression coverage | Y | All test cases should be automated at openshift-virtualization-tests repo. | | Performance Testing | Validates feature performance meets requirements (latency, throughput, resource usage) | N/A | Not related to scale. | | Security Testing | Verifies security requirements, RBAC, authentication, authorization, and vulnerability scanning | N/A | Not related to security. | | Usability Testing | Validates user experience, UI/UX consistency, and accessibility requirements. Does the feature require UI? If so, ensure the UI aligns with the requirements | Y | [UI/UX design doc](https://docs.google.com/document/d/18UKIXiAlyLTABQZdvDD5N85A6uM2CdBbif4eN1dVj-0/edit?usp=sharing) specify requirements.
Done by UI team [CNV-62535](https://issues.redhat.com/browse/CNV-62535) | -| Compatibility Testing | Ensures feature works across supported platforms, versions, and configurations | N/A | Should be done by [SSP/Infra team](https://issues.redhat.com/browse/CNV-76714)/[Virt team](https://issues.redhat.com/browse/CNV-26818) | -| Regression Testing | Verifies that new changes do not break existing functionality | Y | Run IUO T2 tests (must-gather, nodeplacement, golden images tests in particular) to verify no regressions in existing functionality. | -| Upgrade Testing | Validates upgrade paths from previous versions, data migration, and configuration preservation | Y | VMs migrated and updated successfully, related resources preserved. | -| Backward Compatibility Testing | Ensures feature maintains compatibility with previous API versions and configurations | Y | Legacy Datasource pointers, custom golden images without arch annotation. | +| Compatibility Testing | Ensures feature works across supported platforms, versions, and configurations | Y | sig-infra owns cross-arch VM creation validation ([CNV-76714](https://issues.redhat.com/browse/CNV-76714)) | +| Regression Testing | Verifies that new changes do not break existing functionality | Y | All participating SIGs run t1 and t2 on multiarch clusters. sig-iuo: must-gather, nodeplacement, golden images; other SIGs per their scope. | +| Upgrade Testing | Validates upgrade paths from previous versions, data migration, and configuration preservation | Y | **sig-iuo:** VMs migrated and updated successfully, related resources preserved
**sig-virt:** VM scheduling to correct arch nodes preserved post-upgrade | +| Backward Compatibility Testing | Ensures feature maintains compatibility with previous API versions and configurations | Y | **sig-iuo:** Legacy Datasource pointers, custom golden images without arch annotation
**sig-storage:** Legacy `DataSource` API backward-compatible with new CDI arch-specific naming | | Dependencies | Dependent on deliverables from other components/products? Identify what is tested by which team. | Y | Allowing multi-cpu architecture on openshift-virtualization-tests. Review the [PR](https://github.com/RedHatQE/openshift-virtualization-tests/pull/3147) whenever its ready to review. | | Cross Integrations | Does the feature affect other features/require testing by other components? Identify what is tested by which team. | Y | **IUO**: HCO node architecture tracking (`status.nodeInfo`), FG activation/propagation, new metrics & alerts, upgrade
**[SSP/Infra](https://issues.redhat.com/browse/CNV-76714)**: Templates creation & utilization, new SSP API (`enableMultipleArchitectures`, `cluster` fields)
**[Storage](https://issues.redhat.com/browse/CNV-76732)**: CDI-importer architecture selection, legacy `DataSource` backward compatibility, new CDI `platform` API
**[Virt](https://issues.redhat.com/browse/CNV-26818)**: VM scheduling to correct architecture nodes, VM migration between same-arch nodes, upgrade, defaultCPUModel
**[Network](https://issues.redhat.com/browse/CNV-76741)**: Network-related multiarch testing | | Monitoring | Does the feature require metrics and/or alerts? | Y | **Alerts + Metrics (T2):**
[`HCOGoldenImageWithNoSupportedArchitecture`](https://kubevirt.io/monitoring/runbooks/HCOGoldenImageWithNoSupportedArchitecture) + `kubevirt_hco_dataimportcrontemplate_with_supported_architectures`,
[`HCOMultiArchGoldenImagesDisabled`](https://kubevirt.io/monitoring/runbooks/HCOMultiArchGoldenImagesDisabled.html) + `kubevirt_hco_multi_arch_boot_images_enabled`,
[`HCOGoldenImageWithNoArchitectureAnnotation`](https://kubevirt.io/monitoring/runbooks/HCOGoldenImageWithNoArchitectureAnnotation) + `kubevirt_hco_dataimportcrontemplate_with_architecture_annotation` | @@ -201,18 +230,18 @@ that" issues; each out-of-scope item must have PM/Lead sign-off. -| Environment Component | Configuration | Comments | -|:----------------------------------------------|:-------------------------|:--------------------------------------------------------------| -| **Cluster Topology** | MultiArch cluster | 3 control-plane and 3-4 worker nodes | -| **OCP & OpenShift Virtualization Version(s)** | OCP 4.22, CNV-4.22 | OCP 4.22 and OpenShift Virtualization 4.22 | -| **CPU Virtualization** | Multi-arch cluster | 3 amd64 control-plane, 2 amd64 workers, and 1-2 arm64 workers | -| **Compute Resources** | N/A | No special compute requirements | -| **Special Hardware** | N/A | No special hardware required | -| **Storage** | io2-csi storage class | AWS EBS io2 CSI driver | -| **Network** | OVN-Kubernetes (default) | No special network requirements | -| **Required Operators** | N/A | N/A | -| **Platform** | AWS | ARM64 workers available on AWS | -| **Special Configurations** | N/A | No special configurations required | +| Environment Component | Configuration | Comments | +|:----------------------------------------------|:-------------------------|:------------------------------------------------------------| +| **Cluster Topology** | MultiArch cluster | 3 control-plane and 3-4 worker nodes | +| **OCP & OpenShift Virtualization Version(s)** | OCP 4.22, CNV-4.22 | OCP 4.22 and OpenShift Virtualization 4.22 | +| **CPU Virtualization** | Multi-arch cluster | 3 amd64 control-plane, 2 amd64 workers, and 2 arm64 workers | +| **Compute Resources** | N/A | No special compute requirements | +| **Special Hardware** | N/A | No special hardware required | +| **Storage** | io2-csi storage class | AWS EBS io2 CSI driver | +| **Network** | OVN-Kubernetes (default) | No special network requirements | +| **Required Operators** | N/A | N/A | +| **Platform** | AWS | ARM64 workers available on AWS | +| **Special Configurations** | N/A | No special configurations required | #### **3.1. Testing Tools & Frameworks** @@ -223,7 +252,7 @@ Leave empty if using standard tools. --> | Category | Tools/Frameworks | |:-------------------|:------------------------------------------| | **Test Framework** | MultiArch cluster | -| **CI/CD** | Dedicated t2 jenkins job with --cpu-archs | +| **CI/CD** | Dedicated t2 jenkins jobs with multiarch markers:
- `test-pytest-cnv-4.22-iuo-multiarch`
- `test-pytest-cnv-4.22-ssp-multiarch`
- `test-pytest-cnv-4.22-storage-multiarch`
- `test-pytest-cnv-4.22-virt-multiarch` | | **Other Tools** | | #### **4. Entry Criteria** @@ -270,7 +299,9 @@ with justification. --> --- -### **III. Test Scenarios & Traceability** +### **III. Test Scenarios & Traceability (HCO)** + +This table covers HCO (sig-iuo) test cases only. Other participating SIGs track their scenarios in their own deliverables. @@ -295,6 +326,7 @@ tested. --> | | VMs migrate to same-arch nodes during upgrades | Verify ARM64 and AMD64 VMs migrate to same-architecture worker nodes during upgrades | Tier 2 | P0 | | | Arch-specific resources preserved after upgrade | Verify arch-specific resources persist after upgrade | Tier 2 | P0 | | | Functional validation post-upgrade when FG enabled by default | Verify functional tests pass post-upgrade to version with Multiarch FG enabled by default | Tier 2 | P1 | +| | Custom golden images without arch annotation remain functional | Verify custom golden images without arch annotation remain functional | Tier 2 | P0 | --- From 801c6df16c2c84737a0b841fe687a66e7a049bdf Mon Sep 17 00:00:00 2001 From: Harel Meir Date: Mon, 23 Feb 2026 14:07:52 +0200 Subject: [PATCH 7/8] small modifications Signed-off-by: Harel Meir --- stps/sig-iuo/multiarch_arm_support.md | 100 ++++++++++---------------- 1 file changed, 38 insertions(+), 62 deletions(-) diff --git a/stps/sig-iuo/multiarch_arm_support.md b/stps/sig-iuo/multiarch_arm_support.md index c02865c..436a78b 100644 --- a/stps/sig-iuo/multiarch_arm_support.md +++ b/stps/sig-iuo/multiarch_arm_support.md @@ -54,8 +54,8 @@ technology, and testability before formal test planning. | **Understand Value** | [x] | - Enable users to create persistent VMs with specific architectures on heterogeneous clusters.
- Allow users to define custom golden images with multi-architecture support.
- Reliable VM deployment by automatically matching image architectures to compatible node hardware without breaking existing workflows.
- Backward compatibility for users with existing scripts/tools referencing specific DataSource CRs.
- Unified monitoring provides consistent observability across x86 and ARM workloads. | | | **Customer Use Cases** | [x] | **UC1 - Hybrid Development/Testing**: Developer builds app targeting x86 servers and ARM edge devices, running test VMs for both architectures in a single cluster.
**UC2 - Edge + Data Center Integration**: Operator provisions ARM VMs for edge workloads and x86 VMs for core workloads within the same management plane.
**UC3 - ISV Application Validation**: QA team runs ARM VM test environments in the same cluster used for x86-based CI/CD pipelines. | | | **Testability** | [X] | Everything is testable, despite upgrade to a version which this FG enabled by default. (currently disabled by default) | Should be done in 4.23 timeframe | -| **Acceptance Criteria** | [x] | - HCO & SSP accurately reports node architectures in `status.nodeInfo`
- Golden images annotated only with supported architectures
- Arch-specific resources created only for supported architectures, correctly named, and ready to use
- Unsupported-only arch annotations result in fail status in HCO `dataImportCronTemplates`
- Each alert fires and corresponding metric reports correct value (unsupported arch, disabled FG, missing annotation)
- Legacy DataSources remain backward-compatible
- VMs migrate to same-arch nodes during upgrades; arch-specific resources preserved
- Functional tests pass post-upgrade when FG enabled by default | | -| **Non-Functional Requirements (NFRs)** | [x] | - **Usability**: New boot sources and VMs creation
- **Monitoring**: 3 new alerts and metrics
- **Regression**: Run IUO T2 tests (must-gather, nodeplacement, golden images tests in particular)
- **Doc**: Should be documented by doc team
- **Portability**: Feature operates on AWS multiarch clusters
- **Upgrade**: VMs survive upgrade with correct architecture placement preserved | | +| **Acceptance Criteria** | [x] | - HCO & SSP accurately reports node architectures in `status.nodeInfo`
- Golden images annotated and arch-specific resources created only for supported architectures; resources correctly named and ready to use
- Unsupported-only arch annotations result in fail status in HCO `dataImportCronTemplates`
- Each alert fires and corresponding metric reports correct value (unsupported arch, disabled FG, missing annotation)
- Legacy DataSources remain backward-compatible<
| | +| **Non-Functional Requirements (NFRs)** | [x] | - **Usability**: New boot sources and VMs creation
- **Monitoring**: 3 new alerts and metrics
- **Regression**: Run IUO T2 tests
- **Portability**: Feature operates on AWS multiarch clusters
- **Upgrade**: VMs survive upgrade with correct architecture placement preserved
- **Doc**: Drop the TP note from docs
| | #### **2. Technology and Design Review** @@ -126,63 +126,37 @@ Each goal should tie back to requirements from Section I and be independently ve **Functional Goals**: - **[P0]** Verify HCO monitors the cluster's nodes architectures correctly, and updated in addition/removal of nodes (sig-iuo). -- **[P0]** Verify golden images are annotated only with architectures that are actually supported in HCO+SSP (sig-iuo). -- **[P0]** Verify related resources created only for golden images annotated with supported architecture, named with the architecture suffix, and are ready to use (sig-iuo). -- **[P1]** Verify Golden images annotated only with unsupported architectures should present the fail status in HCO dataImportCronTemplates status (sig-iuo). +- **[P0]** Verify golden images are annotated only with supported architectures and related resources are created only for those, named with the architecture suffix (sig-iuo). +- **[P1]** Verify golden images annotated only with unsupported architectures present fail status in HCO dataImportCronTemplates status (sig-iuo). - **[P0]** Verify arch-specific templates created with correct configurations ([sig-infra](https://issues.redhat.com/browse/CNV-76714)). -- **[P0]** Verify instance types expose architecture-specific hardware profiles (sig-infra). - **[P0]** Verify VMs created from arch-specific templates run on matching architecture nodes (sig-infra). - **[P0]** Verify CDI imports correct architecture-specific images via `platform.architecture` ([sig-storage](https://issues.redhat.com/browse/CNV-76732)). -- **[P0]** Verify arch-specific DataSources created with correct naming convention (sig-storage). - **[P0]** Verify VMs scheduled only on nodes matching their CPU architecture ([sig-virt](https://issues.redhat.com/browse/CNV-26818)). - **[P0]** Verify VM migration between same-architecture nodes works correctly (sig-virt). -- **[P1]** Verify `defaultCPUModel` integrations work with multiarch configurations ([sig-virt](https://issues.redhat.com/browse/CNV-74480)). - **Monitoring Goals**: - **[P1]** Verify alert `HCOGoldenImageWithNoSupportedArchitecture` fires when DataImportCronTemplate annotated with unsupported architectures, and `kubevirt_hco_dataimportcrontemplate_with_supported_architectures` metric reports the appropriate value. -- **[P1]** Verify alert `HCOMultiArchGoldenImagesDisabled` fires when running on a multi-arch cluster while Multiarch FG is disabled, and `kubevirt_hco_multi_arch_boot_images_enabled` metric reports the appropriate value +- **[P1]** Verify alert `HCOMultiArchGoldenImagesDisabled` fires when running on a multi-arch cluster while Multiarch FG is disabled, and `kubevirt_hco_multi_arch_boot_images_enabled` metric reports the appropriate value. - **[P1]** Verify alert `HCOGoldenImageWithNoArchitectureAnnotation` fires when a custom golden image lacks an architecture annotation, and `kubevirt_hco_dataimportcrontemplate_with_architecture_annotation` metric reports the appropriate value. -- **[P1]** Verify alert `HCOMultiArchGoldenImagesDisabled` not fired when running on a multi-arch cluster while Multiarch FG is disabled, but nodePlacement affecting only supported architectures. -**Backward compatibility Goals**: +**Backward Compatibility Goals**: - **[P0]** Verify Legacy Datasources points to default arch-annotated Datasources (sig-iuo). -- **[P0]** Verify legacy `DataSource` API backward-compatible with new CDI arch-specific naming (sig-storage). - **[P0]** Verify custom golden images without arch annotation remain functional (sig-iuo). +- **[P0]** Verify legacy `DataSource` API backward-compatible with new CDI arch-specific naming (sig-storage). -**Upgrade goals** -- **[P0]** Verify ARM64 and AMD64 vms are migrated across worker nodes of the same architecture during upgrades (sig-iuo). -- **[P0]** Verify related resources preserved after upgrade (sig-iuo). -- **[P1]** Verify the functional test post-upgrade to version when FG is enabled by default (sig-iuo). -- **[P0]** Verify VM scheduling to correct arch nodes preserved post-upgrade (sig-virt). - -**Regression Goals** -All participating-sigs should run t1 and t2 on multiarch clusters, to make sure functionality doesn't break. - -*sig-iuo:* - -- **[P0]** sig-iuo t1 tests -- **[P0]** sig-iuo t2 tests with both cpu-arch - -*sig-infra:* - -- **[P0]** sig-infra t1 tests -- **[P0]** sig-infra t2 tests with both cpu-arch - -*sig-storage:* - -- **[P0]** sig-storage t1 tests -- **[P0]** sig-storage t2 tests with both cpu-arch - -*sig-virt:* +**Upgrade Goals**: +- **[P0]** Verify ARM64 and AMD64 VMs are migrated to same-architecture nodes during upgrades and related resources are preserved (sig-iuo, sig-virt). +- **[P0]** Verify functional tests pass post-upgrade to version when FG is enabled by default (all sigs). -- **[P0]** sig-virt t1 tests -- **[P0]** sig-virt t2 tests with both cpu-arch +**Regression Goals**: -*sig-network:* +Regression testing ensures that existing functionality continues to work correctly on multiarch clusters after the introduction of multiarch support. Each participating SIG must run its Tier 1 (functional) and Tier 2 (end-to-end) test suites on multiarch clusters with both CPU architectures to confirm no regressions are introduced. -- **[P0]** sig-network t1 tests -- **[P0]** sig-network t2 tests with both cpu-arch +- **[P0]** sig-iuo: Run Tier 1 and Tier 2 test suites on multiarch clusters with both CPU architectures. +- **[P0]** sig-infra: Run Tier 1 and Tier 2 test suites on multiarch clusters with both CPU architectures. +- **[P0]** sig-storage: Run Tier 1 and Tier 2 test suites on multiarch clusters with both CPU architectures. +- **[P0]** sig-virt: Run Tier 1 and Tier 2 test suites on multiarch clusters with both CPU architectures. +- **[P0]** sig-network: Run Tier 1 and Tier 2 test suites on multiarch clusters with both CPU architectures. @@ -204,6 +178,7 @@ that" issues; each out-of-scope item must have PM/Lead sign-off. | Security Testing | Feature not security related | [ ] Name/Date | | Usability Testing | Done by UI team in 4.20 ([CNV-61832](https://issues.redhat.com/browse/CNV-61832)) | [ ] Name/Date | | Testing with s390x architecture | Feature scope is ARM enablement only | [ ] Name/Date | +| Regression with on single-arch cluster | Our existing scheduled runs already covering it | [ ] Name/Date | #### **2. Test Strategy** @@ -212,18 +187,18 @@ that" issues; each out-of-scope item must have PM/Lead sign-off. | Item | Description | Applicable (Y/N or N/A) | Comments | |:-------------------------------|:-------------------------------------------------------------------------------------------------------------------------------------------------------------|:------------------------|:-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| -| Functional Testing | Validates that the feature works according to specified requirements and user stories | Y | **sig-iuo:** node arch monitoring, arch-annotations, related resources creation, FG management
**sig-infra:** templates, instance types, VM creation from arch-specific templates
**sig-storage:** CDI imports, arch-specific DataSources, `platform.architecture` API
**sig-virt:** VM scheduling, migration between same-arch nodes, `defaultCPUModel` | +| Functional Testing | Validates that the feature works according to specified requirements and user stories | Y | | | Automation Testing | Ensures test cases are automated for continuous integration and regression coverage | Y | All test cases should be automated at openshift-virtualization-tests repo. | | Performance Testing | Validates feature performance meets requirements (latency, throughput, resource usage) | N/A | Not related to scale. | | Security Testing | Verifies security requirements, RBAC, authentication, authorization, and vulnerability scanning | N/A | Not related to security. | | Usability Testing | Validates user experience, UI/UX consistency, and accessibility requirements. Does the feature require UI? If so, ensure the UI aligns with the requirements | Y | [UI/UX design doc](https://docs.google.com/document/d/18UKIXiAlyLTABQZdvDD5N85A6uM2CdBbif4eN1dVj-0/edit?usp=sharing) specify requirements.
Done by UI team [CNV-62535](https://issues.redhat.com/browse/CNV-62535) | | Compatibility Testing | Ensures feature works across supported platforms, versions, and configurations | Y | sig-infra owns cross-arch VM creation validation ([CNV-76714](https://issues.redhat.com/browse/CNV-76714)) | -| Regression Testing | Verifies that new changes do not break existing functionality | Y | All participating SIGs run t1 and t2 on multiarch clusters. sig-iuo: must-gather, nodeplacement, golden images; other SIGs per their scope. | -| Upgrade Testing | Validates upgrade paths from previous versions, data migration, and configuration preservation | Y | **sig-iuo:** VMs migrated and updated successfully, related resources preserved
**sig-virt:** VM scheduling to correct arch nodes preserved post-upgrade | +| Regression Testing | Verifies that new changes do not break existing functionality | Y | All participating SIGs run t1 and t2 on multiarch clusters. | +| Upgrade Testing | Validates upgrade paths from previous versions, data migration, and configuration preservation | Y | **sig-iuo & sig-virt:** VMs migrated and updated successfully, related resources preserved
| | Backward Compatibility Testing | Ensures feature maintains compatibility with previous API versions and configurations | Y | **sig-iuo:** Legacy Datasource pointers, custom golden images without arch annotation
**sig-storage:** Legacy `DataSource` API backward-compatible with new CDI arch-specific naming | | Dependencies | Dependent on deliverables from other components/products? Identify what is tested by which team. | Y | Allowing multi-cpu architecture on openshift-virtualization-tests. Review the [PR](https://github.com/RedHatQE/openshift-virtualization-tests/pull/3147) whenever its ready to review. | | Cross Integrations | Does the feature affect other features/require testing by other components? Identify what is tested by which team. | Y | **IUO**: HCO node architecture tracking (`status.nodeInfo`), FG activation/propagation, new metrics & alerts, upgrade
**[SSP/Infra](https://issues.redhat.com/browse/CNV-76714)**: Templates creation & utilization, new SSP API (`enableMultipleArchitectures`, `cluster` fields)
**[Storage](https://issues.redhat.com/browse/CNV-76732)**: CDI-importer architecture selection, legacy `DataSource` backward compatibility, new CDI `platform` API
**[Virt](https://issues.redhat.com/browse/CNV-26818)**: VM scheduling to correct architecture nodes, VM migration between same-arch nodes, upgrade, defaultCPUModel
**[Network](https://issues.redhat.com/browse/CNV-76741)**: Network-related multiarch testing | -| Monitoring | Does the feature require metrics and/or alerts? | Y | **Alerts + Metrics (T2):**
[`HCOGoldenImageWithNoSupportedArchitecture`](https://kubevirt.io/monitoring/runbooks/HCOGoldenImageWithNoSupportedArchitecture) + `kubevirt_hco_dataimportcrontemplate_with_supported_architectures`,
[`HCOMultiArchGoldenImagesDisabled`](https://kubevirt.io/monitoring/runbooks/HCOMultiArchGoldenImagesDisabled.html) + `kubevirt_hco_multi_arch_boot_images_enabled`,
[`HCOGoldenImageWithNoArchitectureAnnotation`](https://kubevirt.io/monitoring/runbooks/HCOGoldenImageWithNoArchitectureAnnotation) + `kubevirt_hco_dataimportcrontemplate_with_architecture_annotation` | +| Monitoring | Does the feature require metrics and/or alerts? | Y | **Alerts + Metrics:**
[`HCOGoldenImageWithNoSupportedArchitecture`](https://kubevirt.io/monitoring/runbooks/HCOGoldenImageWithNoSupportedArchitecture) + `kubevirt_hco_dataimportcrontemplate_with_supported_architectures`,
[`HCOMultiArchGoldenImagesDisabled`](https://kubevirt.io/monitoring/runbooks/HCOMultiArchGoldenImagesDisabled.html) + `kubevirt_hco_multi_arch_boot_images_enabled`,
[`HCOGoldenImageWithNoArchitectureAnnotation`](https://kubevirt.io/monitoring/runbooks/HCOGoldenImageWithNoArchitectureAnnotation) + `kubevirt_hco_dataimportcrontemplate_with_architecture_annotation` | | Cloud Testing | Does the feature require multi-cloud platform testing? Consider cloud-specific features. | Y | Testing environment AWS cluster | #### **3. Test Environment** @@ -278,7 +253,7 @@ with justification. --> | Test Coverage | Should be coordinated with all cnv-sigs | Review & sync with other sigs | [ ] | | Test Environment | N/A | N/A | [X] | | Untestable Aspects | N/A | N/A | [X] | -| Resource Constraints | MultiArch cluster available only for 12 hours | Test automation on HA cluster first, final verification on MultiArch | [X] | +| Resource Constraints | MultiArch cluster available only for 12 hours; limited number of AWS clusters available | Test automation on HA cluster first, final verification on MultiArch. Verify that DevOps are investigating increasing the number of available AWS clusters. | [ ] | | Dependencies | Allowing multi-cpu architecture on openshift-virtualization-tests | Review the [PR](https://github.com/RedHatQE/openshift-virtualization-tests/pull/3147) whenever its ready to review. | [ ] | | Blocker Bug for legacy DataSources | [CNV-75762](https://issues.redhat.com/browse/CNV-75762) | on POST - Storage QE to verify | [ ] | | Other non-blocker bugs | 1. [[UI] architecture is incorrect for fedora arm and inconsistent on UI for other os](https://issues.redhat.com/browse/CNV-68981)
2. [[Storage] Arch-specific DataSources (arm64) persist after removing arm64 nodes](https://issues.redhat.com/browse/CNV-68996) | Make sure they are fixed & verified | [ ] | @@ -312,21 +287,22 @@ tested. --> **Requirement Summary:** Brief description from the Jira issue (user story format preferred) --> -| Requirement ID | Requirement Summary | Test Scenario(s) | Tier | Priority | -|:--------------------------------------------------------|:--------------------------------------------------------------------------|:----------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:-------|:---------| -| [CNV-67900](https://issues.redhat.com/browse/CNV-67900) | HCO detects and reports node architectures | Verify HCO monitors node architectures correctly, including on node addition/removal | Tier 1 | P0 | -| | Golden images annotated only with supported architectures | Verify golden images are annotated only with architectures actually supported in HCO+SSP | Tier 1 | P0 | -| | Arch-specific resources created and ready to use | Verify arch-specific resources created for supported architectures, named with architecture suffix, and are ready to use | Tier 2 | P0 | -| | Fail status for golden images with only unsupported architectures | Verify HCO `dataImportCronTemplates` status shows failure for golden images annotated only with unsupported architectures | Tier 1 | P1 | -| | Alert and metric for golden image with unsupported architecture | Verify `HCOGoldenImageWithNoSupportedArchitecture` alert fires and `kubevirt_hco_dataimportcrontemplate_with_supported_architectures` metric reports the appropriate value | Tier 2 | P1 | -| | Alert and metric for disabled Multiarch FG on multi-arch cluster | Verify `HCOMultiArchGoldenImagesDisabled` alert fires and `kubevirt_hco_multi_arch_boot_images_enabled` metric reports the appropriate value | Tier 2 | P1 | +| Requirement ID | Requirement Summary | Test Scenario(s) | Tier | Priority | +|:--------------------------------------------------------|:--------------------------------------------------------------------------|:---------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:-------|:---------| +| [CNV-67900](https://issues.redhat.com/browse/CNV-67900) | HCO detects and reports node architectures | Verify HCO monitors node architectures correctly | Tier 1 | P0 | +| | Golden images annotated only with supported architectures | Verify golden images are annotated only with architectures actually supported in HCO+SSP | Tier 1 | P0 | +| | Arch-specific resources created and ready to use | Verify arch-specific resources created for supported architectures, named with architecture suffix, and are ready to use | Tier 2 | P0 | +| | Fail status for golden images with only unsupported architectures | Verify HCO `dataImportCronTemplates` status shows failure for dataImportCronTemplate annotated only with unsupported architectures | Tier 1 | P1 | +| [CNV-67900](https://issues.redhat.com/browse/CNV-67900) | Alert and metric for golden image with unsupported architecture | Verify `HCOGoldenImageWithNoSupportedArchitecture` alert fires and `kubevirt_hco_dataimportcrontemplate_with_supported_architectures` metric reports the appropriate value | Tier 2 | P1 | +| | Alert and metric for disabled Multiarch FG on multi-arch cluster | Verify `HCOMultiArchGoldenImagesDisabled` alert fires and `kubevirt_hco_multi_arch_boot_images_enabled` metric reports the appropriate value | Tier 2 | P1 | | | Alert and metric for custom golden image missing arch annotation | Verify `HCOGoldenImageWithNoArchitectureAnnotation` alert fires and `kubevirt_hco_dataimportcrontemplate_with_architecture_annotation` metric reports the appropriate value | Tier 2 | P1 | -| | Alert not fired when nodePlacement limits to supported architectures only | Verify `HCOMultiArchGoldenImagesDisabled` not fired when Multiarch FG is disabled but nodePlacement restricts to supported architectures only | Tier 2 | P1 | -| | Legacy DataSources remain backward-compatible | Verify legacy DataSources point to default arch-annotated DataSources | Tier 2 | P0 | -| | VMs migrate to same-arch nodes during upgrades | Verify ARM64 and AMD64 VMs migrate to same-architecture worker nodes during upgrades | Tier 2 | P0 | -| | Arch-specific resources preserved after upgrade | Verify arch-specific resources persist after upgrade | Tier 2 | P0 | -| | Functional validation post-upgrade when FG enabled by default | Verify functional tests pass post-upgrade to version with Multiarch FG enabled by default | Tier 2 | P1 | -| | Custom golden images without arch annotation remain functional | Verify custom golden images without arch annotation remain functional | Tier 2 | P0 | +| | Alert not fired when nodePlacement limits to supported architectures only | Verify `HCOMultiArchGoldenImagesDisabled` not fired when Multiarch FG is disabled but nodePlacement restricts to supported architectures only | Tier 2 | P1 | +| [CNV-67900](https://issues.redhat.com/browse/CNV-67900) | Legacy DataSources remain backward-compatible | Verify legacy DataSources point to default arch-annotated DataSources | Tier 2 | P0 | +| | Custom golden images without arch annotation remain functional | Verify custom golden images without arch annotation remain functional | Tier 2 | P0 | +| [CNV-67900](https://issues.redhat.com/browse/CNV-67900) | VMs migrate to same-arch nodes during upgrades | Verify ARM64 and AMD64 VMs migrate to same-architecture worker nodes during upgrades, and related resources preserved | Tier 2 | P0 | +| | Functional validation post-upgrade when FG enabled by default | Verify functional tests pass post-upgrade to version with Multiarch FG enabled by default | Tier 2 | P1 | + + --- From 3b3c5b41746025074684a169329ac43d9349e7ca Mon Sep 17 00:00:00 2001 From: Harel Meir Date: Wed, 11 Mar 2026 17:29:12 +0200 Subject: [PATCH 8/8] Shorten the motivation section an update tests cases Signed-off-by: Harel Meir --- stps/sig-iuo/multiarch_arm_support.md | 122 +++++++++++++------------- stps/stp-template/stp.md | 1 - 2 files changed, 61 insertions(+), 62 deletions(-) diff --git a/stps/sig-iuo/multiarch_arm_support.md b/stps/sig-iuo/multiarch_arm_support.md index 436a78b..46e3a12 100644 --- a/stps/sig-iuo/multiarch_arm_support.md +++ b/stps/sig-iuo/multiarch_arm_support.md @@ -48,14 +48,14 @@ technology, and testability before formal test planning. 2. **Details/Notes column**: Summary of the topic (e.g., list key requirements, describe customer value, note acceptance criteria) 3. **Comments column**: Document any concerns, gaps, or follow-up items needed --> -| Check | Done | Details/Notes | Comments | -|:---------------------------------------|:-----|:-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:---------------------------------| -| **Review Requirements** | [x] | - VMs must only run on nodes supporting their architecture (e.g., ARM VMs on ARM nodes)
- Golden images created and managed per-architecture with correct DataImportCron/DataSource resources
- Instance types and preferences support architecture-specific configurations
- Unified monitoring and logging across multiarch VMs | | -| **Understand Value** | [x] | - Enable users to create persistent VMs with specific architectures on heterogeneous clusters.
- Allow users to define custom golden images with multi-architecture support.
- Reliable VM deployment by automatically matching image architectures to compatible node hardware without breaking existing workflows.
- Backward compatibility for users with existing scripts/tools referencing specific DataSource CRs.
- Unified monitoring provides consistent observability across x86 and ARM workloads. | | -| **Customer Use Cases** | [x] | **UC1 - Hybrid Development/Testing**: Developer builds app targeting x86 servers and ARM edge devices, running test VMs for both architectures in a single cluster.
**UC2 - Edge + Data Center Integration**: Operator provisions ARM VMs for edge workloads and x86 VMs for core workloads within the same management plane.
**UC3 - ISV Application Validation**: QA team runs ARM VM test environments in the same cluster used for x86-based CI/CD pipelines. | | -| **Testability** | [X] | Everything is testable, despite upgrade to a version which this FG enabled by default. (currently disabled by default) | Should be done in 4.23 timeframe | -| **Acceptance Criteria** | [x] | - HCO & SSP accurately reports node architectures in `status.nodeInfo`
- Golden images annotated and arch-specific resources created only for supported architectures; resources correctly named and ready to use
- Unsupported-only arch annotations result in fail status in HCO `dataImportCronTemplates`
- Each alert fires and corresponding metric reports correct value (unsupported arch, disabled FG, missing annotation)
- Legacy DataSources remain backward-compatible<
| | -| **Non-Functional Requirements (NFRs)** | [x] | - **Usability**: New boot sources and VMs creation
- **Monitoring**: 3 new alerts and metrics
- **Regression**: Run IUO T2 tests
- **Portability**: Feature operates on AWS multiarch clusters
- **Upgrade**: VMs survive upgrade with correct architecture placement preserved
- **Doc**: Drop the TP note from docs
| | +| Check | Done | Details/Notes | Comments | +|:---------------------------------------|:-----|:------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:---------------------------------| +| **Review Requirements** | [x] | - VMs can be created on Multiarch cluster using openshift-virtualization API's(e.g instancetypes,templates,golden-images,etc..)
- VMs must only run,managed and migrate on/to nodes supporting their architecture (e.g., ARM VMs on ARM nodes)
- Multiarch Nodes labeled to differentiate supported architectures.
- ARM-compatible images can be imported and run
- Unified monitoring and logging across Multiarch VMs | | +| **Understand Value** | [x] | - Allow users to run ARM-based VMs alongside x86-based VMs within the same OpenShift Virtualization cluster, enabling a true multiarch environment.
- Allow users to define custom golden images with multi-architecture support for easier import & VMs creation. | | +| **Customer Use Cases** | [x] | - **Developers**: Test and validate apps across x86 and ARM without separate infrastructure
- **Admins**: Consolidate heterogeneous workloads on a single cluster
- **Enterprises/ISVs**: Reduce cost and complexity when adopting ARM for edge and efficiency | | +| **Testability** | [X] | Everything is testable, despite upgrade to a version which this FG enabled by default. (currently disabled by default) | Should be done in 4.23 timeframe | +| **Acceptance Criteria** | [x] | - HCO accurately reports node architectures
- VMs can be created and managed on Multiarch cluster using standard OpenShift Virtualization APIs.
- Golden images annotated and arch-specific related resources created only for supported architectures,named with arch suffix and ready to use
- Unsupported-only arch annotations result in fail status in HCO `dataImportCronTemplates` - CDI pulls new images according to the image's arch and pullMethod(pod/node).
- VMs are scheduled and migrated successfully. | | +| **Non-Functional Requirements (NFRs)** | [x] | - **Monitoring**: 3 new alerts and metrics
- **Regression**: All teams should run T1+T2 tests on Multiarch cluster
- **Upgrade**: VMs survive upgrade with correct architecture placement preserved
- **UI**: New boot sources and VMs creation pages(templates/instancetype)
- **Doc**: Drop the TP note from docs
| | #### **2. Technology and Design Review** @@ -64,13 +64,13 @@ technology, and testability before formal test planning. 2. **Details/Notes column**: Summary of the item (e.g., list technology challenges, special environment needs, significant API changes) 3. **Comments column**: Note any blockers, risks, or items requiring follow-up --> -| Check | Done | Details/Notes | Comments | -|:---------------------------------|:-----|:-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:-------------------------------------------------------| -| **Developer Handoff/QE Kickoff** | [x] | Met with Nahshon from HCO team | | -| **Technology Challenges** | [x] | Can use HA cluster, but should be verified on Multiarch cluster which is available only for 12 hours | Initial testing on HA, final verification on Multiarch | -| **Test Environment Needs** | [x] | MultiArch cluster, HA cluster | | -| **API Extensions** | [x] | **HCO**: `status.nodeInfo` (controlPlaneArchitectures, workloadsArchitectures), `status.dataImportCronTemplates` (originalSupportedArchitectures, conditions)
**SSP**: `enableMultipleArchitectures`, `cluster` fields (workloadArchitectures, controlPlaneArchitectures)
**CDI**: `platform.architecture` field in `DataVolumeSourceRegistry`, arch-specific `DataSource` (`-`), legacy `DataSource` redirects to arch-specific one | | -| **Topology Considerations** | [x] | Related resources should be created per worker node architecture. Currently its ARM64 and AMD64. | | +| Check | Done | Details/Notes | Comments | +|:---------------------------------|:-----|:-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:-------------------------------------------------------| +| **Developer Handoff/QE Kickoff** | [x] | Met with Nahshon from HCO team | | +| **Technology Challenges** | [x] | Can use HA cluster, but should be verified on Multiarch cluster which is available only for 12 hours | Initial testing on HA, final verification on Multiarch | +| **Test Environment Needs** | [x] | MultiArch cluster, HA cluster | | +| **API Extensions** | [x] | **HCO**: `status.nodeInfo` (controlPlaneArchitectures, workloadsArchitectures), `status.dataImportCronTemplates` (originalSupportedArchitectures, conditions)
**SSP**: `enableMultipleArchitectures`, `cluster` fields (workloadArchitectures, controlPlaneArchitectures)
**CDI**: `platform.architecture` field in `DataVolumeSourceRegistry`, arch-specific `DataSource` (`-`), legacy `DataSource` redirects to arch-specific one
**Kubevirt**: `VirtualMachineInstanceSpec.Architecture` to target architecture | | +| **Topology Considerations** | [x] | Related resources should be created per worker node architecture. Currently its ARM64 and AMD64. | | ### **II. Software Test Plan (STP)** @@ -131,6 +131,8 @@ Each goal should tie back to requirements from Section I and be independently ve - **[P0]** Verify arch-specific templates created with correct configurations ([sig-infra](https://issues.redhat.com/browse/CNV-76714)). - **[P0]** Verify VMs created from arch-specific templates run on matching architecture nodes (sig-infra). - **[P0]** Verify CDI imports correct architecture-specific images via `platform.architecture` ([sig-storage](https://issues.redhat.com/browse/CNV-76732)). +- **[P0]** Verify Datasource new pointer API ([sig-storage](https://issues.redhat.com/browse/CNV-76732)). + - **[P0]** Verify VMs scheduled only on nodes matching their CPU architecture ([sig-virt](https://issues.redhat.com/browse/CNV-26818)). - **[P0]** Verify VM migration between same-architecture nodes works correctly (sig-virt). @@ -138,6 +140,7 @@ Each goal should tie back to requirements from Section I and be independently ve - **[P1]** Verify alert `HCOGoldenImageWithNoSupportedArchitecture` fires when DataImportCronTemplate annotated with unsupported architectures, and `kubevirt_hco_dataimportcrontemplate_with_supported_architectures` metric reports the appropriate value. - **[P1]** Verify alert `HCOMultiArchGoldenImagesDisabled` fires when running on a multi-arch cluster while Multiarch FG is disabled, and `kubevirt_hco_multi_arch_boot_images_enabled` metric reports the appropriate value. - **[P1]** Verify alert `HCOGoldenImageWithNoArchitectureAnnotation` fires when a custom golden image lacks an architecture annotation, and `kubevirt_hco_dataimportcrontemplate_with_architecture_annotation` metric reports the appropriate value. +- **[P1]** Verify `HCOMultiArchGoldenImagesDisabled` alert does not fire when Multiarch FG is disabled but nodePlacement restricts to supported architectures only. **Backward Compatibility Goals**: - **[P0]** Verify Legacy Datasources points to default arch-annotated Datasources (sig-iuo). @@ -150,7 +153,7 @@ Each goal should tie back to requirements from Section I and be independently ve **Regression Goals**: -Regression testing ensures that existing functionality continues to work correctly on multiarch clusters after the introduction of multiarch support. Each participating SIG must run its Tier 1 (functional) and Tier 2 (end-to-end) test suites on multiarch clusters with both CPU architectures to confirm no regressions are introduced. +all participating-sigs run regression on multiarch cluster - **[P0]** sig-iuo: Run Tier 1 and Tier 2 test suites on multiarch clusters with both CPU architectures. - **[P0]** sig-infra: Run Tier 1 and Tier 2 test suites on multiarch clusters with both CPU architectures. @@ -171,14 +174,12 @@ that" issues; each out-of-scope item must have PM/Lead sign-off. **Note:** Replace example rows with your actual out-of-scope items. --> -| Non-Goal | Rationale | PM/ Lead Agreement | -|:--------------------------------|:--------------------------------------------------------------------------------------------------|:-------------------| -| Update existing VM | Running VMs won't use new arch-specific resources | [ ] Name/Date | -| Performance Testing | Feature not scale related | [ ] Name/Date | -| Security Testing | Feature not security related | [ ] Name/Date | -| Usability Testing | Done by UI team in 4.20 ([CNV-61832](https://issues.redhat.com/browse/CNV-61832)) | [ ] Name/Date | -| Testing with s390x architecture | Feature scope is ARM enablement only | [ ] Name/Date | -| Regression with on single-arch cluster | Our existing scheduled runs already covering it | [ ] Name/Date | +| Non-Goal | Rationale | PM/ Lead Agreement | +|:------------------------------------|:----------------------------------------------------------------------------------|:-------------------| +| Update existing VM | Running VMs won't use new arch-specific resources | [ ] Name/Date | +| Performance Testing | Feature not scale related | [ ] Name/Date | +| Security Testing | Feature not security related | [ ] Name/Date | +| Testing with s390x architecture | Feature scope is ARM enablement only | [ ] Name/Date | #### **2. Test Strategy** @@ -187,18 +188,18 @@ that" issues; each out-of-scope item must have PM/Lead sign-off. | Item | Description | Applicable (Y/N or N/A) | Comments | |:-------------------------------|:-------------------------------------------------------------------------------------------------------------------------------------------------------------|:------------------------|:-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| -| Functional Testing | Validates that the feature works according to specified requirements and user stories | Y | | +| Functional Testing | Validates that the feature works according to specified requirements and user stories | Y | | | Automation Testing | Ensures test cases are automated for continuous integration and regression coverage | Y | All test cases should be automated at openshift-virtualization-tests repo. | | Performance Testing | Validates feature performance meets requirements (latency, throughput, resource usage) | N/A | Not related to scale. | | Security Testing | Verifies security requirements, RBAC, authentication, authorization, and vulnerability scanning | N/A | Not related to security. | | Usability Testing | Validates user experience, UI/UX consistency, and accessibility requirements. Does the feature require UI? If so, ensure the UI aligns with the requirements | Y | [UI/UX design doc](https://docs.google.com/document/d/18UKIXiAlyLTABQZdvDD5N85A6uM2CdBbif4eN1dVj-0/edit?usp=sharing) specify requirements.
Done by UI team [CNV-62535](https://issues.redhat.com/browse/CNV-62535) | -| Compatibility Testing | Ensures feature works across supported platforms, versions, and configurations | Y | sig-infra owns cross-arch VM creation validation ([CNV-76714](https://issues.redhat.com/browse/CNV-76714)) | -| Regression Testing | Verifies that new changes do not break existing functionality | Y | All participating SIGs run t1 and t2 on multiarch clusters. | -| Upgrade Testing | Validates upgrade paths from previous versions, data migration, and configuration preservation | Y | **sig-iuo & sig-virt:** VMs migrated and updated successfully, related resources preserved
| -| Backward Compatibility Testing | Ensures feature maintains compatibility with previous API versions and configurations | Y | **sig-iuo:** Legacy Datasource pointers, custom golden images without arch annotation
**sig-storage:** Legacy `DataSource` API backward-compatible with new CDI arch-specific naming | +| Compatibility Testing | Ensures feature works across supported platforms, versions, and configurations | Y | sig-infra owns cross-arch VM creation validation ([CNV-76714](https://issues.redhat.com/browse/CNV-76714)) | +| Regression Testing | Verifies that new changes do not break existing functionality | Y | All participating SIGs run t1 and t2 on multiarch clusters. | +| Upgrade Testing | Validates upgrade paths from previous versions, data migration, and configuration preservation | Y | **sig-iuo & sig-virt:** VMs migrated and updated successfully, related resources preserved
| +| Backward Compatibility Testing | Ensures feature maintains compatibility with previous API versions and configurations | Y | **sig-iuo:** Legacy Datasource pointers, custom golden images without arch annotation
**sig-storage:** Legacy `DataSource` API backward-compatible with new CDI arch-specific naming | | Dependencies | Dependent on deliverables from other components/products? Identify what is tested by which team. | Y | Allowing multi-cpu architecture on openshift-virtualization-tests. Review the [PR](https://github.com/RedHatQE/openshift-virtualization-tests/pull/3147) whenever its ready to review. | | Cross Integrations | Does the feature affect other features/require testing by other components? Identify what is tested by which team. | Y | **IUO**: HCO node architecture tracking (`status.nodeInfo`), FG activation/propagation, new metrics & alerts, upgrade
**[SSP/Infra](https://issues.redhat.com/browse/CNV-76714)**: Templates creation & utilization, new SSP API (`enableMultipleArchitectures`, `cluster` fields)
**[Storage](https://issues.redhat.com/browse/CNV-76732)**: CDI-importer architecture selection, legacy `DataSource` backward compatibility, new CDI `platform` API
**[Virt](https://issues.redhat.com/browse/CNV-26818)**: VM scheduling to correct architecture nodes, VM migration between same-arch nodes, upgrade, defaultCPUModel
**[Network](https://issues.redhat.com/browse/CNV-76741)**: Network-related multiarch testing | -| Monitoring | Does the feature require metrics and/or alerts? | Y | **Alerts + Metrics:**
[`HCOGoldenImageWithNoSupportedArchitecture`](https://kubevirt.io/monitoring/runbooks/HCOGoldenImageWithNoSupportedArchitecture) + `kubevirt_hco_dataimportcrontemplate_with_supported_architectures`,
[`HCOMultiArchGoldenImagesDisabled`](https://kubevirt.io/monitoring/runbooks/HCOMultiArchGoldenImagesDisabled.html) + `kubevirt_hco_multi_arch_boot_images_enabled`,
[`HCOGoldenImageWithNoArchitectureAnnotation`](https://kubevirt.io/monitoring/runbooks/HCOGoldenImageWithNoArchitectureAnnotation) + `kubevirt_hco_dataimportcrontemplate_with_architecture_annotation` | +| Monitoring | Does the feature require metrics and/or alerts? | Y | **Alerts + Metrics:**
[`HCOGoldenImageWithNoSupportedArchitecture`](https://kubevirt.io/monitoring/runbooks/HCOGoldenImageWithNoSupportedArchitecture) + `kubevirt_hco_dataimportcrontemplate_with_supported_architectures`,
[`HCOMultiArchGoldenImagesDisabled`](https://kubevirt.io/monitoring/runbooks/HCOMultiArchGoldenImagesDisabled.html) + `kubevirt_hco_multi_arch_boot_images_enabled`,
[`HCOGoldenImageWithNoArchitectureAnnotation`](https://kubevirt.io/monitoring/runbooks/HCOGoldenImageWithNoArchitectureAnnotation) + `kubevirt_hco_dataimportcrontemplate_with_architecture_annotation` | | Cloud Testing | Does the feature require multi-cloud platform testing? Consider cloud-specific features. | Y | Testing environment AWS cluster | #### **3. Test Environment** @@ -224,20 +225,20 @@ that" issues; each out-of-scope item must have PM/Lead sign-off. for this feature. **Note:** Only list tools that are **new** or **different** from standard testing infrastructure. Leave empty if using standard tools. --> -| Category | Tools/Frameworks | -|:-------------------|:------------------------------------------| -| **Test Framework** | MultiArch cluster | -| **CI/CD** | Dedicated t2 jenkins jobs with multiarch markers:
- `test-pytest-cnv-4.22-iuo-multiarch`
- `test-pytest-cnv-4.22-ssp-multiarch`
- `test-pytest-cnv-4.22-storage-multiarch`
- `test-pytest-cnv-4.22-virt-multiarch` | -| **Other Tools** | | +| Category | Tools/Frameworks | +|:-------------------|:---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| +| **Test Framework** | MultiArch cluster | +| **CI/CD** | Dedicated t2 jenkins jobs with multiarch markers:
- `test-pytest-cnv-4.22-iuo-multiarch`
- `test-pytest-cnv-4.22-observability-multiarch`
- `test-pytest-cnv-4.22-ssp-multiarch`
- `test-pytest-cnv-4.22-storage-multiarch`
- `test-pytest-cnv-4.22-virt-multiarch` | +| **Other Tools** | | #### **4. Entry Criteria** The following conditions must be met before testing can begin: -- [X] VEP [dic-on-heterogeneous-cluster](https://github.com/kubevirt/enhancements/tree/main/veps/sig-storage/dic-on-heterogeneous-cluster) is **approved and merged** +- [x] VEP [dic-on-heterogeneous-cluster](https://github.com/kubevirt/enhancements/tree/main/veps/sig-storage/dic-on-heterogeneous-cluster) is **approved and merged** - [x] Test environment (MultiArch cluster) can be **set up and configured** - [ ] Multi-CPU architecture support enabled in openshift-virtualization-tests repo -- [ ] HCO t2 jenkins jobs created & scheduled +- [ ] related sigs multiarch t2 jenkins jobs created #### **5. Risks** @@ -247,16 +248,15 @@ justification in mitigation strategy. **Note:** Empty "Specific Risk" cells mean this must be filled. "N/A" means explicitly not applicable with justification. --> -| Risk Category | Specific Risk for This Feature | Mitigation Strategy | Status | -|:-----------------------------------|:------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:--------------------------------------------------------------------------------------------------------------------|:-------| -| Timeline/Schedule | N/A | N/A | [X] | -| Test Coverage | Should be coordinated with all cnv-sigs | Review & sync with other sigs | [ ] | -| Test Environment | N/A | N/A | [X] | -| Untestable Aspects | N/A | N/A | [X] | -| Resource Constraints | MultiArch cluster available only for 12 hours; limited number of AWS clusters available | Test automation on HA cluster first, final verification on MultiArch. Verify that DevOps are investigating increasing the number of available AWS clusters. | [ ] | -| Dependencies | Allowing multi-cpu architecture on openshift-virtualization-tests | Review the [PR](https://github.com/RedHatQE/openshift-virtualization-tests/pull/3147) whenever its ready to review. | [ ] | -| Blocker Bug for legacy DataSources | [CNV-75762](https://issues.redhat.com/browse/CNV-75762) | on POST - Storage QE to verify | [ ] | -| Other non-blocker bugs | 1. [[UI] architecture is incorrect for fedora arm and inconsistent on UI for other os](https://issues.redhat.com/browse/CNV-68981)
2. [[Storage] Arch-specific DataSources (arm64) persist after removing arm64 nodes](https://issues.redhat.com/browse/CNV-68996) | Make sure they are fixed & verified | [ ] | +| Risk Category | Specific Risk for This Feature | Mitigation Strategy | Status | +|:---------------------|:------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:------------------------------------------------------------------------------------------------------------------------------------------------------------|:-------| +| Timeline/Schedule | Israel in war currently, some sigs are impacted | N/A | [X] | +| Test Coverage | Should be coordinated with all cnv-sigs | Review & sync with other sigs | [ ] | +| Test Environment | N/A | N/A | [X] | +| Untestable Aspects | N/A | N/A | [X] | +| Resource Constraints | MultiArch cluster available only for 12 hours; limited number of AWS clusters available | Test automation on HA cluster first, final verification on MultiArch. Verify that DevOps are investigating increasing the number of available AWS clusters. | [ ] | +| Dependencies | Allowing multi-cpu architecture on openshift-virtualization-tests | Review the [PR](https://github.com/RedHatQE/openshift-virtualization-tests/pull/3147) whenever its ready to review. | [ ] | | [ ] | +| Non-blocker bugs | 1. [[UI] architecture is incorrect for fedora arm and inconsistent on UI for other os](https://issues.redhat.com/browse/CNV-68981)
2. [[Storage] Arch-specific DataSources (arm64) persist after removing arm64 nodes](https://issues.redhat.com/browse/CNV-68996) | Make sure they are fixed & verified | [ ] | #### **6. Known Limitations** @@ -287,20 +287,20 @@ tested. --> **Requirement Summary:** Brief description from the Jira issue (user story format preferred) --> -| Requirement ID | Requirement Summary | Test Scenario(s) | Tier | Priority | -|:--------------------------------------------------------|:--------------------------------------------------------------------------|:---------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:-------|:---------| -| [CNV-67900](https://issues.redhat.com/browse/CNV-67900) | HCO detects and reports node architectures | Verify HCO monitors node architectures correctly | Tier 1 | P0 | -| | Golden images annotated only with supported architectures | Verify golden images are annotated only with architectures actually supported in HCO+SSP | Tier 1 | P0 | -| | Arch-specific resources created and ready to use | Verify arch-specific resources created for supported architectures, named with architecture suffix, and are ready to use | Tier 2 | P0 | -| | Fail status for golden images with only unsupported architectures | Verify HCO `dataImportCronTemplates` status shows failure for dataImportCronTemplate annotated only with unsupported architectures | Tier 1 | P1 | -| [CNV-67900](https://issues.redhat.com/browse/CNV-67900) | Alert and metric for golden image with unsupported architecture | Verify `HCOGoldenImageWithNoSupportedArchitecture` alert fires and `kubevirt_hco_dataimportcrontemplate_with_supported_architectures` metric reports the appropriate value | Tier 2 | P1 | -| | Alert and metric for disabled Multiarch FG on multi-arch cluster | Verify `HCOMultiArchGoldenImagesDisabled` alert fires and `kubevirt_hco_multi_arch_boot_images_enabled` metric reports the appropriate value | Tier 2 | P1 | -| | Alert and metric for custom golden image missing arch annotation | Verify `HCOGoldenImageWithNoArchitectureAnnotation` alert fires and `kubevirt_hco_dataimportcrontemplate_with_architecture_annotation` metric reports the appropriate value | Tier 2 | P1 | -| | Alert not fired when nodePlacement limits to supported architectures only | Verify `HCOMultiArchGoldenImagesDisabled` not fired when Multiarch FG is disabled but nodePlacement restricts to supported architectures only | Tier 2 | P1 | -| [CNV-67900](https://issues.redhat.com/browse/CNV-67900) | Legacy DataSources remain backward-compatible | Verify legacy DataSources point to default arch-annotated DataSources | Tier 2 | P0 | -| | Custom golden images without arch annotation remain functional | Verify custom golden images without arch annotation remain functional | Tier 2 | P0 | -| [CNV-67900](https://issues.redhat.com/browse/CNV-67900) | VMs migrate to same-arch nodes during upgrades | Verify ARM64 and AMD64 VMs migrate to same-architecture worker nodes during upgrades, and related resources preserved | Tier 2 | P0 | -| | Functional validation post-upgrade when FG enabled by default | Verify functional tests pass post-upgrade to version with Multiarch FG enabled by default | Tier 2 | P1 | +| Requirement ID | Requirement Summary | Test Scenario(s) | Tier | Priority | +|:--------------------------------------------------------|:-----------------------------------------------------------------------------------------------------------------------------------------------------------------------|:----------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:-------|:---------| +| [CNV-67900](https://issues.redhat.com/browse/CNV-67900) | As an Admin, I want HCO to detect and report node architectures so that the system is aware of available architectures in the cluster | Verify HCO monitors the cluster's node architectures correctly, and updates on addition/removal of nodes | Tier 1 | P0 | +| | As a User, I want golden images to be annotated only with supported architectures so that I only see relevant boot sources | Verify golden images are annotated only with supported architectures | Tier 1 | P0 | +| | As an Admin, I want to see a failure status for golden images annotated only with unsupported architectures so that I can identify misconfigured boot sources | Verify golden images annotated only with unsupported architectures present fail status in HCO `dataImportCronTemplates` status | Tier 1 | P1 | +| | As a User, I want arch-specific resources to be created and ready to use so that I can boot VMs from the correct architecture images | Verify arch-specific related resources are created only for supported architectures, named with architecture suffix, and are ready to use | Tier 2 | P0 | +| | As an Admin, I want to receive an alert and metric when a golden image has no supported architecture so that I can take corrective action | Verify `HCOGoldenImageWithNoSupportedArchitecture` alert fires and `kubevirt_hco_dataimportcrontemplate_with_supported_architectures` metric reports the appropriate value | Tier 2 | P1 | +| | As an Admin, I want to receive an alert and metric when the Multiarch FG is disabled on a multi-arch cluster so that I am aware of the misconfiguration | Verify `HCOMultiArchGoldenImagesDisabled` alert fires and `kubevirt_hco_multi_arch_boot_images_enabled` metric reports the appropriate value | Tier 2 | P1 | +| | As an Admin, I want to receive an alert and metric when a custom golden image is missing an architecture annotation so that I can update it accordingly | Verify `HCOGoldenImageWithNoArchitectureAnnotation` alert fires and `kubevirt_hco_dataimportcrontemplate_with_architecture_annotation` metric reports the appropriate value | Tier 2 | P1 | +| | As an Admin, I want the alert not to fire when nodePlacement limits scheduling to supported architectures only so that I am not alerted unnecessarily | Verify `HCOMultiArchGoldenImagesDisabled` not fired when Multiarch FG is disabled but nodePlacement restricts to supported architectures only | Tier 2 | P1 | +| | As an Admin, I want legacy DataSources to remain backward-compatible so that existing workflows are not disrupted after enabling multi-arch support | Verify legacy DataSources point to default arch-annotated DataSources | Tier 2 | P0 | +| | As an Admin, I want custom golden images without architecture annotations to remain functional so that non-annotated images continue to work | Verify custom golden images without arch annotation remain functional | Tier 2 | P0 | +| | As an Admin, I want VMs to migrate to same-architecture nodes during upgrades so that workloads remain stable and architecture-compatible | Verify ARM64 and AMD64 VMs are migrated to same-architecture nodes during upgrades and related resources are preserved | Tier 2 | P0 | +| | As an Admin, I want functional validation to pass post-upgrade when the Multiarch FG is enabled by default so that the upgrade does not break multi-arch functionality | Verify functional tests pass post-upgrade to version when Multiarch FG is enabled by default | Tier 2 | P0 | diff --git a/stps/stp-template/stp.md b/stps/stp-template/stp.md index 8dc40fa..93147a8 100644 --- a/stps/stp-template/stp.md +++ b/stps/stp-template/stp.md @@ -56,7 +56,6 @@ technology, and testability before formal test planning. 1. **Done column**: Mark [x] when the review is complete 2. **Details/Notes column**: Summary of the item (e.g., list technology challenges, special environment needs, significant API changes) 3. **Comments column**: Note any blockers, risks, or items requiring follow-up --> - | Check | Done | Details/Notes | Comments | |:---------------------------------|:-----|:--------------------------------------------------------------------------------------------------------------------------------------------------------|:---------| | **Developer Handoff/QE Kickoff** | [ ] | A meeting where Dev/Arch walked QE through the design, architecture, and implementation details. **Critical for identifying untestable aspects early.** | |