|
| 1 | +# Openshift-virtualization-tests Test plan |
| 2 | + |
| 3 | +## **Stuntime Measurement - Quality Engineering Plan** |
| 4 | + |
| 5 | +### **Metadata & Tracking** |
| 6 | + |
| 7 | +| Field | Details | |
| 8 | +|:-----------------------|:-----------------------------------------------------------------------| |
| 9 | +| **Enhancement(s)** | - | |
| 10 | +| **Feature in Jira** | https://issues.redhat.com/browse/CNV-72773 | |
| 11 | +| **Jira Tracking** | https://issues.redhat.com/browse/CNV-78676 | |
| 12 | +| **QE Owner(s)** | Anat Wax (awax@redhat.com) | |
| 13 | +| **Owning SIG** | sig-network | |
| 14 | +| **Participating SIGs** | sig-network | |
| 15 | +| **Current Status** | Draft | |
| 16 | + |
| 17 | +**Document Conventions:** |
| 18 | +- **Stuntime:** VM downtime (unreachability window) during live migration - the connectivity gap from first connectivity loss to first recovery. |
| 19 | +- **Ping:** Command that sends ICMP ECHO_REQUEST packets to a host to check reachability and measure round-trip time. Official documentation: [ping(8) - Linux manual page](https://man7.org/linux/man-pages/man8/ping.8.html). |
| 20 | +- **KCS:** [Knowledge Centered Support](https://access.redhat.com/articles/7031392). |
| 21 | + |
| 22 | + |
| 23 | +### **Feature Overview** |
| 24 | + |
| 25 | +Customers running live migration on secondary networks need predictable VM downtime. We need a way to detect regressions in migration behavior. |
| 26 | +The feature defines and measures VM stuntime during live migration and establishes a baseline and a pass/fail threshold for testing. Testing focuses on configurations used by the vast majority of our customers - secondary network configurations: Linux bridge and OVN localnet. |
| 27 | + |
| 28 | +--- |
| 29 | + |
| 30 | +### **I. Motivation and Requirements Review (QE Review Guidelines)** |
| 31 | + |
| 32 | +#### **1. Requirement & User Story Review Checklist** |
| 33 | + |
| 34 | +| Check | Done | Details/Notes | Comments | |
| 35 | +|:---------------------------------------|:-----|:----------------------------------------------------------------------------------------------------------------------------------------------------|:---------| |
| 36 | +| **Review Requirements** | [x] | Publish VM stuntime during live migration for users' awareness. To be measured on secondary networks - Linux bridge and OVN localnet. | | |
| 37 | +| **Understand Value** | [x] | Clear value for customers running live migration on secondary networks. Assists new customers coming from other virtualization into OCP-V. | | |
| 38 | +| **Customer Use Cases** | [x] | Customers need predictable VM downtime during VM live migration. | | |
| 39 | +| **Testability** | [x] | Stuntime is testable: connectivity gap is measurable from first packet loss to first packet recovery. The measuring scope (secondary networks, topologies) is well-defined. | | |
| 40 | +| **Acceptance Criteria** | [x] | Stuntime measured in a BM environment to allow later publication in blog/KCS. Stuntime value must be easily retrievable from test logs to enable baseline updates and reports. | | |
| 41 | +| **Non-Functional Requirements (NFRs)** | [x] | Measured stuntime will be documented in a KCS or a Red Hat blog post. | | |
| 42 | + |
| 43 | +#### **2. Technology and Design Review** |
| 44 | + |
| 45 | +| Check | Done | Details/Notes | Comments | |
| 46 | +|:---------------------------------|:-----|:--------------------------------------------------------------------------------------------------------------------------------------------------------|:---------| |
| 47 | +| **Developer Handoff/QE Kickoff** | [x] | Not a new feature. | | |
| 48 | +| **Technology Challenges** | [x] | No special challenge. | | |
| 49 | +| **Test Environment Needs** | [x] | Default OCP-V deployment on Bare Metal, with worker nodes that have multiple NICs for secondary networks. | | |
| 50 | +| **API Extensions** | [x] | No new or modified APIs. | | |
| 51 | +| **Topology Considerations** | [x] | Limited to BM clusters with secondary NICs. | | |
| 52 | + |
| 53 | +--- |
| 54 | + |
| 55 | +### **II. Software Test Plan (STP)** |
| 56 | + |
| 57 | +#### **1. Scope of Testing** |
| 58 | + |
| 59 | +Tests aim to define VM stuntime during live migration on secondary networks (Linux bridge and OVN localnet). Bidirectional measurement will be taken across three migration scenarios (same node → different node; different node → same node; between two other nodes). These scenarios reflect different network paths and ARP/CNI behavior during migration. Since stuntime can differ depending on who initiates traffic, both ping initiation directions will be measured - from the migrated VM and from the static VM. |
| 60 | + |
| 61 | +We will only test the IPv4 family for now. While IPv6 uses NDP instead of ARP, we assume the L2 resolution stuntime will be within the same order of magnitude. IPv6 can be added later if needed. |
| 62 | + |
| 63 | +**Testing Goals** |
| 64 | + |
| 65 | +All scenarios below are **[P0]**. |
| 66 | + |
| 67 | +**VM with secondary network connected to a Linux bridge:** |
| 68 | + |
| 69 | +- Measure stuntime from the migrated VM, live migrated from the same node to a different node. |
| 70 | +- Measure stuntime from the migrated VM, live migrated from a different node to the same node. |
| 71 | +- Measure stuntime from the migrated VM, live migrated between two different nodes. |
| 72 | +- Measure stuntime from the static VM, live migrated from the same node to a different node. |
| 73 | +- Measure stuntime from the static VM, live migrated from a different node to the same node. |
| 74 | +- Measure stuntime from the static VM, live migrated between two different nodes. |
| 75 | + |
| 76 | +**VM with secondary network connected to OVN localnet:** |
| 77 | + |
| 78 | +- Measure stuntime from the migrated VM, live migrated from the same node to a different node. |
| 79 | +- Measure stuntime from the migrated VM, live migrated from a different node to the same node. |
| 80 | +- Measure stuntime from the migrated VM, live migrated between two different nodes. |
| 81 | +- Measure stuntime from the static VM, live migrated from the same node to a different node. |
| 82 | +- Measure stuntime from the static VM, live migrated from a different node to the same node. |
| 83 | +- Measure stuntime from the static VM, live migrated between two different nodes. |
| 84 | + |
| 85 | +**Out of Scope (Testing Scope Exclusions)** |
| 86 | + |
| 87 | +| Out-of-Scope Item | Rationale | PM/ Lead Agreement | |
| 88 | +|:--------------------------------------------------|:----------------------------------------------------------------------------------------------|:-------------------| |
| 89 | +| IPv6 | Scope limited to IPv4 for initial baseline and threshold. | [ ] Name/Date | |
| 90 | +| Default pod network / masquerade | To save capacity, stuntime coverage is limited to the vast majority of client scenarios - secondary networks. | [ ] Name/Date | |
| 91 | +| Other secondary CNIs (e.g. SR-IOV, other plugins) | Stuntime coverage is limited to the vast majority of client scenarios - Linux bridge and OVN localnet. | [ ] Name/Date | |
| 92 | +| Worst-case guarantee | Testing does not establish or assert a worst-case (upper bound) stuntime value or SLA; we measure and baseline stuntime but do not guarantee an upper bound. | [ ] Name/Date | |
| 93 | +| General performance testing | Testing is limited to a two-VM setup (one migrated VM and one static peer). We will not perform high-density stress testing or measure stuntime under heavy cluster load. | [ ] Name/Date | |
| 94 | +| Upgrade | We won't have direct comparison of stuntime before and after OCP-V upgrade. Stuntime measured will be available only per version. | [ ] Name/Date | |
| 95 | + |
| 96 | +**Baseline and threshold** |
| 97 | + |
| 98 | +Tests will use a pass/fail threshold defined during the development phase. The threshold is derived by running the stuntime measurement scenarios 10 times on a BM cluster, taking the maximum stuntime observed, then setting the threshold to the lower of (max × 4) or 5 seconds. If measured stuntime exceeds this threshold, the test fails and is treated as a regression. |
| 99 | + |
| 100 | +**Measurement approach** |
| 101 | +Stuntime will be measured using the ICMP ping tool (simple, already in codebase and matches main use case with a simple stuntime measurement). No need for more robust connectivity tools since the goal is stuntime duration, not connection verification. Alternatives considered: tcping (not in codebase, adds dependency), iperf3 (heavier, overkill for drop/return timing), curl (requires server in VM). ICMP packets will be sent at 100ms intervals with UNIX timestamps enabled and explicit reporting of dropped packets (ping -D -O -i 0.1) to achieve high-resolution measurement. Stuntime is defined as the connectivity gap duration, calculated by subtracting the timestamp of the last successful packet before failure from the timestamp of the first successful packet after recovery. |
| 102 | + |
| 103 | + |
| 104 | +#### **2. Test Strategy** |
| 105 | + |
| 106 | +| Item | Description | Applicable (Y/N or N/A) | Comments | |
| 107 | +|:-------------------------------|:-------------------------------------------------------------------------------------------------------------------------------------------------------------|:------------------------|:------------------------------------------------------------------------------------------| |
| 108 | +| Functional Testing | Validates that the feature works according to specified requirements and user stories | Y | | |
| 109 | +| Automation Testing | Ensures test cases are automated for continuous integration and regression coverage | Y | | |
| 110 | +| Performance Testing | Validates feature performance meets requirements (latency, throughput, resource usage) | Y | Stuntime is a downtime measurement. | |
| 111 | +| Security Testing | Verifies security requirements, RBAC, authentication, authorization, and vulnerability scanning | N | Not applicable to stuntime measurement. | |
| 112 | +| Usability Testing | Validates user experience, UI/UX consistency, and accessibility requirements. Does the feature require UI? If so, ensure the UI aligns with the requirements | N | No UI planned for this feature. | |
| 113 | +| Compatibility Testing | Ensures feature works across supported platforms, versions, and configurations | Y | Linux bridge and OVN localnet. | |
| 114 | +| Regression Testing | Verifies that new changes do not break existing functionality | Y | | |
| 115 | +| Upgrade Testing | Validates upgrade paths from previous versions, data migration, and configuration preservation | N/A | Not in scope for the current phase. | |
| 116 | +| Backward Compatibility Testing | Ensures feature maintains compatibility with previous API versions and configurations | N/A | No API changes. | |
| 117 | +| Dependencies | Dependent on deliverables from other components/products? Identify what is tested by which team. | N | Uses existing migration and secondary network topologies. | |
| 118 | +| Cross Integrations | Does the feature affect other features/require testing by other components? Identify what is tested by which team. | N | | |
| 119 | +| Monitoring | Does the feature require metrics and/or alerts? | N | | |
| 120 | +| Cloud Testing | Does the feature require multi-cloud platform testing? Consider cloud-specific features. | N/A | Tests run on BM; official stuntime data from BM only. | |
| 121 | + |
| 122 | +#### **3. Test Environment** |
| 123 | + |
| 124 | +| Environment Component | Configuration | Specification Examples | |
| 125 | +|:----------------------------------------------|:--------------|:---------------------------------------------------------------------------------------| |
| 126 | +| **Cluster Topology** | Multi-node BM cluster | Three workers for all migration scenarios (same node->different node, different node->same node, between different nodes). | |
| 127 | +| **OCP & OpenShift Virtualization Version(s)** | v4.22 | Can be backported to all versions. | |
| 128 | +| **CPU Virtualization** | Agnostic | | |
| 129 | +| **Compute Resources** | N/A | Not required. | |
| 130 | +| **Special Hardware** | N/A | Not required. | |
| 131 | +| **Storage** | Agnostic | | |
| 132 | +| **Network** | | IPv4; Multi-NIC nodes - secondary networks only. | |
| 133 | +| **Required Operators** | NMState | For secondary network configurations. | |
| 134 | +| **Platform** | Bare Metal | The product is meant for BM. Stuntime is measured there so data reflects product behavior, not cloud/PSI networking. Official/published stuntime is sourced from BM runs only. | |
| 135 | +| **Special Configurations** | N/A | None. | |
| 136 | + |
| 137 | +#### **3.1. Testing Tools & Frameworks** |
| 138 | + |
| 139 | +| Category | Tools/Frameworks | |
| 140 | +|:-------------------|:--------------------------------------------------------------------------------------------------| |
| 141 | +| **Test Framework** | Standard pytest/openshift-virtualization-tests. | |
| 142 | +| **CI/CD** | - | |
| 143 | +| **Other Tools** | - | |
| 144 | + |
| 145 | +#### **4. Entry Criteria** |
| 146 | + |
| 147 | +The following conditions must be met before testing can begin: |
| 148 | + |
| 149 | +- [ ] Requirements and design documents are **approved and merged** |
| 150 | +- [ ] Test environment can be **set up and configured** (see Section II.3 - Test Environment) |
| 151 | + |
| 152 | +#### **5. Risks** |
| 153 | + |
| 154 | +| Risk Category | Specific Risk for This Feature | Mitigation Strategy | Status | |
| 155 | +|:---------------------|:-----------------------------------------------------------------------------------------------|:-------------------------------------------------------------------------------------------------------|:-------| |
| 156 | +| Timeline/Schedule | N/A | No known risks. | [x] | |
| 157 | +| Test Coverage | IPv6, default pod network, and other CNIs out of scope; coverage limited to bridge and localnet. | Explicit out-of-scope notes and explanations in the STP; extend later if requirements change. | [x] | |
| 158 | +| Test Environment | N/A | Run tests on BM; publish only BM-derived stuntime. | [x] | |
| 159 | +| Untestable Aspects | N/A | All components are testable. | [x] | |
| 160 | +| Resource Constraints | N/A | No known risks. | [x] | |
| 161 | +| Dependencies | N/A. | No new product dependencies. | [x] | |
| 162 | +| Other | - | - | [x] | |
| 163 | + |
| 164 | +#### **6. Known Limitations** |
| 165 | + |
| 166 | +- IPv6 stuntime measurement is not planned to be covered. |
| 167 | +- Primary network/pod network will not be covered - coverage is limited to secondary network scenarios only. |
| 168 | +- Not covering any special hardware or operators (e.g. no SR-IOV). |
| 169 | +- No special guest OS testing - will only be tested on Fedora VMs. |
| 170 | + |
| 171 | +--- |
| 172 | + |
| 173 | +### **III. Test Scenarios & Traceability** |
| 174 | + |
| 175 | +| Requirement ID | Requirement Summary | Test Scenario(s) | Tier | Priority | |
| 176 | +|:---------------|:---------------------|:-----------------|:-------|:---------| |
| 177 | +| CNV-72773 | As a user, I want to know what stuntime I can expect from a VM during live migration, in different migration scenarios. | Measure stuntime for all 12 scenarios (Linux bridge + OVN localnet × 3 migration scenarios × bidirectional connectivity initiation) | Tier 2 | P0 | |
| 178 | + |
| 179 | +--- |
| 180 | + |
| 181 | +### **IV. Sign-off and Approval** |
| 182 | + |
| 183 | +This Software Test Plan requires approval from the following stakeholders: |
| 184 | + |
| 185 | +* **Reviewers:** |
| 186 | + - QE Architect (OCP-V): Ruth Netser (@rnetser) |
| 187 | + - QE Members (OCP-V): Yossi Segev (@yossisegev), Asia Zhivov Khromov (@azhivovk), Sergei Volkov (@servolkov) |
| 188 | +* **Approvers:** |
| 189 | + - QE Architect (OCP-V): Ruth Netser (@rnetser) |
| 190 | + - Product Manager/Owner: Ronen Sde-Or (ronen@redhat.com), Petr Horacek (@phoracek) |
| 191 | + - Principal Developer: Edward Haas (@EdDev) |
0 commit comments