stp, sig-network: Introduce the stuntime measurement STP by Anatw · Pull Request #37 · RedHatQE/openshift-virtualization-tests-design-docs

Anatw · 2026-02-18T17:33:09Z

What this PR does

Introduce STP for stuntime measurement of VMs through live migration across different migration scenarios, focusing on secondary networks: Linux bridge and OVN localnet.

Summary by CodeRabbit

Documentation
- Added a QE test plan for measuring VM live-migration stuntime on secondary networks (Linux bridge and OVN localnet).
- Defines measurement method (ICMP with high-resolution timestamps), baseline/threshold calculation, and bidirectional testing across 12 scenarios and three migration paths.
- Includes environment blueprint, tooling, entry criteria, risks/mitigations, known limitations, traceability, and approvers.

coderabbitai · 2026-02-18T17:33:26Z

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

@coderabbitai resume to resume automatic reviews.
@coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

▶️ Resume reviews
🔍 Trigger review

Walkthrough

Adds a new QE Software Test Plan document for measuring stuntime during VM live migration on secondary networks (Linux bridge and OVN localnet). Specifies measurement method (ICMP ping with high-resolution timestamps, IPv4), baseline/threshold logic, 12 test scenarios, environment, tooling, entry criteria, risks, limitations, traceability, and approvers.

Changes

Cohort / File(s)	Summary
Test Plan Documentation `stps/sig-network/stuntime_measurement.md`	Adds a comprehensive QE Software Test Plan for stuntime measurement during VM live migration on secondary networks. Includes metadata/conventions, feature overview, scope/out-of-scope, measurement approach (ICMP ping, high-resolution timestamps, IPv4-only), baseline/threshold calculation (global baseline from repeated BM runs; per-scenario thresholds allowed), 12 scenarios (two topology types × three migration paths, bidirectional initiators), environment blueprint (Bare Metal focus, multi-node, NMState), tools (pytest / openshift-virtualization-tests), entry criteria, risks & mitigations, known limitations, traceability to CNV-72773, and approvers.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~22 minutes

🚥 Pre-merge checks | ✅ 3

✅ Passed checks (3 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title 'stp, sig-network: Introduce the stuntime measurement STP' directly and clearly summarizes the main change: introducing a new STP document for stuntime measurement in the sig-network domain.
Docstring Coverage	✅ Passed	No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

openshift-virtualization-qe-bot-5 · 2026-02-18T17:33:59Z

Report bugs in Issues

Welcome! 🎉

This pull request will be automatically processed with the following features:

🔄 Automatic Actions

Reviewer Assignment: Reviewers are automatically assigned based on the OWNERS file in the repository root
Size Labeling: PR size labels (XS, S, M, L, XL, XXL) are automatically applied based on changes
Issue Creation: A tracking issue is created for this PR and will be closed when the PR is merged or closed
Branch Labeling: Branch-specific labels are applied to track the target branch
Auto-verification: Auto-verified users have their PRs automatically marked as verified
Labels: Enabled categories: branch, can-be-merged, cherry-pick, has-conflicts, hold, needs-rebase, size, verified, wip

📋 Available Commands

PR Status Management

/wip - Mark PR as work in progress (adds WIP: prefix to title)
/wip cancel - Remove work in progress status
/hold - Block PR merging (approvers only)
/hold cancel - Unblock PR merging
/verified - Mark PR as verified
/verified cancel - Remove verification status
/reprocess - Trigger complete PR workflow reprocessing (useful if webhook failed or configuration changed)
/regenerate-welcome - Regenerate this welcome message

Review & Approval

/lgtm - Approve changes (looks good to me)
/approve - Approve PR (approvers only)
/assign-reviewers - Assign reviewers based on OWNERS file
/assign-reviewer @username - Assign specific reviewer
/check-can-merge - Check if PR meets merge requirements

Testing & Validation

/retest tox - Run Python test suite with tox
/retest all - Run all available tests

Cherry-pick Operations

/cherry-pick <branch> - Schedule cherry-pick to target branch when PR is merged
- Multiple branches: /cherry-pick branch1 branch2 branch3

Label Management

/<label-name> - Add a label to the PR
/<label-name> cancel - Remove a label from the PR

✅ Merge Requirements

This PR will be automatically approved when the following conditions are met:

Approval: /approve from at least one approver
LGTM Count: Minimum 2 /lgtm from reviewers
Status Checks: All required status checks must pass
No Blockers: No WIP, hold, conflict labels
Verified: PR must be marked as verified (if verification is enabled)

📊 Review Process

Approvers and Reviewers

Approvers:

EdDev

Reviewers:

Anatw
EdDev
azhivovk
servolkov
yossisegev

Available Labels

hold
verified
wip
lgtm
approve

💡 Tips

WIP Status: Use /wip when your PR is not ready for review
Verification: The verified label is automatically removed on each new commit
Cherry-picking: Cherry-pick labels are processed when the PR is merged
Permission Levels: Some commands require approver permissions
Auto-verified Users: Certain users have automatic verification and merge privileges

For more information, please refer to the project documentation or contact the maintainers.

coderabbitai

Actionable comments posted: 2

🧹 Nitpick comments (1)

stps/sig-network/stuntime_measurement.md (1)
98-100: Clarify baseline storage and consider a minimum threshold floor.

Two gaps worth addressing before the plan is finalised:

Where will the baseline values live? The STP says the threshold is "defined during the development phase" but does not specify where the concrete measured values (max stuntime per scenario) will be recorded — e.g., a Jira comment, a constants file in the test code, or a follow-up doc. Capturing this keeps the baseline auditable and makes future updates traceable.

No minimum floor on the threshold. If the BM baseline yields a very low max (e.g., 50 ms), min(50 ms × 4, 5 s) = 200 ms. At 100 ms ping granularity that leaves only a ~1-packet margin, which may cause spurious failures. Consider establishing a minimum threshold floor (e.g., 500 ms) to absorb measurement noise.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@stps/sig-network/stuntime_measurement.md` around lines 98 - 100, Update the
"Baseline and threshold" section in stuntime_measurement.md to (1) specify where
baseline measurements will be recorded (e.g., a dedicated
"stuntime-baselines.md" or a constants file in the test repo and an associated
Jira ticket ID) and the required metadata (scenario name, run date, BM cluster
ID, max stuntime, author) so baselines are auditable and traceable, and (2) add
a minimum threshold floor (e.g., floor = 500 ms) applied after computing
min(max*4, 5s) to avoid spurious failures from very low baselines; reference the
section header "Baseline and threshold" when adding these requirements.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@stps/sig-network/stuntime_measurement.md`:
- Line 166: Update the "Known Limitations" line that currently reads "Not
covering any special hardware or operators (e.g. no SR-IOV or service-mesh
reference)." by removing the misleading "service-mesh" reference and replacing
it with a more relevant exclusion such as "DPDK" or "macvlan"; specifically edit
the markdown under the Known Limitations heading (the line starting with "Not
covering any special hardware or operators") to mention only applicable
secondary CNIs/hardware (e.g., SR-IOV, DPDK, macvlan) and drop any
application-layer technologies like Istio/OSSM.
- Line 61: Add a note to the "Known Limitations" or "Test Environment" section
that the chosen command invocation (ping -D -O -i 0.1) requires elevated
privileges because unprivileged users are limited to a 200ms minimum interval;
explicitly state that tests must run as root, with CAP_NET_RAW, or with
appropriate kernel settings (e.g., net.ipv4.ping_group_range) inside the Fedora
VM, and update the test framework invocation to perform privilege escalation or
validate/abort if those privileges/settings are not present so the 100ms
interval measurement will succeed.

---

Nitpick comments:
In `@stps/sig-network/stuntime_measurement.md`:
- Around line 98-100: Update the "Baseline and threshold" section in
stuntime_measurement.md to (1) specify where baseline measurements will be
recorded (e.g., a dedicated "stuntime-baselines.md" or a constants file in the
test repo and an associated Jira ticket ID) and the required metadata (scenario
name, run date, BM cluster ID, max stuntime, author) so baselines are auditable
and traceable, and (2) add a minimum threshold floor (e.g., floor = 500 ms)
applied after computing min(max*4, 5s) to avoid spurious failures from very low
baselines; reference the section header "Baseline and threshold" when adding
these requirements.

stps/sig-network/stuntime_measurement.md

coderabbitai

Actionable comments posted: 1

♻️ Duplicate comments (1)

stps/sig-network/stuntime_measurement.md (1)
101-101: Unresolved: ping -i 0.1 requires elevated privileges inside the Fedora VM.

ping enforces a 200ms minimum interval for unprivileged users — sub-200ms intervals require root, CAP_NET_RAW on the binary, or a sufficiently broad net.ipv4.ping_group_range. The test environment section and known limitations do not document this requirement. Please add a note to either the Test Environment or Known Limitations section specifying the required privilege configuration for the Fedora VM.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@stps/sig-network/stuntime_measurement.md` at line 101, Add a note in the Test
Environment or Known Limitations section that the proposed command "ping -D -O
-i 0.1" requires elevated privileges on the Fedora VM (unprivileged users are
limited to 200ms intervals), and specify the acceptable remediation options
(running as root, granting CAP_NET_RAW to the ping binary, or configuring
net.ipv4.ping_group_range) so the reader knows the required privilege
configuration for accurate 100ms ICMP measurements of Stuntime.

🧹 Nitpick comments (3)

stps/sig-network/stuntime_measurement.md (3)
145-151: Add Exit Criteria to complement the Entry Criteria.

Section 4 only defines conditions to start testing. Adding Exit Criteria (e.g., all P0 scenarios executed, baseline thresholds recorded and committed, no open blockers) provides a clear definition of done for the testing phase and is standard STP practice.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@stps/sig-network/stuntime_measurement.md` around lines 145 - 151, The
document only lists "4. Entry Criteria" and lacks a matching "Exit Criteria" to
define test completion; add a new subsection (e.g., "4a. Exit Criteria" or "5.
Exit Criteria") after the Entry Criteria heading in stuntime_measurement.md that
enumerates concrete done conditions such as "All P0/P1 test scenarios executed
and passed or documented failures," "Baseline thresholds recorded and
committed," "No open blocking defects," "Test reports and logs uploaded," and
"Acceptance sign-off obtained" so testers have a clear definition of done;
ensure the new section mirrors the style/format of the Entry Criteria list and
references the same terms used elsewhere in the document (Entry Criteria, P0
scenarios, baseline thresholds).
7-9: Consider linking the HLD in the Enhancement(s) field.

The field currently shows "-". If an HLD document exists for this feature, it should be referenced here — an enhancement PR is not required, but at minimum the HLD link provides traceability context for reviewers.

Based on learnings: in this repository, when no enhancement PR exists, it is acceptable to reference only the HLD document in the Enhancement(s) field.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@stps/sig-network/stuntime_measurement.md` around lines 7 - 9, Update the
Enhancement(s) field in stuntime_measurement.md to reference the HLD document
instead of a "-" placeholder: locate the table row labeled "**Enhancement(s)**"
and replace the "-" with a link to the HLD (or a short note like "HLD: <URL or
doc name>") so reviewers have traceability; if no HLD exists, replace "-" with
"None" or "No enhancement PR; see HLD: <if available>" to make intent explicit.
63-83: Consider condensing the test goals into a matrix.

The 12 bullet points are fully determined by the cross-product of two dimensions, resulting in repeated phrasing across both network sections. A compact table would present the same information without redundancy:

Migration Scenario Linux Bridge — Migrated VM Linux Bridge — Static VM OVN Localnet — Migrated VM OVN Localnet — Static VM

Same node → different node ✓ ✓ ✓ ✓

Different node → same node ✓ ✓ ✓ ✓

Between two different nodes ✓ ✓ ✓ ✓
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@stps/sig-network/stuntime_measurement.md` around lines 63 - 83, The "Testing
Goals" section repeats 12 nearly identical bullet points; replace the two
repeated network subsections ("VM with secondary network connected to a Linux
bridge" and "VM with secondary network connected to OVN localnet") with a
compact matrix/table that cross-products the three migration scenarios ("Same
node → different node", "Different node → same node", "Between two different
nodes") against the four test targets ("Linux Bridge — Migrated VM", "Linux
Bridge — Static VM", "OVN Localnet — Migrated VM", "OVN Localnet — Static VM"),
keep the P0 priority note, and ensure the table entries mark which scenarios
apply (e.g., ✓) to remove redundancy while preserving all original test cases.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@stps/sig-network/stuntime_measurement.md`:
- Around line 96-98: Update the "Baseline and threshold" section to explicitly
state that baselines and thresholds are derived per scenario (i.e., compute a
separate baseline for each of the 12 stuntime measurement scenarios) and clarify
that "10 runs" means 10 repetitions per scenario (120 measurements total);
describe the computation as: for each scenario, run it 10 times on the BM
cluster, take the maximum stuntime observed for that scenario, then set that
scenario's threshold to min(max × 4, 5s). Reference the "Baseline and threshold"
heading and the "stuntime measurement scenarios" phrasing so readers know which
items get per-scenario baselines.

---

Duplicate comments:
In `@stps/sig-network/stuntime_measurement.md`:
- Line 101: Add a note in the Test Environment or Known Limitations section that
the proposed command "ping -D -O -i 0.1" requires elevated privileges on the
Fedora VM (unprivileged users are limited to 200ms intervals), and specify the
acceptable remediation options (running as root, granting CAP_NET_RAW to the
ping binary, or configuring net.ipv4.ping_group_range) so the reader knows the
required privilege configuration for accurate 100ms ICMP measurements of
Stuntime.

---

Nitpick comments:
In `@stps/sig-network/stuntime_measurement.md`:
- Around line 145-151: The document only lists "4. Entry Criteria" and lacks a
matching "Exit Criteria" to define test completion; add a new subsection (e.g.,
"4a. Exit Criteria" or "5. Exit Criteria") after the Entry Criteria heading in
stuntime_measurement.md that enumerates concrete done conditions such as "All
P0/P1 test scenarios executed and passed or documented failures," "Baseline
thresholds recorded and committed," "No open blocking defects," "Test reports
and logs uploaded," and "Acceptance sign-off obtained" so testers have a clear
definition of done; ensure the new section mirrors the style/format of the Entry
Criteria list and references the same terms used elsewhere in the document
(Entry Criteria, P0 scenarios, baseline thresholds).
- Around line 7-9: Update the Enhancement(s) field in stuntime_measurement.md to
reference the HLD document instead of a "-" placeholder: locate the table row
labeled "**Enhancement(s)**" and replace the "-" with a link to the HLD (or a
short note like "HLD: <URL or doc name>") so reviewers have traceability; if no
HLD exists, replace "-" with "None" or "No enhancement PR; see HLD: <if
available>" to make intent explicit.
- Around line 63-83: The "Testing Goals" section repeats 12 nearly identical
bullet points; replace the two repeated network subsections ("VM with secondary
network connected to a Linux bridge" and "VM with secondary network connected to
OVN localnet") with a compact matrix/table that cross-products the three
migration scenarios ("Same node → different node", "Different node → same node",
"Between two different nodes") against the four test targets ("Linux Bridge —
Migrated VM", "Linux Bridge — Static VM", "OVN Localnet — Migrated VM", "OVN
Localnet — Static VM"), keep the P0 priority note, and ensure the table entries
mark which scenarios apply (e.g., ✓) to remove redundancy while preserving all
original test cases.

ℹ️ Review info

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between efd6fcc and 4cac21a.

📒 Files selected for processing (1)

stps/sig-network/stuntime_measurement.md

stps/sig-network/stuntime_measurement.md

Anatw · 2026-02-23T12:45:51Z

Change:

Change marking to reflect review status.
Move stuntime measurement details into its own section (Measurement approach).
Reword IPv6 out-of-scope for clearness.
Reword worst-case guarantee rationale (scope exclusion, not product limitation).
Expand Platform rationale (BM-only; product behavior, not cloud/PSI).
Remove service-mesh from Known Limitations.

Anatw · 2026-02-23T12:59:36Z

Change:

Drop another IPv6 future reference.
Refactor threshold definition for clarity,

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (1)

stps/sig-network/stuntime_measurement.md (1)

100-101: Consider breaking the measurement approach into a short structured list.

The single paragraph packs tool selection, discarded alternatives, command flags, and the stuntime calculation formula into one dense run-on sentence. Splitting it into sub-bullets (one per concern) would make this section much easier to scan.

♻️ Suggested restructure

-**Measurement approach**
-Stuntime will be measured using the ICMP ping tool (simple, already in codebase and matches main use case with a simple stuntime measurement). No need for more robust connectivity tools since the goal is stuntime duration, not connection verification. Alternatives considered: tcping (not in codebase, adds dependency), iperf3 (heavier, overkill for drop/return timing), curl (requires server in VM). ICMP packets will be sent at 100ms intervals with UNIX timestamps enabled and explicit reporting of dropped packets (ping -D -O -i 0.1) to achieve high-resolution measurement. Stuntime is defined as the connectivity gap duration, calculated by subtracting the timestamp of the last successful packet before failure from the timestamp of the first successful packet after recovery.
+**Measurement approach**
+
+Stuntime is measured using the ICMP `ping` tool — it is already in the codebase, is simple, and directly matches the use case (stuntime duration, not connection verification).
+
+Alternatives considered and discarded:
+- **tcping**: not in codebase, adds a dependency.
+- **iperf3**: heavier protocol, overkill for drop/return timing.
+- **curl**: requires a server process inside the VM.
+
+ICMP packets are sent at 100 ms intervals using `ping -D -O -i 0.1`:
+- `-D`: prints a UNIX timestamp before each output line.
+- `-O`: reports dropped packets explicitly ("no answer yet for icmp_seq=N").
+- `-i 0.1`: sets the 100 ms inter-packet interval.
+
+**Stuntime calculation:** connectivity gap duration = timestamp of the first successful packet after recovery − timestamp of the last successful packet before failure.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@stps/sig-network/stuntime_measurement.md` around lines 100 - 101, Split the
dense "Measurement approach" paragraph into a short structured list: one bullet
for the chosen tool and why (ICMP ping), one bullet listing discarded
alternatives and brief reasons, one bullet showing the exact command/flags to
use (ping -D -O -i 0.1) and their purpose, and one bullet stating the stuntime
definition and calculation (subtract timestamp of last successful packet before
failure from first successful packet after recovery); keep each bullet short and
use the existing headings/phrasing ("Measurement approach", "ICMP ping", "ping
-D -O -i 0.1", "stuntime") so readers can easily locate and scan the content.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@stps/sig-network/stuntime_measurement.md`:
- Line 132: The Test Environment table's Network row is missing a Configuration
cell; update the table in stuntime_measurement.md by adding a short, consistent
value such as "IPv4 / Multi-NIC" (or "IPv4; Multi-NIC") into the Configuration
column for the row labeled "Network" so the table matches other rows and remains
consistent.

---

Nitpick comments:
In `@stps/sig-network/stuntime_measurement.md`:
- Around line 100-101: Split the dense "Measurement approach" paragraph into a
short structured list: one bullet for the chosen tool and why (ICMP ping), one
bullet listing discarded alternatives and brief reasons, one bullet showing the
exact command/flags to use (ping -D -O -i 0.1) and their purpose, and one bullet
stating the stuntime definition and calculation (subtract timestamp of last
successful packet before failure from first successful packet after recovery);
keep each bullet short and use the existing headings/phrasing ("Measurement
approach", "ICMP ping", "ping -D -O -i 0.1", "stuntime") so readers can easily
locate and scan the content.

ℹ️ Review info

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 4cac21a and b28b6da.

📒 Files selected for processing (1)

stps/sig-network/stuntime_measurement.md

stps/sig-network/stuntime_measurement.md

servolkov · 2026-02-24T14:31:36Z

stps/sig-network/stuntime_measurement.md

+
+| Category           | Tools/Frameworks                                                                                  |
+|:-------------------|:--------------------------------------------------------------------------------------------------|
+| **Test Framework** | Standard pytest/openshift-virtualization-tests. |


I am not in favor of this section, I see it is empty in other STPs. It is implementation detail, tomorrow we get rid of pytest and this section immediately becomes obsolete.

But I understand you are following the template. Lets see what @rnetser says.

servolkov · 2026-02-24T14:33:54Z

stps/sig-network/stuntime_measurement.md

+
+| Requirement ID | Requirement Summary | Test Scenario(s) | Tier   | Priority |
+|:---------------|:---------------------|:-----------------|:-------|:---------|
+| CNV-72773      | As a user, I want to know what stuntime I can expect from a VM during live migration, in different migration scenarios. | Measure stuntime for all 12 scenarios (Linux bridge + OVN localnet × 3 migration scenarios × bidirectional connectivity initiation) | Tier 2 | P0       |


hm, as for me I get it, hopefully other folks will not force you to disclose the list.

Anatw · 2026-02-26T09:12:09Z

Changes:

Defer threshold details to STD (Feature Overview + Technology Challenges)
Trim Scope of Testing duplication; consolidate IPv4/IPv6 reasoning into Out-of-Scope table
Expand IPv6 out-of-scope rationale; add IP-family-independent risk mitigation
Fix Test Strategy: Security N→N/A, Compatibility Y→N, add Regression cross-version note
Add Guest OS to Test Environment; replace Known Limitations list with Out-of-Scope reference

stps/sig-network/stuntime_measurement.md

servolkov · 2026-02-26T14:57:24Z

Generally LGTM to me, I have caught an idea and way you are going to move, I don't want to spend time and focus on nit details anymore. Thanks.

Anatw · 2026-03-01T15:36:43Z

Changes:

Remove OS reference (move to the STD).
Remove Test Framework reference.

azhivovk

Thanks

EdDev

I reviewed the first section: "Motivation and Requirements".

I will continue with the rest later, please start addressing the current ones.

EdDev · 2026-03-03T12:21:48Z

stps/sig-network/stuntime_measurement.md

+| **Feature in Jira**    | https://issues.redhat.com/browse/CNV-72773                             |
+| **Jira Tracking**      | https://issues.redhat.com/browse/CNV-78676                             |


"Jira Tracking" refers to the epic that tracks this effort and in which all the work is expressed.
"Feature in Jira" is referring to a Feature type ticket, but there seems to be none, so you need to explain why not or provide an alternative.

@rnetser , maybe the fields should be names "Feature ref" and "Epic ref", WDYT?

I followed the pattern I saw in previous STPs. Changed according to your explanation.

EdDev · 2026-03-03T12:24:34Z

stps/sig-network/stuntime_measurement.md

+| **QE Owner(s)**        | Anat Wax (awax@redhat.com)                                                              |
+| **Owning SIG**         | sig-network                                                            |
+| **Participating SIGs** | sig-network                                                            |
+| **Current Status**     | Draft                                                                  |


When merged, it will not be a draft.

@rnetser , this is a repeating issue I see. Maybe it needs to be dropped as no one will update this field.
In the kubevirt project, the status is tracked by an issue that has links to everything, but here the tracker is the epic and it seems good enough IMO.

Yes, I was wondering how this will work as well.

EdDev · 2026-03-03T12:29:35Z

stps/sig-network/stuntime_measurement.md

+- **Migrated VM:** The VM that is live-migrated during the test.
+- **Static VM:** The peer VM that remains on its node throughout the test. Used as the reference point for migration direction (e.g., "from the static VM's node").


This is a strong hint that you defined an STP in this document.

I assume you meant "STD".
A question arose during the review about what "live migrated from the same node to a different node" or "live migrated from a different node to the same node" mean. I added these definitions (Migrated VM, Static VM) so the test scenarios would be clear.
#37 (comment)

EdDev · 2026-03-03T14:20:35Z

stps/sig-network/stuntime_measurement.md

+
+### **Feature Overview**
+
+Customers running live migration on secondary networks need predictable VM downtime. We need a way to detect regressions in migration behavior.


running live migration on secondary networks

This describes something different from what I suspect you intended.
There is such a feature on which migration actually runs on secondary networks (and not on the primary). What I suspect you intended to describe here is that the network connectivity traffic which is relevant for the "downtime" is passing through these networks.

IMO you should generalize the description here and do not mention which network.
In the scope, you can emphasize that the primary or other secondary network types are not in the scope.

I wasn't aware of this feature, thanks for the explanation. This is indeed not what I meant to describe.
I simplified this paragraph. The network scope (default pod network and other secondary types out of scope) is already covered in the Out of Scope section.

EdDev · 2026-03-03T14:24:01Z

stps/sig-network/stuntime_measurement.md

+### **Feature Overview**
+
+Customers running live migration on secondary networks need predictable VM downtime. We need a way to detect regressions in migration behavior.
+The feature defines and measures VM stuntime during live migration and establishes a baseline and a pass/fail threshold for testing (to be defined in the STD). Testing focuses on configurations used by the vast majority of our customers - secondary network configurations: Linux bridge and OVN localnet.


Suggested change

The feature defines and measures VM stuntime during live migration and establishes a baseline and a pass/fail threshold for testing (to be defined in the STD). Testing focuses on configurations used by the vast majority of our customers - secondary network configurations: Linux bridge and OVN localnet.

The feature defines and measures VM stuntime during live migration and establishes a baseline with a pass/fail threshold. Cover and focus on configurations used by the vast majority of our customers - secondary network configurations: Linux bridge and OVN localnet.

EdDev · 2026-03-03T14:43:54Z

stps/sig-network/stuntime_measurement.md

+| **Review Requirements**                | [x]  | Publish VM stuntime during live migration for users' awareness. To be measured on secondary networks - Linux bridge and OVN localnet. |          |
+| **Understand Value**                   | [x]  | Published stuntime lets users set expectations for VM downtime during live migration and compare OCP-V with other virtualization solutions. For QE, automated measurement provides on-demand stuntime data, and continuous test coverage catches regressions in live migration behavior.                      |          |
+| **Customer Use Cases**                 | [x]  | Customers need predictable VM downtime during VM live migration. |          |
+| **Testability**                        | [x]  | Stuntime is testable: connectivity gap is measurable from first packet loss to first packet recovery. The measuring scope (secondary networks, topologies) is well-defined. |          |


Suggested change

| **Testability** | [x] | Stuntime is testable: connectivity gap is measurable from first packet loss to first packet recovery. The measuring scope (secondary networks, topologies) is well-defined. | |

| **Testability** | [x] | Testable by measuring the time period in which network traffic is lost. | |

EdDev · 2026-03-03T15:01:48Z

stps/sig-network/stuntime_measurement.md

+| **Understand Value**                   | [x]  | Published stuntime lets users set expectations for VM downtime during live migration and compare OCP-V with other virtualization solutions. For QE, automated measurement provides on-demand stuntime data, and continuous test coverage catches regressions in live migration behavior.                      |          |
+| **Customer Use Cases**                 | [x]  | Customers need predictable VM downtime during VM live migration. |          |
+| **Testability**                        | [x]  | Stuntime is testable: connectivity gap is measurable from first packet loss to first packet recovery. The measuring scope (secondary networks, topologies) is well-defined. |          |
+| **Acceptance Criteria**                | [x]  | Stuntime measured in a BM environment to allow later publication in blog/KCS. Stuntime value must be easily retrievable from test logs to enable baseline updates and reports. |          |


In which env it is measured is irrelevant in this context.

"Easily retrievable" and "from test logs" are relative and assume implementation details. Please focus on the end result, like "Measured values can be publicly shared and used as formal thresholds".

To clarify, it is unlikely that we will take the numbers measured from the logs and post them anywhere.
Most likely, and this will be my expectation, is that the tests will measure values and assert that they are in a certain range. The range will be published and the tests will just make sure we are not outside of the range.
In case we detect the range changes, there is either a product issue or we need to update the published values.

Makes sense, thanks.

EdDev · 2026-03-03T15:02:36Z

stps/sig-network/stuntime_measurement.md

+| **Customer Use Cases**                 | [x]  | Customers need predictable VM downtime during VM live migration. |          |
+| **Testability**                        | [x]  | Stuntime is testable: connectivity gap is measurable from first packet loss to first packet recovery. The measuring scope (secondary networks, topologies) is well-defined. |          |
+| **Acceptance Criteria**                | [x]  | Stuntime measured in a BM environment to allow later publication in blog/KCS. Stuntime value must be easily retrievable from test logs to enable baseline updates and reports. |          |
+| **Non-Functional Requirements (NFRs)** | [x]  | Measured stuntime will be documented in a KCS or a Red Hat blog post. |          |


In what way this is relevant to the STP? I do not understand.

Maybe I don't understand NFRs correctly, I thought the documentation (KCS/blog) fell under this category.
Reading through the example, it does say docs specifically:

Confirmed coverage for NFRs, including Performance, Security, Usability, Downtime, Connectivity, Monitoring (alerts/metrics), Scalability, Portability (e.g., cloud support), and Docs.

If it doesn’t fit in this context, I’m happy to change it, could you clarify what you’d expect here?

EdDev · 2026-03-03T15:15:57Z

stps/sig-network/stuntime_measurement.md

+
+| Check                            | Done | Details/Notes                                                                                                                                           | Comments |
+|:---------------------------------|:-----|:--------------------------------------------------------------------------------------------------------------------------------------------------------|:---------|
+| **Developer Handoff/QE Kickoff** | [x]  | Not a new feature. |          |


The fact it is not a new product feature is not convincing. Any work is expected to pass through some kind of a kickoff to sync the relevant engineering members and clarify open question.

I had a meeting with you and with Petr to go over the epic and discuss measurement strategies. I've updated the details.

EdDev · 2026-03-03T15:20:33Z

stps/sig-network/stuntime_measurement.md

+| Check                            | Done | Details/Notes                                                                                                                                           | Comments |
+|:---------------------------------|:-----|:--------------------------------------------------------------------------------------------------------------------------------------------------------|:---------|
+| **Developer Handoff/QE Kickoff** | [x]  | Not a new feature. |          |
+| **Technology Challenges**        | [x]  | Stuntime is sensitive to network workload and infrastructure, so measured values may vary across different labs and environments. A single threshold will be established and validated across available BM environments (see STD). Per-environment adjustments will be considered only if specific labs show consistent deviation. |          |


I do not think you should mention STD in an STP.

You can mention what is needed, something like "a relative stable setup environment which can provide as stable results as possible".

Introduce STP for stuntime measurement of VMs through live migration across different migration scenarios, focusing on secondary networks: Linux bridge and OVN localnet.

Anatw · 2026-03-04T11:56:06Z

Changed - according to CR:

Feature Overview: Generalized; removed network-specific wording; scope now points to Section II.
Metadata: Clarified Feature in Jira (N/A, work tracked under Epic).
Requirements checklist: Updated Review Requirements, Understand Value, Customer Use Cases, Testability, Acceptance Criteria, Developer Handoff, and Technology Challenges per review feedback.
Scope of Testing: Clarified connectivity vs migration; added a pointer to Testing Goals.
Out of Scope: Confirmed default pod network and other secondary CNIs are listed.
Other: Minor edits and consistency fixes (e.g., Storage row, Upgrade wording).

github-actions bot added the sig-network label Feb 18, 2026

openshift-virtualization-qe-bot-5 requested a review from EdDev February 18, 2026 17:33

openshift-virtualization-qe-bot-5 assigned Anatw Feb 18, 2026

openshift-virtualization-qe-bot-5 mentioned this pull request Feb 18, 2026

stp, sig-network: Introduce the stuntime measurement STP - 37 #38

Open

openshift-virtualization-qe-bot-5 requested a review from azhivovk February 18, 2026 17:34

openshift-virtualization-qe-bot-5 added the branch-main label Feb 18, 2026

openshift-virtualization-qe-bot-5 requested review from servolkov and yossisegev February 18, 2026 17:34

openshift-virtualization-qe-bot-5 added the size/L label Feb 18, 2026

coderabbitai bot reviewed Feb 18, 2026

View reviewed changes

stps/sig-network/stuntime_measurement.md Outdated Show resolved Hide resolved

stps/sig-network/stuntime_measurement.md Outdated Show resolved Hide resolved

openshift-virtualization-qe-bot-4 added the commented-coderabbitai[bot] label Feb 18, 2026

azhivovk suggested changes Feb 19, 2026

View reviewed changes

openshift-virtualization-qe-bot-2 added the changes-requested-azhivovk label Feb 19, 2026

Anatw force-pushed the stuntime_measurement_stp branch from efd6fcc to 4cac21a Compare February 23, 2026 12:39

openshift-virtualization-qe-bot-2 requested a review from azhivovk February 23, 2026 12:40

openshift-virtualization-qe-bot-2 removed commented-coderabbitai[bot] changes-requested-azhivovk labels Feb 23, 2026

openshift-virtualization-qe-bot-4 added commented-Anatw commented-coderabbitai[bot] labels Feb 23, 2026

coderabbitai bot reviewed Feb 23, 2026

View reviewed changes

stps/sig-network/stuntime_measurement.md Outdated Show resolved Hide resolved

Anatw force-pushed the stuntime_measurement_stp branch from 4cac21a to b28b6da Compare February 23, 2026 12:57

openshift-virtualization-qe-bot-3 removed commented-Anatw commented-coderabbitai[bot] labels Feb 23, 2026

openshift-virtualization-qe-bot added the commented-coderabbitai[bot] label Feb 23, 2026

coderabbitai bot reviewed Feb 23, 2026

View reviewed changes

stps/sig-network/stuntime_measurement.md Outdated Show resolved Hide resolved

servolkov reviewed Feb 24, 2026

View reviewed changes

azhivovk reviewed Feb 26, 2026

View reviewed changes

stps/sig-network/stuntime_measurement.md Outdated Show resolved Hide resolved

stps/sig-network/stuntime_measurement.md Outdated Show resolved Hide resolved

openshift-virtualization-qe-bot-4 added the commented-azhivovk label Feb 26, 2026

servolkov approved these changes Feb 26, 2026

View reviewed changes

openshift-virtualization-qe-bot-3 added the lgtm-servolkov label Feb 26, 2026

Anatw force-pushed the stuntime_measurement_stp branch from b92c176 to ab95fab Compare March 1, 2026 15:34

openshift-virtualization-qe-bot-4 requested a review from azhivovk March 1, 2026 15:35

openshift-virtualization-qe-bot-4 removed commented-azhivovk commented-Anatw commented-coderabbitai[bot] lgtm-servolkov labels Mar 1, 2026

openshift-virtualization-qe-bot-4 requested a review from servolkov March 1, 2026 15:35

openshift-virtualization-qe-bot-3 added the commented-Anatw label Mar 1, 2026

azhivovk approved these changes Mar 1, 2026

View reviewed changes

openshift-virtualization-qe-bot-5 added the lgtm-azhivovk label Mar 1, 2026

servolkov approved these changes Mar 3, 2026

View reviewed changes

openshift-virtualization-qe-bot-5 added the lgtm-servolkov label Mar 3, 2026

EdDev suggested changes Mar 3, 2026

View reviewed changes

openshift-virtualization-qe-bot-5 added the changes-requested-EdDev label Mar 3, 2026

stp, sig-network: Introduce the stuntime measurement STP

83059df

Introduce STP for stuntime measurement of VMs through live migration across different migration scenarios, focusing on secondary networks: Linux bridge and OVN localnet.

Anatw force-pushed the stuntime_measurement_stp branch from ab95fab to 83059df Compare March 4, 2026 11:54

openshift-virtualization-qe-bot removed the changes-requested-EdDev label Mar 4, 2026

openshift-virtualization-qe-bot requested a review from servolkov March 4, 2026 11:54

openshift-virtualization-qe-bot removed lgtm-azhivovk commented-Anatw lgtm-servolkov labels Mar 4, 2026

openshift-virtualization-qe-bot requested a review from azhivovk March 4, 2026 11:54

Migration Scenario	Linux Bridge — Migrated VM	Linux Bridge — Static VM	OVN Localnet — Migrated VM	OVN Localnet — Static VM
Same node → different node	✓	✓	✓	✓
Different node → same node	✓	✓	✓	✓
Between two different nodes	✓	✓	✓	✓

		\| Feature in Jira \| https://issues.redhat.com/browse/CNV-72773 \|
		\| Jira Tracking \| https://issues.redhat.com/browse/CNV-78676 \|

		- Migrated VM: The VM that is live-migrated during the test.
		- Static VM: The peer VM that remains on its node throughout the test. Used as the reference point for migration direction (e.g., "from the static VM's node").


		### Feature Overview

		Customers running live migration on secondary networks need predictable VM downtime. We need a way to detect regressions in migration behavior.

	The feature defines and measures VM stuntime during live migration and establishes a baseline and a pass/fail threshold for testing (to be defined in the STD). Testing focuses on configurations used by the vast majority of our customers - secondary network configurations: Linux bridge and OVN localnet.
	The feature defines and measures VM stuntime during live migration and establishes a baseline with a pass/fail threshold. Cover and focus on configurations used by the vast majority of our customers - secondary network configurations: Linux bridge and OVN localnet.

	\| Testability \| [x] \| Stuntime is testable: connectivity gap is measurable from first packet loss to first packet recovery. The measuring scope (secondary networks, topologies) is well-defined. \| \|
	\| Testability \| [x] \| Testable by measuring the time period in which network traffic is lost. \| \|

Conversation

Anatw commented Feb 18, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What this PR does

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Feb 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reviews paused

Walkthrough

Changes

Estimated code review effort

Uh oh!

openshift-virtualization-qe-bot-5 commented Feb 18, 2026

Welcome! 🎉

🔄 Automatic Actions

📋 Available Commands

PR Status Management

Review & Approval

Testing & Validation

Cherry-pick Operations

Label Management

✅ Merge Requirements

📊 Review Process

💡 Tips

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Anatw commented Feb 23, 2026

Uh oh!

Anatw commented Feb 23, 2026

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Anatw commented Feb 26, 2026

Uh oh!

Uh oh!

Uh oh!

servolkov commented Feb 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Anatw commented Mar 1, 2026

Uh oh!

azhivovk left a comment

Choose a reason for hiding this comment

Uh oh!

EdDev left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Anatw commented Feb 18, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Feb 18, 2026 •

edited

Loading

servolkov commented Feb 26, 2026 •

edited

Loading