Skip to content
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
21 changes: 16 additions & 5 deletions utilities/virt.py
Original file line number Diff line number Diff line change
Expand Up @@ -138,12 +138,23 @@ def wait_for_vm_interfaces(vmi: VirtualMachineInstance, timeout: int = TIMEOUT_1
timeout=timeout,
)
LOGGER.info(f"Wait for {vmi.name} network interfaces")
expected_names = {
iface.name for iface in vmi.instance.spec.domain.devices.interfaces if iface.get("state") != "absent"
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

have you ever seen the different in number of interfaces between vmi.instance.spec.domain.devices.interfaces and .status.interfaces`?

wrapping to try/except and print clear error message is a good initiative, but may be it would be enough to keep old logic and print interfaces and active_interfaces

Copy link
Copy Markdown
Contributor Author

@geetikakay geetikakay Mar 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have done these tests with this PR:

Scenario Spec Interfaces Status Interfaces expected_names Result Log Output old logic
All interfaces up default, nic-1, nic-2 default(enp1s0), nic-1(enp2s0), nic-2(enp3s0) {'default','nic-1','nic-2'} works  
Hot-plug nic-3 default, nic-1, nic-2, nic-3 default(enp1s0), nic-1(enp2s0), nic-2(enp3s0), nic-3(enp4s0) {'default','nic-1','nic-2','nic-3'} works  
Hot-unplug nic-3 (state: absent) default, nic-1, nic-2, nic-3(absent) default(enp1s0), nic-1(enp2s0), nic-2(enp3s0) {'default','nic-1','nic-2'} works  
nic-test missing after migration default, nic-1, nic-2, nic-test default(enp1s0), nic-1(enp2s0), nic-2(enp3s0) {'default','nic-1','nic-2','nic-test'} timeout Missing (not reported by guest agent): {'nic-test'} silently returns True (3 == 3)
  • scenario 4 tracks it,, missing interface.. old logic would silently pass it.

  • by comparing reported_names == expected_names we now detecting when a spec interface doesn't show up in status and report

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

scenario 4 is interesting, but the guest-agent output already checked by if interface.get("interfaceName")]

We have observed some flaky issue with guest-agent interfaces after migration, and it was caught by existing automation:

12:29:12  2026-03-03T19:29:12.058304 timeout_sampler INFO Waiting for 720 seconds [0:12:00], retry every 1 seconds. (Function: utilities.virt.<locals>.lambda: vmi.instance)
12:29:12  2026-03-03T19:29:12.058163 utilities.virt INFO Wait for first-vm-for-aaq-test-1772566081-4850352 network interfaces
12:41:18  FAILED

and in collected VMI status we don't have guest-agent as a infoSource and don't have "interfaceName":

  interfaces:
  - infoSource: domain
    ipAddress: 10.128.0.99
    ipAddresses:
    - 10.128.0.99
    - fd02:0:0:1::63
    linkState: up
    mac: 02:7e:eb:3c:67:42
    name: default
    podInterfaceName: eth0
    queueCount: 1

I agree - adding something meaninfgul like echo interfaces and active_interfaces to error message can be very useful, but may be the existing logic works fine and you are hunting for the same flaky issue..

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was looking at https://github.com/kubevirt/kubevirt/blob/main/tests/libnet/hotplug.go#L51-L54 and for dynamic changes it looked for vmi.spec.
this code change introduces similar spec comparison approach as VerifyDynamicInterfaceChange which will make checks more reliable.from the testing i did this change behaves as expected in all scenarios

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not against of updating logic, just trying to understand if we really need it and how safe is it.

You are updating the global function which is used in almost all tests in the repo.
However, from my understanding, we can get absent state only when we hot-unplug the interface. If we have any "absent" interface in all other regular scenarios it would sound as a bug which your changes potentially could hide.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dshchedr thanks for reviewing this, I really appreciate it. These are very valid questions and I agree we should be thoughtful about the impact since this helper is used across many scenarios. It makes sense to have clear discussion around the behavior change
I looked in code about state: absent ,based on upstream code validation check it can happen only on non default interfaces and never on default interface. admission webhooks checks and rejects for default.
there is a test coverage in that area as well .

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sharing the links I came across while digging into this concept in case they are useful for discussion and can be referred later if needed

}
sampler = TimeoutSampler(wait_timeout=timeout, sleep=1, func=lambda: vmi.instance)
for sample in sampler:
interfaces = sample.get("status", {}).get("interfaces", [])
active_interfaces = [interface for interface in interfaces if interface.get("interfaceName")]
if len(active_interfaces) == len(interfaces):
return True
reported_names = set()
try:
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for sample in sampler:
interfaces = sample.get("status", {}).get("interfaces", [])
reported_names = {iface.get("name") for iface in interfaces if iface.get("name")}
if reported_names == expected_names and all(iface.get("interfaceName") for iface in interfaces):
Comment thread
coderabbitai[bot] marked this conversation as resolved.
return True
Comment thread
geetikakay marked this conversation as resolved.
except TimeoutExpiredError:
LOGGER.error(
f"VMI {vmi.name}: expected interfaces: {expected_names}, reported: {reported_names}. "
f"Missing (not reported by guest agent): {expected_names - reported_names}."
Comment thread
geetikakay marked this conversation as resolved.
)
raise
return False


Expand Down
Loading