Log printableStatus for timeout and skip adding vnc screenshot for non booted vms#4041
Log printableStatus for timeout and skip adding vnc screenshot for non booted vms#4041geetikakay wants to merge 1 commit intoRedHatQE:mainfrom
Conversation
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughUpdated VNC screenshot collection to accept a Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes 🚥 Pre-merge checks | ✅ 1 | ❌ 2❌ Failed checks (1 warning, 1 inconclusive)
✅ Passed checks (1 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Tip Try Coding Plans. Let us write the prompt for your AI agent so you can ship faster (with fewer bugs). Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
Report bugs in Issues Welcome! 🎉This pull request will be automatically processed with the following features: 🔄 Automatic Actions
📋 Available CommandsPR Status Management
Review & Approval
Testing & Validation
Container Operations
Cherry-pick Operations
Label Management
✅ Merge RequirementsThis PR will be automatically approved when the following conditions are met:
📊 Review ProcessApprovers and ReviewersApprovers:
Reviewers:
Available Labels
💡 Tips
For more information, please refer to the project documentation or contact the maintainers. |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@utilities/virt.py`:
- Around line 1797-1805: The else branch lacks a status log before taking a VNC
screenshot; update the timeout handling so that before calling
collect_vnc_screenshot_for_vms(vm_name=vm.name, vm_namespace=vm.namespace) you
log the VM printable status (use printable_status variable and LOGGER.error
consistent with the existing pattern), e.g. an error message indicating VM
{vm.name} timed out with printableStatus '{printable_status}' after
{wait_until_running_timeout}s and that a VNC screenshot is being captured; keep
the existing VM_ERROR_STATUSES branch and re-raise behavior unchanged.
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #4041 +/- ##
==========================================
+ Coverage 98.56% 98.60% +0.04%
==========================================
Files 25 25
Lines 2297 2372 +75
==========================================
+ Hits 2264 2339 +75
Misses 33 33
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
0d81668 to
959ba32
Compare
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@utilities/virt.py`:
- Around line 1797-1809: The current handler can lose the original
TimeoutExpiredError if collect_vnc_screenshot_for_vms raises, and it uses a
forbidden "# type: ignore[arg-type]"; fix by preserving and re-raising the
original exception and by not suppressing type errors: capture the original
exception into a variable in the except block (e.g., original_exc), then call
collect_vnc_screenshot_for_vms inside its own try/except that logs any
screenshot failures via LOGGER.exception/LOGGER.error (don’t let that exception
propagate), remove the "# type: ignore[arg-type]" and pass correct typed
arguments (or adjust collect_vnc_screenshot_for_vms signature to accept vm.name
and vm.namespace as strings), and finally re-raise the preserved original
exception (use "raise" or re-raise original_exc to keep the original traceback).
|
/build-and-push-container |
|
New container for quay.io/openshift-cnv/openshift-virtualization-tests:pr-4041 published |
|
/verified |
Test execution plan request detailsCRITICAL: You MUST post an inline review comment on the first changed line of the first file. As an expert software testing engineer, analyze all modified files in this PR and create a targeted test execution plan. Analysis Requirements:
Your deliverable: Test Execution Plan
Guidelines:
Hardware-Related Checks (SR-IOV, GPU, DPDK): When PR modifies fixtures for hardware-specific resources:
CRITICAL WORKFLOW COMPLETION RULES: When responding to this test execution plan request, you MUST follow these rules EXACTLY:
ABSOLUTE PROHIBITIONS (violating these creates empty/meaningless reviews):
Remember: The pull request review is visible to users. Additional comments are redundant noise. CRITICAL: Before setting "Run smoke tests: True", you MUST use shell scripts to: |
|
Tip For best results, initiate chat on the files or code changes. 🧩 Analysis chain🏁 Script executed: Length of output: 1490 🏁 Script executed: Length of output: 1994 🏁 Script executed: Length of output: 1735 🧠 Learnings used |
utilities/virt.py
Outdated
| except TimeoutExpiredError: | ||
| LOGGER.error(f"VM {vm.name} unresponsive after migration; getting VNC screenshot") | ||
| collect_vnc_screenshot_for_vms(vm_name=vm.name, vm_namespace=vm.namespace) | ||
| LOGGER.error(f"VM {vm.name} unresponsive after migration.") |
There was a problem hiding this comment.
the error is partial; could also fail on wait_for_vm_interfaces
based on the pr description, a proper error message should be added to wait_for_interfaces and check_ssh_connectivity and not add a generic error here
There was a problem hiding this comment.
right adding reference to failed_phase was missing. I have added.
utilities/data_collector.py
Outdated
| if printable_status not in (VirtualMachine.Status.RUNNING, VirtualMachine.Status.MIGRATING): | ||
| LOGGER.warning(f"Skipping VNC screenshot for VM {vm.name}, status is '{printable_status}'.") | ||
| return | ||
|
|
||
| base_dir = get_data_collector_base_directory() | ||
| utilities.infra.run_virtctl_command( | ||
| command=shlex.split(f"vnc screenshot {vm_name} -f {base_dir}/{vm_namespace}-{vm_name}.png"), | ||
| namespace=vm_namespace, | ||
| command=shlex.split(f"vnc screenshot {vm.name} -f {base_dir}/{vm.namespace}-{vm.name}.png"), | ||
| namespace=vm.namespace, | ||
| ) |
There was a problem hiding this comment.
pls rewrite:
if status in (running, migrating):
do screenshot
else:
There was a problem hiding this comment.
I used the negative checks to avoid extra nesting and keep the happy path flat.
There was a problem hiding this comment.
I don't see the benefit in your approach
Positive checks easier to read.
Early return make sense in big functions, but this is a short function with only 2 possible ways: if possible - do screenshot, else - print message
|
/verified |
Test execution plan request detailsCRITICAL: You MUST post an inline review comment on the first changed line of the first file. As an expert software testing engineer, analyze all modified files in this PR and create a targeted test execution plan. Analysis Requirements:
Your deliverable: Test Execution Plan
Guidelines:
Hardware-Related Checks (SR-IOV, GPU, DPDK): When PR modifies fixtures for hardware-specific resources:
CRITICAL WORKFLOW COMPLETION RULES: When responding to this test execution plan request, you MUST follow these rules EXACTLY:
ABSOLUTE PROHIBITIONS (violating these creates empty/meaningless reviews):
Remember: The pull request review is visible to users. Additional comments are redundant noise. CRITICAL: Before setting "Run smoke tests: True", you MUST use shell scripts to: |
|
Tip For best results, initiate chat on the files or code changes. 🧩 Analysis chain🏁 Script executed: Length of output: 1735 🧠 Learnings used |
utilities/data_collector.py
Outdated
| if printable_status not in (VirtualMachine.Status.RUNNING, VirtualMachine.Status.MIGRATING): | ||
| LOGGER.warning(f"Skipping VNC screenshot for VM {vm.name}, status is '{printable_status}'.") | ||
| return | ||
|
|
||
| base_dir = get_data_collector_base_directory() | ||
| utilities.infra.run_virtctl_command( | ||
| command=shlex.split(f"vnc screenshot {vm_name} -f {base_dir}/{vm_namespace}-{vm_name}.png"), | ||
| namespace=vm_namespace, | ||
| command=shlex.split(f"vnc screenshot {vm.name} -f {base_dir}/{vm.namespace}-{vm.name}.png"), | ||
| namespace=vm.namespace, | ||
| ) |
There was a problem hiding this comment.
I don't see the benefit in your approach
Positive checks easier to read.
Early return make sense in big functions, but this is a short function with only 2 possible ways: if possible - do screenshot, else - print message
utilities/virt.py
Outdated
| LOGGER.error( | ||
| f"VM {vm.name} timed out during '{failed_phase}' with status '{printable_status}' " | ||
| f"after {wait_until_running_timeout}s." | ||
| ) |
There was a problem hiding this comment.
as @rnetser said - proper error messages should be added directly in functions wait_for_interfaces and check_ssh_connectivity
Here you shoud not print any generic error messages, the idea of current try/except was - collect vnc screenshot before raising the exceptions. Addin logger messages related to other functions here does not make sense.
do not get and print VM status here. Do that all that in collect_vnc_screenshot_for_vms.
wait_for_vm_interfaces: - Validate reported interface count against VMI spec wait_for_ssh_connectivity: - Add try/except TimeoutExpiredError with error log on timeout. - Replace sleep=5 with TIMEOUT_5SEC constant. wait_for_running_vm / verify_vm_migrated: - Move specific error logging into each helper so callers own their diagnostics collect_vnc_screenshot_for_vms: - Fetch printable_status ,Capture screenshots only for Running or Migrating VMs and log a warning and skip for non booted vmc Signed-off-by: Geetika Kapoor <gkapoor@redhat.com>
Short description:
all VM timeout failures produced a generic TimeoutExpiredError with no indication of the root cause.The code always attempted a vnc screenshot on timeout sometimes that step is not needed if vmi is not available.
example : $ virtctl -n default vnc screenshot rhel-10-peach-whippet-55 -f /tmp/abc.png
Can't access VMI rhel-10-peach-whippet-55: VMI is not running
More details:
What this PR does / why we need it:
this is observed while triaging and debugging a test failure related to "reason": "ErrImagePull", status 500 (Internal Server Error)
Which issue(s) this PR fixes:
Special notes for reviewer:
jira-ticket:
Summary by CodeRabbit