-
Notifications
You must be signed in to change notification settings - Fork 237
[5763448][ONNX][Autocast] Fix Resize input type mismatch error #757
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[5763448][ONNX][Autocast] Fix Resize input type mismatch error #757
Conversation
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #757 +/- ##
=======================================
Coverage 74.22% 74.23%
=======================================
Files 192 192
Lines 19027 19033 +6
=======================================
+ Hits 14123 14129 +6
Misses 4904 4904 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
galagam
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
96abee2 to
e8bc39e
Compare
|
Important Review skippedAuto incremental reviews are disabled on this repository. Please check the settings in the CodeRabbit UI or the You can disable this status message by setting the 📝 WalkthroughWalkthroughThe changes introduce a new constant duplication feature to GraphSanitizer that is invoked during graph sanitization, along with test coverage for Conv-Resize model handling in mixed-precision conversion workflows. Changes
Estimated code review effort🎯 2 (Simple) | ⏱️ ~10 minutes 🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (2)
tests/_test_utils/onnx/lib_test_models.py (1)
958-972: Consider adding a brief comment explaining the shared constant pattern.The intentional reuse of
resize_roi_scalesfor both roi and scales inputs is the core of this test. A brief comment would help future maintainers understand this is deliberate, not accidental.📝 Suggested clarifying comment
helper.make_node( op_type="Resize", + # Note: resize_roi_scales is intentionally used for both roi and scales inputs + # to test the shared constant duplication fix (PR #757) inputs=[ "conv1_conv/Conv2D:0", "resize_roi_scales", "resize_roi_scales", "resize_sizes", ], outputs=["output_0"], name="resize1_resize/Resize", coordinate_transformation_mode="asymmetric", cubic_coeff_a=-0.75, mode="nearest", nearest_mode="floor", ),tests/unit/onnx/autocast/test_autocast.py (1)
193-210: Test validates the fix but could be more explicit about what it's testing.The test successfully verifies that a Conv-Resize model can be converted to mixed precision without errors. However:
The comment on line 205 mentions "QDQ node placements" but the test doesn't check QDQ nodes—consider updating the comment to match the actual assertion.
Since this PR specifically fixes the Resize input type mismatch, consider adding an assertion that verifies the Resize node's data input is FP16 while its
roi,scales, andsizesinputs remain their original types (the core bug was that shared constants caused type conflicts).💡 Suggested improvements
# Output model should be produced in the same tmp_path output_onnx_path = onnx_path.replace(".onnx", ".fp16.onnx") onnx.save(converted_model, output_onnx_path) - # Load the output model and check QDQ node placements + # Load the output model and verify conversion graph = gs.import_onnx(converted_model) # Check that Conv is converted conv_nodes = [n for n in graph.nodes if "Conv" in n.op] assert assert_input_precision(conv_nodes) + # Check that Resize node has correct input types: + # - Data input (index 0) should be FP16 + # - roi/scales/sizes inputs should remain their original types + resize_nodes = [n for n in graph.nodes if n.op == "Resize"] + assert len(resize_nodes) == 1 + resize_node = resize_nodes[0] + assert resize_node.inputs[0].dtype == np.float16, "Resize data input should be FP16"
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
modelopt/onnx/autocast/graphsanitizer.pytests/_test_utils/onnx/lib_test_models.pytests/unit/onnx/autocast/test_autocast.py
🧰 Additional context used
🧬 Code graph analysis (2)
tests/unit/onnx/autocast/test_autocast.py (2)
tests/_test_utils/onnx/lib_test_models.py (1)
build_conv_resize_model(929-1009)modelopt/onnx/autocast/convert.py (1)
convert_to_mixed_precision(46-189)
tests/_test_utils/onnx/lib_test_models.py (1)
modelopt/onnx/utils.py (2)
infer_shapes(731-744)check_model(555-567)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
- GitHub Check: linux
- GitHub Check: wait-checks / wait
- GitHub Check: wait-checks / wait
- GitHub Check: code-quality
- GitHub Check: build-docs
🔇 Additional comments (4)
modelopt/onnx/autocast/graphsanitizer.py (2)
70-70: Placement ofduplicate_shared_constants()looks appropriate.Calling this early in
sanitize(), after ensuring the graph name but before naming nodes, is a sensible ordering. This ensures shared constants are duplicated before subsequent transformations that might depend on unique tensor names.
258-262: API contract verified — no issues found.The utility function
onnx_utils.duplicate_shared_constantsexists inmodelopt/onnx/utils.pyat line 524 with the expected signature:def duplicate_shared_constants(onnx_model: onnx.ModelProto) -> tuple[onnx.ModelProto, bool]:. The import statement is correct, and the tuple unpacking in the method correctly handles the(model, bool)return type. The code is implemented correctly.tests/_test_utils/onnx/lib_test_models.py (1)
929-1009: Test model correctly reproduces the shared constant scenario.The
build_conv_resize_model()function intentionally usesresize_roi_scalesfor both theroiandscalesinputs of the Resize node (inputs at indices 1 and 2). This shared constant pattern is exactly what triggers the type mismatch error during FP16 conversion, making this an effective regression test for the bug fix.The model structure and validation follow the established patterns in this file.
tests/unit/onnx/autocast/test_autocast.py (1)
23-23: LGTM!Import correctly extended to include
build_conv_resize_model.
80325df to
aa68114
Compare
Signed-off-by: gcunhase <4861122+gcunhase@users.noreply.github.com>
Signed-off-by: gcunhase <4861122+gcunhase@users.noreply.github.com>
Signed-off-by: gcunhase <4861122+gcunhase@users.noreply.github.com>
Signed-off-by: gcunhase <4861122+gcunhase@users.noreply.github.com>
Signed-off-by: gcunhase <4861122+gcunhase@users.noreply.github.com>
Signed-off-by: gcunhase <4861122+gcunhase@users.noreply.github.com>
Signed-off-by: gcunhase <4861122+gcunhase@users.noreply.github.com>
Signed-off-by: gcunhase <4861122+gcunhase@users.noreply.github.com>
2fdc09f to
7e9c6f4
Compare
## What does this PR do? **Type of change:** Bug fix **Overview:** This PR fixes an input type mismatch in Resize layers when being converted to FP16. ## Usage ```python $ python -m modelopt.onnx.autocast --onnx_path=$MODEL_NAME.onnx ``` ## Testing Added unittest. ## Before your PR is "*Ready for review*" <!-- If you haven't finished some of the above items you can still open `Draft` PR. --> - **Make sure you read and follow [Contributor guidelines](https://github.com/NVIDIA/Model-Optimizer/blob/main/CONTRIBUTING.md)** and your commits are signed. - **Is this change backward compatible?**: Yes - **Did you write any new necessary tests?**: Yes - **Did you add or update any necessary documentation?**: No - **Did you update [Changelog](https://github.com/NVIDIA/Model-Optimizer/blob/main/CHANGELOG.rst)?**: No ## Additional Information This issue is also fixed by using the standalone type inference logic from #719. <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit ## Release Notes * **Improvements** * Enhanced the graph sanitization process to automatically duplicate shared constants during optimization, ensuring improved model handling and consistency. * **Tests** * Added test coverage for mixed precision conversion of Conv-Resize model architectures. <sub>✏️ Tip: You can customize this high-level summary in your review settings.</sub> <!-- end of auto-generated comment: release notes by coderabbit.ai --> --------- Signed-off-by: gcunhase <4861122+gcunhase@users.noreply.github.com>
What does this PR do?
Type of change: Bug fix
Overview: This PR fixes an input type mismatch in Resize layers when being converted to FP16.
Usage
Testing
Added unittest.
Before your PR is "Ready for review"
Additional Information
This issue is also fixed by using the standalone type inference logic from #719.
Summary by CodeRabbit
Release Notes
Improvements
Tests
✏️ Tip: You can customize this high-level summary in your review settings.