-
Notifications
You must be signed in to change notification settings - Fork 237
[5750013][5591945][5360813]: AutoCast standalone implementation for type inference #719
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually. Contributors can view more details about this message here. |
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #719 +/- ##
==========================================
- Coverage 74.23% 74.19% -0.04%
==========================================
Files 192 192
Lines 19033 19236 +203
==========================================
+ Hits 14129 14273 +144
- Misses 4904 4963 +59 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
a659cad to
7caedc7
Compare
|
This is not related to this PR, but a shape inference issue I encountered previously. It was happening due to using strict mode in shape inference. Would it be possible to not use the strict mode and use the default mode itself? |
modelopt/onnx/autocast/convert.py
Outdated
| if use_standalone_type_inference: | ||
| model = onnx_utils.infer_types(model) | ||
| else: | ||
| model = onnx_utils.infer_shapes(model) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you create a util function for this as it is reused in multiple places?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, done
| if not self.use_standalone_type_inference: | ||
| for idx, d in enumerate(inp.type.tensor_type.shape.dim): | ||
| if d.dim_value: | ||
| inp.type.tensor_type.shape.dim[idx].dim_param = "unk" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similarly for this, can we create a util function?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Leaving it to next refactor, see - #717 (comment)
| if not inp_name: | ||
| continue |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we expect any inputs with empty names?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's a node input, not a graph input, and it happens for optional inputs, e.g. resize with scales and roi input is empty. Added a comment to clarify.
| for attr in node.attribute: | ||
| if attr.name == "value" and attr.type == onnx.AttributeProto.TENSOR: | ||
| if attr.t.HasField("data_type"): | ||
| const_type = attr.t.data_type |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This pattern is used in multiple places in our codebase, should we create a utility function for it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same as #719 (comment)
modelopt/onnx/utils.py
Outdated
| tensor_types[init_name] = init.data_type | ||
|
|
||
| # Helper function to get tensor type | ||
| def get_tensor_type(tensor_name: str) -> int | None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we re-use _get_tensor_type() in modelopt.onnx.utils.py or, if not, move this function to modelopt.onnx.utils.py as _get_tensor_type_from_tensor_name or a variation of that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Refactored to reuse _get_tensor_type() to get the type from the value info, and renamed local function to get_tensor_type_from_name for clarity.
| break | ||
| assert const_type is not None | ||
| output_types = [const_type] | ||
| elif node.op_type == "ConstantOfShape": |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would this PR fix the issue in bug 5763424?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@gcunhase Yes, it should fix that as well. I address ConstantOfShape as one of the special cases:
elif node.op_type == "ConstantOfShape":
# ConstantOfShape: output type is from the value attribute's tensor data_type
# If no value attribute, defaults to FLOAT
# Note: Schema allows multiple types, so we need to check the value attribute
const_type = None
for attr in node.attribute:
if attr.name == "value" and attr.type == onnx.AttributeProto.TENSOR:
if attr.t.HasField("data_type"):
const_type = attr.t.data_type
break
assert const_type is not None
output_types = [const_type]
7caedc7 to
1d2fdf6
Compare
|
Important Review skippedAuto incremental reviews are disabled on this repository. Please check the settings in the CodeRabbit UI or the You can disable this status message by setting the 📝 WalkthroughWalkthroughIntroduces a new Changes
Sequence Diagram(s)sequenceDiagram
participant CLI as CLI (__main__.py)
participant Convert as Convert Logic (convert.py)
participant Utils as Type Inference (utils.py)
participant Converter as PrecisionConverter
CLI->>Convert: convert_to_mixed_precision(use_standalone_type_inference=True)
Convert->>Utils: infer_types(model, use_standalone_type_inference=True)
Utils->>Utils: _infer_types_only(model)
Utils->>Utils: Iterate graphs & compute types<br/>(Cast, Quantize, Constant, etc.)
Utils->>Utils: infer_types_verification(model)
Utils-->>Convert: model with types set
Convert->>Converter: PrecisionConverter(model, use_standalone_type_inference=True)
Converter->>Converter: Conditional type/shape clearing<br/>based on flag
Converter-->>Convert: Precision conversion complete
Convert-->>CLI: Modified ONNX model
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes 🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
1d2fdf6 to
a35b63c
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🤖 Fix all issues with AI agents
In @tests/unit/onnx/autocast/test_precisionconverter.py:
- Around line 577-585: The test's expected low_precision_nodes list uses
outdated node names "concat" and "concat_dims" that no longer match the model
nodes "concat1" and "concat2"; update the test to reference the actual node
names by replacing "concat" -> "concat1" and "concat_dims" -> "concat2" in the
low_precision_nodes array (the list that currently reads ["transpose", "concat",
"size", "div", "concat_dims", "reshape"]) so the nodes "concat1" and "concat2"
(created by helper.make_node) are correctly classified during the precision
conversion assertions.
🧹 Nitpick comments (3)
modelopt/onnx/utils.py (1)
731-1091: Consider refactoring for maintainability.The
_infer_types_onlyfunction is ~360 lines long with significant complexity. While functionally sound, consider extracting helper functions for better maintainability:
str_to_tensor_dtype(lines 806-838) could be a module-level constant dictionary- Special operator handlers (Cast, DequantizeLinear, QuantizeLinear, etc.) could be separate functions
- Schema-based inference logic (lines 1005-1063) could be extracted
This would improve testability and make the code easier to maintain as the feature matures beyond its experimental status.
modelopt/onnx/autocast/precisionconverter.py (1)
260-267: Consider clarifying the redundant call pattern when using standalone type inference.When
use_standalone_type_inference=True, the_infer_types_onlyfunction is called (which ignoresstrict_modeandcheck_typekwargs). The second call at lines 265-267 effectively duplicates the first call since_ensure_types_are_defined()at line 263 only modifies UNDEFINED types but doesn't require re-inference.This differs from the pattern in
_remove_redundant_casts(lines 1191-1197) where the second call is conditionally skipped when using standalone type inference.The current code works correctly, but consider either:
- Adding a comment explaining why both calls are needed in standalone mode, or
- Aligning with the conditional pattern used in
_remove_redundant_caststests/unit/onnx/autocast/test_precisionconverter.py (1)
1672-1674: Intentional exclusion from standalone type inference testing.This fixture and its test (
test_if_subgraph_outer_scope_type_preservation) useinfer_shapesdirectly and are not parametrized withuse_standalone_type_inference. This appears intentional since this specific test focuses on preserving outer scope variable types in If subgraphs, which may have specific behavior tied to shape inference.Consider adding a comment explaining why this test doesn't use the standalone type inference path, to prevent future maintainers from inadvertently "fixing" this.
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (7)
CHANGELOG.rstdocs/source/guides/8_autocast.rstmodelopt/onnx/autocast/__main__.pymodelopt/onnx/autocast/convert.pymodelopt/onnx/autocast/precisionconverter.pymodelopt/onnx/utils.pytests/unit/onnx/autocast/test_precisionconverter.py
🧰 Additional context used
🧬 Code graph analysis (4)
modelopt/onnx/autocast/convert.py (1)
modelopt/onnx/utils.py (1)
infer_types(1162-1182)
modelopt/onnx/utils.py (1)
modelopt/onnx/autocast/utils.py (1)
walk_subgraphs_recursive(126-158)
modelopt/onnx/autocast/precisionconverter.py (1)
modelopt/onnx/utils.py (1)
infer_types(1162-1182)
tests/unit/onnx/autocast/test_precisionconverter.py (2)
modelopt/onnx/autocast/utils.py (1)
setup_mappings(33-60)modelopt/onnx/utils.py (1)
infer_types(1162-1182)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
- GitHub Check: linux
- GitHub Check: build-docs
- GitHub Check: code-quality
🔇 Additional comments (14)
CHANGELOG.rst (1)
21-21: LGTM! Clear changelog entry for experimental feature.The changelog entry appropriately describes the new standalone type inference option as experimental and clearly explains its purpose as a workaround for shape inference issues.
docs/source/guides/8_autocast.rst (1)
45-45: LGTM! Comprehensive documentation of the new feature.The documentation properly:
- Marks the feature as experimental
- Explains it's a workaround for shape inference failures
- Notes implementation details (graphsurgeon, special operators)
- Warns about potential limitations
- Provides clear usage examples for both Python API and CLI
Also applies to: 86-89, 152-159, 214-218
modelopt/onnx/autocast/__main__.py (1)
188-197: LGTM! Clean CLI integration.The CLI flag is properly integrated with appropriate help text that aligns with the documentation and marks the feature as experimental.
Also applies to: 231-231
modelopt/onnx/utils.py (3)
1005-1063: Schema-based inference is complex but well-structured.The fallback schema-based inference implements a multi-strategy approach:
- Explicit tensor types from schema
- Type constraints with single allowed types
- Placeholder matching to input types
- Default propagation from first input
The debug logging and fallback behavior are appropriate for an experimental feature. Be aware that this may not handle all operator type relationships correctly, but this aligns with the documented limitations of standalone type inference being "less robust than ONNX's implementation for edge cases."
1094-1143: LGTM! Solid verification logic.The
infer_types_verificationfunction properly:
- Computes reachable tensors from inputs/initializers
- Only validates reachable tensors (avoiding false positives from unreachable nodes)
- Provides clear error messages with tensor names
- Uses appropriate error handling
This defensive programming will help catch incomplete type inference early.
1162-1183: LGTM! Clean and well-documented public API.The
infer_typesfunction provides a clear toggle between standalone type inference and ONNX shape inference. The docstring is comprehensive, and passing**kwargstoinfer_shapesmaintains flexibility.modelopt/onnx/autocast/convert.py (2)
64-64: LGTM! Consistent API integration in convert_to_mixed_precision.The new
use_standalone_type_inferenceparameter is properly:
- Added to the function signature with appropriate default (False)
- Documented as a workaround for shape inference issues
- Passed to both
infer_typesandPrecisionConverter- Marked as experimental in the docstring
Also applies to: 89-91, 139-139, 171-171
204-204: LGTM! Consistent API integration in convert_to_f16.The parameter integration mirrors
convert_to_mixed_precisionconsistently, maintaining API coherence across both conversion functions.Also applies to: 217-219, 237-237, 249-249
modelopt/onnx/autocast/precisionconverter.py (4)
100-145: LGTM!The new parameter
use_standalone_type_inferenceis properly added to the constructor signature, documented in the docstring, and stored as an instance attribute.
287-345: LGTM!The conditional shape-clearing logic is well-implemented. When
use_standalone_type_inference=True, only types are cleared (not shapes), which correctly aligns with the feature's intent to separate type inference from shape inference. The gating at lines 311-314, 326-329, and 340-343 properly preserves shape information in standalone mode.
1188-1197: LGTM!The conditional skipping of the second
infer_typescall is correct. When using standalone type inference, thecheck_type=Truevalidation isn't supported, so skipping this call avoids passing unused parameters.
1282-1287: LGTM!The type inference integration in
_fix_network_output_namescorrectly uses the newinfer_typeswrapper with the instance flag.tests/unit/onnx/autocast/test_precisionconverter.py (2)
35-41: LGTM!The helper function centralizes the type/shape inference and mapping setup pattern, reducing duplication across test fixtures. Note that this intentionally has a different signature than
utils.setup_mappings(returns 4-tuple with model vs 3-tuple without).
73-104: LGTM!Tests are properly parametrized with
use_standalone_type_inferenceand the parameter is correctly threaded through toPrecisionConverterinitialization.
a35b63c to
58fe117
Compare
CHANGELOG.rst
Outdated
| - Add support for parallel draft heads in Eagle speculative decoding. | ||
| - Add support to enable custom emulated quantization backend. See :meth:`register_quant_backend <modelopt.torch.quantization.nn.modules.tensor_quantizer.register_quant_backend>`` for more details. See an example in ``tests/unit/torch/quantization/test_custom_backend.py``. | ||
| - Add ``examples/llm_qad`` for QAD training with Megatron-LM. | ||
| - Add standalone type inference option (``--use_standalone_type_inference``) in ONNX AutoCast as an alternative to ONNX's ``infer_shapes``. This experimental feature performs type-only inference without shape inference, useful as a workaround when shape inference fails or to avoid unnecessary shape inference overhead. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this a bug fix for 0.41? Else please move to 0.42 section
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@kevalmorabia97 it's a long-standing known issue. I don't know how to classify this exactly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If it can be merged by EoD today then we can include in 0.41 else it will need to wait till 0.42
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nvm, will push to 0.42.0
## What does this PR do? **Type of change:** Bug fix **Overview:** This PR fixes an input type mismatch in Resize layers when being converted to FP16. ## Usage ```python $ python -m modelopt.onnx.autocast --onnx_path=$MODEL_NAME.onnx ``` ## Testing Added unittest. ## Before your PR is "*Ready for review*" <!-- If you haven't finished some of the above items you can still open `Draft` PR. --> - **Make sure you read and follow [Contributor guidelines](https://github.com/NVIDIA/Model-Optimizer/blob/main/CONTRIBUTING.md)** and your commits are signed. - **Is this change backward compatible?**: Yes - **Did you write any new necessary tests?**: Yes - **Did you add or update any necessary documentation?**: No - **Did you update [Changelog](https://github.com/NVIDIA/Model-Optimizer/blob/main/CHANGELOG.rst)?**: No ## Additional Information This issue is also fixed by using the standalone type inference logic from #719. <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit ## Release Notes * **Improvements** * Enhanced the graph sanitization process to automatically duplicate shared constants during optimization, ensuring improved model handling and consistency. * **Tests** * Added test coverage for mixed precision conversion of Conv-Resize model architectures. <sub>✏️ Tip: You can customize this high-level summary in your review settings.</sub> <!-- end of auto-generated comment: release notes by coderabbit.ai --> --------- Signed-off-by: gcunhase <4861122+gcunhase@users.noreply.github.com>
## What does this PR do? **Type of change:** Bug fix **Overview:** This PR fixes an input type mismatch in Resize layers when being converted to FP16. ## Usage ```python $ python -m modelopt.onnx.autocast --onnx_path=$MODEL_NAME.onnx ``` ## Testing Added unittest. ## Before your PR is "*Ready for review*" <!-- If you haven't finished some of the above items you can still open `Draft` PR. --> - **Make sure you read and follow [Contributor guidelines](https://github.com/NVIDIA/Model-Optimizer/blob/main/CONTRIBUTING.md)** and your commits are signed. - **Is this change backward compatible?**: Yes - **Did you write any new necessary tests?**: Yes - **Did you add or update any necessary documentation?**: No - **Did you update [Changelog](https://github.com/NVIDIA/Model-Optimizer/blob/main/CHANGELOG.rst)?**: No ## Additional Information This issue is also fixed by using the standalone type inference logic from #719. <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit ## Release Notes * **Improvements** * Enhanced the graph sanitization process to automatically duplicate shared constants during optimization, ensuring improved model handling and consistency. * **Tests** * Added test coverage for mixed precision conversion of Conv-Resize model architectures. <sub>✏️ Tip: You can customize this high-level summary in your review settings.</sub> <!-- end of auto-generated comment: release notes by coderabbit.ai --> --------- Signed-off-by: gcunhase <4861122+gcunhase@users.noreply.github.com>
Signed-off-by: Gal Hubara Agam <96368689+galagam@users.noreply.github.com>
created by cursor and committed by mistake Signed-off-by: Gal Hubara Agam <96368689+galagam@users.noreply.github.com>
Signed-off-by: Gal Hubara Agam <96368689+galagam@users.noreply.github.com>
Signed-off-by: Gal Hubara Agam <96368689+galagam@users.noreply.github.com>
…add to changelog Signed-off-by: Gal Hubara Agam <96368689+galagam@users.noreply.github.com>
Signed-off-by: Gal Hubara Agam <96368689+galagam@users.noreply.github.com>
Signed-off-by: Gal Hubara Agam <96368689+galagam@users.noreply.github.com>
Signed-off-by: Gal Hubara Agam <96368689+galagam@users.noreply.github.com>
Signed-off-by: Gal Hubara Agam <96368689+galagam@users.noreply.github.com>
Signed-off-by: Gal Hubara Agam <96368689+galagam@users.noreply.github.com>
c5efc96 to
9656fca
Compare
What does this PR do?
Type of change: New feature
Overview:
AutoCast runs full type inference to get the new types after adding casts. ONNX doesn't have a separate function for type inference, and it is done as part of shape inference. Shape inference is a much more complex task than type inference, especially when dynamic shapes are involved. We're seeing some shape inference related bugs in AutoCast. Typically we can WAR, but it's cumbersome. A local implementation might allow users to WAR shape inference related issues. This is opt-in and marked as experimental.
Usage
python -m modelopt.onnx.autocast --onnx_path /path/to/input.onnx [options] --use_standalone_type_inference
Testing
Added use_standalone_type_inference=True to all existing PrecisionConverter tests.
Before your PR is "Ready for review"
Additional Information
A more permanent fix would be to decouple type and shape inference in ONNX, we should invest in that when we have the resources - see onnx/onnx#7100
. This is a quick fix, which is also why it is opt-in and not the default mode.
Summary by CodeRabbit
New Features
--use_standalone_type_inferenceflag to ONNX AutoCast, enabling type-only inference as an alternative to standard shape inference. Useful as a workaround when shape inference fails or to reduce computational overhead.Documentation
Tests
✏️ Tip: You can customize this high-level summary in your review settings.