[5750013][5591945][5360813]: AutoCast standalone implementation for type inference #719

galagam · 2025-12-22T13:39:02Z

What does this PR do?

Type of change: New feature

Overview:
AutoCast runs full type inference to get the new types after adding casts. ONNX doesn't have a separate function for type inference, and it is done as part of shape inference. Shape inference is a much more complex task than type inference, especially when dynamic shapes are involved. We're seeing some shape inference related bugs in AutoCast. Typically we can WAR, but it's cumbersome. A local implementation might allow users to WAR shape inference related issues. This is opt-in and marked as experimental.

Usage

python -m modelopt.onnx.autocast --onnx_path /path/to/input.onnx [options] --use_standalone_type_inference

Testing

Added use_standalone_type_inference=True to all existing PrecisionConverter tests.

Before your PR is "Ready for review"

Make sure you read and follow Contributor guidelines and your commits are signed.
Is this change backward compatible?: Yes
Did you write any new necessary tests?: Yes
Did you add or update any necessary documentation?: Yes
Did you update Changelog?: Yes

Additional Information

A more permanent fix would be to decouple type and shape inference in ONNX, we should invest in that when we have the resources - see onnx/onnx#7100
. This is a quick fix, which is also why it is opt-in and not the default mode.

Summary by CodeRabbit

New Features
- Added --use_standalone_type_inference flag to ONNX AutoCast, enabling type-only inference as an alternative to standard shape inference. Useful as a workaround when shape inference fails or to reduce computational overhead.
Documentation
- Added "Type Inference Control" section with usage examples and caveats for the new standalone type inference option.
Tests
- Extended test coverage to validate both standard and standalone type inference paths.

_{✏️ Tip: You can customize this high-level summary in your review settings.}

copy-pr-bot · 2025-12-22T13:39:06Z

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

codecov · 2025-12-22T13:50:38Z

Codecov Report

❌ Patch coverage is 68.94977% with 68 lines in your changes missing coverage. Please review.
✅ Project coverage is 74.19%. Comparing base (951c6aa) to head (9656fca).

Files with missing lines	Patch %	Lines
modelopt/onnx/utils.py	69.54%	60 Missing ⚠️
modelopt/onnx/autocast/precisionconverter.py	57.89%	8 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #719      +/-   ##
==========================================
- Coverage   74.23%   74.19%   -0.04%     
==========================================
  Files         192      192              
  Lines       19033    19236     +203     
==========================================
+ Hits        14129    14273     +144     
- Misses       4904     4963      +59

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

modelopt/onnx/utils.py

ajrasane · 2026-01-09T05:23:42Z

This is not related to this PR, but a shape inference issue I encountered previously. It was happening due to using strict mode in shape inference. Would it be possible to not use the strict mode and use the default mode itself?

ajrasane · 2026-01-09T05:28:10Z

modelopt/onnx/autocast/convert.py

+    if use_standalone_type_inference:
+        model = onnx_utils.infer_types(model)
+    else:
+        model = onnx_utils.infer_shapes(model)


Could you create a util function for this as it is reused in multiple places?

ajrasane · 2026-01-09T05:30:42Z

modelopt/onnx/autocast/precisionconverter.py

+                        if not self.use_standalone_type_inference:
+                            for idx, d in enumerate(inp.type.tensor_type.shape.dim):
+                                if d.dim_value:
+                                    inp.type.tensor_type.shape.dim[idx].dim_param = "unk"


Similarly for this, can we create a util function?

Leaving it to next refactor, see - #717 (comment)

modelopt/onnx/utils.py

ajrasane · 2026-01-09T05:53:30Z

modelopt/onnx/utils.py

+                if not inp_name:
+                    continue


Do we expect any inputs with empty names?

It's a node input, not a graph input, and it happens for optional inputs, e.g. resize with scales and roi input is empty. Added a comment to clarify.

ajrasane · 2026-01-09T05:58:01Z

modelopt/onnx/utils.py

+                for attr in node.attribute:
+                    if attr.name == "value" and attr.type == onnx.AttributeProto.TENSOR:
+                        if attr.t.HasField("data_type"):
+                            const_type = attr.t.data_type


This pattern is used in multiple places in our codebase, should we create a utility function for it?

Same as #719 (comment)

gcunhase · 2026-01-09T15:40:03Z

modelopt/onnx/utils.py

+            tensor_types[init_name] = init.data_type
+
+        # Helper function to get tensor type
+        def get_tensor_type(tensor_name: str) -> int | None:


Can we re-use _get_tensor_type() in modelopt.onnx.utils.py or, if not, move this function to modelopt.onnx.utils.py as _get_tensor_type_from_tensor_name or a variation of that?

Refactored to reuse _get_tensor_type() to get the type from the value info, and renamed local function to get_tensor_type_from_name for clarity.

modelopt/onnx/utils.py

gcunhase · 2026-01-09T15:58:16Z

modelopt/onnx/utils.py

+                            break
+                assert const_type is not None
+                output_types = [const_type]
+            elif node.op_type == "ConstantOfShape":


Would this PR fix the issue in bug 5763424?

@gcunhase Yes, it should fix that as well. I address ConstantOfShape as one of the special cases:

elif node.op_type == "ConstantOfShape": # ConstantOfShape: output type is from the value attribute's tensor data_type # If no value attribute, defaults to FLOAT # Note: Schema allows multiple types, so we need to check the value attribute const_type = None for attr in node.attribute: if attr.name == "value" and attr.type == onnx.AttributeProto.TENSOR: if attr.t.HasField("data_type"): const_type = attr.t.data_type break assert const_type is not None output_types = [const_type]

copy-pr-bot · 2026-01-11T11:53:29Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

coderabbitai · 2026-01-11T11:53:36Z

Important

Review skipped

Auto incremental reviews are disabled on this repository.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

📝 Walkthrough

Walkthrough

Introduces a new --use_standalone_type_inference flag to ONNX AutoCast enabling type-only inference without shape inference. When enabled, uses custom type inference implementation instead of ONNX's infer_shapes. The flag is threaded through CLI, public APIs, and core converters with comprehensive test coverage.

Changes

Cohort / File(s)	Summary
Documentation & Metadata `CHANGELOG.rst`, `docs/source/guides/8_autocast.rst`	Added documentation for new `use_standalone_type_inference` flag, including "Type Inference Control" section with usage examples and behavior description. Flag defaults to False, preserving existing shape+type inference behavior.
CLI Integration `modelopt/onnx/autocast/__main__.py`	Added `--use_standalone_type_inference` boolean CLI flag; wired flag through argument parser and propagated to `convert_to_mixed_precision` function call.
Conversion Logic `modelopt/onnx/autocast/convert.py`	Threaded `use_standalone_type_inference` parameter through `convert_to_mixed_precision` and `convert_to_f16` function signatures; replaced `infer_shapes` calls with conditional `infer_types` calls; propagated flag to `PrecisionConverter` initialization.
Precision Converter Core `modelopt/onnx/autocast/precisionconverter.py`	Added `use_standalone_type_inference` parameter to `__init__`; replaced `infer_shapes` with `infer_types` calls; added conditional logic to limit type/shape clearing to types-only when flag enabled; updated multiple inference and shape-handling code paths (+40/-17 lines).
Type Inference Utilities `modelopt/onnx/utils.py`	Introduced `_infer_types_only()` for standalone type inference without shapes; added public `infer_types()` router function gating between standalone and ONNX inference; added `infer_types_verification()` to validate tensor types; implements type computation for Cast, QuantizeLinear, DequantizeLinear, Constant, ConstantOfShape, Split, and subgraph operators (+438 lines).
Test Coverage `tests/unit/onnx/autocast/test_precisionconverter.py`	Added `setup_mappings()` helper centralizing type inference and mapping setup; parametrized 28+ test functions with `use_standalone_type_inference` boolean; threaded flag through `PrecisionConverter` initialization in all test paths; maintained existing functional expectations (+191/-78 lines).

Sequence Diagram(s)

sequenceDiagram
    participant CLI as CLI (__main__.py)
    participant Convert as Convert Logic (convert.py)
    participant Utils as Type Inference (utils.py)
    participant Converter as PrecisionConverter
    
    CLI->>Convert: convert_to_mixed_precision(use_standalone_type_inference=True)
    Convert->>Utils: infer_types(model, use_standalone_type_inference=True)
    Utils->>Utils: _infer_types_only(model)
    Utils->>Utils: Iterate graphs & compute types<br/>(Cast, Quantize, Constant, etc.)
    Utils->>Utils: infer_types_verification(model)
    Utils-->>Convert: model with types set
    Convert->>Converter: PrecisionConverter(model, use_standalone_type_inference=True)
    Converter->>Converter: Conditional type/shape clearing<br/>based on flag
    Converter-->>Convert: Precision conversion complete
    Convert-->>CLI: Modified ONNX model

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 46.43% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title clearly describes the main feature being added: a standalone type inference implementation for AutoCast, which is the primary focus of all the changes across multiple files.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 1

🤖 Fix all issues with AI agents

In @tests/unit/onnx/autocast/test_precisionconverter.py:
- Around line 577-585: The test's expected low_precision_nodes list uses
outdated node names "concat" and "concat_dims" that no longer match the model
nodes "concat1" and "concat2"; update the test to reference the actual node
names by replacing "concat" -> "concat1" and "concat_dims" -> "concat2" in the
low_precision_nodes array (the list that currently reads ["transpose", "concat",
"size", "div", "concat_dims", "reshape"]) so the nodes "concat1" and "concat2"
(created by helper.make_node) are correctly classified during the precision
conversion assertions.

🧹 Nitpick comments (3)

modelopt/onnx/utils.py (1)

731-1091: Consider refactoring for maintainability.

The _infer_types_only function is ~360 lines long with significant complexity. While functionally sound, consider extracting helper functions for better maintainability:

str_to_tensor_dtype (lines 806-838) could be a module-level constant dictionary

Special operator handlers (Cast, DequantizeLinear, QuantizeLinear, etc.) could be separate functions

Schema-based inference logic (lines 1005-1063) could be extracted

This would improve testability and make the code easier to maintain as the feature matures beyond its experimental status.

modelopt/onnx/autocast/precisionconverter.py (1)

260-267: Consider clarifying the redundant call pattern when using standalone type inference.

When use_standalone_type_inference=True, the _infer_types_only function is called (which ignores strict_mode and check_type kwargs). The second call at lines 265-267 effectively duplicates the first call since _ensure_types_are_defined() at line 263 only modifies UNDEFINED types but doesn't require re-inference.

This differs from the pattern in _remove_redundant_casts (lines 1191-1197) where the second call is conditionally skipped when using standalone type inference.

The current code works correctly, but consider either:

Adding a comment explaining why both calls are needed in standalone mode, or

Aligning with the conditional pattern used in _remove_redundant_casts

tests/unit/onnx/autocast/test_precisionconverter.py (1)

1672-1674: Intentional exclusion from standalone type inference testing.

This fixture and its test (test_if_subgraph_outer_scope_type_preservation) use infer_shapes directly and are not parametrized with use_standalone_type_inference. This appears intentional since this specific test focuses on preserving outer scope variable types in If subgraphs, which may have specific behavior tied to shape inference.

Consider adding a comment explaining why this test doesn't use the standalone type inference path, to prevent future maintainers from inadvertently "fixing" this.

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 5104513 and 1d2fdf6.

📒 Files selected for processing (7)

CHANGELOG.rst
docs/source/guides/8_autocast.rst
modelopt/onnx/autocast/__main__.py
modelopt/onnx/autocast/convert.py
modelopt/onnx/autocast/precisionconverter.py
modelopt/onnx/utils.py
tests/unit/onnx/autocast/test_precisionconverter.py

🧰 Additional context used

🧬 Code graph analysis (4)

modelopt/onnx/autocast/convert.py (1)

modelopt/onnx/utils.py (1)

infer_types (1162-1182)

modelopt/onnx/utils.py (1)

modelopt/onnx/autocast/utils.py (1)

walk_subgraphs_recursive (126-158)

modelopt/onnx/autocast/precisionconverter.py (1)

modelopt/onnx/utils.py (1)

infer_types (1162-1182)

tests/unit/onnx/autocast/test_precisionconverter.py (2)

modelopt/onnx/autocast/utils.py (1)

setup_mappings (33-60)

modelopt/onnx/utils.py (1)

infer_types (1162-1182)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)

GitHub Check: linux
GitHub Check: build-docs
GitHub Check: code-quality

🔇 Additional comments (14)

CHANGELOG.rst (1)

21-21: LGTM! Clear changelog entry for experimental feature.

The changelog entry appropriately describes the new standalone type inference option as experimental and clearly explains its purpose as a workaround for shape inference issues.

docs/source/guides/8_autocast.rst (1)

45-45: LGTM! Comprehensive documentation of the new feature.

The documentation properly:

Marks the feature as experimental

Explains it's a workaround for shape inference failures

Notes implementation details (graphsurgeon, special operators)

Warns about potential limitations

Provides clear usage examples for both Python API and CLI

Also applies to: 86-89, 152-159, 214-218

modelopt/onnx/autocast/__main__.py (1)

188-197: LGTM! Clean CLI integration.

The CLI flag is properly integrated with appropriate help text that aligns with the documentation and marks the feature as experimental.

Also applies to: 231-231

modelopt/onnx/utils.py (3)

1005-1063: Schema-based inference is complex but well-structured.

The fallback schema-based inference implements a multi-strategy approach:

Explicit tensor types from schema

Type constraints with single allowed types

Placeholder matching to input types

Default propagation from first input

The debug logging and fallback behavior are appropriate for an experimental feature. Be aware that this may not handle all operator type relationships correctly, but this aligns with the documented limitations of standalone type inference being "less robust than ONNX's implementation for edge cases."

1094-1143: LGTM! Solid verification logic.

The infer_types_verification function properly:

Computes reachable tensors from inputs/initializers

Only validates reachable tensors (avoiding false positives from unreachable nodes)

Provides clear error messages with tensor names

Uses appropriate error handling

This defensive programming will help catch incomplete type inference early.

1162-1183: LGTM! Clean and well-documented public API.

The infer_types function provides a clear toggle between standalone type inference and ONNX shape inference. The docstring is comprehensive, and passing **kwargs to infer_shapes maintains flexibility.

modelopt/onnx/autocast/convert.py (2)

64-64: LGTM! Consistent API integration in convert_to_mixed_precision.

The new use_standalone_type_inference parameter is properly:

Added to the function signature with appropriate default (False)

Documented as a workaround for shape inference issues

Passed to both infer_types and PrecisionConverter

Marked as experimental in the docstring

Also applies to: 89-91, 139-139, 171-171

204-204: LGTM! Consistent API integration in convert_to_f16.

The parameter integration mirrors convert_to_mixed_precision consistently, maintaining API coherence across both conversion functions.

Also applies to: 217-219, 237-237, 249-249

modelopt/onnx/autocast/precisionconverter.py (4)

100-145: LGTM!

The new parameter use_standalone_type_inference is properly added to the constructor signature, documented in the docstring, and stored as an instance attribute.

287-345: LGTM!

The conditional shape-clearing logic is well-implemented. When use_standalone_type_inference=True, only types are cleared (not shapes), which correctly aligns with the feature's intent to separate type inference from shape inference. The gating at lines 311-314, 326-329, and 340-343 properly preserves shape information in standalone mode.

1188-1197: LGTM!

The conditional skipping of the second infer_types call is correct. When using standalone type inference, the check_type=True validation isn't supported, so skipping this call avoids passing unused parameters.

1282-1287: LGTM!

The type inference integration in _fix_network_output_names correctly uses the new infer_types wrapper with the instance flag.

tests/unit/onnx/autocast/test_precisionconverter.py (2)

35-41: LGTM!

The helper function centralizes the type/shape inference and mapping setup pattern, reducing duplication across test fixtures. Note that this intentionally has a different signature than utils.setup_mappings (returns 4-tuple with model vs 3-tuple without).

73-104: LGTM!

Tests are properly parametrized with use_standalone_type_inference and the parameter is correctly threaded through to PrecisionConverter initialization.

tests/unit/onnx/autocast/test_precisionconverter.py

kevalmorabia97 · 2026-01-12T05:59:10Z

CHANGELOG.rst

 - Add support for parallel draft heads in Eagle speculative decoding.
 - Add support to enable custom emulated quantization backend. See :meth:`register_quant_backend <modelopt.torch.quantization.nn.modules.tensor_quantizer.register_quant_backend>`` for more details. See an example in ``tests/unit/torch/quantization/test_custom_backend.py``.
 - Add ``examples/llm_qad`` for QAD training with Megatron-LM.
+- Add standalone type inference option (``--use_standalone_type_inference``) in ONNX AutoCast as an alternative to ONNX's ``infer_shapes``. This experimental feature performs type-only inference without shape inference, useful as a workaround when shape inference fails or to avoid unnecessary shape inference overhead.


Is this a bug fix for 0.41? Else please move to 0.42 section

@kevalmorabia97 it's a long-standing known issue. I don't know how to classify this exactly.

If it can be merged by EoD today then we can include in 0.41 else it will need to wait till 0.42

@gcunhase @ajrasane please help me merge this today :)

nvm, will push to 0.42.0

## What does this PR do? **Type of change:** Bug fix **Overview:** This PR fixes an input type mismatch in Resize layers when being converted to FP16. ## Usage ```python $ python -m modelopt.onnx.autocast --onnx_path=$MODEL_NAME.onnx ``` ## Testing Added unittest. ## Before your PR is "*Ready for review*"  - **Make sure you read and follow [Contributor guidelines](https://github.com/NVIDIA/Model-Optimizer/blob/main/CONTRIBUTING.md)** and your commits are signed. - **Is this change backward compatible?**: Yes - **Did you write any new necessary tests?**: Yes - **Did you add or update any necessary documentation?**: No - **Did you update [Changelog](https://github.com/NVIDIA/Model-Optimizer/blob/main/CHANGELOG.rst)?**: No ## Additional Information This issue is also fixed by using the standalone type inference logic from #719.  ## Summary by CodeRabbit ## Release Notes * **Improvements** * Enhanced the graph sanitization process to automatically duplicate shared constants during optimization, ensuring improved model handling and consistency. * **Tests** * Added test coverage for mixed precision conversion of Conv-Resize model architectures. <sub>✏️ Tip: You can customize this high-level summary in your review settings.</sub>  --------- Signed-off-by: gcunhase <4861122+gcunhase@users.noreply.github.com>

Signed-off-by: Gal Hubara Agam <96368689+galagam@users.noreply.github.com>

created by cursor and committed by mistake Signed-off-by: Gal Hubara Agam <96368689+galagam@users.noreply.github.com>

Signed-off-by: Gal Hubara Agam <96368689+galagam@users.noreply.github.com>

…add to changelog Signed-off-by: Gal Hubara Agam <96368689+galagam@users.noreply.github.com>

Signed-off-by: Gal Hubara Agam <96368689+galagam@users.noreply.github.com>