Support multiple-batch input for autocast calibration. #760

byte-deve · 2026-01-11T10:38:46Z

What does this PR do?

Add multi-batch calibration data support for autocast precision conversion. This enhancement allows users to provide multiple batches of calibration data (via a directory of NPZ files or Polygraphy JSON with multiple batches) to aggregate tensor statistics across batches, resulting in more robust precision conversion decisions.

Usage

Single NPZ file (existing behavior)

python -m modelopt.onnx.autocast --onnx_path model.onnx --calibration_data calibration_data.npz --output_path model_fp16.onnx

Directory containing multiple NPZ files for multi-batch calibration (new)

python -m modelopt.onnx.autocast --onnx_path model.onnx --calibration_data calibration_data_dir/ --output_path model_fp16.onnx

Testing

Tested with single NPZ file to ensure backward compatibility
Tested with directory containing multiple NPZ files for multi-batch calibration
Verified that aggregated statistics (absmax, min, max) are correctly computed across batches

Before your PR is "Ready for review"

Make sure you read and follow Contributor guidelines and your commits are signed.
Is this change backward compatible?: Yes
Did you write any new necessary tests?: No
Did you add or update any necessary documentation?: Yes
Did you update Changelog?: No

Additional Information

Key changes:

Added TensorStats dataclass to store aggregated tensor statistics (absmax, min_val, max_val, shape)
Updated ReferenceRunner to:
- Load multiple NPZ files from a directory (_load_inputs_from_npz)
- Aggregate statistics across batches (_aggregate_tensor_stats)
- Process multi-batch inference in run() method
Updated IORangeRule and DepthOfReductionRule to handle both raw numpy arrays and TensorStats objects
Enhanced --calibration_data CLI help text to document multi-batch support

Summary by CodeRabbit

New Features
- Added multi-batch calibration support via directories of NPZ files or Polygraphy JSON files.
- Implemented cross-batch statistics aggregation for more robust precision conversion decisions.
Documentation
- Expanded calibration_data CLI option guidance with detailed support for multiple input formats and batch processing benefits.

_{✏️ Tip: You can customize this high-level summary in your review settings.}

copy-pr-bot · 2026-01-11T10:38:50Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

coderabbitai · 2026-01-11T10:38:58Z

Important

Review skipped

Auto incremental reviews are disabled on this repository.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

📝 Walkthrough

Walkthrough

The changes introduce multi-batch calibration support to the autocast module. A new TensorStats data structure aggregates tensor statistics (absmax, min, max) across multiple calibration batches. The reference runner now supports directory-based multi-batch inputs and computes aggregated statistics. Node classification rules are enhanced to use these statistics for precision conversion decisions.

Changes

Cohort / File(s)	Summary
CLI Documentation `modelopt/onnx/autocast/__main__.py`	Expanded help text for --calibration_data option to document support for single NPZ files, directories of NPZ files for multi-batch calibration, and Polygraphy JSON files with multiple batches. Notes aggregation behavior across batches.
Statistics Aggregation Infrastructure `modelopt/onnx/autocast/referencerunner.py`	Added TensorStats dataclass to hold aggregated tensor statistics (absmax, min_val, max_val, shape). Implemented _aggregate_tensor_stats() to compute per-tensor aggregations across batches. Extended NPZ loading to support directory input for multi-batch calibration. Refactored run() to collect and aggregate statistics when multiple batches are present.
Rule Enhancement for Multi-Batch Support `modelopt/onnx/autocast/nodeclassifier.py`	Enhanced IORangeRule with _get_tensor_stats() helper to compute and cache statistics from both numpy arrays and TensorStats; updated is_io_out_of_range() and _log_skipped() to use aggregated statistics for range violation detection. Extended DepthOfReductionRule._get_tensor_shape() to support TensorStats in addition to raw numpy arrays. Updated logging to display min/max/absmax values.

Sequence Diagram(s)

sequenceDiagram
    participant CLI
    participant ReferenceRunner
    participant TensorStats
    participant NodeClassifier
    
    CLI->>ReferenceRunner: run() with multiple calibration batches
    activate ReferenceRunner
    loop for each batch
        ReferenceRunner->>ReferenceRunner: load batch data (NPZ directory)
        ReferenceRunner->>ReferenceRunner: execute model, collect outputs
    end
    
    ReferenceRunner->>ReferenceRunner: _aggregate_tensor_stats(all_batches)
    ReferenceRunner->>TensorStats: create aggregated statistics<br/>(absmax, min, max per tensor)
    activate TensorStats
    TensorStats-->>ReferenceRunner: TensorStats objects
    deactivate TensorStats
    
    ReferenceRunner->>NodeClassifier: pass aggregated TensorStats
    deactivate ReferenceRunner
    
    activate NodeClassifier
    NodeClassifier->>NodeClassifier: IORangeRule._get_tensor_stats()
    NodeClassifier->>NodeClassifier: DepthOfReductionRule._get_tensor_shape()
    NodeClassifier->>NodeClassifier: evaluate precision conversion rules<br/>using aggregated statistics
    NodeClassifier-->>CLI: precision decisions
    deactivate NodeClassifier

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

🚥 Pre-merge checks | ✅ 3

✅ Passed checks (3 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title 'Support multiple-batch input for autocast calibration' clearly and concisely summarizes the main feature addition across all modified files.
Docstring Coverage	✅ Passed	Docstring coverage is 87.50% which is sufficient. The required threshold is 80.00%.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 0

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

modelopt/onnx/autocast/referencerunner.py (1)

153-170: Directory paths are not handled, breaking multi-batch support.

The _load_inputs method only checks for .json or .npz file extensions. A directory path (e.g., calibration_data_dir/) will fall through to the raise ValueError branch, making the multi-batch directory feature non-functional despite being documented in the CLI help text.

🐛 Proposed fix

+        import os
+
         if inputs is not None:
             if isinstance(inputs, str):
                 if inputs.endswith(".json"):
                     data_loader = self._load_inputs_from_json(inputs)
-                elif inputs.endswith(".npz"):
+                elif inputs.endswith(".npz") or os.path.isdir(inputs):
                     data_loader = self._load_inputs_from_npz(inputs)
                 else:
                     raise ValueError(
-                        f"Invalid input file: {inputs}. Supported input file types: .json (Polygraphy JSON format), "
-                        ".npz (Numpy)"
+                        f"Invalid input file: {inputs}. Supported input types: .json (Polygraphy JSON format), "
+                        ".npz (Numpy), or a directory containing .npz files"
                     )

🧹 Nitpick comments (3)

modelopt/onnx/autocast/referencerunner.py (2)
62-68: Consider using math.prod for the size property.

The manual loop works correctly, but math.prod (Python 3.8+) would be more concise and idiomatic.
♻️ Suggested refactor
+import math
+
 @property
 def size(self):
     """Return total number of elements."""
-    result = 1
-    for dim in self.shape:
-        result *= dim
-    return result
+    return math.prod(self.shape)
199-201: Silently skipping missing tensors may mask data inconsistencies.

If a tensor present in the first batch is missing from subsequent batches, the aggregated statistics will only reflect partial data without any warning. Consider logging a debug message when tensors are skipped.
♻️ Suggested enhancement
             for batch_data in all_batch_data:
                 if name not in batch_data:
+                    logger.debug(f"Tensor '{name}' not found in batch, skipping for aggregation")
                     continue
modelopt/onnx/autocast/nodeclassifier.py (1)
282-288: Redundant isinstance check.

Both branches return ref_data.shape, and both numpy arrays and TensorStats objects have a .shape attribute. The conditional is unnecessary.
♻️ Suggested simplification
         if tensor_name in self.reference_data:
             ref_data = self.reference_data[tensor_name]
-            # Import here to avoid circular imports
-            from modelopt.onnx.autocast.referencerunner import TensorStats
-
-            if isinstance(ref_data, TensorStats):
-                return ref_data.shape
+            # Both numpy arrays and TensorStats have .shape attribute
             return ref_data.shape

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 5104513 and b90f689.

📒 Files selected for processing (3)

modelopt/onnx/autocast/__main__.py
modelopt/onnx/autocast/nodeclassifier.py
modelopt/onnx/autocast/referencerunner.py

🧰 Additional context used

🧬 Code graph analysis (1)

modelopt/onnx/autocast/nodeclassifier.py (1)

modelopt/onnx/autocast/referencerunner.py (2)

TensorStats (43-68)

size (63-68)

🔇 Additional comments (7)

modelopt/onnx/autocast/__main__.py (1)

69-74: LGTM!

The updated help text clearly documents the three supported calibration data formats and explains the multi-batch aggregation behavior.

modelopt/onnx/autocast/referencerunner.py (3)

22-25: LGTM!

Module docstring appropriately updated to reflect multi-batch aggregation behavior.

104-127: LGTM!

The directory loading implementation is well-structured with proper error handling for empty directories and deterministic file ordering via sorting.

276-302: LGTM!

The multi-batch processing logic correctly combines inputs and outputs per batch and delegates to _aggregate_tensor_stats for aggregation. The fallback for exhausted data loaders handles the random input generation case appropriately.

modelopt/onnx/autocast/nodeclassifier.py (3)

152-172: LGTM!

The updated docstring clearly documents support for both single-batch and multi-batch reference data formats, and the new output_stats attribute enables proper logging for TensorStats.

174-197: LGTM!

Clean abstraction that properly handles both TensorStats and numpy arrays. The local import correctly avoids circular dependencies, and edge cases for empty arrays are handled appropriately.

210-228: LGTM!

The refactored is_io_out_of_range function properly uses the abstracted _get_tensor_stats method, providing consistent handling for both single-batch and multi-batch data while maintaining clear debug logging.

Signed-off-by: Tony Yin <toyin@nvidia.com>

galagam · 2026-01-11T11:10:49Z

@gcunhase can you please review with the context of https://nvbugspro.nvidia.com/bug/5676209 ?

galagam

I want to make sure I understand the need here:

If, for example, the model accepts (N, 3, 256, 256) where N is the batch size, we can pass calibration_data of size (N, 3, 256, 256) with N>1, and the reference data statistics will take all N examples into account.

However, if the model accepts (1,3,256,256) - that is the batch dim is static, and we want to pass N examples for the calibration - current code doesn't handle it well.
If the input is provided in polygraphy json, it only uses the first example (index 0) and ignores the rest. If the input is provided in npz format - it will fail due to shape mismatch.

@byte-deve Please confirm or correct.

byte-deve · 2026-01-12T02:48:38Z

I want to make sure I understand the need here:

If, for example, the model accepts (N, 3, 256, 256) where N is the batch size, we can pass calibration_data of size (N, 3, 256, 256) with N>1, and the reference data statistics will take all N examples into account.

However, if the model accepts (1,3,256,256) - that is the batch dim is static, and we want to pass N examples for the calibration - current code doesn't handle it well. If the input is provided in polygraphy json, it only uses the first example (index 0) and ignores the rest. If the input is provided in npz format - it will fail due to shape mismatch.

@byte-deve Please confirm or correct.

@galagam I think you are right. Assuming the model takes (N, 3, 256, 256) static shape input, we can pass 2 or more inputs of (N, 3, 256, 256). The original naming "frame" is possible better to avoid confusion with model batch. For the difference on polygraphy json and npz format, shall I add a test to clarify? Thanks!

galagam · 2026-01-12T06:41:31Z

I want to make sure I understand the need here:
If, for example, the model accepts (N, 3, 256, 256) where N is the batch size, we can pass calibration_data of size (N, 3, 256, 256) with N>1, and the reference data statistics will take all N examples into account.
However, if the model accepts (1,3,256,256) - that is the batch dim is static, and we want to pass N examples for the calibration - current code doesn't handle it well. If the input is provided in polygraphy json, it only uses the first example (index 0) and ignores the rest. If the input is provided in npz format - it will fail due to shape mismatch.
@byte-deve Please confirm or correct.

@galagam I think you are right. Assuming the model takes (N, 3, 256, 256) static shape input, we can pass 2 or more inputs of (N, 3, 256, 256). The original naming "frame" is possible better to avoid confusion with model batch. For the difference on polygraphy json and npz format, shall I add a test to clarify? Thanks!

If N is a dynamic dimension - you don't need this, right? Because you can pass (N*K, 3, 256, 256).
Only if N is a static dimension, it doesn't have to be 1, but it has to be static.

galagam · 2026-01-12T19:41:23Z

@galagam I think you are right. Assuming the model takes (N, 3, 256, 256) static shape input, we can pass 2 or more inputs of (N, 3, 256, 256). The original naming "frame" is possible better to avoid confusion with model batch. For the difference on polygraphy json and npz format, shall I add a test to clarify? Thanks!

@byte-deve Yes, please add a test

byte-deve requested a review from a team as a code owner January 11, 2026 10:38

byte-deve requested a review from ajrasane January 11, 2026 10:38

coderabbitai bot reviewed Jan 11, 2026

View reviewed changes

Support multiple-batch input for autocast calibration.

6d665cf

Signed-off-by: Tony Yin <toyin@nvidia.com>

byte-deve force-pushed the dev/toyin/5761612_autocast_calibration_with_multi_batch_input branch from 583c9f5 to 6d665cf Compare January 11, 2026 10:52

galagam requested a review from gcunhase January 11, 2026 11:08

galagam reviewed Jan 11, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Support multiple-batch input for autocast calibration. #760

Support multiple-batch input for autocast calibration. #760

Uh oh!

byte-deve commented Jan 11, 2026 •

edited by coderabbitai bot

Loading

Uh oh!

copy-pr-bot bot commented Jan 11, 2026

Uh oh!

coderabbitai bot commented Jan 11, 2026 •

edited

Loading

Review skipped

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Uh oh!

coderabbitai bot left a comment

Uh oh!

galagam commented Jan 11, 2026

Uh oh!

galagam left a comment

Uh oh!

byte-deve commented Jan 12, 2026 •

edited

Loading

Uh oh!

galagam commented Jan 12, 2026

Uh oh!

galagam commented Jan 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Support multiple-batch input for autocast calibration. #760

Are you sure you want to change the base?

Support multiple-batch input for autocast calibration. #760

Uh oh!

Conversation

byte-deve commented Jan 11, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Usage

Single NPZ file (existing behavior)

Directory containing multiple NPZ files for multi-batch calibration (new)

Testing

Before your PR is "Ready for review"

Additional Information

Summary by CodeRabbit

Uh oh!

copy-pr-bot bot commented Jan 11, 2026

Uh oh!

coderabbitai bot commented Jan 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review skipped

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

galagam commented Jan 11, 2026

Uh oh!

galagam left a comment

Choose a reason for hiding this comment

Uh oh!

byte-deve commented Jan 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

galagam commented Jan 12, 2026

Uh oh!

galagam commented Jan 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

byte-deve commented Jan 11, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Jan 11, 2026 •

edited

Loading

byte-deve commented Jan 12, 2026 •

edited

Loading