-
Notifications
You must be signed in to change notification settings - Fork 237
Support multiple-batch input for autocast calibration. #760
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Support multiple-batch input for autocast calibration. #760
Conversation
|
Important Review skippedAuto incremental reviews are disabled on this repository. Please check the settings in the CodeRabbit UI or the You can disable this status message by setting the 📝 WalkthroughWalkthroughThe changes introduce multi-batch calibration support to the autocast module. A new TensorStats data structure aggregates tensor statistics (absmax, min, max) across multiple calibration batches. The reference runner now supports directory-based multi-batch inputs and computes aggregated statistics. Node classification rules are enhanced to use these statistics for precision conversion decisions. Changes
Sequence Diagram(s)sequenceDiagram
participant CLI
participant ReferenceRunner
participant TensorStats
participant NodeClassifier
CLI->>ReferenceRunner: run() with multiple calibration batches
activate ReferenceRunner
loop for each batch
ReferenceRunner->>ReferenceRunner: load batch data (NPZ directory)
ReferenceRunner->>ReferenceRunner: execute model, collect outputs
end
ReferenceRunner->>ReferenceRunner: _aggregate_tensor_stats(all_batches)
ReferenceRunner->>TensorStats: create aggregated statistics<br/>(absmax, min, max per tensor)
activate TensorStats
TensorStats-->>ReferenceRunner: TensorStats objects
deactivate TensorStats
ReferenceRunner->>NodeClassifier: pass aggregated TensorStats
deactivate ReferenceRunner
activate NodeClassifier
NodeClassifier->>NodeClassifier: IORangeRule._get_tensor_stats()
NodeClassifier->>NodeClassifier: DepthOfReductionRule._get_tensor_shape()
NodeClassifier->>NodeClassifier: evaluate precision conversion rules<br/>using aggregated statistics
NodeClassifier-->>CLI: precision decisions
deactivate NodeClassifier
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes 🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
modelopt/onnx/autocast/referencerunner.py (1)
153-170: Directory paths are not handled, breaking multi-batch support.The
_load_inputsmethod only checks for.jsonor.npzfile extensions. A directory path (e.g.,calibration_data_dir/) will fall through to theraise ValueErrorbranch, making the multi-batch directory feature non-functional despite being documented in the CLI help text.🐛 Proposed fix
+ import os + if inputs is not None: if isinstance(inputs, str): if inputs.endswith(".json"): data_loader = self._load_inputs_from_json(inputs) - elif inputs.endswith(".npz"): + elif inputs.endswith(".npz") or os.path.isdir(inputs): data_loader = self._load_inputs_from_npz(inputs) else: raise ValueError( - f"Invalid input file: {inputs}. Supported input file types: .json (Polygraphy JSON format), " - ".npz (Numpy)" + f"Invalid input file: {inputs}. Supported input types: .json (Polygraphy JSON format), " + ".npz (Numpy), or a directory containing .npz files" )
🧹 Nitpick comments (3)
modelopt/onnx/autocast/referencerunner.py (2)
62-68: Consider usingmath.prodfor thesizeproperty.The manual loop works correctly, but
math.prod(Python 3.8+) would be more concise and idiomatic.♻️ Suggested refactor
+import math + @property def size(self): """Return total number of elements.""" - result = 1 - for dim in self.shape: - result *= dim - return result + return math.prod(self.shape)
199-201: Silently skipping missing tensors may mask data inconsistencies.If a tensor present in the first batch is missing from subsequent batches, the aggregated statistics will only reflect partial data without any warning. Consider logging a debug message when tensors are skipped.
♻️ Suggested enhancement
for batch_data in all_batch_data: if name not in batch_data: + logger.debug(f"Tensor '{name}' not found in batch, skipping for aggregation") continuemodelopt/onnx/autocast/nodeclassifier.py (1)
282-288: Redundant isinstance check.Both branches return
ref_data.shape, and both numpy arrays andTensorStatsobjects have a.shapeattribute. The conditional is unnecessary.♻️ Suggested simplification
if tensor_name in self.reference_data: ref_data = self.reference_data[tensor_name] - # Import here to avoid circular imports - from modelopt.onnx.autocast.referencerunner import TensorStats - - if isinstance(ref_data, TensorStats): - return ref_data.shape + # Both numpy arrays and TensorStats have .shape attribute return ref_data.shape
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
modelopt/onnx/autocast/__main__.pymodelopt/onnx/autocast/nodeclassifier.pymodelopt/onnx/autocast/referencerunner.py
🧰 Additional context used
🧬 Code graph analysis (1)
modelopt/onnx/autocast/nodeclassifier.py (1)
modelopt/onnx/autocast/referencerunner.py (2)
TensorStats(43-68)size(63-68)
🔇 Additional comments (7)
modelopt/onnx/autocast/__main__.py (1)
69-74: LGTM!The updated help text clearly documents the three supported calibration data formats and explains the multi-batch aggregation behavior.
modelopt/onnx/autocast/referencerunner.py (3)
22-25: LGTM!Module docstring appropriately updated to reflect multi-batch aggregation behavior.
104-127: LGTM!The directory loading implementation is well-structured with proper error handling for empty directories and deterministic file ordering via sorting.
276-302: LGTM!The multi-batch processing logic correctly combines inputs and outputs per batch and delegates to
_aggregate_tensor_statsfor aggregation. The fallback for exhausted data loaders handles the random input generation case appropriately.modelopt/onnx/autocast/nodeclassifier.py (3)
152-172: LGTM!The updated docstring clearly documents support for both single-batch and multi-batch reference data formats, and the new
output_statsattribute enables proper logging for TensorStats.
174-197: LGTM!Clean abstraction that properly handles both TensorStats and numpy arrays. The local import correctly avoids circular dependencies, and edge cases for empty arrays are handled appropriately.
210-228: LGTM!The refactored
is_io_out_of_rangefunction properly uses the abstracted_get_tensor_statsmethod, providing consistent handling for both single-batch and multi-batch data while maintaining clear debug logging.
Signed-off-by: Tony Yin <toyin@nvidia.com>
583c9f5 to
6d665cf
Compare
|
@gcunhase can you please review with the context of https://nvbugspro.nvidia.com/bug/5676209 ? |
galagam
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I want to make sure I understand the need here:
If, for example, the model accepts (N, 3, 256, 256) where N is the batch size, we can pass calibration_data of size (N, 3, 256, 256) with N>1, and the reference data statistics will take all N examples into account.
However, if the model accepts (1,3,256,256) - that is the batch dim is static, and we want to pass N examples for the calibration - current code doesn't handle it well.
If the input is provided in polygraphy json, it only uses the first example (index 0) and ignores the rest. If the input is provided in npz format - it will fail due to shape mismatch.
@byte-deve Please confirm or correct.
@galagam I think you are right. Assuming the model takes (N, 3, 256, 256) static shape input, we can pass 2 or more inputs of (N, 3, 256, 256). The original naming "frame" is possible better to avoid confusion with model batch. For the difference on polygraphy json and npz format, shall I add a test to clarify? Thanks! |
If N is a dynamic dimension - you don't need this, right? Because you can pass (N*K, 3, 256, 256). |
@byte-deve Yes, please add a test |
What does this PR do?
Add multi-batch calibration data support for autocast precision conversion. This enhancement allows users to provide multiple batches of calibration data (via a directory of NPZ files or Polygraphy JSON with multiple batches) to aggregate tensor statistics across batches, resulting in more robust precision conversion decisions.
Usage
Single NPZ file (existing behavior)
Directory containing multiple NPZ files for multi-batch calibration (new)
Testing
Before your PR is "Ready for review"
Additional Information
Key changes:
TensorStatsdataclass to store aggregated tensor statistics (absmax, min_val, max_val, shape)ReferenceRunnerto:_load_inputs_from_npz)_aggregate_tensor_stats)run()methodIORangeRuleandDepthOfReductionRuleto handle both raw numpy arrays andTensorStatsobjects--calibration_dataCLI help text to document multi-batch supportSummary by CodeRabbit
New Features
Documentation
✏️ Tip: You can customize this high-level summary in your review settings.