feat: add components_to_skip to OnnxBlockWiseRtnQuantization#2457
Open
titaiwangms wants to merge 6 commits intofeat/mobius-builder-components-filterfrom
Open
feat: add components_to_skip to OnnxBlockWiseRtnQuantization#2457titaiwangms wants to merge 6 commits intofeat/mobius-builder-components-filterfrom
titaiwangms wants to merge 6 commits intofeat/mobius-builder-components-filterfrom
Conversation
Contributor
There was a problem hiding this comment.
Pull request overview
This PR extends Olive’s ONNX tooling to better support multi-component (composite) models by allowing selective processing of components during export and RTN quantization.
Changes:
- Add
components_to_skiptoOnnxBlockWiseRtnQuantizationto copy selected composite components through unmodified while quantizing the rest. - Add tests validating
components_to_skipbehavior on composite and non-composite models, including a warning on unknown component names. - Add
components_to_exporttoMobiusBuilderto export only a subset of components from multi-component mobius packages, with validation + new unit tests.
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 3 comments.
| File | Description |
|---|---|
olive/passes/onnx/rtn_quantization.py |
Adds components_to_skip config and custom composite-handling run() logic to skip quantization for selected components. |
test/passes/onnx/test_rtn_quantization.py |
Adds unit tests covering components_to_skip behavior and logging. |
olive/passes/onnx/mobius_model_builder.py |
Adds components_to_export config and filters which mobius package components are saved/returned. |
test/passes/onnx/test_mobius_model_builder.py |
Adds unit tests for components_to_export filtering/validation and updates the fake package save behavior. |
Comment on lines
+140
to
+142
| output_component = ONNXModelHandler( | ||
| model_path=str(component_output_path), | ||
| onnx_file_name=component_model.onnx_file_name, |
Comment on lines
207
to
211
| # ModelPackage.save() handles both single and multi-component layouts: | ||
| # single component → <output_dir>/model.onnx | ||
| # multi-component → <output_dir>/<name>/model.onnx for each key | ||
| pkg.save(str(output_dir)) | ||
| pkg.save(str(output_dir), components=components_filter) | ||
|
|
Comment on lines
107
to
+122
| "configs alongside the ONNX models. 'none' to skip." | ||
| ), | ||
| ), | ||
| "components_to_export": PassConfigParam( | ||
| type_=list[str], | ||
| required=False, | ||
| default_value=None, | ||
| description=( | ||
| "Optional list of component names to export from a multi-component model " | ||
| "(e.g. ['vision', 'embedding'] to skip the decoder). " | ||
| "When set, only the named components are saved and returned; " | ||
| "all others are discarded after the mobius build step. " | ||
| "When not set (None), all components are exported (default, backward compatible). " | ||
| "Raises ValueError if any specified name is not found in the model's components." | ||
| ), | ||
| ), |
Add optional `components_to_skip` parameter (list of component names) to OnnxBlockWiseRtnQuantization pass. When a composite model component's name appears in this list, its model files are copied to the output path unchanged instead of being quantized. Non-listed components are quantized as usual. When not set (default None), all components are quantized (backward compatible). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- Remove unused List from typing import (only Optional is needed) - Remove redundant required=False (implied by default_value=None) to match conventions of neighboring PassConfigParam declarations Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…rride The base Pass.run() gates on self._initialized before calling _initialize(). When components_to_skip is set and the model is a CompositeModelHandler, the override bypasses super().run() entirely, so the init guard must be replicated explicitly. This ensures consistent lifecycle semantics if a subclass ever overrides _initialize(). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- Call _carry_forward_additional_files per-skipped component to mirror what base Pass.run() does for each individual component - Remove redundant AcceleratorSpec reimport in test _make_pass - Wrap long lines in test helpers to stay under 120 chars (Black) - Split compound assertion into two separate assertions for clarity Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- Use ONNXModelHandler directly (remove OnnxHandler alias imported from internal module path) - Use list[str] type annotation for components_to_skip (matches mobius_model_builder.py convention from feat/mobius-builder-components-filter) - Add warning when components_to_skip names a component not found in the composite model (misspellings were silently ignored before) - Add comment explaining the _initialized guard in run() override: it mirrors Pass.run() behavior and is required because we bypass super().run() for composite models - Remove dead code: model_attributes was set in the constructor and then redundantly reassigned on the next line for skipped components Also adds test_components_to_skip_unknown_name_warns to cover the warning case. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…WiseRtnQuantization When copying a skipped component, onnx_file_name may be None if the ONNXModelHandler was constructed without an explicit file name. Fall back to 'model.onnx' which is the standard Olive convention to avoid passing None to the output ONNXModelHandler constructor. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
49d8455 to
4da7446
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Add optional
components_to_skipparameter (list of component names) to theOnnxBlockWiseRtnQuantizationpass. When a composite model component's name appears in this list, its model files are copied to the output path unchanged instead of being quantized. Non-listed components are quantized as usual. DefaultNonequantizes all components (fully backward compatible).Depends on #2456 (MobiusBuilder
components_to_export). This PR's diff is scoped to RTN changes only — MobiusBuilder changes are in the base branch.Motivation
When quantizing large multi-component VLMs (e.g. Mistral-3 with decoder + vision_encoder + embedding), users may want to skip RTN quantization for specific components such as the embedding model — which may need to stay in higher precision (e.g. due to a
GatherBlockQuantizedbug with certain inputs).Changes
olive/passes/onnx/rtn_quantization.py:from typing import Optional(removed unusedList)components_to_skip: listPassConfigParam(defaultNone)run()to interceptCompositeModelHandlerprocessing: copy skipped components viashutil.copytree(handles file vs directorymodel_path), quantize the rest via recursiveself.run()onnx_file_namefallback: usegetattr(component_model, "onnx_file_name", None) or "model.onnx"to handle handlers constructed without an explicit file name_initialize()guard and post-processing (model_attributespropagation +_carry_forward_additional_filesper component)test/passes/onnx/test_rtn_quantization.py:TestRTNQuantizationComponentsToSkipclass with 5 testscomponents_to_skip=Nonequantizes all (backward compat)AcceleratorSpecreimport; wrapped long lines (120-char limit); split compound assertionsTesting
All 16 tests pass: