Skip to content

feat: add components_to_skip to OnnxBlockWiseRtnQuantization#2457

Open
titaiwangms wants to merge 6 commits intofeat/mobius-builder-components-filterfrom
feat/rtn-quantization-components-to-skip
Open

feat: add components_to_skip to OnnxBlockWiseRtnQuantization#2457
titaiwangms wants to merge 6 commits intofeat/mobius-builder-components-filterfrom
feat/rtn-quantization-components-to-skip

Conversation

@titaiwangms
Copy link
Copy Markdown
Contributor

@titaiwangms titaiwangms commented May 8, 2026

Summary

Add optional components_to_skip parameter (list of component names) to the OnnxBlockWiseRtnQuantization pass. When a composite model component's name appears in this list, its model files are copied to the output path unchanged instead of being quantized. Non-listed components are quantized as usual. Default None quantizes all components (fully backward compatible).

Depends on #2456 (MobiusBuilder components_to_export). This PR's diff is scoped to RTN changes only — MobiusBuilder changes are in the base branch.

Motivation

When quantizing large multi-component VLMs (e.g. Mistral-3 with decoder + vision_encoder + embedding), users may want to skip RTN quantization for specific components such as the embedding model — which may need to stay in higher precision (e.g. due to a GatherBlockQuantized bug with certain inputs).

Changes

  • olive/passes/onnx/rtn_quantization.py:

    • from typing import Optional (removed unused List)
    • Add components_to_skip: list PassConfigParam (default None)
    • Override run() to intercept CompositeModelHandler processing: copy skipped components via shutil.copytree (handles file vs directory model_path), quantize the rest via recursive self.run()
    • onnx_file_name fallback: use getattr(component_model, "onnx_file_name", None) or "model.onnx" to handle handlers constructed without an explicit file name
    • Mirror base class _initialize() guard and post-processing (model_attributes propagation + _carry_forward_additional_files per component)
    • Non-composite (single) models are unaffected
  • test/passes/onnx/test_rtn_quantization.py:

    • New TestRTNQuantizationComponentsToSkip class with 5 tests
    • Skipped component passes through unchanged (MatMul present, no MatMulNBits)
    • Non-skipped components are quantized
    • components_to_skip=None quantizes all (backward compat)
    • Single-model (non-composite) unaffected
    • Unknown component name logs a warning (does not raise)
    • Removed redundant AcceleratorSpec reimport; wrapped long lines (120-char limit); split compound assertions

Testing

All 16 tests pass:

python3 -m pytest test/passes/onnx/test_rtn_quantization.py -v

Copilot AI review requested due to automatic review settings May 8, 2026 22:48
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR extends Olive’s ONNX tooling to better support multi-component (composite) models by allowing selective processing of components during export and RTN quantization.

Changes:

  • Add components_to_skip to OnnxBlockWiseRtnQuantization to copy selected composite components through unmodified while quantizing the rest.
  • Add tests validating components_to_skip behavior on composite and non-composite models, including a warning on unknown component names.
  • Add components_to_export to MobiusBuilder to export only a subset of components from multi-component mobius packages, with validation + new unit tests.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 3 comments.

File Description
olive/passes/onnx/rtn_quantization.py Adds components_to_skip config and custom composite-handling run() logic to skip quantization for selected components.
test/passes/onnx/test_rtn_quantization.py Adds unit tests covering components_to_skip behavior and logging.
olive/passes/onnx/mobius_model_builder.py Adds components_to_export config and filters which mobius package components are saved/returned.
test/passes/onnx/test_mobius_model_builder.py Adds unit tests for components_to_export filtering/validation and updates the fake package save behavior.

Comment thread olive/passes/onnx/rtn_quantization.py Outdated
Comment on lines +140 to +142
output_component = ONNXModelHandler(
model_path=str(component_output_path),
onnx_file_name=component_model.onnx_file_name,
Comment on lines 207 to 211
# ModelPackage.save() handles both single and multi-component layouts:
# single component → <output_dir>/model.onnx
# multi-component → <output_dir>/<name>/model.onnx for each key
pkg.save(str(output_dir))
pkg.save(str(output_dir), components=components_filter)

Comment on lines 107 to +122
"configs alongside the ONNX models. 'none' to skip."
),
),
"components_to_export": PassConfigParam(
type_=list[str],
required=False,
default_value=None,
description=(
"Optional list of component names to export from a multi-component model "
"(e.g. ['vision', 'embedding'] to skip the decoder). "
"When set, only the named components are saved and returned; "
"all others are discarded after the mobius build step. "
"When not set (None), all components are exported (default, backward compatible). "
"Raises ValueError if any specified name is not found in the model's components."
),
),
titaiwangms and others added 6 commits May 8, 2026 23:12
Add optional `components_to_skip` parameter (list of component names) to
OnnxBlockWiseRtnQuantization pass. When a composite model component's name
appears in this list, its model files are copied to the output path unchanged
instead of being quantized. Non-listed components are quantized as usual.
When not set (default None), all components are quantized (backward compatible).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- Remove unused List from typing import (only Optional is needed)
- Remove redundant required=False (implied by default_value=None)
  to match conventions of neighboring PassConfigParam declarations

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…rride

The base Pass.run() gates on self._initialized before calling _initialize().
When components_to_skip is set and the model is a CompositeModelHandler,
the override bypasses super().run() entirely, so the init guard must be
replicated explicitly. This ensures consistent lifecycle semantics if a
subclass ever overrides _initialize().

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- Call _carry_forward_additional_files per-skipped component to mirror
  what base Pass.run() does for each individual component
- Remove redundant AcceleratorSpec reimport in test _make_pass
- Wrap long lines in test helpers to stay under 120 chars (Black)
- Split compound assertion into two separate assertions for clarity

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- Use ONNXModelHandler directly (remove OnnxHandler alias imported from
  internal module path)
- Use list[str] type annotation for components_to_skip (matches
  mobius_model_builder.py convention from feat/mobius-builder-components-filter)
- Add warning when components_to_skip names a component not found in the
  composite model (misspellings were silently ignored before)
- Add comment explaining the _initialized guard in run() override: it mirrors
  Pass.run() behavior and is required because we bypass super().run() for
  composite models
- Remove dead code: model_attributes was set in the constructor and then
  redundantly reassigned on the next line for skipped components

Also adds test_components_to_skip_unknown_name_warns to cover the warning case.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…WiseRtnQuantization

When copying a skipped component, onnx_file_name may be None if the
ONNXModelHandler was constructed without an explicit file name. Fall back to
'model.onnx' which is the standard Olive convention to avoid passing None to
the output ONNXModelHandler constructor.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@titaiwangms titaiwangms force-pushed the feat/rtn-quantization-components-to-skip branch from 49d8455 to 4da7446 Compare May 8, 2026 23:12
@titaiwangms titaiwangms changed the base branch from main to feat/mobius-builder-components-filter May 8, 2026 23:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants