feat(quant): Quantizer class with BaseQuantPass pipeline (#964) by DingmaomaoBJTU · Pull Request #985 · microsoft/winml-cli

DingmaomaoBJTU · 2026-06-26T05:42:07Z

Summary

Refactors the quantizer from a flat dispatch function into an extensible pass-based pipeline, as tracked in #964.

Changes

New: `passes/` sub-package

File	Class	What it does
`passes/base.py`	`BaseQuantPass`	ABC — `__init__(config)` + abstract `run(model_path, output_path) -> QuantizeResult`
`passes/fp16.py`	`FP16Pass`	Reads `fp16_keep_io_types`, `fp16_op_block_list` from config
`passes/rtn.py`	`RTNPass`	Reads `rtn_bits`, `rtn_block_size`, `rtn_symmetric`, `rtn_accuracy_level` from config
`passes/static.py`	`StaticPass`	Reads all QDQ/calibration fields from config

All passes accept a single WinMLQuantizationConfig — each reads only its relevant fields.

Refactored: `quantizer.py`

Quantizer(passes) — chains passes sequentially; single-pass takes the direct path, multi-pass routes intermediates through a TemporaryDirectory; merges QuantizeResult stats across passes
expand_precision(mode, config) — maps mode strings to pass lists; mode is optional and falls back to config.mode:
- "fp16" → [FP16Pass(config)]
- "rtn" → [RTNPass(config)]
- "static" / "dynamic" → [StaticPass(config)]
quantize_onnx() — kept as backward-compatible entry point; now delegates to Quantizer

Updated: `commands/quantize.py`

--precision now accepts multiple values to compose a pass pipeline (e.g. --precision int4 --precision fp16 runs RTN then FP16)
Default output name for multi-pass: {stem}_int4_fp16.onnx

Tests

tests/unit/test_quant_passes.py — 19 tests, all passing:

TestExpandPrecision (7) — mapping correctness, unknown mode, None config, mode-from-config fallback
TestQuantizerSinglePass (4) — path routing, missing model, exception handling, empty passes guard
TestQuantizeOnnxKwargsGuard (1) — unexpected kwargs raise TypeError
TestQuantizerMultiPass (4) — chaining, stat merging, abort on failure, warning accumulation
TestFP16PassConfig (1) — config field wiring
TestRTNPassConfig (2) — config field wiring, accuracy_level=0 → None

tests/e2e/test_quantize_e2e.py — TestMultiPrecision (2 tests):

test_int4_then_fp16_pipeline — verifies MatMulNBits nodes (RTN) + FLOAT16 initializers (FP16) are both present
test_pipeline_default_output_path — verifies auto-named output file

- Add passes/ sub-package with BaseQuantPass ABC - Implement FP16Pass, RTNPass, QDQPass — each accepts WinMLQuantizationConfig and reads only the fields relevant to that pass - Add Quantizer class: chains passes sequentially, uses tempfile for intermediates, merges QuantizeResult stats across passes - Add expand_precision(mode, config) to map precision strings to pass lists (supports 'fp16', 'rtn', 'static', 'dynamic', 'w4a16') - Keep quantize_onnx() as backward-compatible entry point - Add tests/unit/test_quant_passes.py (19 tests, all passing)

- QDQPass.run(): forward use_external_data to final save_onnx call - WinMLQuantizationConfig: add 'w4a16' to mode Literal; to_dict() now serialises rtn_* and fp16_* fields when mode is 'w4a16' - quantize_onnx(): raise TypeError on unrecognised kwargs instead of silently discarding them - Tests: add TestW4a16Config (3 cases) and TestQuantizeOnnxKwargsGuard (1 case)

- Add TYPE_CHECKING import block for Quantizer, expand_precision, and quantize_onnx so mypy resolves their types instead of falling back to Any? (fixes 'Any? not callable [misc]' in hf.py and onnx.py) - Same TYPE_CHECKING imports satisfy CodeQL's 'Explicit export is not defined' alerts for those names in __all__ - Remove trailing ... after docstring in BaseQuantPass.run() to fix CodeQL 'Statement has no effect' alert

… onto main w4a16 is a composite pipeline concept, not a single-pass quantization mode. Removing it from the mode Literal keeps config.py focused on atomic pass modes (static, dynamic, rtn, fp16). Multi-pass pipelines are expressed through Quantizer + expand_precision at a higher level. Changes: - config.py: revert mode Literal to [static, dynamic, rtn, fp16], revert to_dict() guards back to equality checks, remove w4a16 docstring example - quantizer.py: remove w4a16 from _COMPOSITE_PRECISIONS and docstrings - __init__.py: update module docstring example - commands/build.py: fix stale 'single-pass' comment - tests: remove TestW4a16Config and test_w4a16_returns_rtn_then_fp16

… pipeline Rename: - passes/qdq.py → passes/static.py; QDQPass → StaticPass throughout - Update all imports, __all__, quantizer.py pass_factories, and tests Multi-precision --precision: - precision_option() gains multiple=True support - quantize command accepts repeated --precision flags; len > 1 routes to _run_multi_precision() which chains expand_precision() calls into a single Quantizer pipeline - Default output path for multi-pass: {stem}_{p1}_{p2}.onnx - Calibration-unused warning emitted when no static pass is in pipeline E2E tests (TestMultiPrecision): - test_int4_then_fp16_pipeline: verifies MatMulNBits nodes (RTN) and FLOAT16 initializers (FP16 pass) are both present in output - test_pipeline_default_output_path: verifies auto-named output file

- static.py: fix model_name -> model_id (WinMLQuantizationConfig has no model_name field; correct field is model_id) — fixes mypy [attr-defined] - cli.py: widen precision_option default type to str | tuple[str,...] | None so passing default=() for multiple=True passes mypy — fixes [arg-type] - quantizer.py: make expand_precision mode optional, falling back to config.mode when not provided; removes redundant arg from quantize_onnx caller (addresses reviewer: 'why have mode param when config has it') - quantize.py: remove redundant 'from typing import cast' inside _run_multi_precision (cast already imported at module level) - passes/base.py: add note in run() docstring explaining why file-based I/O is used (addresses reviewer suggestion about in-memory model proto) - tests: add test_no_mode_uses_config_mode to cover expand_precision(config=) path

xieofxie

A few correctness concerns (2 should-fix, 2 to consider) on the pass-pipeline refactor. Overall the design is clean and well-tested — these are mostly about parity with the previous single-dispatch behavior.

…ntized suffix - expand_precision(): rename parameter mode -> precision (more accurate: 'mode' maps only to config.mode values, while 'precision' also covers future _COMPOSITE_PRECISIONS entries like w4a16) - Rename internal effective_mode -> effective_precision - quantize_onnx() default output: {stem}_qdq.onnx -> {stem}_quantized.onnx (_qdq is an implementation detail of one pass; _quantized is generic) - Update all callers, tests, CLI help text, and docstrings

xieofxie

Two minor nits to go with the earlier review.

… types, nits Should-fix: - passes/static.py: restore per-target symmetry override — weight_symmetric and activation_symmetric now fall back to config.symmetric only when None, fixing the w8a16 regression where both were collapsed to config.symmetric - commands/quantize.py: resolve weight/activation types from the first static precision in the multi-pass pipeline via _resolve_quant_types, so '-p int16 -p fp16' correctly uses int16 instead of silently defaulting to uint8 Nits: - quantizer.py: replace eager _pass_factories (instances) with lazy _pass_types (classes); only the selected pass is instantiated - passes/fp16.py, passes/rtn.py: remove redundant __init__ that only calls super().__init__(config)

Resolve conflict in src/winml/modelkit/quant/quantizer.py against main's #985 Quantizer/BaseQuantPass pipeline refactor (which replaced the old _quantize_single_pass / _quantize_qdq dispatch): - Drop the obsolete old-architecture handlers; keep main's expand_precision + Quantizer(passes).run pipeline. - Re-wire the disk-full input guard (_check_input_model_opset) into quantize_onnx, ordered after the kwargs TypeError check and before the model-type finalizer. Not placed in Quantizer.run() because its orchestration unit tests drive it with dummy non-ONNX inputs. - Fix two finalizer tests that #985 silently broke (they monkeypatched the removed _quantize_qdq): retarget them to the new Quantizer seam. - Cover the standalone multi-precision CLI path (_run_multi_precision), which drives Quantizer directly, with the same guard for parity. Address PR review comments in onnx/persistence.py: - ONNXSaveError.errno was always None; preserve the originating errno via a new errno_code argument so `except OSError` callers can inspect e.errno. - Add a zero-byte stat() fast-path to _check_input_model_opset so the healthy success path avoids a full proto parse.

DingmaomaoBJTU requested a review from a team as a code owner June 26, 2026 05:42

github-advanced-security AI found potential problems Jun 26, 2026

View reviewed changes

Comment thread src/winml/modelkit/quant/__init__.py Fixed

Comment thread src/winml/modelkit/quant/__init__.py Fixed

Comment thread src/winml/modelkit/quant/passes/base.py Fixed

DingmaomaoBJTU force-pushed the dingmaomaobjtu-feat-quantizer-pass-pipeline branch from 5a4da8c to 519b562 Compare June 26, 2026 07:56

xieofxie reviewed Jun 26, 2026

View reviewed changes

Comment thread src/winml/modelkit/quant/passes/base.py

xieofxie reviewed Jun 26, 2026

View reviewed changes

Comment thread src/winml/modelkit/quant/passes/static.py

xieofxie reviewed Jun 26, 2026

View reviewed changes

Comment thread src/winml/modelkit/quant/quantizer.py Outdated

timenick approved these changes Jun 29, 2026

View reviewed changes

github-actions Bot added 5 commits June 29, 2026 10:21

DingmaomaoBJTU force-pushed the dingmaomaobjtu-feat-quantizer-pass-pipeline branch from 0b6098a to 7ccce14 Compare June 29, 2026 02:22

xieofxie reviewed Jun 29, 2026

View reviewed changes

Comment thread src/winml/modelkit/quant/quantizer.py Outdated

xieofxie reviewed Jun 29, 2026

View reviewed changes

Comment thread src/winml/modelkit/quant/quantizer.py Outdated

xieofxie reviewed Jun 29, 2026

View reviewed changes

Comment thread src/winml/modelkit/quant/passes/static.py Outdated

Comment thread src/winml/modelkit/commands/quantize.py Outdated

Comment thread src/winml/modelkit/quant/quantizer.py

Comment thread src/winml/modelkit/commands/quantize.py

xieofxie reviewed Jun 29, 2026

View reviewed changes

Comment thread src/winml/modelkit/quant/quantizer.py Outdated

Comment thread src/winml/modelkit/quant/passes/fp16.py Outdated

xieofxie approved these changes Jun 29, 2026

View reviewed changes

DingmaomaoBJTU merged commit c9a611c into main Jun 29, 2026
9 checks passed

DingmaomaoBJTU deleted the dingmaomaobjtu-feat-quantizer-pass-pipeline branch June 29, 2026 08:58

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat(quant): Quantizer class with BaseQuantPass pipeline (#964)#985

feat(quant): Quantizer class with BaseQuantPass pipeline (#964)#985
DingmaomaoBJTU merged 8 commits into
mainfrom
dingmaomaobjtu-feat-quantizer-pass-pipeline

DingmaomaoBJTU commented Jun 26, 2026 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

xieofxie left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

xieofxie left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Uh oh!

Conversation

DingmaomaoBJTU commented Jun 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Changes

New: passes/ sub-package

Refactored: quantizer.py

Updated: commands/quantize.py

Tests

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

xieofxie left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

xieofxie left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

DingmaomaoBJTU commented Jun 26, 2026 •

edited

Loading

New: `passes/` sub-package

Refactored: `quantizer.py`

Updated: `commands/quantize.py`