Skip to content

feat(quant): Quantizer class with BaseQuantPass pipeline (#964)#985

Merged
DingmaomaoBJTU merged 8 commits into
mainfrom
dingmaomaobjtu-feat-quantizer-pass-pipeline
Jun 29, 2026
Merged

feat(quant): Quantizer class with BaseQuantPass pipeline (#964)#985
DingmaomaoBJTU merged 8 commits into
mainfrom
dingmaomaobjtu-feat-quantizer-pass-pipeline

Conversation

@DingmaomaoBJTU

@DingmaomaoBJTU DingmaomaoBJTU commented Jun 26, 2026

Copy link
Copy Markdown
Collaborator

Summary

Refactors the quantizer from a flat dispatch function into an extensible pass-based pipeline, as tracked in #964.

Changes

New: passes/ sub-package

File Class What it does
passes/base.py BaseQuantPass ABC — __init__(config) + abstract run(model_path, output_path) -> QuantizeResult
passes/fp16.py FP16Pass Reads fp16_keep_io_types, fp16_op_block_list from config
passes/rtn.py RTNPass Reads rtn_bits, rtn_block_size, rtn_symmetric, rtn_accuracy_level from config
passes/static.py StaticPass Reads all QDQ/calibration fields from config

All passes accept a single WinMLQuantizationConfig — each reads only its relevant fields.

Refactored: quantizer.py

  • Quantizer(passes) — chains passes sequentially; single-pass takes the direct path, multi-pass routes intermediates through a TemporaryDirectory; merges QuantizeResult stats across passes
  • expand_precision(mode, config) — maps mode strings to pass lists; mode is optional and falls back to config.mode:
    • "fp16"[FP16Pass(config)]
    • "rtn"[RTNPass(config)]
    • "static" / "dynamic"[StaticPass(config)]
  • quantize_onnx() — kept as backward-compatible entry point; now delegates to Quantizer

Updated: commands/quantize.py

  • --precision now accepts multiple values to compose a pass pipeline (e.g. --precision int4 --precision fp16 runs RTN then FP16)
  • Default output name for multi-pass: {stem}_int4_fp16.onnx

Tests

tests/unit/test_quant_passes.py — 19 tests, all passing:

  • TestExpandPrecision (7) — mapping correctness, unknown mode, None config, mode-from-config fallback
  • TestQuantizerSinglePass (4) — path routing, missing model, exception handling, empty passes guard
  • TestQuantizeOnnxKwargsGuard (1) — unexpected kwargs raise TypeError
  • TestQuantizerMultiPass (4) — chaining, stat merging, abort on failure, warning accumulation
  • TestFP16PassConfig (1) — config field wiring
  • TestRTNPassConfig (2) — config field wiring, accuracy_level=0 → None

tests/e2e/test_quantize_e2e.pyTestMultiPrecision (2 tests):

  • test_int4_then_fp16_pipeline — verifies MatMulNBits nodes (RTN) + FLOAT16 initializers (FP16) are both present
  • test_pipeline_default_output_path — verifies auto-named output file

@DingmaomaoBJTU DingmaomaoBJTU requested a review from a team as a code owner June 26, 2026 05:42
Comment thread src/winml/modelkit/quant/__init__.py Fixed
Comment thread src/winml/modelkit/quant/__init__.py Fixed
Comment thread src/winml/modelkit/quant/passes/base.py Fixed
@DingmaomaoBJTU DingmaomaoBJTU force-pushed the dingmaomaobjtu-feat-quantizer-pass-pipeline branch from 5a4da8c to 519b562 Compare June 26, 2026 07:56
Comment thread src/winml/modelkit/quant/passes/base.py
Comment thread src/winml/modelkit/quant/passes/static.py
Comment thread src/winml/modelkit/quant/quantizer.py Outdated
- Add passes/ sub-package with BaseQuantPass ABC
- Implement FP16Pass, RTNPass, QDQPass — each accepts WinMLQuantizationConfig
  and reads only the fields relevant to that pass
- Add Quantizer class: chains passes sequentially, uses tempfile for
  intermediates, merges QuantizeResult stats across passes
- Add expand_precision(mode, config) to map precision strings to pass lists
  (supports 'fp16', 'rtn', 'static', 'dynamic', 'w4a16')
- Keep quantize_onnx() as backward-compatible entry point
- Add tests/unit/test_quant_passes.py (19 tests, all passing)
github-actions Bot added 5 commits June 29, 2026 10:21
- QDQPass.run(): forward use_external_data to final save_onnx call
- WinMLQuantizationConfig: add 'w4a16' to mode Literal; to_dict() now
  serialises rtn_* and fp16_* fields when mode is 'w4a16'
- quantize_onnx(): raise TypeError on unrecognised kwargs instead of
  silently discarding them
- Tests: add TestW4a16Config (3 cases) and TestQuantizeOnnxKwargsGuard (1 case)
- Add TYPE_CHECKING import block for Quantizer, expand_precision, and
  quantize_onnx so mypy resolves their types instead of falling back to
  Any? (fixes 'Any? not callable [misc]' in hf.py and onnx.py)
- Same TYPE_CHECKING imports satisfy CodeQL's 'Explicit export is not
  defined' alerts for those names in __all__
- Remove trailing ... after docstring in BaseQuantPass.run() to fix
  CodeQL 'Statement has no effect' alert
… onto main

w4a16 is a composite pipeline concept, not a single-pass quantization mode.
Removing it from the mode Literal keeps config.py focused on atomic pass
modes (static, dynamic, rtn, fp16). Multi-pass pipelines are expressed
through Quantizer + expand_precision at a higher level.

Changes:
- config.py: revert mode Literal to [static, dynamic, rtn, fp16], revert
  to_dict() guards back to equality checks, remove w4a16 docstring example
- quantizer.py: remove w4a16 from _COMPOSITE_PRECISIONS and docstrings
- __init__.py: update module docstring example
- commands/build.py: fix stale 'single-pass' comment
- tests: remove TestW4a16Config and test_w4a16_returns_rtn_then_fp16
… pipeline

Rename:
- passes/qdq.py → passes/static.py; QDQPass → StaticPass throughout
- Update all imports, __all__, quantizer.py pass_factories, and tests

Multi-precision --precision:
- precision_option() gains multiple=True support
- quantize command accepts repeated --precision flags; len > 1 routes to
  _run_multi_precision() which chains expand_precision() calls into a
  single Quantizer pipeline
- Default output path for multi-pass: {stem}_{p1}_{p2}.onnx
- Calibration-unused warning emitted when no static pass is in pipeline

E2E tests (TestMultiPrecision):
- test_int4_then_fp16_pipeline: verifies MatMulNBits nodes (RTN) and
  FLOAT16 initializers (FP16 pass) are both present in output
- test_pipeline_default_output_path: verifies auto-named output file
- static.py: fix model_name -> model_id (WinMLQuantizationConfig has no
  model_name field; correct field is model_id) — fixes mypy [attr-defined]
- cli.py: widen precision_option default type to str | tuple[str,...] | None
  so passing default=() for multiple=True passes mypy — fixes [arg-type]
- quantizer.py: make expand_precision mode optional, falling back to
  config.mode when not provided; removes redundant arg from quantize_onnx
  caller (addresses reviewer: 'why have mode param when config has it')
- quantize.py: remove redundant 'from typing import cast' inside
  _run_multi_precision (cast already imported at module level)
- passes/base.py: add note in run() docstring explaining why file-based I/O
  is used (addresses reviewer suggestion about in-memory model proto)
- tests: add test_no_mode_uses_config_mode to cover expand_precision(config=)
  path
@DingmaomaoBJTU DingmaomaoBJTU force-pushed the dingmaomaobjtu-feat-quantizer-pass-pipeline branch from 0b6098a to 7ccce14 Compare June 29, 2026 02:22
Comment thread src/winml/modelkit/quant/quantizer.py Outdated
Comment thread src/winml/modelkit/quant/quantizer.py Outdated

@xieofxie xieofxie left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few correctness concerns (2 should-fix, 2 to consider) on the pass-pipeline refactor. Overall the design is clean and well-tested — these are mostly about parity with the previous single-dispatch behavior.

Comment thread src/winml/modelkit/quant/passes/static.py Outdated
Comment thread src/winml/modelkit/commands/quantize.py Outdated
Comment thread src/winml/modelkit/quant/quantizer.py
Comment thread src/winml/modelkit/commands/quantize.py
…ntized suffix

- expand_precision(): rename parameter mode -> precision (more accurate:
  'mode' maps only to config.mode values, while 'precision' also covers
  future _COMPOSITE_PRECISIONS entries like w4a16)
- Rename internal effective_mode -> effective_precision
- quantize_onnx() default output: {stem}_qdq.onnx -> {stem}_quantized.onnx
  (_qdq is an implementation detail of one pass; _quantized is generic)
- Update all callers, tests, CLI help text, and docstrings

@xieofxie xieofxie left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two minor nits to go with the earlier review.

Comment thread src/winml/modelkit/quant/quantizer.py Outdated
Comment thread src/winml/modelkit/quant/passes/fp16.py Outdated
… types, nits

Should-fix:
- passes/static.py: restore per-target symmetry override — weight_symmetric
  and activation_symmetric now fall back to config.symmetric only when None,
  fixing the w8a16 regression where both were collapsed to config.symmetric
- commands/quantize.py: resolve weight/activation types from the first static
  precision in the multi-pass pipeline via _resolve_quant_types, so
  '-p int16 -p fp16' correctly uses int16 instead of silently defaulting to uint8

Nits:
- quantizer.py: replace eager _pass_factories (instances) with lazy _pass_types
  (classes); only the selected pass is instantiated
- passes/fp16.py, passes/rtn.py: remove redundant __init__ that only calls
  super().__init__(config)
@DingmaomaoBJTU DingmaomaoBJTU merged commit c9a611c into main Jun 29, 2026
9 checks passed
@DingmaomaoBJTU DingmaomaoBJTU deleted the dingmaomaobjtu-feat-quantizer-pass-pipeline branch June 29, 2026 08:58
timenick added a commit that referenced this pull request Jun 29, 2026
Resolve conflict in src/winml/modelkit/quant/quantizer.py against main's
#985 Quantizer/BaseQuantPass pipeline refactor (which replaced the old
_quantize_single_pass / _quantize_qdq dispatch):

- Drop the obsolete old-architecture handlers; keep main's expand_precision
  + Quantizer(passes).run pipeline.
- Re-wire the disk-full input guard (_check_input_model_opset) into
  quantize_onnx, ordered after the kwargs TypeError check and before the
  model-type finalizer. Not placed in Quantizer.run() because its
  orchestration unit tests drive it with dummy non-ONNX inputs.
- Fix two finalizer tests that #985 silently broke (they monkeypatched the
  removed _quantize_qdq): retarget them to the new Quantizer seam.
- Cover the standalone multi-precision CLI path (_run_multi_precision), which
  drives Quantizer directly, with the same guard for parity.

Address PR review comments in onnx/persistence.py:
- ONNXSaveError.errno was always None; preserve the originating errno via a
  new errno_code argument so `except OSError` callers can inspect e.errno.
- Add a zero-byte stat() fast-path to _check_input_model_opset so the healthy
  success path avoids a full proto parse.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants