Skip to content

new architecture for auto_round#1542

Open
n1ck-guo wants to merge 63 commits intomainfrom
hengguo/new_ar_arch
Open

new architecture for auto_round#1542
n1ck-guo wants to merge 63 commits intomainfrom
hengguo/new_ar_arch

Conversation

@n1ck-guo
Copy link
Copy Markdown
Contributor

@n1ck-guo n1ck-guo commented Mar 13, 2026

Description

  • Compressor:
    Main entry point responsible for orchestrating the workflow, invoking different algorithms, and handling model persistence. Supports block-wise or layer-wise quantization strategies. Primary subclasses include TuneCompressor and ZeroShotCompressor.
  • Calibration: Handles the calibration process (Work in Progress)
  • Context: Manages shared configurations and model states throughout the quantization pipeline, providing centralized control to prevent cross-module dependencies
    • ModelContext: Handles model loading and tracks model states and relevant configurations
    • CompressContext: Stores shared compression settings such as low_cpu_mem_usage, enable_torch_compile, etc.
  • Algorithms: Concrete quantization and weight transformation implementations
    • Quantization: Various quantization algorithms, including AutoRound, RTN, OptRTN, etc.
    • Transform: Weight transformation algorithms such as Hadamard transform

Usage of new api:

from auto_round.algorithms.rotation import HadamardConfig 

quant_cfg  = AutoRoundConfig(bits=4, group_size=128, iters=200)
had_cfg_1  = HadamardConfig(hadamard_type="hadamard",        block_size=32)
had_cfg_2  = HadamardConfig(hadamard_type="random_hadamard", block_size=64, random_seed=True)

compressor = Compressor(
    config=[quant_cfg, had_cfg_1, had_cfg_2], 
    model="facebook/opt-125m",
    scheme="MXFP4",
    format="auto_round",
)

model, layer_config = compressor.quantize_and_save(
    output_dir="./output",
)

Type of Change

  • Bug fix
  • New feature
  • Documentation update
  • Performance improvement
  • Code refactoring
  • Other (please specify):

Related Issues

Fixes or relates to #

Checklist Before Submitting

  • My code has been tested locally.
  • Documentation has been updated as needed.
  • New or updated tests are included where applicable.

Signed-off-by: n1ck-guo <heng.guo@intel.com>
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Refactors AutoRound toward a new “context + compressor + algorithm” architecture, introducing new compressors_new/ and context/ modules and updating scheme parsing/export helpers to support the new flow.

Changes:

  • Added new context singletons (ModelContext, CompressContext) and a new compressors_new implementation path.
  • Expanded scheme parsing to reconcile bits/data_type and support user overrides + AutoScheme integration.
  • Added new calibration utilities and algorithm scaffolding for quantization backends (AutoRound/RTN).

Reviewed changes

Copilot reviewed 26 out of 26 changed files in this pull request and generated 18 comments.

Show a summary per file
File Description
auto_round/utils/model.py Avoids runtime import cycles via TYPE_CHECKING for QuantizationScheme.
auto_round/schemes.py Adds scheme override + parsing helpers and bits/dtype reconciliation.
auto_round/formats.py Switches divisibility checks to global supported-layer constants.
auto_round/context/model_context.py Introduces model lifecycle/loading + AMP setup and forward-hook management.
auto_round/context/compress_context.py Introduces device/device_map and memory-usage knobs as shared context.
auto_round/context/base.py Adds simple singleton context base.
auto_round/context/init.py Package init for new context module.
auto_round/compressors_new/utils.py New utility module (layer config, gguf mapping, caching helpers, forward helpers).
auto_round/compressors_new/shard_writer.py New shard-based saver with optional safetensors support.
auto_round/compressors_new/config.py Introduces extra/legacy config dataclasses for the new compressor path.
auto_round/compressors_new/base.py New “BaseCompressor” implementation wiring contexts, formats, caching, quant loop.
auto_round/compressors_new/init.py Package init for compressors_new.
auto_round/compressors/utils.py Extends legacy layer-config resolution to include safetensors-only tensors and skip missing modules.
auto_round/calibration/utils.py Adds helpers for “early stop” caching and input reshaping for block tuning.
auto_round/calibration/init.py Package init for calibration.
auto_round/algorithms/quantization/rtn/rtn.py Adds placeholder RTN quantization module file.
auto_round/algorithms/quantization/rtn/config.py Adds RTN algorithm config stub.
auto_round/algorithms/quantization/rtn/init.py Package init for RTN quantization.
auto_round/algorithms/quantization/base.py Adds base quantization class stub.
auto_round/algorithms/quantization/auto_round/quantize.py Adds new AutoRound quantizer implementation (algorithm object).
auto_round/algorithms/quantization/auto_round/config.py Adds new AutoRound algorithm config.
auto_round/algorithms/quantization/auto_round/init.py Package init for AutoRound quantization algorithm.
auto_round/algorithms/quantization/init.py Package init for quantization algorithms.
auto_round/algorithms/base.py Adds base algorithm stub.
auto_round/algorithms/alg_config.py Adds base algorithm config stub.
auto_round/algorithms/init.py Package init for algorithms.

@wenhuach21
Copy link
Copy Markdown
Contributor

If there is already an algorithm folder, what is the purpose of the compressor folder?

@n1ck-guo n1ck-guo requested review from WeiweiZhang1 and yiliu30 and removed request for xin3he March 13, 2026 05:31
@chensuyue chensuyue added this to the 0.12.0 milestone Mar 16, 2026
n1ck-guo and others added 3 commits March 17, 2026 17:02
Signed-off-by: n1ck-guo <heng.guo@intel.com>
Signed-off-by: n1ck-guo <heng.guo@intel.com>
n1ck-guo added 7 commits April 2, 2026 09:47
Signed-off-by: n1ck-guo <heng.guo@intel.com>
Signed-off-by: n1ck-guo <heng.guo@intel.com>
Signed-off-by: n1ck-guo <heng.guo@intel.com>
Signed-off-by: n1ck-guo <heng.guo@intel.com>
Signed-off-by: n1ck-guo <heng.guo@intel.com>
Signed-off-by: n1ck-guo <heng.guo@intel.com>
Signed-off-by: n1ck-guo <heng.guo@intel.com>
n1ck-guo added 2 commits April 3, 2026 13:54
Signed-off-by: n1ck-guo <heng.guo@intel.com>
Signed-off-by: n1ck-guo <heng.guo@intel.com>
pre-commit-ci bot and others added 10 commits April 7, 2026 06:48
Signed-off-by: n1ck-guo <heng.guo@intel.com>
Signed-off-by: n1ck-guo <heng.guo@intel.com>
Signed-off-by: n1ck-guo <heng.guo@intel.com>
Signed-off-by: n1ck-guo <heng.guo@intel.com>
Signed-off-by: n1ck-guo <heng.guo@intel.com>
Signed-off-by: n1ck-guo <heng.guo@intel.com>
Signed-off-by: n1ck-guo <heng.guo@intel.com>
@n1ck-guo n1ck-guo added the ready only add when the PR is ready to merge label Apr 8, 2026
n1ck-guo added 3 commits April 9, 2026 10:21
Signed-off-by: n1ck-guo <heng.guo@intel.com>
Signed-off-by: n1ck-guo <heng.guo@intel.com>
Signed-off-by: n1ck-guo <heng.guo@intel.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

api/new engineering ready only add when the PR is ready to merge

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants