Add compressed-tensors format export support for W4A16 and W8A16 by thuang6 · Pull Request #1669 · intel/auto-round

thuang6 · 2026-04-09T03:56:13Z

Description

Added compressed-tensors format export support for W4A16 and W8A16,
Replaced previous INT W8A8 support from internal NaiveQuantizationCompressor interface with new compress_module interface (require >=0.15.0)

updated PR to use BaseCompressor class method to be compatiable with old version

Type of Change

Related Issues

Fixes or relates to #1567

Checklist Before Submitting

My code has been tested locally.
Documentation has been updated as needed.
New or updated tests are included where applicable.

# Please enter a commit message to explain why this merge is necessary, # especially if it merges an updated upstream into a topic branch. # # Lines starting with '#' will be ignored, and an empty message aborts # the commit.

for more information, see https://pre-commit.ci

Copilot

Pull request overview

Adds llm_compressor (compressed-tensors) export support for INT weight-only schemes (W4A16, W8A16), and updates docs/tests accordingly.

Changes:

Extend llm_compressor format to accept W4A16/W8A16 and route them through a new backend path.
Update compressed-tensors scheme construction to omit activation quantization for weight-only exports.
Add/adjust CPU export tests and document the newly supported schemes (EN + CN).

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 4 comments.

Show a summary per file

File	Description
`auto_round/formats.py`	Adds W4A16/W8A16 to `llm_compressor` support and introduces a WOQ backend selector (`wint_a16`).
`auto_round/export/export_to_llmcompressor/export.py`	Treats W*A16 as weight-only in compressed-tensors scheme creation; tightens dependency expectations around `compress_module`.
`auto_round/compressors/utils.py`	Adds helper to detect integer weight-only quantization (WOQ).
`test/test_cpu/export/test_export.py`	Refactors INT8_W8A8 export test and adds new W4A16/W8A16 llm_compressor export assertions.
`README.md`	Documents `llm_compressor` support for FP8_BLOCK, INT8_W8A8, W4A16, W8A16.
`README_CN.md`	Mirrors the README support-matrix update in Chinese.

Copilot · 2026-04-09T04:01:05Z

test/test_cpu/export/test_export.py

+        autoround = AutoRound(
+            self.model_name,
+            iters=2,
+            nsamples=2,
+            seqlen=2,
+            scheme=scheme,


This test constructs AutoRound with iters=2 (tuning enabled) but doesn’t pass an explicit dataset. That will fall back to the default HF dataset (e.g., "NeelNanda/pile-10k"), which can introduce unwanted network dependence/flakiness in CI. Consider passing the dataloader fixture (or setting iters=0 for RTN-only) so the test is hermetic.

Copilot · 2026-04-09T04:01:06Z

test/test_cpu/export/test_export.py

+            group_size=group_size,
+            sym=sym,
+        )
+        quantized_model_path = "./saved"


Using a hard-coded relative output directory (./saved) can collide with other tests, depends on the process working directory, and risks deleting a real local folder when shutil.rmtree runs. Prefer using self.save_dir (tmp_path-based) or a tmp_path subdirectory for isolation.

Suggested change

quantized_model_path = "./saved"

quantized_model_path = os.path.join(

self.save_dir,

f"llmc_wint_a16_export_{scheme}_{bits}_{group_size}_{'sym' if sym else 'asym'}",

)

Copilot · 2026-04-09T04:01:06Z

auto_round/export/export_to_llmcompressor/export.py

+        logger.error(
+            "Unable to import compress_module from compressed_tensors. "
+            "This functionality requires compressed_tensors >= 0.15.0. "
+            "Please upgrade: pip install --upgrade compressed_tensors"
        )


_compress_and_set_format now hard-requires compressed_tensors.compressors.compress_module and raises if it’s missing, but the repo doesn’t appear to pin/enforce a minimum compressed-tensors version in top-level dependencies. Either keep the previous fallback for older versions or add an explicit version check / dependency constraint so users get a deterministic install-time failure instead of a runtime error here.

auto_round/export/export_to_llmcompressor/export.py

auto_round/compressors/utils.py

wenhuach21 · 2026-04-09T04:08:06Z

README_CN.md

 |**auto_awq**| W4A16（推荐）、BF16                                                                                                                                                                                  |
 |**auto_gptq**| W4A16（推荐）、W2A16、W3A16、W8A16、W2A16G64、W2A16G32、BF16                                                                                                                                              |
-|**llm_compressor**| NVFP4（推荐）、`MXFP4`、`MXFP8`、`FPW8A16`、`FP8_STATIC`                                                                                                                                             |
+|**llm_compressor**| NVFP4（推荐）、`MXFP4`、`MXFP8`、`FPW8A16`、`FP8_STATIC`、`FP8_BLOCK`、`INT8_W8A8`、W4A16、W8A16                                                                                                   |


In AR terminology, INT8 explicitly denotes W8A8 INT quantization. If the WA suffix is omitted, INT8 by default refers to W8A8,similar to MXFP4

INT_W8A8 is an existing scheme, here just expose it to doc. if plan to change name, can open a new PR to update both doc and code.
btw, considering native INT8 without smoothing does not scale well from acc perspective, maybe we should remove it from doc until smoothing feature ready, what do u think? @wenhuach21

yiliu30

LGTM

README.md

README_CN.md

…into thuang6/int4-ct

for more information, see https://pre-commit.ci

…into thuang6/int4-ct

Co-authored-by: Yi Liu <yi4.liu@intel.com>

thuang6 added 2 commits April 9, 2026 11:35

add CT format export for W4A16 and W8A16

9a5a966

thuang6 added this to the 0.13.0 milestone Apr 9, 2026

thuang6 requested review from Copilot, wenhuach21, xin3he and yiliu30 April 9, 2026 03:56

[pre-commit.ci] auto fixes from pre-commit.com hooks

045bfff

for more information, see https://pre-commit.ci

Copilot started reviewing on behalf of thuang6 April 9, 2026 03:56 View session

Copilot AI reviewed Apr 9, 2026

View reviewed changes

wenhuach21 approved these changes Apr 9, 2026

View reviewed changes

yiliu30 approved these changes Apr 9, 2026

View reviewed changes

README.md Outdated Show resolved Hide resolved

README_CN.md Outdated Show resolved Hide resolved

XuehaoSun and others added 8 commits April 9, 2026 16:49

Merge branch 'main' into thuang6/int4-ct

ded51d5

refactor to be compatible with multiple CT version

09eaa09

Merge remote-tracking branch 'origin/main' into thuang6/int4-ct

255d2b8

Merge branch 'thuang6/int4-ct' of https://github.com/intel/auto-round …

9e3fa00

…into thuang6/int4-ct

[pre-commit.ci] auto fixes from pre-commit.com hooks

104dcda

for more information, see https://pre-commit.ci

update woq check condition

0c1f2bd

Merge branch 'thuang6/int4-ct' of https://github.com/intel/auto-round …

4a7bb6a

…into thuang6/int4-ct

Apply doc change suggestions from review

8166213

Co-authored-by: Yi Liu <yi4.liu@intel.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add compressed-tensors format export support for W4A16 and W8A16 #1669

Add compressed-tensors format export support for W4A16 and W8A16 #1669
thuang6 wants to merge 11 commits intomainfrom
thuang6/int4-ct

thuang6 commented Apr 9, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Apr 9, 2026

Uh oh!

Copilot AI Apr 9, 2026

Uh oh!

Copilot AI Apr 9, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

wenhuach21 Apr 9, 2026

Uh oh!

thuang6 Apr 10, 2026 •

edited

Loading

Uh oh!

yiliu30 left a comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

-        quantized_model_path = "./saved"
+        quantized_model_path = os.path.join(
+            self.save_dir,
+            f"llmc_wint_a16_export_{scheme}_{bits}_{group_size}_{'sym' if sym else 'asym'}",
+        )

Conversation

thuang6 commented Apr 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Type of Change

Related Issues

Checklist Before Submitting

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Apr 9, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 9, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 9, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

wenhuach21 Apr 9, 2026

Choose a reason for hiding this comment

Uh oh!

thuang6 Apr 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

yiliu30 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

thuang6 commented Apr 9, 2026 •

edited

Loading

thuang6 Apr 10, 2026 •

edited

Loading