Enable NextStepDiffusion and support multi-device tuning for diffusion by xin3he · Pull Request #1640 · intel/auto-round

xin3he · 2026-03-30T12:59:35Z

Description

fix nextstep loading issue

example_prompt = "A REALISTIC PHOTOGRAPH OF A WALL WITH \"TOWARD AUTOREGRESSIVE IMAGE GENERATION WITH CONTINUOUS TOKENS AT SCALE\" PROMINENTLY DISPLAYED"

Raw model output:

W4A16 model output with torch backend on CPU:

W4A16 model output with `gptqmodel:marlin` backend on CUDA:

Type of Change

Related Issues

Fixes or relates to #

Checklist Before Submitting

My code has been tested locally.
Documentation has been updated as needed.
New or updated tests are included where applicable.

Signed-off-by: Xin He <xin3.he@intel.com>

Copilot

Pull request overview

Fixes model loading for the “nextstep” model type by selecting an appropriate AutoModel loader, and adjusts multimodal key detection to recognize “image”-named components.

Changes:

Force AutoModel for model_type == "nextstep" during MLLM model loading.
Add "image" to MM_KEYS to broaden multimodal component detection.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.

File	Description
`auto_round/utils/model.py`	Adds a NextStep-specific loader class override to resolve loading failures.
`auto_round/utils/common.py`	Extends multimodal key matching to include `"image"` for downstream detection/mapping.

auto_round/utils/model.py

auto_round/utils/common.py

Signed-off-by: Xin He <xin3.he@intel.com>

xin3he · 2026-03-30T13:52:54Z

The exllama backend has accuracy issue for nextstep generation.
The marlin backend requires main branch so fix it in this PR.
cc @wenhuach21

auto_round_extension/cuda/gptqmodel_marlin.py

wenhuach21 · 2026-03-31T01:44:35Z

better add next_step to mllm support matrix

xin3he · 2026-03-31T08:07:09Z

I need to upstream a model before updating the support matrix (requires model link).

wenhuach21 · 2026-03-31T08:57:43Z

I need to upstream a model before updating the support matrix (requires model link).

If the model’s license allows upstreaming, we can upload it. Otherwise, we can leave the link blank.

xin3he · 2026-04-03T01:53:36Z

The status has been reverted to "Draft," as only RTN is currently supported; upstream adaptation and optimization work is currently underway.

…model loading for NextStep Signed-off-by: Xin He <xin3.he@intel.com>

Signed-off-by: Xin He <xin3.he@intel.com>

… gptqmodel fix Signed-off-by: Xin He <xin3.he@intel.com>

for more information, see https://pre-commit.ci

Signed-off-by: Xin He <xin3.he@intel.com>

…imports Signed-off-by: Xin He <xin3.he@intel.com>

for more information, see https://pre-commit.ci

xin3he · 2026-04-08T07:54:49Z

/azp run Unit-Test-CUDA-AutoRound

azure-pipelines · 2026-04-08T07:54:59Z

Azure Pipelines successfully started running 1 pipeline(s).

Signed-off-by: Xin He <xin3.he@intel.com>

for more information, see https://pre-commit.ci

Signed-off-by: Xin He <xin3.he@intel.com>

for more information, see https://pre-commit.ci

Signed-off-by: Xin He <xin3.he@intel.com>

wenhuach21 · 2026-04-10T05:27:46Z

auto_round/compressors/diffusion/compressor.py

        **kwargs,
    ):
        logger.warning("Diffusion model quantization is experimental and is only validated on Flux models.")
+        if dataset == "NeelNanda/pile-10k":


this is not very robust, I guess all of our supported llm datasets are not suitable for this

wenhuach21 · 2026-04-10T05:29:04Z

auto_round/compressors/diffusion/compressor.py

        """
        # Replace special characters to make the folder name filesystem-safe
        sanitized_format = format.get_backend_name().replace(":", "-").replace("_", "-")
+        if hasattr(self.model, "config") and getattr(self.model.config, "model_type", None) == "nextstep":


this is very tricky, It would be better to handle this in special model

wenhuach21 · 2026-04-10T05:31:45Z

auto_round/compressors/diffusion/compressor.py

            return super().save_quantized(output_dir, format=format, inplace=inplace, **kwargs)

        compressed_model = None
+        if hasattr(self.model, "config") and getattr(self.model.config, "model_type", None) == "nextstep":


The same tricky issue. We do not handle model specific issue in the common code. Better name it as a specific behavior and code a function/class to handle this for all models with the same behavior

wenhuach21 · 2026-04-10T05:37:12Z

auto_round/utils/device.py

+
+        if isinstance(model, DiffusionPipeline):
+            pipe = model
+            _device_map = 0 if device_map is None else device_map


only try the code that may throw exceptions, I guess it's from diffusers.pipelines.pipeline_utils import DiffusionPipeline here

wenhuach21 · 2026-04-10T05:38:03Z

auto_round/utils/device.py

+
+# This function is designed for Auto Scheme and Diffusion Pipeline,
+# which requires dispatching the whole model on all available devices.
+def dispatch_model_by_all_available_devices(


could we consolidate with the other function in auto-scheme

wenhuach21 · 2026-04-10T05:38:48Z

auto_round/utils/model.py

+    try:
+        from transformers import AutoConfig
+
+        config = AutoConfig.from_pretrained(pretrained_model_name_or_path, trust_remote_code=True)


trust_remode_code should follow the AR's setting. We have disable_trust

wenhuach21 · 2026-04-10T05:40:21Z

auto_round/utils/model.py

+        config = AutoConfig.from_pretrained(model_or_path, trust_remote_code=True)
+        model_type = getattr(config, "model_type", "")
+        # A special case for NextStep
+        if model_type == "nextstep":


same issue, you could register model type in handle_special_model.py or in diffuser folder

wenhuach21 · 2026-04-10T05:43:34Z

auto_round/utils/model.py

+
+
+def load_next_step_diffusion(pretrained_model_name_or_path, device_str):
+    from models.gen_pipeline import NextStepPipeline  # pylint: disable=E0401


better create a new file or new folder to handle the special model loading, you could use register or something else. And the other developer only needs to call load_mllm_model to load all our supported models

wenhuach21 · 2026-04-10T05:45:06Z

auto_round/eval/eval_cli.py

-        assert device in environ_mapping, f"Device {device} not supported for vllm tensor parallelism."
-        environ_name = environ_mapping[device]
+        assert device in DEVICE_ENVIRON_VARIABLE_MAPPING, f"Device {device} not supported for vllm tensor parallelism."
+        environ_name = DEVICE_ENVIRON_VARIABLE_MAPPING[device]


fix nextstep loading issue

2bc3697

Signed-off-by: Xin He <xin3.he@intel.com>

xin3he requested review from Copilot, mengniwang95 and n1ck-guo March 30, 2026 12:59

Copilot started reviewing on behalf of xin3he March 30, 2026 13:11 View session

Copilot AI reviewed Mar 30, 2026

View reviewed changes

auto_round/utils/model.py Outdated Show resolved Hide resolved

auto_round/utils/common.py Outdated Show resolved Hide resolved

support 6.0.0 gptqmodel

2bb744a

Signed-off-by: Xin He <xin3.he@intel.com>

xin3he requested a review from wenhuach21 March 30, 2026 13:57

wenhuach21 reviewed Mar 31, 2026

View reviewed changes

auto_round_extension/cuda/gptqmodel_marlin.py Show resolved Hide resolved

xin3he marked this pull request as draft April 3, 2026 01:52

xin3he added 2 commits April 3, 2026 09:55

Enhance DiffusionCompressor with custom pipeline support and improve …

576e130

…model loading for NextStep Signed-off-by: Xin He <xin3.he@intel.com>

Merge branch 'main' into xinhe/3-30a

010d4c0

xin3he mentioned this pull request Apr 7, 2026

[Feature]: support NextStepPipeline for tuning #1664

Open

xin3he added 2 commits April 7, 2026 12:23

fix bug for nextstep

bb22b29

Signed-off-by: Xin He <xin3.he@intel.com>

set self.num_inference_steps=1 for calibration

87c4037

Signed-off-by: Xin He <xin3.he@intel.com>

xin3he force-pushed the xinhe/3-30a branch from d55bab0 to 1c01200 Compare April 8, 2026 02:44

xin3he and others added 5 commits April 8, 2026 02:46

Remove unused function; add CUDA CI for diffusion tuning test; revert…

1c01200

… gptqmodel fix Signed-off-by: Xin He <xin3.he@intel.com>

[pre-commit.ci] auto fixes from pre-commit.com hooks

240df28

for more information, see https://pre-commit.ci

fix bug

04bb910

Signed-off-by: Xin He <xin3.he@intel.com>

Refactor device dispatching logic; remove unused function and update …

d84c2d2

…imports Signed-off-by: Xin He <xin3.he@intel.com>

[pre-commit.ci] auto fixes from pre-commit.com hooks

8632802

for more information, see https://pre-commit.ci

xin3he changed the title ~~fix nextstep loading issue~~ Enable NextStepDiffusion and support multi-device tuning for diffusion Apr 8, 2026

xin3he requested a review from changwangss April 8, 2026 07:51

xin3he marked this pull request as ready for review April 8, 2026 07:54

xin3he requested a review from wenhuach21 April 8, 2026 07:55

xin3he force-pushed the xinhe/3-30a branch from 2cfdc23 to e2926de Compare April 9, 2026 07:42

xin3he and others added 7 commits April 9, 2026 07:42

fix z-image save_pretrained

e2926de

Signed-off-by: Xin He <xin3.he@intel.com>

[pre-commit.ci] auto fixes from pre-commit.com hooks

e9967c9

for more information, see https://pre-commit.ci

fix pylint

816d2e2

Signed-off-by: Xin He <xin3.he@intel.com>

revert "force num_inference_steps=1"

6a4fdc8

Signed-off-by: Xin He <xin3.he@intel.com>

[pre-commit.ci] auto fixes from pre-commit.com hooks

c9597d9

for more information, see https://pre-commit.ci

fix pylint

55eb544

Signed-off-by: Xin He <xin3.he@intel.com>

Merge branch 'main' into xinhe/3-30a

b2f40a2

wenhuach21 reviewed Apr 10, 2026

View reviewed changes



		def load_next_step_diffusion(pretrained_model_name_or_path, device_str):
		from models.gen_pipeline import NextStepPipeline # pylint: disable=E0401

Conversation

xin3he commented Mar 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Raw model output:

W4A16 model output with torch backend on CPU:

W4A16 model output with gptqmodel:marlin backend on CUDA:

Type of Change

Related Issues

Checklist Before Submitting

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

xin3he commented Mar 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

wenhuach21 commented Mar 31, 2026

Uh oh!

xin3he commented Mar 31, 2026

Uh oh!

wenhuach21 commented Mar 31, 2026

Uh oh!

xin3he commented Apr 3, 2026

Uh oh!

xin3he commented Apr 8, 2026

Uh oh!

azure-pipelines bot commented Apr 8, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

wenhuach21 Apr 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

xin3he commented Mar 30, 2026 •

edited

Loading

W4A16 model output with `gptqmodel:marlin` backend on CUDA:

xin3he commented Mar 30, 2026 •

edited

Loading

wenhuach21 Apr 10, 2026 •

edited

Loading