[1/3] Diffusion ckpt export for NVFP4 & FP8 #781

jingyu-ml · 2026-01-14T03:56:03Z

What does this PR do?

Type of change: New feature

Overview:

This PR adds support for exporting quantized diffusers models (DiT, Flux, SD3, UNet, etc.) to HuggingFace checkpoint format, enabling deployment to inference frameworks like SGLang, vLLM, and TensorRT-LLM.

Changes

New file: diffusers_utils.py

Dummy input generation for various diffusion models
Pipeline component extraction helpers
QKV projection detection and grouping
hide_quantizers_from_state_dict() context manager for clean saves

Refactored: unified_export_hf.py

New _fuse_qkv_linears_diffusion() for QKV amax fusion
_export_diffusers_checkpoint() to export full pipelines (models + tokenizers + schedulers etc.)

Plans

[1/3] Add the basic functionalities to support limited image models with NVFP4 + FP8, with some refactoring on the previous LLM code and the diffusers example. PIC: @jingyu-ml
[2/3] Add support to more video gen models, and the export support with SVDQuant. PIC: @jingyu-ml
[3/3] Add test cases, refactor on the doc, and all related README. PIC: @jingyu-ml

Usage

mtq.quantize(pipe, quant_config, forward_call)
export_hf_checkpoint(pipe, export_dir=hf_ckpt_dir)

Testing

Before your PR is "Ready for review"

Make sure you read and follow Contributor guidelines and your commits are signed.
Is this change backward compatible?: Yes
Did you write any new necessary tests?:No
Did you add or update any necessary documentation?:No
Did you update Changelog?:No

Additional Information

Summary by CodeRabbit

New Features

Added HuggingFace checkpoint export support for quantized diffusion models with configurable output directory
Introduced new --hf-ckpt-dir CLI argument for specifying checkpoint export destination
Extended export functionality to support selective component exports from diffusion pipelines
Enhanced quantized model export with improved component handling and multi-stage checkpoint generation

_{✏️ Tip: You can customize this high-level summary in your review settings.}

Signed-off-by: Jingyu Xin <jingyux@nvidia.com>

coderabbitai · 2026-01-14T03:56:15Z

Important

Review skipped

Auto incremental reviews are disabled on this repository.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

📝 Walkthrough

Walkthrough

This PR adds HuggingFace checkpoint export support for quantized diffusion models. It introduces a new CLI option and export configuration field for specifying a checkpoint directory, then extends the unified export module to route and handle diffusion pipeline exports with quantizer management and QKV fusion.

Changes

Cohort / File(s)	Summary
Quantization Script Enhancement `examples/diffusers/quantization/quantize.py`	Adds `hf_ckpt_dir` field to `ExportConfig`, new `export_hf_ckpt()` method to `ExportManager`, CLI argument `--hf-ckpt-dir` to accept export directory, and wiring to trigger HF checkpoint export after ONNX export and at the end of the main export flow.
Unified Export Module Extension `modelopt/torch/export/unified_export_hf.py`	Introduces context manager and helper functions for quantizer handling during export (`_hide_quantizers_from_state_dict`, `_process_quantized_modules`). Adds diffusion model support via `_export_diffusers_checkpoint` with per-component export, `model_index.json` creation, and non-nn.Module component handling. Implements QKV fusion utilities (`_is_qkv_projection`, `_get_qkv_group_key`, `_fuse_qkv_linears_diffusion`). Adds diffusion-specific helpers (`_generate_diffusion_dummy_inputs`, `_get_diffusers_components`, `_has_quantized_modules`, `_infer_dtype_from_model`). Updates `export_hf_checkpoint` signature to accept `components` parameter and route `DiffusionPipeline` models to diffusion-specific export path.

Sequence Diagram(s)

sequenceDiagram
    participant User as User/CLI
    participant Config as ExportConfig
    participant Manager as ExportManager
    participant Export as export_hf_ckpt()
    participant Router as Route Logic
    participant DiffusionExp as _export_diffusers_checkpoint()
    participant Components as Component Handler

    User->>Config: Pass --hf-ckpt-dir
    Config->>Manager: Create with hf_ckpt_dir set
    Manager->>Export: Call export_hf_ckpt(pipe)
    Export->>Router: Detect model type
    Router->>DiffusionExp: Route DiffusionPipeline
    DiffusionExp->>Components: Extract & process components
    Components->>Components: Hide quantizers
    Components->>Components: Fuse QKV linears
    Components->>Components: Save per-component subdirs
    DiffusionExp->>DiffusionExp: Save model_index.json
    DiffusionExp->>Export: Export complete

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

🚥 Pre-merge checks | ✅ 3

✅ Passed checks (3 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title '[1/3] Diffusion ckpt export for NVFP4 & FP8' directly reflects the main change: adding diffusion checkpoint export support for quantized models.
Docstring Coverage	✅ Passed	Docstring coverage is 90.48% which is sufficient. The required threshold is 80.00%.
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

copy-pr-bot · 2026-01-14T03:56:16Z

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

coderabbitai

Actionable comments posted: 2

🤖 Fix all issues with AI agents

In `@modelopt/torch/export/unified_export_hf.py`:
- Around line 1055-1057: The _get_diffusers_components currently raises for
anything not a DiffusionPipeline but _export_diffusers_checkpoint accepts
DiffusionPipeline | ModelMixin; update _get_diffusers_components to also accept
instances of ModelMixin (e.g., a standalone UNet) by detecting isinstance(model,
ModelMixin) and returning a components mapping consistent with what
_export_diffusers_checkpoint expects (for example {'unet': model} or the
appropriate single-component keys used downstream); ensure the DiffusionPipeline
branch behavior is unchanged and that callers handle the returned mapping
uniformly.
- Around line 452-463: The loop over model.named_modules() sets
fsdp_module_to_reshard for each FSDPModule but never resshards the last one
after the loop, leaving it unsharded; after the loop completes add a final check
and call to reshard on fsdp_module_to_reshard (i.e., if fsdp_module_to_reshard
is not None: fsdp_module_to_reshard.reshard()) so the last FSDPModule is
properly resharded; locate symbols model.named_modules, FSDPModule,
fsdp_module_to_reshard, and reshard() to apply the fix.

🧹 Nitpick comments (5)

modelopt/torch/export/unified_export_hf.py (5)
82-83: Move import to top of file with other imports.

The contextmanager import should be grouped with other imports at the top of the file (around lines 18-27) rather than inserted mid-file.
Suggested fix

Move to the imports section at the top:
from contextlib import contextmanager
954-955: Consider using logging instead of print statements.

The function uses print() for debug output which is inconsistent with the rest of the codebase that uses warnings.warn() or could use a logger. This also affects production output.
Suggested approach

Replace print() calls with warnings.warn() for warning-level messages, or consider adding an optional logger parameter:
-            print("No quantized linear modules found for QKV fusion.")
+            warnings.warn("No quantized linear modules found for QKV fusion.")
...
-                print(f"Warning: Unknown model type '{model_class_name}', skipping QKV fusion.")
+                warnings.warn(f"Unknown model type '{model_class_name}', skipping QKV fusion.")
...
-        print(f"Warning: Failed to run dummy forward for QKV fusion: {e}")
-        print("Skipping QKV fusion. Quantization may still work but amax values won't be unified.")
+        warnings.warn(f"Failed to run dummy forward for QKV fusion: {e}. Skipping QKV fusion.")
Also applies to: 970-970, 979-980, 1014-1015, 1017-1020

1113-1125: Minor: Step numbering is inconsistent - "Step 2" is missing.

The comments jump from "Step 1" (line 1113) to "Step 3" (line 1125). Consider renumbering for clarity.

1129-1129: Consider using warnings.warn() or a logger instead of print() statements.

Multiple print() calls throughout this function for status messages. For consistency with the rest of the codebase and to allow users to control output, consider using warnings.warn() or passing in a logger.

Also applies to: 1147-1148, 1178-1178, 1190-1190, 1206-1206, 1229-1229

23-23: Unnecessary import of builtin ValueError.

ValueError is a Python builtin and doesn't need to be imported.
Suggested fix
-from builtins import ValueError

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 18d9b1e and a33cf13.

📒 Files selected for processing (2)

examples/diffusers/quantization/quantize.py
modelopt/torch/export/unified_export_hf.py

🧰 Additional context used

🧬 Code graph analysis (2)

examples/diffusers/quantization/quantize.py (1)

modelopt/torch/export/unified_export_hf.py (1)

export_hf_checkpoint (1232-1303)

modelopt/torch/export/unified_export_hf.py (6)

modelopt/torch/export/layer_utils.py (1)

is_quantlinear (346-348)

modelopt/torch/export/quant_utils.py (1)

get_quantization_format (432-533)

modelopt/torch/quantization/utils.py (1)

fsdp2_aware_weight_update (689-797)

modelopt/torch/quantization/conversion.py (1)

set_quantizer_by_cfg_context (305-327)

modelopt/torch/export/convert_hf_config.py (1)

convert_hf_quant_config_format (21-117)

modelopt/torch/export/plugins/hf_spec_export.py (2)

spec_opt_only (107-112)

export_spec_ckpt_state_dict (115-153)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)

GitHub Check: linux
GitHub Check: wait-checks / wait
GitHub Check: wait-checks / wait
GitHub Check: build-docs
GitHub Check: code-quality

🔇 Additional comments (13)

modelopt/torch/export/unified_export_hf.py (7)

85-119: LGTM - Context manager for hiding quantizers during save.

The implementation correctly backs up and restores quantizer attributes. Using delattr/setattr with a try/finally ensures quantizers are restored even if an exception occurs during save.

617-816: LGTM - Comprehensive dummy input generation for diffusion models.

Good implementation with defensive coding:

Uses isinstance checks with fallback to string matching when imports fail

Provides sensible defaults via getattr for missing config attributes

Returns None for unsupported models, which the caller handles gracefully

818-869: LGTM - QKV projection identification logic.

The pattern matching is comprehensive, covering common diffusers naming conventions (to_q, to_k, to_v, etc.) and correctly handles nested module paths.

871-909: LGTM - QKV grouping logic.

Correctly groups QKV projections by parent attention block and distinguishes between main and added (cross-attention) QKV types.

1059-1072: LGTM - Simple quantization check.

Clean implementation using generator expression with any() for early termination.

1074-1086: LGTM - dtype inference with sensible default.

Returns the dtype of the first parameter found, with a reasonable float16 fallback for edge cases (e.g., models with no parameters).

1232-1261: LGTM - Clean routing between diffusers and transformers export.

The updated public API correctly routes to the appropriate export function based on model type. The components parameter documentation clearly states it's only for diffusers pipelines.

examples/diffusers/quantization/quantize.py (6)

69-69: LGTM - Import follows existing pattern.

The import is correctly placed with other modelopt imports.

352-352: LGTM - ExportConfig extension follows existing patterns.

The hf_ckpt_dir field and its validation mirror the existing onnx_dir handling.

Also applies to: 368-370

870-883: LGTM - Method follows existing ExportManager patterns.

Clean implementation that mirrors other export methods like save_checkpoint and export_onnx.

1016-1020: LGTM - CLI argument follows existing conventions.

The --hf-ckpt-dir argument is consistent with the existing --onnx-dir pattern.

1097-1097: LGTM - Config construction follows existing pattern.

Correctly handles the optional hf_ckpt_dir argument.

1153-1155: LGTM - HF checkpoint export integrated at appropriate point in flow.

Placed after ONNX export, following the logical export sequence.

_{✏️ Tip: You can disable this entire section by setting review_details to false in your review settings.}

modelopt/torch/export/unified_export_hf.py

codecov · 2026-01-14T04:08:06Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 74.23%. Comparing base (e6e4efd) to head (302e2f4).

Additional details and impacted files

@@           Coverage Diff           @@
##             main     #781   +/-   ##
=======================================
  Coverage   74.23%   74.23%           
=======================================
  Files         192      192           
  Lines       19033    19033           
=======================================
  Hits        14129    14129           
  Misses       4904     4904

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Signed-off-by: Jingyu Xin <jingyux@nvidia.com>

See #781 This is the MR that only includes the refactoring of the llm export, please ignore the change on quantize.py from the diffusion example.  ## Summary by CodeRabbit ## Release Notes * **New Features** * Added `--hf-ckpt-dir` CLI option to save checkpoints in HuggingFace format * Enabled support for exporting Diffusers-based pipelines * Unified export system now handles both transformer and diffusion model architectures <sub>✏️ Tip: You can customize this high-level summary in your review settings.</sub>  --------- Signed-off-by: Jingyu Xin <jingyux@nvidia.com>

Signed-off-by: Jingyu Xin <jingyux@nvidia.com>

Your commit message describing all changes

a33cf13

Signed-off-by: Jingyu Xin <jingyux@nvidia.com>

jingyu-ml requested review from a team as code owners January 14, 2026 03:56

jingyu-ml requested review from ChenhanYu and ajrasane January 14, 2026 03:56

jingyu-ml marked this pull request as draft January 14, 2026 03:56

jingyu-ml requested a review from Edwardf0t1 January 14, 2026 03:59

coderabbitai bot reviewed Jan 14, 2026

View reviewed changes

modelopt/torch/export/unified_export_hf.py Show resolved Hide resolved

modelopt/torch/export/unified_export_hf.py Outdated Show resolved Hide resolved

jingyu-ml added 2 commits January 14, 2026 05:41

Merge the diffusion and llms layer fusion code

dff152b

Signed-off-by: Jingyu Xin <jingyux@nvidia.com>

Create a diffusers utils function, moved some functions to it

9e94843

Signed-off-by: Jingyu Xin <jingyux@nvidia.com>

jingyu-ml self-assigned this Jan 14, 2026

jingyu-ml marked this pull request as ready for review January 14, 2026 06:02

jingyu-ml added 4 commits January 14, 2026 06:03

Merge branch 'main' into jingyux/diffusion.export-fixed

db61c20

Fixed some bugs in the CI/CD

8a81723

Signed-off-by: Jingyu Xin <jingyux@nvidia.com>

Merge branch 'main' into jingyux/diffusion.export-fixed

16a2bbf

Signed-off-by: Jingyu Xin <jingyux@nvidia.com>

Move one function to diffusers utils

68d5665

Signed-off-by: Jingyu Xin <jingyux@nvidia.com>

jingyu-ml mentioned this pull request Jan 14, 2026

[0.5/3] Diffusion ckpt export for NVFP4 & FP8 #783

Merged

jingyu-ml added 3 commits January 15, 2026 20:19

Merge branch 'main' into jingyux/diffusion.export-fixed

ace5773

Signed-off-by: Jingyu Xin <jingyux@nvidia.com>

removed the DiffusionPipeline import

95dfb52

Signed-off-by: Jingyu Xin <jingyux@nvidia.com>

Update the example

302e2f4

Signed-off-by: Jingyu Xin <jingyux@nvidia.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[1/3] Diffusion ckpt export for NVFP4 & FP8 #781

[1/3] Diffusion ckpt export for NVFP4 & FP8 #781

jingyu-ml commented Jan 14, 2026 •

edited

Loading

Uh oh!

coderabbitai bot commented Jan 14, 2026 •

edited

Loading

Review skipped

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Uh oh!

copy-pr-bot bot commented Jan 14, 2026

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

Uh oh!

codecov bot commented Jan 14, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

[1/3] Diffusion ckpt export for NVFP4 & FP8 #781

Are you sure you want to change the base?

[1/3] Diffusion ckpt export for NVFP4 & FP8 #781

Conversation

jingyu-ml commented Jan 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Usage

Testing

Before your PR is "Ready for review"

Additional Information

Summary by CodeRabbit

New Features

Uh oh!

coderabbitai bot commented Jan 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review skipped

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Uh oh!

copy-pr-bot bot commented Jan 14, 2026

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

codecov bot commented Jan 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

jingyu-ml commented Jan 14, 2026 •

edited

Loading

coderabbitai bot commented Jan 14, 2026 •

edited

Loading

codecov bot commented Jan 14, 2026 •

edited

Loading