[0.5/3] Diffusion ckpt export for NVFP4 & FP8 #783

jingyu-ml · 2026-01-14T20:49:23Z

This is the MR that only includes the refactoring of the llm export, please ignore the change on quantize.py from the diffusion example.

Summary by CodeRabbit

Release Notes

New Features
- Added --hf-ckpt-dir CLI option to save checkpoints in HuggingFace format
- Enabled support for exporting Diffusers-based pipelines
- Unified export system now handles both transformer and diffusion model architectures

_{✏️ Tip: You can customize this high-level summary in your review settings.}

Signed-off-by: Jingyu Xin <jingyux@nvidia.com>

coderabbitai · 2026-01-14T20:49:37Z

Important

Review skipped

Auto incremental reviews are disabled on this repository.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

📝 Walkthrough

Walkthrough

This PR adds HuggingFace checkpoint export functionality for quantized Diffusers models. It introduces new helper functions for module collection and fusion, adds diffusers pipeline routing in the export framework, and integrates HuggingFace checkpoint export into the quantization workflow via a new CLI argument.

Changes

Cohort / File(s)	Summary
Quantization Workflow Integration `examples/diffusers/quantization/quantize.py`	Added `hf_ckpt_dir` optional field to `ExportConfig` dataclass with directory validation. Introduced `export_hf_ckpt()` method to `ExportManager`. Wired new CLI argument `--hf-ckpt-dir` and propagated it through argument parsing into main quantization flow, invoking the export after ONNX export steps.
Export Framework Enhancement `modelopt/torch/export/unified_export_hf.py`	Added helper functions: `_collect_shared_input_modules()` for gathering modules with shared inputs, `_fuse_shared_input_modules()` for fusion operations, `_process_quantized_modules()` for handling quantized module iteration/export. Introduced `_export_diffusers_checkpoint()` (currently NotImplementedError) and `_export_transformers_checkpoint()` for model-type-specific export routing. Enhanced `export_hf_checkpoint()` with diffusers detection and routing, extended signature with optional `components` parameter. Added imports for `Callable`, `DiffusionPipeline`, and `ModelMixin`.

Sequence Diagram

sequenceDiagram
    participant User
    participant CLI as CLI Parser
    participant Main as Main Flow
    participant ExportMgr as ExportManager
    participant Export as export_hf_checkpoint
    participant DiffExp as _export_diffusers_checkpoint

    User->>CLI: Provide --hf-ckpt-dir argument
    CLI->>Main: Parse args to ExportConfig
    Main->>Main: Quantize model
    Main->>ExportMgr: export_hf_ckpt(pipe, hf_ckpt_dir)
    ExportMgr->>Export: export_hf_checkpoint(pipe, ...)
    Export->>Export: Detect DiffusionPipeline type
    Export->>DiffExp: Route to diffusers export
    DiffExp-->>Export: Process checkpoint (NotImplemented)
    Export-->>ExportMgr: Complete
    ExportMgr-->>Main: Export finished

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~22 minutes

🚥 Pre-merge checks | ✅ 3

✅ Passed checks (3 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title accurately describes the main change: adding diffusion checkpoint export support for quantized models (NVFP4 & FP8 formats), which aligns with the primary purpose of the changeset across both files.
Docstring Coverage	✅ Passed	Docstring coverage is 83.33% which is sufficient. The required threshold is 80.00%.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 3

🤖 Fix all issues with AI agents

In `@modelopt/torch/export/unified_export_hf.py`:
- Around line 193-218: The qkv_only branch currently raises
NotImplementedError("Diffusion only") but leaves a dangling print block that is
unreachable; either implement the qkv_only handling or remove/relocate the dead
prints. Specifically, either (A) implement the QKV fusion logic used to update
fused_count and fused_linears when qkv_only is True (ensuring any weight updates
use fsdp2_aware_weight_update and
preprocess_linear_fusion/fuse_prequant_layernorm as in the non-qkv path), or (B)
remove the trailing conditional that checks fused_count and prints the "Fused
..." / "No QKV groups found ..." messages, or at minimum guard that print block
with if not qkv_only so it only runs when the non-qkv branch executed; update
references to qkv_only, fused_count, fused_linears, and the NotImplementedError
accordingly.
- Around line 670-693: The stub _export_diffusers_checkpoint currently raises
NotImplementedError which will break export_hf_checkpoint when passed a
DiffusionPipeline or ModelMixin; either implement the full export logic for
diffusers (handling components, tokenizers, schedulers, dtype conversion,
sharding/max_shard_size, and writing .safetensors.index.json) inside
_export_diffusers_checkpoint, or change export_hf_checkpoint to detect
DiffusionPipeline/ModelMixin and raise a clear, actionable error referencing
_export_diffusers_checkpoint (e.g., "diffusers export not implemented yet; see
_export_diffusers_checkpoint") or guard routing so only supported types reach
this function; update tests or docs accordingly.

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 6038451 and c783509.

📒 Files selected for processing (2)

examples/diffusers/quantization/quantize.py
modelopt/torch/export/unified_export_hf.py

🧰 Additional context used

🧬 Code graph analysis (1)

examples/diffusers/quantization/quantize.py (1)

modelopt/torch/export/unified_export_hf.py (1)

export_hf_checkpoint (696-767)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)

GitHub Check: linux
GitHub Check: wait-checks / wait
GitHub Check: wait-checks / wait
GitHub Check: code-quality
GitHub Check: build-docs

🔇 Additional comments (11)

modelopt/torch/export/unified_export_hf.py (6)

25-31: New imports for diffusers support look appropriate.

The imports for Callable and diffusers types (DiffusionPipeline, ModelMixin) are correctly added to support the new diffusers export functionality.

92-154: Well-structured helper function with proper resource cleanup.

The hook management is correctly implemented with a finally block ensuring hooks are always removed. The function properly handles the collection of shared input modules for both layernorms and quantized linear layers.

223-300: Clean refactoring to use modular helper functions.

The function has been well-refactored to use _collect_shared_input_modules and _fuse_shared_input_modules, improving code reusability between LLM and diffusion model fusion paths.

488-544: Good extraction of quantized module processing logic.

The function is well-documented and correctly handles FSDP resharding optimization to prevent OOM. The logic for handling both standard linear layers and expert modules is preserved from the original implementation.

547-667: Well-refactored transformer export function.

The function maintains all the original functionality while delegating module processing to the new _process_quantized_modules helper. The comprehensive error handling for various MoE model structures is preserved.

696-724: Function signature and routing logic are well-structured.

The updated export_hf_checkpoint correctly detects diffusers models using isinstance(model, (DiffusionPipeline, ModelMixin)) and routes them appropriately. The docstring clearly documents the new components parameter.

Note: The diffusers routing will currently raise NotImplementedError as flagged in the previous comment.

examples/diffusers/quantization/quantize.py (5)

69-69: Import is correct.

The export_hf_checkpoint is properly exported from modelopt.torch.export.

346-370: Consistent addition of hf_ckpt_dir to ExportConfig.

The new field follows the same pattern as onnx_dir, and the validation correctly creates the directory if it doesn't exist.

1016-1020: CLI argument follows existing patterns.

The --hf-ckpt-dir argument is consistent with other export directory arguments like --onnx-dir.

1092-1099: Configuration initialization follows existing patterns.

The hf_ckpt_dir is correctly converted to Path when provided, consistent with other path arguments.

1147-1158: Export call placement is appropriate.

The HuggingFace checkpoint export is correctly placed after ONNX export in the workflow. The execution flow is logical.

Note: The underlying NotImplementedError issue was flagged in earlier comments.

_{✏️ Tip: You can disable this entire section by setting review_details to false in your review settings.}

examples/diffusers/quantization/quantize.py

modelopt/torch/export/unified_export_hf.py

Signed-off-by: Jingyu Xin <jingyux@nvidia.com>

codecov · 2026-01-14T21:10:19Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 74.23%. Comparing base (db76b1e) to head (d392fb7).
⚠️ Report is 5 commits behind head on main.

Additional details and impacted files

@@           Coverage Diff           @@
##             main     #783   +/-   ##
=======================================
  Coverage   74.23%   74.23%           
=======================================
  Files         192      192           
  Lines       19033    19033           
=======================================
  Hits        14129    14129           
  Misses       4904     4904

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Signed-off-by: Jingyu Xin <jingyux@nvidia.com>

cjluo-nv

Refactoring the change LGTM

Signed-off-by: Jingyu Xin <jingyux@nvidia.com>

Edwardf0t1 · 2026-01-15T00:16:50Z

Thanks @jingyu-ml for kicking off the support for diffusion ckpt export! 👍 If not already, could we run a few validation tests to ensure the export logic is robust for our supported models? For example:

LLM: Qwen3-Next-80B-A3B-Thinking
VLM: Nemotron-Nano-12B-v2-VL

examples/diffusers/quantization/quantize.py

jingyu-ml · 2026-01-15T18:58:29Z

Thanks @jingyu-ml for kicking off the support for diffusion ckpt export! 👍 If not already, could we run a few validation tests to ensure the export logic is robust for our supported models? For example:

LLM: Qwen3-Next-80B-A3B-Thinking

VLM: Nemotron-Nano-12B-v2-VL

This MR is ready. I’ve tested the models you mentioned, as well as a few smaller models locally using NVFP4, and all of them work as expected.

qwen3.moe.txt
nemotron.vlm.v2.txt

Edwardf0t1

LGTM

jingyu-ml added 8 commits January 14, 2026 03:55

Your commit message describing all changes

a33cf13

Signed-off-by: Jingyu Xin <jingyux@nvidia.com>

Merge the diffusion and llms layer fusion code

dff152b

Signed-off-by: Jingyu Xin <jingyux@nvidia.com>

Create a diffusers utils function, moved some functions to it

9e94843

Signed-off-by: Jingyu Xin <jingyux@nvidia.com>

Merge branch 'main' into jingyux/diffusion.export-fixed

db61c20

Fixed some bugs in the CI/CD

8a81723

Signed-off-by: Jingyu Xin <jingyux@nvidia.com>

Merge branch 'main' into jingyux/diffusion.export-fixed

16a2bbf

Signed-off-by: Jingyu Xin <jingyux@nvidia.com>

Move one function to diffusers utils

68d5665

Signed-off-by: Jingyu Xin <jingyux@nvidia.com>

refactor only

c783509

Signed-off-by: Jingyu Xin <jingyux@nvidia.com>

jingyu-ml requested review from a team as code owners January 14, 2026 20:49

jingyu-ml requested review from kevalmorabia97 and sugunav14 January 14, 2026 20:49

coderabbitai bot reviewed Jan 14, 2026

View reviewed changes

examples/diffusers/quantization/quantize.py Show resolved Hide resolved

modelopt/torch/export/unified_export_hf.py Show resolved Hide resolved

modelopt/torch/export/unified_export_hf.py Show resolved Hide resolved

jingyu-ml requested review from Edwardf0t1 and cjluo-nv January 14, 2026 20:54

Update some examples used old APIs

ba2ce44

Signed-off-by: Jingyu Xin <jingyux@nvidia.com>

jingyu-ml requested review from a team as code owners January 14, 2026 20:57

fix the import error

9dfb55a

Signed-off-by: Jingyu Xin <jingyux@nvidia.com>

cjluo-nv approved these changes Jan 14, 2026

View reviewed changes

Fix the cicd

d392fb7

Signed-off-by: Jingyu Xin <jingyux@nvidia.com>

Edwardf0t1 reviewed Jan 15, 2026

View reviewed changes

examples/diffusers/quantization/quantize.py Show resolved Hide resolved

jingyu-ml requested a review from Edwardf0t1 January 15, 2026 18:59

Edwardf0t1 approved these changes Jan 15, 2026

View reviewed changes

jingyu-ml merged commit e6e4efd into main Jan 15, 2026
36 checks passed

jingyu-ml deleted the jingyux/diffusion.export-refactor-llm branch January 15, 2026 19:49

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[0.5/3] Diffusion ckpt export for NVFP4 & FP8 #783

[0.5/3] Diffusion ckpt export for NVFP4 & FP8 #783

jingyu-ml commented Jan 14, 2026 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Jan 14, 2026 •

edited

Loading

Review skipped

Walkthrough

Changes

Sequence Diagram

Estimated code review effort

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

codecov bot commented Jan 14, 2026 •

edited

Loading

Uh oh!

cjluo-nv left a comment

Uh oh!

Edwardf0t1 commented Jan 15, 2026

Uh oh!

Uh oh!

jingyu-ml commented Jan 15, 2026

Uh oh!

Edwardf0t1 left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

[0.5/3] Diffusion ckpt export for NVFP4 & FP8 #783

[0.5/3] Diffusion ckpt export for NVFP4 & FP8 #783

Conversation

jingyu-ml commented Jan 14, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Release Notes

Uh oh!

coderabbitai bot commented Jan 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review skipped

Walkthrough

Changes

Sequence Diagram

Estimated code review effort

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

codecov bot commented Jan 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

cjluo-nv left a comment

Choose a reason for hiding this comment

Uh oh!

Edwardf0t1 commented Jan 15, 2026

Uh oh!

Uh oh!

jingyu-ml commented Jan 15, 2026

Uh oh!

Edwardf0t1 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

jingyu-ml commented Jan 14, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Jan 14, 2026 •

edited

Loading

codecov bot commented Jan 14, 2026 •

edited

Loading