Skip to content

Conversation

@jingyu-ml
Copy link
Contributor

@jingyu-ml jingyu-ml commented Jan 14, 2026

See #781

This is the MR that only includes the refactoring of the llm export, please ignore the change on quantize.py from the diffusion example.

Summary by CodeRabbit

Release Notes

  • New Features
    • Added --hf-ckpt-dir CLI option to save checkpoints in HuggingFace format
    • Enabled support for exporting Diffusers-based pipelines
    • Unified export system now handles both transformer and diffusion model architectures

✏️ Tip: You can customize this high-level summary in your review settings.

Signed-off-by: Jingyu Xin <jingyux@nvidia.com>
Signed-off-by: Jingyu Xin <jingyux@nvidia.com>
Signed-off-by: Jingyu Xin <jingyux@nvidia.com>
Signed-off-by: Jingyu Xin <jingyux@nvidia.com>
Signed-off-by: Jingyu Xin <jingyux@nvidia.com>
Signed-off-by: Jingyu Xin <jingyux@nvidia.com>
Signed-off-by: Jingyu Xin <jingyux@nvidia.com>
@jingyu-ml jingyu-ml requested review from a team as code owners January 14, 2026 20:49
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Jan 14, 2026

Important

Review skipped

Auto incremental reviews are disabled on this repository.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

📝 Walkthrough

Walkthrough

This PR adds HuggingFace checkpoint export functionality for quantized Diffusers models. It introduces new helper functions for module collection and fusion, adds diffusers pipeline routing in the export framework, and integrates HuggingFace checkpoint export into the quantization workflow via a new CLI argument.

Changes

Cohort / File(s) Summary
Quantization Workflow Integration
examples/diffusers/quantization/quantize.py
Added hf_ckpt_dir optional field to ExportConfig dataclass with directory validation. Introduced export_hf_ckpt() method to ExportManager. Wired new CLI argument --hf-ckpt-dir and propagated it through argument parsing into main quantization flow, invoking the export after ONNX export steps.
Export Framework Enhancement
modelopt/torch/export/unified_export_hf.py
Added helper functions: _collect_shared_input_modules() for gathering modules with shared inputs, _fuse_shared_input_modules() for fusion operations, _process_quantized_modules() for handling quantized module iteration/export. Introduced _export_diffusers_checkpoint() (currently NotImplementedError) and _export_transformers_checkpoint() for model-type-specific export routing. Enhanced export_hf_checkpoint() with diffusers detection and routing, extended signature with optional components parameter. Added imports for Callable, DiffusionPipeline, and ModelMixin.

Sequence Diagram

sequenceDiagram
    participant User
    participant CLI as CLI Parser
    participant Main as Main Flow
    participant ExportMgr as ExportManager
    participant Export as export_hf_checkpoint
    participant DiffExp as _export_diffusers_checkpoint

    User->>CLI: Provide --hf-ckpt-dir argument
    CLI->>Main: Parse args to ExportConfig
    Main->>Main: Quantize model
    Main->>ExportMgr: export_hf_ckpt(pipe, hf_ckpt_dir)
    ExportMgr->>Export: export_hf_checkpoint(pipe, ...)
    Export->>Export: Detect DiffusionPipeline type
    Export->>DiffExp: Route to diffusers export
    DiffExp-->>Export: Process checkpoint (NotImplemented)
    Export-->>ExportMgr: Complete
    ExportMgr-->>Main: Export finished
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~22 minutes

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately describes the main change: adding diffusion checkpoint export support for quantized models (NVFP4 & FP8 formats), which aligns with the primary purpose of the changeset across both files.
Docstring Coverage ✅ Passed Docstring coverage is 83.33% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🤖 Fix all issues with AI agents
In `@modelopt/torch/export/unified_export_hf.py`:
- Around line 193-218: The qkv_only branch currently raises
NotImplementedError("Diffusion only") but leaves a dangling print block that is
unreachable; either implement the qkv_only handling or remove/relocate the dead
prints. Specifically, either (A) implement the QKV fusion logic used to update
fused_count and fused_linears when qkv_only is True (ensuring any weight updates
use fsdp2_aware_weight_update and
preprocess_linear_fusion/fuse_prequant_layernorm as in the non-qkv path), or (B)
remove the trailing conditional that checks fused_count and prints the "Fused
..." / "No QKV groups found ..." messages, or at minimum guard that print block
with if not qkv_only so it only runs when the non-qkv branch executed; update
references to qkv_only, fused_count, fused_linears, and the NotImplementedError
accordingly.
- Around line 670-693: The stub _export_diffusers_checkpoint currently raises
NotImplementedError which will break export_hf_checkpoint when passed a
DiffusionPipeline or ModelMixin; either implement the full export logic for
diffusers (handling components, tokenizers, schedulers, dtype conversion,
sharding/max_shard_size, and writing .safetensors.index.json) inside
_export_diffusers_checkpoint, or change export_hf_checkpoint to detect
DiffusionPipeline/ModelMixin and raise a clear, actionable error referencing
_export_diffusers_checkpoint (e.g., "diffusers export not implemented yet; see
_export_diffusers_checkpoint") or guard routing so only supported types reach
this function; update tests or docs accordingly.
📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 6038451 and c783509.

📒 Files selected for processing (2)
  • examples/diffusers/quantization/quantize.py
  • modelopt/torch/export/unified_export_hf.py
🧰 Additional context used
🧬 Code graph analysis (1)
examples/diffusers/quantization/quantize.py (1)
modelopt/torch/export/unified_export_hf.py (1)
  • export_hf_checkpoint (696-767)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
  • GitHub Check: linux
  • GitHub Check: wait-checks / wait
  • GitHub Check: wait-checks / wait
  • GitHub Check: code-quality
  • GitHub Check: build-docs
🔇 Additional comments (11)
modelopt/torch/export/unified_export_hf.py (6)

25-31: New imports for diffusers support look appropriate.

The imports for Callable and diffusers types (DiffusionPipeline, ModelMixin) are correctly added to support the new diffusers export functionality.


92-154: Well-structured helper function with proper resource cleanup.

The hook management is correctly implemented with a finally block ensuring hooks are always removed. The function properly handles the collection of shared input modules for both layernorms and quantized linear layers.


223-300: Clean refactoring to use modular helper functions.

The function has been well-refactored to use _collect_shared_input_modules and _fuse_shared_input_modules, improving code reusability between LLM and diffusion model fusion paths.


488-544: Good extraction of quantized module processing logic.

The function is well-documented and correctly handles FSDP resharding optimization to prevent OOM. The logic for handling both standard linear layers and expert modules is preserved from the original implementation.


547-667: Well-refactored transformer export function.

The function maintains all the original functionality while delegating module processing to the new _process_quantized_modules helper. The comprehensive error handling for various MoE model structures is preserved.


696-724: Function signature and routing logic are well-structured.

The updated export_hf_checkpoint correctly detects diffusers models using isinstance(model, (DiffusionPipeline, ModelMixin)) and routes them appropriately. The docstring clearly documents the new components parameter.

Note: The diffusers routing will currently raise NotImplementedError as flagged in the previous comment.

examples/diffusers/quantization/quantize.py (5)

69-69: Import is correct.

The export_hf_checkpoint is properly exported from modelopt.torch.export.


346-370: Consistent addition of hf_ckpt_dir to ExportConfig.

The new field follows the same pattern as onnx_dir, and the validation correctly creates the directory if it doesn't exist.


1016-1020: CLI argument follows existing patterns.

The --hf-ckpt-dir argument is consistent with other export directory arguments like --onnx-dir.


1092-1099: Configuration initialization follows existing patterns.

The hf_ckpt_dir is correctly converted to Path when provided, consistent with other path arguments.


1147-1158: Export call placement is appropriate.

The HuggingFace checkpoint export is correctly placed after ONNX export in the workflow. The execution flow is logical.

Note: The underlying NotImplementedError issue was flagged in earlier comments.

✏️ Tip: You can disable this entire section by setting review_details to false in your review settings.

Signed-off-by: Jingyu Xin <jingyux@nvidia.com>
@jingyu-ml jingyu-ml requested review from a team as code owners January 14, 2026 20:57
@codecov
Copy link

codecov bot commented Jan 14, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 74.23%. Comparing base (db76b1e) to head (d392fb7).
⚠️ Report is 5 commits behind head on main.

Additional details and impacted files
@@           Coverage Diff           @@
##             main     #783   +/-   ##
=======================================
  Coverage   74.23%   74.23%           
=======================================
  Files         192      192           
  Lines       19033    19033           
=======================================
  Hits        14129    14129           
  Misses       4904     4904           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Signed-off-by: Jingyu Xin <jingyux@nvidia.com>
Copy link
Collaborator

@cjluo-nv cjluo-nv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Refactoring the change LGTM

Signed-off-by: Jingyu Xin <jingyux@nvidia.com>
@Edwardf0t1
Copy link
Contributor

Thanks @jingyu-ml for kicking off the support for diffusion ckpt export! 👍 If not already, could we run a few validation tests to ensure the export logic is robust for our supported models? For example:

  • LLM: Qwen3-Next-80B-A3B-Thinking
  • VLM: Nemotron-Nano-12B-v2-VL

@jingyu-ml
Copy link
Contributor Author

Thanks @jingyu-ml for kicking off the support for diffusion ckpt export! 👍 If not already, could we run a few validation tests to ensure the export logic is robust for our supported models? For example:

  • LLM: Qwen3-Next-80B-A3B-Thinking
  • VLM: Nemotron-Nano-12B-v2-VL

This MR is ready. I’ve tested the models you mentioned, as well as a few smaller models locally using NVFP4, and all of them work as expected.

qwen3.moe.txt
nemotron.vlm.v2.txt

@jingyu-ml jingyu-ml requested a review from Edwardf0t1 January 15, 2026 18:59
Copy link
Contributor

@Edwardf0t1 Edwardf0t1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@jingyu-ml jingyu-ml merged commit e6e4efd into main Jan 15, 2026
36 checks passed
@jingyu-ml jingyu-ml deleted the jingyux/diffusion.export-refactor-llm branch January 15, 2026 19:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants