-
Notifications
You must be signed in to change notification settings - Fork 237
[0.5/3] Diffusion ckpt export for NVFP4 & FP8 #783
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Signed-off-by: Jingyu Xin <jingyux@nvidia.com>
Signed-off-by: Jingyu Xin <jingyux@nvidia.com>
Signed-off-by: Jingyu Xin <jingyux@nvidia.com>
Signed-off-by: Jingyu Xin <jingyux@nvidia.com>
Signed-off-by: Jingyu Xin <jingyux@nvidia.com>
Signed-off-by: Jingyu Xin <jingyux@nvidia.com>
Signed-off-by: Jingyu Xin <jingyux@nvidia.com>
|
Important Review skippedAuto incremental reviews are disabled on this repository. Please check the settings in the CodeRabbit UI or the You can disable this status message by setting the 📝 WalkthroughWalkthroughThis PR adds HuggingFace checkpoint export functionality for quantized Diffusers models. It introduces new helper functions for module collection and fusion, adds diffusers pipeline routing in the export framework, and integrates HuggingFace checkpoint export into the quantization workflow via a new CLI argument. Changes
Sequence DiagramsequenceDiagram
participant User
participant CLI as CLI Parser
participant Main as Main Flow
participant ExportMgr as ExportManager
participant Export as export_hf_checkpoint
participant DiffExp as _export_diffusers_checkpoint
User->>CLI: Provide --hf-ckpt-dir argument
CLI->>Main: Parse args to ExportConfig
Main->>Main: Quantize model
Main->>ExportMgr: export_hf_ckpt(pipe, hf_ckpt_dir)
ExportMgr->>Export: export_hf_checkpoint(pipe, ...)
Export->>Export: Detect DiffusionPipeline type
Export->>DiffExp: Route to diffusers export
DiffExp-->>Export: Process checkpoint (NotImplemented)
Export-->>ExportMgr: Complete
ExportMgr-->>Main: Export finished
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~22 minutes 🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 3
🤖 Fix all issues with AI agents
In `@modelopt/torch/export/unified_export_hf.py`:
- Around line 193-218: The qkv_only branch currently raises
NotImplementedError("Diffusion only") but leaves a dangling print block that is
unreachable; either implement the qkv_only handling or remove/relocate the dead
prints. Specifically, either (A) implement the QKV fusion logic used to update
fused_count and fused_linears when qkv_only is True (ensuring any weight updates
use fsdp2_aware_weight_update and
preprocess_linear_fusion/fuse_prequant_layernorm as in the non-qkv path), or (B)
remove the trailing conditional that checks fused_count and prints the "Fused
..." / "No QKV groups found ..." messages, or at minimum guard that print block
with if not qkv_only so it only runs when the non-qkv branch executed; update
references to qkv_only, fused_count, fused_linears, and the NotImplementedError
accordingly.
- Around line 670-693: The stub _export_diffusers_checkpoint currently raises
NotImplementedError which will break export_hf_checkpoint when passed a
DiffusionPipeline or ModelMixin; either implement the full export logic for
diffusers (handling components, tokenizers, schedulers, dtype conversion,
sharding/max_shard_size, and writing .safetensors.index.json) inside
_export_diffusers_checkpoint, or change export_hf_checkpoint to detect
DiffusionPipeline/ModelMixin and raise a clear, actionable error referencing
_export_diffusers_checkpoint (e.g., "diffusers export not implemented yet; see
_export_diffusers_checkpoint") or guard routing so only supported types reach
this function; update tests or docs accordingly.
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
examples/diffusers/quantization/quantize.pymodelopt/torch/export/unified_export_hf.py
🧰 Additional context used
🧬 Code graph analysis (1)
examples/diffusers/quantization/quantize.py (1)
modelopt/torch/export/unified_export_hf.py (1)
export_hf_checkpoint(696-767)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
- GitHub Check: linux
- GitHub Check: wait-checks / wait
- GitHub Check: wait-checks / wait
- GitHub Check: code-quality
- GitHub Check: build-docs
🔇 Additional comments (11)
modelopt/torch/export/unified_export_hf.py (6)
25-31: New imports for diffusers support look appropriate.The imports for
Callableand diffusers types (DiffusionPipeline,ModelMixin) are correctly added to support the new diffusers export functionality.
92-154: Well-structured helper function with proper resource cleanup.The hook management is correctly implemented with a
finallyblock ensuring hooks are always removed. The function properly handles the collection of shared input modules for both layernorms and quantized linear layers.
223-300: Clean refactoring to use modular helper functions.The function has been well-refactored to use
_collect_shared_input_modulesand_fuse_shared_input_modules, improving code reusability between LLM and diffusion model fusion paths.
488-544: Good extraction of quantized module processing logic.The function is well-documented and correctly handles FSDP resharding optimization to prevent OOM. The logic for handling both standard linear layers and expert modules is preserved from the original implementation.
547-667: Well-refactored transformer export function.The function maintains all the original functionality while delegating module processing to the new
_process_quantized_moduleshelper. The comprehensive error handling for various MoE model structures is preserved.
696-724: Function signature and routing logic are well-structured.The updated
export_hf_checkpointcorrectly detects diffusers models usingisinstance(model, (DiffusionPipeline, ModelMixin))and routes them appropriately. The docstring clearly documents the newcomponentsparameter.Note: The diffusers routing will currently raise
NotImplementedErroras flagged in the previous comment.examples/diffusers/quantization/quantize.py (5)
69-69: Import is correct.The
export_hf_checkpointis properly exported frommodelopt.torch.export.
346-370: Consistent addition ofhf_ckpt_dirtoExportConfig.The new field follows the same pattern as
onnx_dir, and the validation correctly creates the directory if it doesn't exist.
1016-1020: CLI argument follows existing patterns.The
--hf-ckpt-dirargument is consistent with other export directory arguments like--onnx-dir.
1092-1099: Configuration initialization follows existing patterns.The
hf_ckpt_diris correctly converted toPathwhen provided, consistent with other path arguments.
1147-1158: Export call placement is appropriate.The HuggingFace checkpoint export is correctly placed after ONNX export in the workflow. The execution flow is logical.
Note: The underlying
NotImplementedErrorissue was flagged in earlier comments.
✏️ Tip: You can disable this entire section by setting review_details to false in your review settings.
Signed-off-by: Jingyu Xin <jingyux@nvidia.com>
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #783 +/- ##
=======================================
Coverage 74.23% 74.23%
=======================================
Files 192 192
Lines 19033 19033
=======================================
Hits 14129 14129
Misses 4904 4904 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Signed-off-by: Jingyu Xin <jingyux@nvidia.com>
cjluo-nv
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Refactoring the change LGTM
Signed-off-by: Jingyu Xin <jingyux@nvidia.com>
|
Thanks @jingyu-ml for kicking off the support for diffusion ckpt export! 👍 If not already, could we run a few validation tests to ensure the export logic is robust for our supported models? For example:
|
This MR is ready. I’ve tested the models you mentioned, as well as a few smaller models locally using NVFP4, and all of them work as expected. |
Edwardf0t1
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
See #781
This is the MR that only includes the refactoring of the llm export, please ignore the change on quantize.py from the diffusion example.
Summary by CodeRabbit
Release Notes
--hf-ckpt-dirCLI option to save checkpoints in HuggingFace format✏️ Tip: You can customize this high-level summary in your review settings.