-
Notifications
You must be signed in to change notification settings - Fork 237
[1/3] Diffusion ckpt export for NVFP4 & FP8 #781
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Signed-off-by: Jingyu Xin <jingyux@nvidia.com>
|
Important Review skippedAuto incremental reviews are disabled on this repository. Please check the settings in the CodeRabbit UI or the You can disable this status message by setting the 📝 WalkthroughWalkthroughThis PR adds HuggingFace checkpoint export support for quantized diffusion models. It introduces a new CLI option and export configuration field for specifying a checkpoint directory, then extends the unified export module to route and handle diffusion pipeline exports with quantizer management and QKV fusion. Changes
Sequence Diagram(s)sequenceDiagram
participant User as User/CLI
participant Config as ExportConfig
participant Manager as ExportManager
participant Export as export_hf_ckpt()
participant Router as Route Logic
participant DiffusionExp as _export_diffusers_checkpoint()
participant Components as Component Handler
User->>Config: Pass --hf-ckpt-dir
Config->>Manager: Create with hf_ckpt_dir set
Manager->>Export: Call export_hf_ckpt(pipe)
Export->>Router: Detect model type
Router->>DiffusionExp: Route DiffusionPipeline
DiffusionExp->>Components: Extract & process components
Components->>Components: Hide quantizers
Components->>Components: Fuse QKV linears
Components->>Components: Save per-component subdirs
DiffusionExp->>DiffusionExp: Save model_index.json
DiffusionExp->>Export: Export complete
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes 🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually. Contributors can view more details about this message here. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
🤖 Fix all issues with AI agents
In `@modelopt/torch/export/unified_export_hf.py`:
- Around line 1055-1057: The _get_diffusers_components currently raises for
anything not a DiffusionPipeline but _export_diffusers_checkpoint accepts
DiffusionPipeline | ModelMixin; update _get_diffusers_components to also accept
instances of ModelMixin (e.g., a standalone UNet) by detecting isinstance(model,
ModelMixin) and returning a components mapping consistent with what
_export_diffusers_checkpoint expects (for example {'unet': model} or the
appropriate single-component keys used downstream); ensure the DiffusionPipeline
branch behavior is unchanged and that callers handle the returned mapping
uniformly.
- Around line 452-463: The loop over model.named_modules() sets
fsdp_module_to_reshard for each FSDPModule but never resshards the last one
after the loop, leaving it unsharded; after the loop completes add a final check
and call to reshard on fsdp_module_to_reshard (i.e., if fsdp_module_to_reshard
is not None: fsdp_module_to_reshard.reshard()) so the last FSDPModule is
properly resharded; locate symbols model.named_modules, FSDPModule,
fsdp_module_to_reshard, and reshard() to apply the fix.
🧹 Nitpick comments (5)
modelopt/torch/export/unified_export_hf.py (5)
82-83: Move import to top of file with other imports.The
contextmanagerimport should be grouped with other imports at the top of the file (around lines 18-27) rather than inserted mid-file.Suggested fix
Move to the imports section at the top:
from contextlib import contextmanager
954-955: Consider using logging instead of print statements.The function uses
print()for debug output which is inconsistent with the rest of the codebase that useswarnings.warn()or could use a logger. This also affects production output.Suggested approach
Replace
print()calls withwarnings.warn()for warning-level messages, or consider adding an optionalloggerparameter:- print("No quantized linear modules found for QKV fusion.") + warnings.warn("No quantized linear modules found for QKV fusion.") ... - print(f"Warning: Unknown model type '{model_class_name}', skipping QKV fusion.") + warnings.warn(f"Unknown model type '{model_class_name}', skipping QKV fusion.") ... - print(f"Warning: Failed to run dummy forward for QKV fusion: {e}") - print("Skipping QKV fusion. Quantization may still work but amax values won't be unified.") + warnings.warn(f"Failed to run dummy forward for QKV fusion: {e}. Skipping QKV fusion.")Also applies to: 970-970, 979-980, 1014-1015, 1017-1020
1113-1125: Minor: Step numbering is inconsistent - "Step 2" is missing.The comments jump from "Step 1" (line 1113) to "Step 3" (line 1125). Consider renumbering for clarity.
1129-1129: Consider usingwarnings.warn()or a logger instead ofprint()statements.Multiple
print()calls throughout this function for status messages. For consistency with the rest of the codebase and to allow users to control output, consider usingwarnings.warn()or passing in a logger.Also applies to: 1147-1148, 1178-1178, 1190-1190, 1206-1206, 1229-1229
23-23: Unnecessary import of builtinValueError.
ValueErroris a Python builtin and doesn't need to be imported.Suggested fix
-from builtins import ValueError
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
examples/diffusers/quantization/quantize.pymodelopt/torch/export/unified_export_hf.py
🧰 Additional context used
🧬 Code graph analysis (2)
examples/diffusers/quantization/quantize.py (1)
modelopt/torch/export/unified_export_hf.py (1)
export_hf_checkpoint(1232-1303)
modelopt/torch/export/unified_export_hf.py (6)
modelopt/torch/export/layer_utils.py (1)
is_quantlinear(346-348)modelopt/torch/export/quant_utils.py (1)
get_quantization_format(432-533)modelopt/torch/quantization/utils.py (1)
fsdp2_aware_weight_update(689-797)modelopt/torch/quantization/conversion.py (1)
set_quantizer_by_cfg_context(305-327)modelopt/torch/export/convert_hf_config.py (1)
convert_hf_quant_config_format(21-117)modelopt/torch/export/plugins/hf_spec_export.py (2)
spec_opt_only(107-112)export_spec_ckpt_state_dict(115-153)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
- GitHub Check: linux
- GitHub Check: wait-checks / wait
- GitHub Check: wait-checks / wait
- GitHub Check: build-docs
- GitHub Check: code-quality
🔇 Additional comments (13)
modelopt/torch/export/unified_export_hf.py (7)
85-119: LGTM - Context manager for hiding quantizers during save.The implementation correctly backs up and restores quantizer attributes. Using
delattr/setattrwith a try/finally ensures quantizers are restored even if an exception occurs during save.
617-816: LGTM - Comprehensive dummy input generation for diffusion models.Good implementation with defensive coding:
- Uses
isinstancechecks with fallback to string matching when imports fail- Provides sensible defaults via
getattrfor missing config attributes- Returns
Nonefor unsupported models, which the caller handles gracefully
818-869: LGTM - QKV projection identification logic.The pattern matching is comprehensive, covering common diffusers naming conventions (
to_q,to_k,to_v, etc.) and correctly handles nested module paths.
871-909: LGTM - QKV grouping logic.Correctly groups QKV projections by parent attention block and distinguishes between main and added (cross-attention) QKV types.
1059-1072: LGTM - Simple quantization check.Clean implementation using generator expression with
any()for early termination.
1074-1086: LGTM - dtype inference with sensible default.Returns the dtype of the first parameter found, with a reasonable
float16fallback for edge cases (e.g., models with no parameters).
1232-1261: LGTM - Clean routing between diffusers and transformers export.The updated public API correctly routes to the appropriate export function based on model type. The
componentsparameter documentation clearly states it's only for diffusers pipelines.examples/diffusers/quantization/quantize.py (6)
69-69: LGTM - Import follows existing pattern.The import is correctly placed with other modelopt imports.
352-352: LGTM - ExportConfig extension follows existing patterns.The
hf_ckpt_dirfield and its validation mirror the existingonnx_dirhandling.Also applies to: 368-370
870-883: LGTM - Method follows existing ExportManager patterns.Clean implementation that mirrors other export methods like
save_checkpointandexport_onnx.
1016-1020: LGTM - CLI argument follows existing conventions.The
--hf-ckpt-dirargument is consistent with the existing--onnx-dirpattern.
1097-1097: LGTM - Config construction follows existing pattern.Correctly handles the optional
hf_ckpt_dirargument.
1153-1155: LGTM - HF checkpoint export integrated at appropriate point in flow.Placed after ONNX export, following the logical export sequence.
✏️ Tip: You can disable this entire section by setting review_details to false in your review settings.
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #781 +/- ##
=======================================
Coverage 74.23% 74.23%
=======================================
Files 192 192
Lines 19033 19033
=======================================
Hits 14129 14129
Misses 4904 4904 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Signed-off-by: Jingyu Xin <jingyux@nvidia.com>
Signed-off-by: Jingyu Xin <jingyux@nvidia.com>
Signed-off-by: Jingyu Xin <jingyux@nvidia.com>
Signed-off-by: Jingyu Xin <jingyux@nvidia.com>
Signed-off-by: Jingyu Xin <jingyux@nvidia.com>
See #781 This is the MR that only includes the refactoring of the llm export, please ignore the change on quantize.py from the diffusion example. <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit ## Release Notes * **New Features** * Added `--hf-ckpt-dir` CLI option to save checkpoints in HuggingFace format * Enabled support for exporting Diffusers-based pipelines * Unified export system now handles both transformer and diffusion model architectures <sub>✏️ Tip: You can customize this high-level summary in your review settings.</sub> <!-- end of auto-generated comment: release notes by coderabbit.ai --> --------- Signed-off-by: Jingyu Xin <jingyux@nvidia.com>
Signed-off-by: Jingyu Xin <jingyux@nvidia.com>
Signed-off-by: Jingyu Xin <jingyux@nvidia.com>
Signed-off-by: Jingyu Xin <jingyux@nvidia.com>
What does this PR do?
Type of change: New feature
Overview:
This PR adds support for exporting quantized diffusers models (DiT, Flux, SD3, UNet, etc.) to HuggingFace checkpoint format, enabling deployment to inference frameworks like SGLang, vLLM, and TensorRT-LLM.
Changes
New file:
diffusers_utils.pyhide_quantizers_from_state_dict()context manager for clean savesRefactored:
unified_export_hf.py_fuse_qkv_linears_diffusion()for QKV amax fusion_export_diffusers_checkpoint()to export full pipelines (models + tokenizers + schedulers etc.)Plans
Usage
Testing
Before your PR is "Ready for review"
Additional Information
Summary by CodeRabbit
New Features
--hf-ckpt-dirCLI argument for specifying checkpoint export destination✏️ Tip: You can customize this high-level summary in your review settings.