-
Notifications
You must be signed in to change notification settings - Fork 237
Add NeMo Conversion Scripts to Puzzletron #784
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add NeMo Conversion Scripts to Puzzletron #784
Conversation
Signed-off-by: Liana Mikaelyan <lmikaelyan@nvidia.com>
Signed-off-by: Liana Mikaelyan <lmikaelyan@nvidia.com>
|
Important Review skippedAuto incremental reviews are disabled on this repository. Please check the settings in the CodeRabbit UI or the You can disable this status message by setting the 📝 WalkthroughWalkthroughAdds comprehensive support for converting Puzzletron Llama-Nemotron models between HuggingFace and NeMo formats. Introduces configuration classes, bidirectional model importers/exporters, heterogeneous transformer layer specifications, conversion utilities, and command-line tools enabling end-to-end model conversion workflows with knowledge distillation support. Changes
Sequence DiagramssequenceDiagram
actor User
participant CLI as convert_hf_to_nemo.py
participant NeMo as NeMo Framework
participant FSys as Filesystem
User->>CLI: invoke with HF model path
CLI->>FSys: validate input HF model exists
CLI->>FSys: ensure output directory parent exists
CLI->>NeMo: initialize PuzzletronLlamaNemotronModel config
CLI->>NeMo: call import_ckpt(source=HF_path)
NeMo->>NeMo: map HF weights to NeMo format<br/>apply Puzzletron transforms
NeMo->>FSys: write converted NeMo model
CLI->>User: return output path
sequenceDiagram
actor User
participant CLI as convert_nemo_to_hf.py
participant NeMo as NeMo Framework
participant Config as Config Converter
participant FSys as Filesystem
User->>CLI: invoke with NeMo model path
CLI->>FSys: validate input NeMo model exists
CLI->>FSys: ensure output directory exists
CLI->>NeMo: instantiate NeMo model from checkpoint
CLI->>Config: convert_nemo_config_to_hf_decilm_config()
Config->>Config: adapt attention/MLP configs<br/>preserve metadata for round-trip
CLI->>NeMo: call export_ckpt(target="hf")
NeMo->>NeMo: map NeMo weights to HF format<br/>apply inverse Puzzletron transforms
NeMo->>FSys: write converted HF model
CLI->>FSys: embed chat template in tokenizer config
CLI->>User: return output path
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes 🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 16
🤖 Fix all issues with AI agents
In `@examples/puzzletron/nemo_export/convert_hf_to_nemo.py`:
- Line 45: The code passes the PuzzletronNemotronModelConfig class to
PuzzletronLlamaNemotronModel instead of an instance; change the call to
instantiate the config (use PuzzletronNemotronModelConfig() when constructing
PuzzletronLlamaNemotronModel) so the config parameter receives an object
instance compatible with the __init__ signature of PuzzletronLlamaNemotronModel.
- Around line 16-21: The file has a duplicate import of Path and an unused
timedelta import; remove the second "from pathlib import Path" and delete "from
datetime import timedelta" (and also remove "from typing import Any" if Any is
unused in the rest of convert_hf_to_nemo.py) so imports are unique and only
include symbols actually referenced by functions like any conversion helpers in
this module.
- Around line 90-91: The os.makedirs call using
os.path.dirname(args.output_ckpt_path) can receive an empty string if
output_ckpt_path is a bare filename; change the logic around the directory
creation in convert_hf_to_nemo.py to compute dirpath =
os.path.dirname(args.output_ckpt_path) and only call os.makedirs when dirpath is
non-empty (or use dirpath = dirpath or '.' and call os.makedirs(dirpath,
exist_ok=True)); update the code that creates the output directory (the
os.makedirs line) accordingly to avoid passing an empty string to os.makedirs.
- Around line 28-32: The imports HFLlamaNemotronImporter,
Llama33NemotronSuper49BConfig, and LlamaNemotronModel from
nemo.collections.llm.gpt.model.llama_nemotron are unused; remove these three
symbols from the import list in convert_hf_to_nemo.py (or replace the
multi-import with only the actually used symbols) to eliminate unused-import
warnings and dead code. If any of these are intended for future use, add a short
comment near their import or reference them explicitly where needed (e.g., in
functions that perform conversion) to avoid linter complaints.
In `@examples/puzzletron/nemo_export/convert_nemo_to_hf.py`:
- Around line 79-80: The os.makedirs call uses
os.path.dirname(args.output_ckpt_path) which can be an empty string when the
path has no directory component; guard against that by computing a directory
variable (e.g., out_dir = os.path.dirname(args.output_ckpt_path) or out_dir =
os.path.dirname(...) or '.' if that result is empty) and call
os.makedirs(out_dir, exist_ok=True) so you never pass an empty string to
os.makedirs.
- Around line 21-24: The file imports PuzzletronLlamaNemotronModel and
PuzzletronNemotronModelConfig but never uses them; remove these two unused
imports from the import statement in convert_nemo_to_hf.py (the import that
currently references modelopt.torch.puzzletron.export.MCore.llama_nemotron) so
only actually used symbols remain, or if they are intended to be used later,
reference them where needed (e.g., in functions that construct or configure the
model) to avoid unused-import warnings.
In `@examples/puzzletron/README.md`:
- Around line 234-250: In the "Knowledge Distillation" section fix three small
issues: remove the trailing whitespace after the "## Knowledge Distillation"
heading, correct the typo "contianer" to "container" in the sentence mentioning
the nemo:25.07 container, and correct "distillaiton" to "distillation" in the
link text and sentence referencing the NeMo documentation; also tidy the code
fences so the bash command block contains only the command (remove the stray
"bash" token shown in the proposed fix).
In `@modelopt/torch/puzzletron/export/MCore/llama_nemotron_utils.py`:
- Around line 601-610: The package_dir path in
export/MCore/llama_nemotron_utils.py is incorrect and points to a non-existent
"puzzle_tools" directory causing the warning/early return; update the expression
that sets package_dir (currently using Path(__file__).parent.parent.parent /
"puzzle_tools" / "deci_lm_hf_code") to use "decilm" instead
(Path(__file__).parent.parent.parent / "decilm" / "deci_lm_hf_code") so the real
decilm/deci_lm_hf_code directory is found, and also update the accompanying
docstring/text that references "puzzle_tools/deci_lm_hf_code/" to
"decilm/deci_lm_hf_code/" to keep documentation consistent (this touches the
package_dir assignment and the log/doc comment surrounding it).
In `@modelopt/torch/puzzletron/export/MCore/llama_nemotron.py`:
- Around line 390-400: Replace the debug print statements with logging calls:
obtain a module logger (e.g., logging.getLogger(__name__)) if one isn't already
present, change each print(...) to logger.debug(...) and preserve the same
formatted message content (including f-strings referring to INSTANTIATION_KEYS,
metadata_keys, adapted_cfg_dict, and key/value details); ensure callable
handling still shows either value.__name__ or 'callable' and that lengths for
list/dict use len(value), so behavior remains identical but uses the logging
module and debug level.
- Around line 533-537: Remove the debug print statements in the model factory
path that output config dtype settings; instead either delete those lines or
replace them with proper logging using the project's logger (e.g., use an
existing logger to log at DEBUG level), and then return
PuzzletronLlamaNemotronModel(config, tokenizer=self.tokenizer) as before; target
the block that references config.bf16, config.fp16, config.params_dtype and
PuzzletronLlamaNemotronModel to locate the code to change.
- Around line 466-476: The code currently uses "assert False, (...)" to abort on
heterogeneous activations; replace this with an explicit exception raise so it
cannot be disabled by Python optimizations: construct the same formatted error
message (using activation_funcs, not_none_activations, unique_not_none,
func_names, unique_func_names) and raise ValueError(...) (or raise
AssertionError(...)) with that message, keeping the existing explanatory text
about MCore and the suggested remedies.
- Around line 274-282: Remove the three debug print calls and replace them with
proper logging: import Python's logging and get a module logger (e.g., logger =
logging.getLogger(__name__)), then change print(...) on detection of
orig_cfg_dict["dtype"] and orig_cfg_dict["torch_dtype"] to logger.debug(...) and
change the warning print(...) to logger.warning(...); keep the existing
assignment to adapted_cfg_dict["torch_dtype"] and preserve the same messages but
via the logger so production logging config controls output.
In `@modelopt/torch/puzzletron/export/MCore/puzzletron_hf_config_utils.py`:
- Around line 90-91: The error message in the MLP gating check is misspelled
("notgated"); update the NotImplementedError raised in the conditional that
checks mlp_config.gated (the block containing "if not hasattr(mlp_config,
'gated') or mlp_config.gated is False:") to use a clear phrase such as
"non-gated MLP is not supported" or "not gated MLP is not supported" instead of
"notgated MLP is not supported".
- Around line 55-73: The variable is_mamba is set to attention_config.mamba
(which may be an object) then passed to PuzzletronAttentionConfig(is_mamba=...),
so coerce it to a boolean: set is_mamba = bool(attention_config.mamba) (or use
hasattr check then bool) and update the mamba_state_dim conditional to check the
boolean (e.g., if is_mamba and hasattr(attention_config.mamba, "state_dim") then
use state_dim else default 128) so PuzzletronAttentionConfig always receives a
true boolean for is_mamba and the state_dim lookup only happens when the mamba
config exists.
In `@modelopt/torch/puzzletron/export/MCore/puzzletron_layer_specs.py`:
- Around line 413-414: There's a misspelled instance attribute:
self.hetereogenous_dist_checkpoint should be renamed to
self.heterogeneous_dist_checkpoint; update the assignment in
puzzletron_layer_specs (where self.heterogeneous_block_specs is set) to use the
correct spelling and search/replace any other references to the misspelled
symbol so all code reads/writes self.heterogeneous_dist_checkpoint consistently
(or add the correctly-named attribute and deprecate the misspelled one if you
need backward compatibility).
In `@setup.py`:
- Line 115: Update the pinned lm-eval dependency in setup.py: replace the exact
version string "lm-eval==0.4.9" with either the latest patch "lm-eval==0.4.9.2"
or a minimum constraint "lm-eval>=0.4.9" depending on whether you want to lock
to a specific patch (use "lm-eval==0.4.9.2") or allow future patch upgrades (use
"lm-eval>=0.4.9"); modify the string literal in setup.py where "lm-eval==0.4.9"
appears.
🧹 Nitpick comments (6)
modelopt/torch/puzzletron/export/MCore/puzzletron_layer_specs.py (3)
226-235: Consider extracting the magic number8as a named constant.The fallback value
num_attention_heads = 8is used to avoid division by zero. Consider defining a constant likeDEFAULT_NUM_ATTENTION_HEADS = 8for clarity and maintainability.
630-658: Missing docstring forWrappedTELinearclass.The TODO on line 631 indicates a missing docstring. Consider adding documentation to maintain consistency with other wrapper classes.
870-872: Redundantmtp_block_specassignment.The variable
mtp_block_specis set toNoneon line 870 and then checked on line 871, raising an error if MTP is requested. Since MTP is not supported, the assignment is unnecessary. Consider removing or leaving a comment explaining its purpose for future support.Suggested simplification
- mtp_block_spec = None if args.mtp_num_layers is not None: raise ValueError("MTP is not supported") + mtp_block_spec = None # MTP not supported for Puzzletron modelsmodelopt/torch/puzzletron/export/MCore/llama_nemotron.py (1)
682-685: Catching bareExceptionis too broad.Catching generic
Exceptionon line 684 may hide unexpected errors. Consider catching specific exceptions likeFileNotFoundErrororOSError, or at least logging the exception type.Suggested improvement
try: generation_config = GenerationConfig.from_pretrained(str(self)) - except Exception: + except (FileNotFoundError, OSError) as e: + logging.debug(f"No generation_config found: {e}") generation_config = Nonemodelopt/torch/puzzletron/export/MCore/llama_nemotron_utils.py (1)
40-135: Consolidate duplicate conversion functions betweenllama_nemotron_utils.pyandpuzzletron_hf_config_utils.py.
convert_attention_config_from_cfg_objectis identically duplicated in both files.convert_mlp_config_from_cfg_objectis also duplicated with intentional differences in the signature (puzzletron_hf_config_utils has an additionaldefault_hidden_actparameter). Currently, only the versions inllama_nemotron_utils.pyare used (imported byllama_nemotron.py), making the versions inpuzzletron_hf_config_utils.pydead code. Consolidate these functions into a shared module to reduce maintenance burden and avoid sync issues.examples/puzzletron/nemo_export/convert_hf_to_nemo.py (1)
34-34:mprintimported but not used.
mprintis imported butprint()is used throughout the file (lines 47, 57, 93). If this script may run in distributed environments, consider usingmprintfor rank-0 only logging, or remove the unused import if distributed logging is not needed here.
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (8)
examples/puzzletron/README.mdexamples/puzzletron/nemo_export/convert_hf_to_nemo.pyexamples/puzzletron/nemo_export/convert_nemo_to_hf.pymodelopt/torch/puzzletron/export/MCore/llama_nemotron.pymodelopt/torch/puzzletron/export/MCore/llama_nemotron_utils.pymodelopt/torch/puzzletron/export/MCore/puzzletron_hf_config_utils.pymodelopt/torch/puzzletron/export/MCore/puzzletron_layer_specs.pysetup.py
🧰 Additional context used
🧬 Code graph analysis (3)
examples/puzzletron/nemo_export/convert_hf_to_nemo.py (4)
modelopt/torch/puzzletron/puzzletron.py (1)
puzzletron(29-74)modelopt/torch/puzzletron/export/MCore/llama_nemotron.py (4)
PuzzletronLlamaNemotronModel(326-342)PuzzletronNemotronModelConfig(92-323)config(659-688)config(728-756)modelopt/torch/puzzletron/tools/logger.py (1)
mprint(157-161)examples/puzzletron/nemo_export/convert_nemo_to_hf.py (2)
convert_model(28-51)main(54-83)
modelopt/torch/puzzletron/export/MCore/puzzletron_hf_config_utils.py (1)
modelopt/torch/puzzletron/export/MCore/puzzletron_layer_specs.py (3)
PuzzletronAttentionConfig(165-235)PuzzletronHeterogeneousTransformerConfig(340-476)get_gpt_heterogeneous_layer_spec_puzzletron(767-827)
examples/puzzletron/nemo_export/convert_nemo_to_hf.py (2)
modelopt/torch/puzzletron/puzzletron.py (1)
puzzletron(29-74)modelopt/torch/puzzletron/export/MCore/llama_nemotron.py (2)
PuzzletronLlamaNemotronModel(326-342)PuzzletronNemotronModelConfig(92-323)
🪛 LanguageTool
examples/puzzletron/README.md
[grammar] ~236-~236: Ensure spelling is correct
Context: ...s/nvidia/containers/nemo?version=25.07) contianer. First, convert the HF model to NeMo f...
(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)
[grammar] ~244-~244: Ensure spelling is correct
Context: ...distillation. Please refer to the [NeMo distillaiton documentation](https://docs.nvidia.com/...
(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)
🪛 markdownlint-cli2 (0.18.1)
examples/puzzletron/README.md
234-234: Trailing spaces
Expected: 0 or 2; Actual: 1
(MD009, no-trailing-spaces)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
- GitHub Check: linux
- GitHub Check: wait-checks / wait
- GitHub Check: build-docs
- GitHub Check: code-quality
🔇 Additional comments (16)
modelopt/torch/puzzletron/export/MCore/puzzletron_layer_specs.py (8)
61-73: LGTM - Appropriate handling of optional Transformer Engine import.The try/except pattern for importing Transformer Engine with a fallback flag is correct and follows best practices for optional dependencies.
82-122: Consider addressing the TODO comment clarifying the difference from the referenced implementation.The function logic appears correct for generating sharded state dict key mappings. The TODO on line 83 suggests uncertainty about the difference from the original Megatron-LM implementation. Consider documenting or addressing this before the code stabilizes.
126-162: LGTM - Well-designed base config class with proper validation.The mutual exclusivity check in
__post_init__and thebuild_config_from_dictfactory method provide clean configuration building with proper fallback to main config values.
238-276: LGTM - MLP configuration with proper no-op/linear handling.The
__post_init__correctly clears MLP-specific fields when the subblock is configured as no-op or linear replacement.
278-316: LGTM - Block config with flexible key handling.The
build_from_dictmethod correctly handles both "mlp" and "ffn" keys for compatibility with different config formats.
561-577: Unusual super() call bypasses parent class initialization.The call
super(TELayerNormColumnParallelLinear, self).__init__(...)intentionally skipsTELayerNormColumnParallelLinear.__init__to call the grandparent class directly. While this works, it's fragile and could break if the TE class hierarchy changes. Consider adding a comment explaining why this pattern is necessary.
687-763: LGTM - Comprehensive layer spec generation.The function correctly handles all attention and MLP variants (standard, no-op, linear replacement, Mamba, MoE) with appropriate module specs for both TE and non-TE modes.
767-827: LGTM - Heterogeneous layer spec assembly with pipeline parallel support.The function correctly assembles layer specs with proper pipeline parallel handling, including support for
pipeline_model_parallel_layout.modelopt/torch/puzzletron/export/MCore/llama_nemotron_utils.py (6)
138-285: LGTM - Comprehensive NeMo to HF config conversion.The function handles complex conversion logic well, including heterogeneous block configs, Mamba settings, MoE configurations, and rope scaling. The error handling appropriately catches both JSON parsing and key access errors.
302-464: LGTM - Well-structured mapping builder for heterogeneous models.The function correctly handles both HF and NeMo config formats, building per-layer mappings for all layer types. The validation at line 346-351 ensures proper error reporting when no block configs are found.
467-509: LGTM - QKV merge with heterogeneous attention support.The function correctly handles per-layer configuration through the
idxparameter and derivesheads_per_groupfrom tensor shapes, enabling support for heterogeneous attention configurations.
512-560: LGTM - QKV split correctly inverts the merge operation.The function properly derives dimensions from the input tensor shape and creates appropriate slice indices for extracting q, k, v projections. The
.cpu()calls ensure compatibility with HF model saving.
563-581: LGTM - Robust dtype extraction with format compatibility.The function correctly handles both legacy (
torch_dtype) and new (dtype) config formats with appropriate type conversion.
624-677: LGTM - Safe chat template embedding with proper checks.The function includes appropriate guards for missing files and already-embedded templates, with clear logging for debugging.
modelopt/torch/puzzletron/export/MCore/llama_nemotron.py (1)
691-962: LGTM - Well-structured exporter with proper weight mapping reversal.The exporter correctly reverses the importer's mappings and transforms. The closure pattern for
make_split_qkv_fnproperly captures layer indices to avoid variable capture issues.examples/puzzletron/nemo_export/convert_nemo_to_hf.py (1)
28-51: LGTM!The
convert_modelfunction correctly usesllm.export_ckptwith appropriate parameters for NeMo to HuggingFace conversion.
✏️ Tip: You can disable this entire section by setting review_details to false in your review settings.
modelopt/torch/puzzletron/export/MCore/puzzletron_hf_config_utils.py
Outdated
Show resolved
Hide resolved
modelopt/torch/puzzletron/export/MCore/puzzletron_hf_config_utils.py
Outdated
Show resolved
Hide resolved
modelopt/torch/puzzletron/export/MCore/puzzletron_layer_specs.py
Outdated
Show resolved
Hide resolved
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## feature/compress #784 +/- ##
=================================================
Coverage 74.62% 74.62%
=================================================
Files 192 192
Lines 18989 18989
=================================================
Hits 14171 14171
Misses 4818 4818 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Signed-off-by: Liana Mikaelyan <lmikaelyan@nvidia.com>
Signed-off-by: Liana Mikaelyan <lmikaelyan@nvidia.com>
|
amazing, is it basically just copied from puzzletron? are there any changes from the original code? (I didn't see anything but I might have missed something) |
danielkorzekwa
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did not review the internals of export/MCore/...... and nemo_export/.... - I assume this came directly from puzzletron
| @@ -0,0 +1,92 @@ | |||
| # SPDX-FileCopyrightText: Copyright (c) 2024 NVIDIA CORPORATION & AFFILIATES. All rights reserved. | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would move it to modelopt/torch/puzzletron/...., the same for convert_nemo_to_hf
examples should not keep the logic - should be just examples,
similarly, example/puzzletron/main.py should go to modelopt/torch/puzzletron/...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These are cmdline scripts which we should keep in examples folder just like all other modelopt examples. ModelOpt installation would be somewhere in /usr/local/... and we dont want users to run script from there
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And the more things we add to modelopt library will require more dependencies for modelopt. Keeping example scripts separate means we can keep modelopt dependencies leaner and move extra dependencies to examples/<example_name>/requirements.txt
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd prefer we keep it in examples, the scripts rely on the nemo dependency too
Signed-off-by: Liana Mikaelyan <lmikaelyan@nvidia.com>
Signed-off-by: Liana Mikaelyan <lmikaelyan@nvidia.com>
Signed-off-by: Liana Mikaelyan <lmikaelyan@nvidia.com>
Signed-off-by: Liana Mikaelyan <lmikaelyan@nvidia.com>
Signed-off-by: Liana Mikaelyan <lmikaelyan@nvidia.com>
kevalmorabia97
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Didnt look too much into details given that all this is temporary and works for current customer use case hence approving
Signed-off-by: Liana Mikaelyan <lmikaelyan@nvidia.com>
What does this PR do?
This PR adds the scripts to convert the Puzzle model from HuggingFace to NeMo and back. This is required for running knowledge distillation. The tutorial is further updated with the instructions.
Summary by CodeRabbit
Release Notes
New Features
Documentation
Dependencies
✏️ Tip: You can customize this high-level summary in your review settings.