Skip to content

Conversation

@LianaMikael
Copy link

@LianaMikael LianaMikael commented Jan 14, 2026

What does this PR do?

This PR adds the scripts to convert the Puzzle model from HuggingFace to NeMo and back. This is required for running knowledge distillation. The tutorial is further updated with the instructions.

Summary by CodeRabbit

Release Notes

  • New Features

    • Added support for Puzzletron Llama-Nemotron models within the NeMo framework.
    • New model conversion tools enabling seamless switching between HuggingFace and NeMo formats.
    • Knowledge distillation workflow to restore model quality after compression.
  • Documentation

    • Added knowledge distillation process guide with step-by-step instructions.
  • Dependencies

    • Added lm-eval as an optional dependency.

✏️ Tip: You can customize this high-level summary in your review settings.

Signed-off-by: Liana Mikaelyan <lmikaelyan@nvidia.com>
Signed-off-by: Liana Mikaelyan <lmikaelyan@nvidia.com>
@LianaMikael LianaMikael requested review from a team as code owners January 14, 2026 22:37
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Jan 14, 2026

Important

Review skipped

Auto incremental reviews are disabled on this repository.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

📝 Walkthrough

Walkthrough

Adds comprehensive support for converting Puzzletron Llama-Nemotron models between HuggingFace and NeMo formats. Introduces configuration classes, bidirectional model importers/exporters, heterogeneous transformer layer specifications, conversion utilities, and command-line tools enabling end-to-end model conversion workflows with knowledge distillation support.

Changes

Cohort / File(s) Summary
Documentation
examples/puzzletron/README.md
Added Knowledge Distillation section documenting workflow to recover quality loss in compressed models using NeMo, including HF-to-NeMo conversion steps and optional reverse conversion.
CLI Conversion Tools
examples/puzzletron/nemo_export/convert_hf_to_nemo.py, examples/puzzletron/nemo_export/convert_nemo_to_hf.py
New command-line utilities providing convert_model() functions and argparse-based interfaces for bidirectional model format conversion with input validation, path handling, and logging.
Core Model Integration
modelopt/torch/puzzletron/export/MCore/llama_nemotron.py
Introduces PuzzletronNemotronModelConfig, PuzzletronLlamaNemotronModel, importer/exporter classes, and config adaptation helpers to enable HuggingFace↔NeMo model import/export with support for heterogeneous layer configurations and RoPE scaling.
Configuration & Layer Specifications
modelopt/torch/puzzletron/export/MCore/puzzletron_layer_specs.py
Adds dataclass-based layer specification primitives (PuzzletronAttentionConfig, PuzzletronMLPConfig, PuzzletronHeterogeneousTransformerConfig) and heterogeneous transformer builders supporting MHA/MLA/Mamba variants, MoE configurations, and per-layer orchestration via get_gpt_heterogeneous_layer_spec_puzzletron().
Conversion Utilities
modelopt/torch/puzzletron/export/MCore/llama_nemotron_utils.py, modelopt/torch/puzzletron/export/MCore/puzzletron_hf_config_utils.py
Helper functions for HF↔NeMo config conversion, QKV merge/split transforms for Puzzletron, dtype handling, tokenizer embedding, and validation of heterogeneous block configurations with error handling.
Dependency Updates
setup.py
Added lm-eval==0.4.9 to puzzletron extras.

Sequence Diagrams

sequenceDiagram
    actor User
    participant CLI as convert_hf_to_nemo.py
    participant NeMo as NeMo Framework
    participant FSys as Filesystem
    
    User->>CLI: invoke with HF model path
    CLI->>FSys: validate input HF model exists
    CLI->>FSys: ensure output directory parent exists
    CLI->>NeMo: initialize PuzzletronLlamaNemotronModel config
    CLI->>NeMo: call import_ckpt(source=HF_path)
    NeMo->>NeMo: map HF weights to NeMo format<br/>apply Puzzletron transforms
    NeMo->>FSys: write converted NeMo model
    CLI->>User: return output path
Loading
sequenceDiagram
    actor User
    participant CLI as convert_nemo_to_hf.py
    participant NeMo as NeMo Framework
    participant Config as Config Converter
    participant FSys as Filesystem
    
    User->>CLI: invoke with NeMo model path
    CLI->>FSys: validate input NeMo model exists
    CLI->>FSys: ensure output directory exists
    CLI->>NeMo: instantiate NeMo model from checkpoint
    CLI->>Config: convert_nemo_config_to_hf_decilm_config()
    Config->>Config: adapt attention/MLP configs<br/>preserve metadata for round-trip
    CLI->>NeMo: call export_ckpt(target="hf")
    NeMo->>NeMo: map NeMo weights to HF format<br/>apply inverse Puzzletron transforms
    NeMo->>FSys: write converted HF model
    CLI->>FSys: embed chat template in tokenizer config
    CLI->>User: return output path
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

🚥 Pre-merge checks | ✅ 2 | ❌ 1
❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 48.21% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'Add NeMo Conversion Scripts to Puzzletron' accurately captures the primary changes: new scripts for converting Puzzletron models between HuggingFace and NeMo formats, along with supporting infrastructure and documentation updates for knowledge distillation workflows.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 16

🤖 Fix all issues with AI agents
In `@examples/puzzletron/nemo_export/convert_hf_to_nemo.py`:
- Line 45: The code passes the PuzzletronNemotronModelConfig class to
PuzzletronLlamaNemotronModel instead of an instance; change the call to
instantiate the config (use PuzzletronNemotronModelConfig() when constructing
PuzzletronLlamaNemotronModel) so the config parameter receives an object
instance compatible with the __init__ signature of PuzzletronLlamaNemotronModel.
- Around line 16-21: The file has a duplicate import of Path and an unused
timedelta import; remove the second "from pathlib import Path" and delete "from
datetime import timedelta" (and also remove "from typing import Any" if Any is
unused in the rest of convert_hf_to_nemo.py) so imports are unique and only
include symbols actually referenced by functions like any conversion helpers in
this module.
- Around line 90-91: The os.makedirs call using
os.path.dirname(args.output_ckpt_path) can receive an empty string if
output_ckpt_path is a bare filename; change the logic around the directory
creation in convert_hf_to_nemo.py to compute dirpath =
os.path.dirname(args.output_ckpt_path) and only call os.makedirs when dirpath is
non-empty (or use dirpath = dirpath or '.' and call os.makedirs(dirpath,
exist_ok=True)); update the code that creates the output directory (the
os.makedirs line) accordingly to avoid passing an empty string to os.makedirs.
- Around line 28-32: The imports HFLlamaNemotronImporter,
Llama33NemotronSuper49BConfig, and LlamaNemotronModel from
nemo.collections.llm.gpt.model.llama_nemotron are unused; remove these three
symbols from the import list in convert_hf_to_nemo.py (or replace the
multi-import with only the actually used symbols) to eliminate unused-import
warnings and dead code. If any of these are intended for future use, add a short
comment near their import or reference them explicitly where needed (e.g., in
functions that perform conversion) to avoid linter complaints.

In `@examples/puzzletron/nemo_export/convert_nemo_to_hf.py`:
- Around line 79-80: The os.makedirs call uses
os.path.dirname(args.output_ckpt_path) which can be an empty string when the
path has no directory component; guard against that by computing a directory
variable (e.g., out_dir = os.path.dirname(args.output_ckpt_path) or out_dir =
os.path.dirname(...) or '.' if that result is empty) and call
os.makedirs(out_dir, exist_ok=True) so you never pass an empty string to
os.makedirs.
- Around line 21-24: The file imports PuzzletronLlamaNemotronModel and
PuzzletronNemotronModelConfig but never uses them; remove these two unused
imports from the import statement in convert_nemo_to_hf.py (the import that
currently references modelopt.torch.puzzletron.export.MCore.llama_nemotron) so
only actually used symbols remain, or if they are intended to be used later,
reference them where needed (e.g., in functions that construct or configure the
model) to avoid unused-import warnings.

In `@examples/puzzletron/README.md`:
- Around line 234-250: In the "Knowledge Distillation" section fix three small
issues: remove the trailing whitespace after the "## Knowledge Distillation"
heading, correct the typo "contianer" to "container" in the sentence mentioning
the nemo:25.07 container, and correct "distillaiton" to "distillation" in the
link text and sentence referencing the NeMo documentation; also tidy the code
fences so the bash command block contains only the command (remove the stray
"bash" token shown in the proposed fix).

In `@modelopt/torch/puzzletron/export/MCore/llama_nemotron_utils.py`:
- Around line 601-610: The package_dir path in
export/MCore/llama_nemotron_utils.py is incorrect and points to a non-existent
"puzzle_tools" directory causing the warning/early return; update the expression
that sets package_dir (currently using Path(__file__).parent.parent.parent /
"puzzle_tools" / "deci_lm_hf_code") to use "decilm" instead
(Path(__file__).parent.parent.parent / "decilm" / "deci_lm_hf_code") so the real
decilm/deci_lm_hf_code directory is found, and also update the accompanying
docstring/text that references "puzzle_tools/deci_lm_hf_code/" to
"decilm/deci_lm_hf_code/" to keep documentation consistent (this touches the
package_dir assignment and the log/doc comment surrounding it).

In `@modelopt/torch/puzzletron/export/MCore/llama_nemotron.py`:
- Around line 390-400: Replace the debug print statements with logging calls:
obtain a module logger (e.g., logging.getLogger(__name__)) if one isn't already
present, change each print(...) to logger.debug(...) and preserve the same
formatted message content (including f-strings referring to INSTANTIATION_KEYS,
metadata_keys, adapted_cfg_dict, and key/value details); ensure callable
handling still shows either value.__name__ or 'callable' and that lengths for
list/dict use len(value), so behavior remains identical but uses the logging
module and debug level.
- Around line 533-537: Remove the debug print statements in the model factory
path that output config dtype settings; instead either delete those lines or
replace them with proper logging using the project's logger (e.g., use an
existing logger to log at DEBUG level), and then return
PuzzletronLlamaNemotronModel(config, tokenizer=self.tokenizer) as before; target
the block that references config.bf16, config.fp16, config.params_dtype and
PuzzletronLlamaNemotronModel to locate the code to change.
- Around line 466-476: The code currently uses "assert False, (...)" to abort on
heterogeneous activations; replace this with an explicit exception raise so it
cannot be disabled by Python optimizations: construct the same formatted error
message (using activation_funcs, not_none_activations, unique_not_none,
func_names, unique_func_names) and raise ValueError(...) (or raise
AssertionError(...)) with that message, keeping the existing explanatory text
about MCore and the suggested remedies.
- Around line 274-282: Remove the three debug print calls and replace them with
proper logging: import Python's logging and get a module logger (e.g., logger =
logging.getLogger(__name__)), then change print(...) on detection of
orig_cfg_dict["dtype"] and orig_cfg_dict["torch_dtype"] to logger.debug(...) and
change the warning print(...) to logger.warning(...); keep the existing
assignment to adapted_cfg_dict["torch_dtype"] and preserve the same messages but
via the logger so production logging config controls output.

In `@modelopt/torch/puzzletron/export/MCore/puzzletron_hf_config_utils.py`:
- Around line 90-91: The error message in the MLP gating check is misspelled
("notgated"); update the NotImplementedError raised in the conditional that
checks mlp_config.gated (the block containing "if not hasattr(mlp_config,
'gated') or mlp_config.gated is False:") to use a clear phrase such as
"non-gated MLP is not supported" or "not gated MLP is not supported" instead of
"notgated MLP is not supported".
- Around line 55-73: The variable is_mamba is set to attention_config.mamba
(which may be an object) then passed to PuzzletronAttentionConfig(is_mamba=...),
so coerce it to a boolean: set is_mamba = bool(attention_config.mamba) (or use
hasattr check then bool) and update the mamba_state_dim conditional to check the
boolean (e.g., if is_mamba and hasattr(attention_config.mamba, "state_dim") then
use state_dim else default 128) so PuzzletronAttentionConfig always receives a
true boolean for is_mamba and the state_dim lookup only happens when the mamba
config exists.

In `@modelopt/torch/puzzletron/export/MCore/puzzletron_layer_specs.py`:
- Around line 413-414: There's a misspelled instance attribute:
self.hetereogenous_dist_checkpoint should be renamed to
self.heterogeneous_dist_checkpoint; update the assignment in
puzzletron_layer_specs (where self.heterogeneous_block_specs is set) to use the
correct spelling and search/replace any other references to the misspelled
symbol so all code reads/writes self.heterogeneous_dist_checkpoint consistently
(or add the correctly-named attribute and deprecate the misspelled one if you
need backward compatibility).

In `@setup.py`:
- Line 115: Update the pinned lm-eval dependency in setup.py: replace the exact
version string "lm-eval==0.4.9" with either the latest patch "lm-eval==0.4.9.2"
or a minimum constraint "lm-eval>=0.4.9" depending on whether you want to lock
to a specific patch (use "lm-eval==0.4.9.2") or allow future patch upgrades (use
"lm-eval>=0.4.9"); modify the string literal in setup.py where "lm-eval==0.4.9"
appears.
🧹 Nitpick comments (6)
modelopt/torch/puzzletron/export/MCore/puzzletron_layer_specs.py (3)

226-235: Consider extracting the magic number 8 as a named constant.

The fallback value num_attention_heads = 8 is used to avoid division by zero. Consider defining a constant like DEFAULT_NUM_ATTENTION_HEADS = 8 for clarity and maintainability.


630-658: Missing docstring for WrappedTELinear class.

The TODO on line 631 indicates a missing docstring. Consider adding documentation to maintain consistency with other wrapper classes.


870-872: Redundant mtp_block_spec assignment.

The variable mtp_block_spec is set to None on line 870 and then checked on line 871, raising an error if MTP is requested. Since MTP is not supported, the assignment is unnecessary. Consider removing or leaving a comment explaining its purpose for future support.

Suggested simplification
-    mtp_block_spec = None
     if args.mtp_num_layers is not None:
         raise ValueError("MTP is not supported")
+    mtp_block_spec = None  # MTP not supported for Puzzletron models
modelopt/torch/puzzletron/export/MCore/llama_nemotron.py (1)

682-685: Catching bare Exception is too broad.

Catching generic Exception on line 684 may hide unexpected errors. Consider catching specific exceptions like FileNotFoundError or OSError, or at least logging the exception type.

Suggested improvement
         try:
             generation_config = GenerationConfig.from_pretrained(str(self))
-        except Exception:
+        except (FileNotFoundError, OSError) as e:
+            logging.debug(f"No generation_config found: {e}")
             generation_config = None
modelopt/torch/puzzletron/export/MCore/llama_nemotron_utils.py (1)

40-135: Consolidate duplicate conversion functions between llama_nemotron_utils.py and puzzletron_hf_config_utils.py.

convert_attention_config_from_cfg_object is identically duplicated in both files. convert_mlp_config_from_cfg_object is also duplicated with intentional differences in the signature (puzzletron_hf_config_utils has an additional default_hidden_act parameter). Currently, only the versions in llama_nemotron_utils.py are used (imported by llama_nemotron.py), making the versions in puzzletron_hf_config_utils.py dead code. Consolidate these functions into a shared module to reduce maintenance burden and avoid sync issues.

examples/puzzletron/nemo_export/convert_hf_to_nemo.py (1)

34-34: mprint imported but not used.

mprint is imported but print() is used throughout the file (lines 47, 57, 93). If this script may run in distributed environments, consider using mprint for rank-0 only logging, or remove the unused import if distributed logging is not needed here.

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 43b3cfa and c57d21e.

📒 Files selected for processing (8)
  • examples/puzzletron/README.md
  • examples/puzzletron/nemo_export/convert_hf_to_nemo.py
  • examples/puzzletron/nemo_export/convert_nemo_to_hf.py
  • modelopt/torch/puzzletron/export/MCore/llama_nemotron.py
  • modelopt/torch/puzzletron/export/MCore/llama_nemotron_utils.py
  • modelopt/torch/puzzletron/export/MCore/puzzletron_hf_config_utils.py
  • modelopt/torch/puzzletron/export/MCore/puzzletron_layer_specs.py
  • setup.py
🧰 Additional context used
🧬 Code graph analysis (3)
examples/puzzletron/nemo_export/convert_hf_to_nemo.py (4)
modelopt/torch/puzzletron/puzzletron.py (1)
  • puzzletron (29-74)
modelopt/torch/puzzletron/export/MCore/llama_nemotron.py (4)
  • PuzzletronLlamaNemotronModel (326-342)
  • PuzzletronNemotronModelConfig (92-323)
  • config (659-688)
  • config (728-756)
modelopt/torch/puzzletron/tools/logger.py (1)
  • mprint (157-161)
examples/puzzletron/nemo_export/convert_nemo_to_hf.py (2)
  • convert_model (28-51)
  • main (54-83)
modelopt/torch/puzzletron/export/MCore/puzzletron_hf_config_utils.py (1)
modelopt/torch/puzzletron/export/MCore/puzzletron_layer_specs.py (3)
  • PuzzletronAttentionConfig (165-235)
  • PuzzletronHeterogeneousTransformerConfig (340-476)
  • get_gpt_heterogeneous_layer_spec_puzzletron (767-827)
examples/puzzletron/nemo_export/convert_nemo_to_hf.py (2)
modelopt/torch/puzzletron/puzzletron.py (1)
  • puzzletron (29-74)
modelopt/torch/puzzletron/export/MCore/llama_nemotron.py (2)
  • PuzzletronLlamaNemotronModel (326-342)
  • PuzzletronNemotronModelConfig (92-323)
🪛 LanguageTool
examples/puzzletron/README.md

[grammar] ~236-~236: Ensure spelling is correct
Context: ...s/nvidia/containers/nemo?version=25.07) contianer. First, convert the HF model to NeMo f...

(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)


[grammar] ~244-~244: Ensure spelling is correct
Context: ...distillation. Please refer to the [NeMo distillaiton documentation](https://docs.nvidia.com/...

(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)

🪛 markdownlint-cli2 (0.18.1)
examples/puzzletron/README.md

234-234: Trailing spaces
Expected: 0 or 2; Actual: 1

(MD009, no-trailing-spaces)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
  • GitHub Check: linux
  • GitHub Check: wait-checks / wait
  • GitHub Check: build-docs
  • GitHub Check: code-quality
🔇 Additional comments (16)
modelopt/torch/puzzletron/export/MCore/puzzletron_layer_specs.py (8)

61-73: LGTM - Appropriate handling of optional Transformer Engine import.

The try/except pattern for importing Transformer Engine with a fallback flag is correct and follows best practices for optional dependencies.


82-122: Consider addressing the TODO comment clarifying the difference from the referenced implementation.

The function logic appears correct for generating sharded state dict key mappings. The TODO on line 83 suggests uncertainty about the difference from the original Megatron-LM implementation. Consider documenting or addressing this before the code stabilizes.


126-162: LGTM - Well-designed base config class with proper validation.

The mutual exclusivity check in __post_init__ and the build_config_from_dict factory method provide clean configuration building with proper fallback to main config values.


238-276: LGTM - MLP configuration with proper no-op/linear handling.

The __post_init__ correctly clears MLP-specific fields when the subblock is configured as no-op or linear replacement.


278-316: LGTM - Block config with flexible key handling.

The build_from_dict method correctly handles both "mlp" and "ffn" keys for compatibility with different config formats.


561-577: Unusual super() call bypasses parent class initialization.

The call super(TELayerNormColumnParallelLinear, self).__init__(...) intentionally skips TELayerNormColumnParallelLinear.__init__ to call the grandparent class directly. While this works, it's fragile and could break if the TE class hierarchy changes. Consider adding a comment explaining why this pattern is necessary.


687-763: LGTM - Comprehensive layer spec generation.

The function correctly handles all attention and MLP variants (standard, no-op, linear replacement, Mamba, MoE) with appropriate module specs for both TE and non-TE modes.


767-827: LGTM - Heterogeneous layer spec assembly with pipeline parallel support.

The function correctly assembles layer specs with proper pipeline parallel handling, including support for pipeline_model_parallel_layout.

modelopt/torch/puzzletron/export/MCore/llama_nemotron_utils.py (6)

138-285: LGTM - Comprehensive NeMo to HF config conversion.

The function handles complex conversion logic well, including heterogeneous block configs, Mamba settings, MoE configurations, and rope scaling. The error handling appropriately catches both JSON parsing and key access errors.


302-464: LGTM - Well-structured mapping builder for heterogeneous models.

The function correctly handles both HF and NeMo config formats, building per-layer mappings for all layer types. The validation at line 346-351 ensures proper error reporting when no block configs are found.


467-509: LGTM - QKV merge with heterogeneous attention support.

The function correctly handles per-layer configuration through the idx parameter and derives heads_per_group from tensor shapes, enabling support for heterogeneous attention configurations.


512-560: LGTM - QKV split correctly inverts the merge operation.

The function properly derives dimensions from the input tensor shape and creates appropriate slice indices for extracting q, k, v projections. The .cpu() calls ensure compatibility with HF model saving.


563-581: LGTM - Robust dtype extraction with format compatibility.

The function correctly handles both legacy (torch_dtype) and new (dtype) config formats with appropriate type conversion.


624-677: LGTM - Safe chat template embedding with proper checks.

The function includes appropriate guards for missing files and already-embedded templates, with clear logging for debugging.

modelopt/torch/puzzletron/export/MCore/llama_nemotron.py (1)

691-962: LGTM - Well-structured exporter with proper weight mapping reversal.

The exporter correctly reverses the importer's mappings and transforms. The closure pattern for make_split_qkv_fn properly captures layer indices to avoid variable capture issues.

examples/puzzletron/nemo_export/convert_nemo_to_hf.py (1)

28-51: LGTM!

The convert_model function correctly uses llm.export_ckpt with appropriate parameters for NeMo to HuggingFace conversion.

✏️ Tip: You can disable this entire section by setting review_details to false in your review settings.

@codecov
Copy link

codecov bot commented Jan 14, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 74.62%. Comparing base (43b3cfa) to head (ef36626).
⚠️ Report is 1 commits behind head on feature/compress.

Additional details and impacted files
@@                Coverage Diff                @@
##           feature/compress     #784   +/-   ##
=================================================
  Coverage             74.62%   74.62%           
=================================================
  Files                   192      192           
  Lines                 18989    18989           
=================================================
  Hits                  14171    14171           
  Misses                 4818     4818           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Signed-off-by: Liana Mikaelyan <lmikaelyan@nvidia.com>
Signed-off-by: Liana Mikaelyan <lmikaelyan@nvidia.com>
@Separius
Copy link

amazing, is it basically just copied from puzzletron? are there any changes from the original code? (I didn't see anything but I might have missed something)

Copy link

@danielkorzekwa danielkorzekwa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did not review the internals of export/MCore/...... and nemo_export/.... - I assume this came directly from puzzletron

@@ -0,0 +1,92 @@
# SPDX-FileCopyrightText: Copyright (c) 2024 NVIDIA CORPORATION & AFFILIATES. All rights reserved.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would move it to modelopt/torch/puzzletron/...., the same for convert_nemo_to_hf

examples should not keep the logic - should be just examples,

similarly, example/puzzletron/main.py should go to modelopt/torch/puzzletron/...

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are cmdline scripts which we should keep in examples folder just like all other modelopt examples. ModelOpt installation would be somewhere in /usr/local/... and we dont want users to run script from there

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And the more things we add to modelopt library will require more dependencies for modelopt. Keeping example scripts separate means we can keep modelopt dependencies leaner and move extra dependencies to examples/<example_name>/requirements.txt

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd prefer we keep it in examples, the scripts rely on the nemo dependency too

Signed-off-by: Liana Mikaelyan <lmikaelyan@nvidia.com>
Signed-off-by: Liana Mikaelyan <lmikaelyan@nvidia.com>
Signed-off-by: Liana Mikaelyan <lmikaelyan@nvidia.com>
Signed-off-by: Liana Mikaelyan <lmikaelyan@nvidia.com>
Signed-off-by: Liana Mikaelyan <lmikaelyan@nvidia.com>
Copy link
Collaborator

@kevalmorabia97 kevalmorabia97 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Didnt look too much into details given that all this is temporary and works for current customer use case hence approving

Signed-off-by: Liana Mikaelyan <lmikaelyan@nvidia.com>
@LianaMikael LianaMikael enabled auto-merge (squash) January 15, 2026 15:55
@LianaMikael LianaMikael merged commit 4c30bd5 into feature/compress Jan 15, 2026
23 checks passed
@LianaMikael LianaMikael deleted the lmikaelyan/add-nemo-conversion-puzzletron branch January 15, 2026 17:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants