feat(customizer): expose all hyperparams supported in backend lib by anubhutivyas · Pull Request #371 · NVIDIA-NeMo/nemo-platform

anubhutivyas · 2026-06-17T07:08:42Z

Summary by CodeRabbit

Release Notes

New Features

Added flexible optimizer configuration for training, including optimizer type selection and configurable learning rate decay scheduling.
Introduced attention implementation selection options for model training.
Expanded LoRA adapter configuration with module exclusion and kernel optimization support.
Added sequence packing controls to optimize training batch processing.

Documentation

Expanded hyperparameter reference documentation to include newly supported training configuration options.

Signed-off-by: anubhutiv <anubhutiv@nvidia.com>

coderabbitai · 2026-06-17T07:18:11Z

📝 Walkthrough

Walkthrough

Two parallel expansions add optional hyperparameter fields to automodel and unsloth services. Automodel gains LoRA exclusions, Triton toggle, attention backend selection, sequence-packing controls, and optimizer/LR-decay fields. Unsloth gains rope scaling, DoRA/LoRA init options, lr_scheduler_kwargs, and additional Adam/regularization parameters. All changes propagate through schemas, adapters, compilers, backends, and docs.

Changes

Automodel pass-2 hyperparameter expansion

Layer / File(s)	Summary
Automodel schema contracts `plugins/nemo-automodel/src/nemo_automodel_plugin/schema.py`, `services/automodel/src/nmp/automodel/api/v2/jobs/schemas.py`, `services/automodel/src/nmp/automodel/app/jobs/training/schemas.py`	Plugin `LoRAParams`, `TrainingSpec`, `BatchSpec`, `OptimizerSpec` gain new fields; API v2 `LoRAParams` and `_TrainingBase` mirror them; internal `TrainingStepConfig.BatchConfig` and `OptimizerConfig` are extended with `split_across_pack`, `optimizer_name`, and `lr_decay_style`.
Adapter and compiler field plumbing `services/automodel/src/nmp/automodel/adapter.py`, `services/automodel/src/nmp/automodel/app/jobs/training/compiler.py`	Adapter maps new plugin fields into v2 `SFTTraining`; compiler threads them from API schemas into `TrainingStepConfig`, replacing hardcoded `use_triton=True`.
Training backend dynamic wiring `services/automodel/src/nmp/automodel/tasks/training/backends/config.py`	`compile_automodel_config` looks up optimizer class by name, reads `lr_decay_style` and `split_across_pack` from config, and conditionally injects `exclude_modules` into PEFT config.
Adapter and compiler tests `services/automodel/tests/test_adapter.py`, `services/automodel/tests/test_compiler.py`	Two adapter tests cover explicit pass-2 field propagation and default behavior; one compiler test verifies all pass-2 fields are carried through `compile_training_step`.

Unsloth pass-2 hyperparameter expansion

Layer / File(s)	Summary
Unsloth schema contracts `services/unsloth/src/nmp/unsloth/schemas.py`	`ModelLoadSpec` gains `rope_scaling`; `LoRAParams` gains DoRA/init/module-selection fields; `ScheduleSpec` gains `lr_scheduler_kwargs`; `OptimizerSpec` gains Adam betas/epsilon, grad norm, label smoothing, NEFTune alpha.
`build_peft_kwargs` helper and SFTConfig wiring `services/unsloth/src/nmp/unsloth/tasks/training/backends/unsloth_sft.py`	`build_model_load_kwargs` conditionally includes `rope_scaling`; new `build_peft_kwargs` centralizes PEFT arg construction with optional fields; `SFTConfig` wiring adds optimizer fields and conditional `neftune_noise_alpha`/`lr_scheduler_kwargs`.
Unsloth kwargs builder tests `services/unsloth/tests/test_model_load_kwargs.py`	`_spec` helper extended; new tests cover `rope_scaling` forwarding and `build_peft_kwargs` defaults and optional field emission.

Hyperparameter reference documentation

Layer / File(s)	Summary
Automodel and unsloth hyperparameter docs `plugins/nemo-customizer/src/nemo_customizer/skills/nemo-customizer/references/hyperparameters.md`	JSON templates and field glossary tables updated for all newly added schema fields in both automodel and unsloth sections.

Possibly related PRs

NVIDIA-NeMo/nemo-platform#346: Modifies build_model_load_kwargs and train_sft in unsloth_sft.py with full_finetuning/kwargs construction that this PR further extends with conditional rope_scaling and updated PEFT/optimizer wiring.

Suggested labels

chore

Suggested reviewers

albcui
soluwalana

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 50.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	Title directly describes the main change: exposing additional hyperparameters from backend libraries in the customizer. Aligns with the extensive schema additions across multiple files.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch aalgo-279-hyperparams-support/anubhutiv

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@services/automodel/src/nmp/automodel/tasks/training/backends/config.py`:
- Around line 257-261: The optimizer_targets.get() call with a default fallback
to Adam silently masks configuration errors when an unknown optimizer_name is
provided. Instead of using the get method with a default, explicitly check if
customizer_config.optimizer.optimizer_name exists in the optimizer_targets
dictionary and raise a clear error if it does not, ensuring invalid optimizer
configurations fail immediately rather than silently defaulting to Adam.

In `@services/unsloth/src/nmp/unsloth/tasks/training/backends/unsloth_sft.py`:
- Around line 85-96: The issue is that the `build_peft_kwargs` function relies
on a runtime assertion to check that `lora` is not None, but the schema allows
`finetuning_type="lora"` with `lora=None`, which will cause a crash at runtime.
Instead of relying on runtime assertions, enforce this invariant at the schema
level in `TrainingSpec` by either making `lora` required when
`finetuning_type="lora"` or providing an appropriate default value. This ensures
backend helper functions like `build_peft_kwargs` never receive invalid input
shapes without needing runtime assertions.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: af7a496d-dc75-454e-8083-359c419f6870

📥 Commits

Reviewing files that changed from the base of the PR and between d36974f and 3544f77.

📒 Files selected for processing (12)

plugins/nemo-automodel/src/nemo_automodel_plugin/schema.py
plugins/nemo-customizer/src/nemo_customizer/skills/nemo-customizer/references/hyperparameters.md
services/automodel/src/nmp/automodel/adapter.py
services/automodel/src/nmp/automodel/api/v2/jobs/schemas.py
services/automodel/src/nmp/automodel/app/jobs/training/compiler.py
services/automodel/src/nmp/automodel/app/jobs/training/schemas.py
services/automodel/src/nmp/automodel/tasks/training/backends/config.py
services/automodel/tests/test_adapter.py
services/automodel/tests/test_compiler.py
services/unsloth/src/nmp/unsloth/schemas.py
services/unsloth/src/nmp/unsloth/tasks/training/backends/unsloth_sft.py
services/unsloth/tests/test_model_load_kwargs.py

coderabbitai · 2026-06-17T07:18:14Z

+    # Map the optimizer choice to its torch class. Defaults to Adam
+    optimizer_targets = {"adam": "torch.optim.Adam", "adamw": "torch.optim.AdamW"}
    cfg["optimizer"] = {
-        "_target_": "torch.optim.Adam",
+        "_target_": optimizer_targets.get(customizer_config.optimizer.optimizer_name, "torch.optim.Adam"),
        "lr": customizer_config.optimizer.learning_rate,


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Fail fast on unknown optimizer names instead of silently using Adam.

Line 260 masks bad config values by defaulting to Adam. That can run with a different optimizer than requested without any error.

Suggested fix

# Map the optimizer choice to its torch class. Defaults to Adam optimizer_targets = {"adam": "torch.optim.Adam", "adamw": "torch.optim.AdamW"} + optimizer_target = optimizer_targets.get(customizer_config.optimizer.optimizer_name) + if optimizer_target is None: + raise ValueError( + f"Unsupported optimizer '{customizer_config.optimizer.optimizer_name}'. " + f"Expected one of: {', '.join(sorted(optimizer_targets))}." + ) cfg["optimizer"] = { - "_target_": optimizer_targets.get(customizer_config.optimizer.optimizer_name, "torch.optim.Adam"), + "_target_": optimizer_target, "lr": customizer_config.optimizer.learning_rate, "weight_decay": customizer_config.optimizer.weight_decay, "betas": [customizer_config.optimizer.beta1, customizer_config.optimizer.beta2], "eps": customizer_config.optimizer.eps, # Adam epsilon for numerical stability }

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@services/automodel/src/nmp/automodel/tasks/training/backends/config.py` around lines 257 - 261, The optimizer_targets.get() call with a default fallback to Adam silently masks configuration errors when an unknown optimizer_name is provided. Instead of using the get method with a default, explicitly check if customizer_config.optimizer.optimizer_name exists in the optimizer_targets dictionary and raise a clear error if it does not, ensuring invalid optimizer configurations fail immediately rather than silently defaulting to Adam.

coderabbitai · 2026-06-17T07:18:14Z

+def build_peft_kwargs(spec: UnslothJobOutput, *, gradient_checkpointing: bool | str) -> dict[str, Any]:
+    """Assemble ``FastLanguageModel.get_peft_model`` kwargs for a LoRA run.
+
+    Torch-free (unit-testable). Caller resolves ``gradient_checkpointing`` from
+    ``spec.training.use_gradient_checkpointing`` (the JSON literal → ``True`` /
+    ``False`` / ``"unsloth"`` mapping). Optional knobs (``loftq_config``,
+    ``modules_to_save``, ``layers_to_transform``, ``layer_replication``) are only
+    emitted when set so PEFT/Unsloth see absence, not ``None``.
+    """
+    lora = spec.training.lora
+    assert lora is not None  # validated by UnslothJobInput
+    kwargs: dict[str, Any] = {


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Enforce the LoRA invariant in schema instead of relying on runtime assertions.

Line 95 assumes training.lora is present, but the schema default allows finetuning_type="lora" with lora=None. That path will crash at runtime. Enforce/default this in TrainingSpec so backend helpers never receive an invalid shape.

Proposed fix (schema-level invariant/defaulting)

diff --git a/services/unsloth/src/nmp/unsloth/schemas.py b/services/unsloth/src/nmp/unsloth/schemas.py @@ -from pydantic import BaseModel, ConfigDict, Field +from pydantic import BaseModel, ConfigDict, Field, model_validator @@ class TrainingSpec(BaseModel): @@ use_gradient_checkpointing: Literal["unsloth", "true", "false"] = "unsloth" + + `@model_validator`(mode="after") + def _enforce_lora_invariant(self) -> TrainingSpec: + if self.finetuning_type == "lora" and self.lora is None: + self.lora = LoRAParams() + if self.finetuning_type == "all_weights" and self.lora is not None: + raise ValueError("training.lora must be omitted when finetuning_type='all_weights'") + return self

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@services/unsloth/src/nmp/unsloth/tasks/training/backends/unsloth_sft.py` around lines 85 - 96, The issue is that the `build_peft_kwargs` function relies on a runtime assertion to check that `lora` is not None, but the schema allows `finetuning_type="lora"` with `lora=None`, which will cause a crash at runtime. Instead of relying on runtime assertions, enforce this invariant at the schema level in `TrainingSpec` by either making `lora` required when `finetuning_type="lora"` or providing an appropriate default value. This ensures backend helper functions like `build_peft_kwargs` never receive invalid input shapes without needing runtime assertions.

github-actions · 2026-06-17T07:19:55Z

Suite	Lines Covered	Line Rate	Branch Rate
Unit Tests	19503/25926	75.2%	60.7%
Integration Tests	11412/24698	46.2%	20.1%

feat(customizer): expose all hyperparams supported in backend lib

3544f77

Signed-off-by: anubhutiv <anubhutiv@nvidia.com>

anubhutivyas requested review from a team as code owners June 17, 2026 07:08

github-actions Bot added the feat label Jun 17, 2026

coderabbitai Bot reviewed Jun 17, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(customizer): expose all hyperparams supported in backend lib#371

feat(customizer): expose all hyperparams supported in backend lib#371
anubhutivyas wants to merge 1 commit into
mainfrom
aalgo-279-hyperparams-support/anubhutiv

anubhutivyas commented Jun 17, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented Jun 17, 2026

Walkthrough

Changes

Possibly related PRs

Suggested labels

Suggested reviewers

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Uh oh!

coderabbitai Bot Jun 17, 2026

Uh oh!

coderabbitai Bot Jun 17, 2026

Uh oh!

github-actions Bot commented Jun 17, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

anubhutivyas commented Jun 17, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Release Notes

Uh oh!

coderabbitai Bot commented Jun 17, 2026

Walkthrough

Changes

Possibly related PRs

Suggested labels

Suggested reviewers

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 17, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 17, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions Bot commented Jun 17, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

anubhutivyas commented Jun 17, 2026 •

edited by coderabbitai Bot

Loading