Skip to content

feat(customizer): expose all hyperparams supported in backend lib#371

Open
anubhutivyas wants to merge 1 commit into
mainfrom
aalgo-279-hyperparams-support/anubhutiv
Open

feat(customizer): expose all hyperparams supported in backend lib#371
anubhutivyas wants to merge 1 commit into
mainfrom
aalgo-279-hyperparams-support/anubhutiv

Conversation

@anubhutivyas

@anubhutivyas anubhutivyas commented Jun 17, 2026

Copy link
Copy Markdown
Contributor

Summary by CodeRabbit

Release Notes

New Features

  • Added flexible optimizer configuration for training, including optimizer type selection and configurable learning rate decay scheduling.
  • Introduced attention implementation selection options for model training.
  • Expanded LoRA adapter configuration with module exclusion and kernel optimization support.
  • Added sequence packing controls to optimize training batch processing.

Documentation

  • Expanded hyperparameter reference documentation to include newly supported training configuration options.

Signed-off-by: anubhutiv <anubhutiv@nvidia.com>
@anubhutivyas anubhutivyas requested review from a team as code owners June 17, 2026 07:08
@github-actions github-actions Bot added the feat label Jun 17, 2026
@coderabbitai

coderabbitai Bot commented Jun 17, 2026

Copy link
Copy Markdown

Review Change Stack

📝 Walkthrough

Walkthrough

Two parallel expansions add optional hyperparameter fields to automodel and unsloth services. Automodel gains LoRA exclusions, Triton toggle, attention backend selection, sequence-packing controls, and optimizer/LR-decay fields. Unsloth gains rope scaling, DoRA/LoRA init options, lr_scheduler_kwargs, and additional Adam/regularization parameters. All changes propagate through schemas, adapters, compilers, backends, and docs.

Changes

Automodel pass-2 hyperparameter expansion

Layer / File(s) Summary
Automodel schema contracts
plugins/nemo-automodel/src/nemo_automodel_plugin/schema.py, services/automodel/src/nmp/automodel/api/v2/jobs/schemas.py, services/automodel/src/nmp/automodel/app/jobs/training/schemas.py
Plugin LoRAParams, TrainingSpec, BatchSpec, OptimizerSpec gain new fields; API v2 LoRAParams and _TrainingBase mirror them; internal TrainingStepConfig.BatchConfig and OptimizerConfig are extended with split_across_pack, optimizer_name, and lr_decay_style.
Adapter and compiler field plumbing
services/automodel/src/nmp/automodel/adapter.py, services/automodel/src/nmp/automodel/app/jobs/training/compiler.py
Adapter maps new plugin fields into v2 SFTTraining; compiler threads them from API schemas into TrainingStepConfig, replacing hardcoded use_triton=True.
Training backend dynamic wiring
services/automodel/src/nmp/automodel/tasks/training/backends/config.py
compile_automodel_config looks up optimizer class by name, reads lr_decay_style and split_across_pack from config, and conditionally injects exclude_modules into PEFT config.
Adapter and compiler tests
services/automodel/tests/test_adapter.py, services/automodel/tests/test_compiler.py
Two adapter tests cover explicit pass-2 field propagation and default behavior; one compiler test verifies all pass-2 fields are carried through compile_training_step.

Unsloth pass-2 hyperparameter expansion

Layer / File(s) Summary
Unsloth schema contracts
services/unsloth/src/nmp/unsloth/schemas.py
ModelLoadSpec gains rope_scaling; LoRAParams gains DoRA/init/module-selection fields; ScheduleSpec gains lr_scheduler_kwargs; OptimizerSpec gains Adam betas/epsilon, grad norm, label smoothing, NEFTune alpha.
build_peft_kwargs helper and SFTConfig wiring
services/unsloth/src/nmp/unsloth/tasks/training/backends/unsloth_sft.py
build_model_load_kwargs conditionally includes rope_scaling; new build_peft_kwargs centralizes PEFT arg construction with optional fields; SFTConfig wiring adds optimizer fields and conditional neftune_noise_alpha/lr_scheduler_kwargs.
Unsloth kwargs builder tests
services/unsloth/tests/test_model_load_kwargs.py
_spec helper extended; new tests cover rope_scaling forwarding and build_peft_kwargs defaults and optional field emission.

Hyperparameter reference documentation

Layer / File(s) Summary
Automodel and unsloth hyperparameter docs
plugins/nemo-customizer/src/nemo_customizer/skills/nemo-customizer/references/hyperparameters.md
JSON templates and field glossary tables updated for all newly added schema fields in both automodel and unsloth sections.

Possibly related PRs

  • NVIDIA-NeMo/nemo-platform#346: Modifies build_model_load_kwargs and train_sft in unsloth_sft.py with full_finetuning/kwargs construction that this PR further extends with conditional rope_scaling and updated PEFT/optimizer wiring.

Suggested labels

chore

Suggested reviewers

  • albcui
  • soluwalana
🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 50.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed Title directly describes the main change: exposing additional hyperparameters from backend libraries in the customizer. Aligns with the extensive schema additions across multiple files.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch aalgo-279-hyperparams-support/anubhutiv

Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@services/automodel/src/nmp/automodel/tasks/training/backends/config.py`:
- Around line 257-261: The optimizer_targets.get() call with a default fallback
to Adam silently masks configuration errors when an unknown optimizer_name is
provided. Instead of using the get method with a default, explicitly check if
customizer_config.optimizer.optimizer_name exists in the optimizer_targets
dictionary and raise a clear error if it does not, ensuring invalid optimizer
configurations fail immediately rather than silently defaulting to Adam.

In `@services/unsloth/src/nmp/unsloth/tasks/training/backends/unsloth_sft.py`:
- Around line 85-96: The issue is that the `build_peft_kwargs` function relies
on a runtime assertion to check that `lora` is not None, but the schema allows
`finetuning_type="lora"` with `lora=None`, which will cause a crash at runtime.
Instead of relying on runtime assertions, enforce this invariant at the schema
level in `TrainingSpec` by either making `lora` required when
`finetuning_type="lora"` or providing an appropriate default value. This ensures
backend helper functions like `build_peft_kwargs` never receive invalid input
shapes without needing runtime assertions.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: af7a496d-dc75-454e-8083-359c419f6870

📥 Commits

Reviewing files that changed from the base of the PR and between d36974f and 3544f77.

📒 Files selected for processing (12)
  • plugins/nemo-automodel/src/nemo_automodel_plugin/schema.py
  • plugins/nemo-customizer/src/nemo_customizer/skills/nemo-customizer/references/hyperparameters.md
  • services/automodel/src/nmp/automodel/adapter.py
  • services/automodel/src/nmp/automodel/api/v2/jobs/schemas.py
  • services/automodel/src/nmp/automodel/app/jobs/training/compiler.py
  • services/automodel/src/nmp/automodel/app/jobs/training/schemas.py
  • services/automodel/src/nmp/automodel/tasks/training/backends/config.py
  • services/automodel/tests/test_adapter.py
  • services/automodel/tests/test_compiler.py
  • services/unsloth/src/nmp/unsloth/schemas.py
  • services/unsloth/src/nmp/unsloth/tasks/training/backends/unsloth_sft.py
  • services/unsloth/tests/test_model_load_kwargs.py

Comment on lines +257 to 261
# Map the optimizer choice to its torch class. Defaults to Adam
optimizer_targets = {"adam": "torch.optim.Adam", "adamw": "torch.optim.AdamW"}
cfg["optimizer"] = {
"_target_": "torch.optim.Adam",
"_target_": optimizer_targets.get(customizer_config.optimizer.optimizer_name, "torch.optim.Adam"),
"lr": customizer_config.optimizer.learning_rate,

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Fail fast on unknown optimizer names instead of silently using Adam.

Line 260 masks bad config values by defaulting to Adam. That can run with a different optimizer than requested without any error.

Suggested fix
     # Map the optimizer choice to its torch class. Defaults to Adam
     optimizer_targets = {"adam": "torch.optim.Adam", "adamw": "torch.optim.AdamW"}
+    optimizer_target = optimizer_targets.get(customizer_config.optimizer.optimizer_name)
+    if optimizer_target is None:
+        raise ValueError(
+            f"Unsupported optimizer '{customizer_config.optimizer.optimizer_name}'. "
+            f"Expected one of: {', '.join(sorted(optimizer_targets))}."
+        )
     cfg["optimizer"] = {
-        "_target_": optimizer_targets.get(customizer_config.optimizer.optimizer_name, "torch.optim.Adam"),
+        "_target_": optimizer_target,
         "lr": customizer_config.optimizer.learning_rate,
         "weight_decay": customizer_config.optimizer.weight_decay,
         "betas": [customizer_config.optimizer.beta1, customizer_config.optimizer.beta2],
         "eps": customizer_config.optimizer.eps,  # Adam epsilon for numerical stability
     }
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@services/automodel/src/nmp/automodel/tasks/training/backends/config.py`
around lines 257 - 261, The optimizer_targets.get() call with a default fallback
to Adam silently masks configuration errors when an unknown optimizer_name is
provided. Instead of using the get method with a default, explicitly check if
customizer_config.optimizer.optimizer_name exists in the optimizer_targets
dictionary and raise a clear error if it does not, ensuring invalid optimizer
configurations fail immediately rather than silently defaulting to Adam.

Comment on lines +85 to +96
def build_peft_kwargs(spec: UnslothJobOutput, *, gradient_checkpointing: bool | str) -> dict[str, Any]:
"""Assemble ``FastLanguageModel.get_peft_model`` kwargs for a LoRA run.

Torch-free (unit-testable). Caller resolves ``gradient_checkpointing`` from
``spec.training.use_gradient_checkpointing`` (the JSON literal → ``True`` /
``False`` / ``"unsloth"`` mapping). Optional knobs (``loftq_config``,
``modules_to_save``, ``layers_to_transform``, ``layer_replication``) are only
emitted when set so PEFT/Unsloth see absence, not ``None``.
"""
lora = spec.training.lora
assert lora is not None # validated by UnslothJobInput
kwargs: dict[str, Any] = {

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Enforce the LoRA invariant in schema instead of relying on runtime assertions.

Line 95 assumes training.lora is present, but the schema default allows finetuning_type="lora" with lora=None. That path will crash at runtime. Enforce/default this in TrainingSpec so backend helpers never receive an invalid shape.

Proposed fix (schema-level invariant/defaulting)
diff --git a/services/unsloth/src/nmp/unsloth/schemas.py b/services/unsloth/src/nmp/unsloth/schemas.py
@@
-from pydantic import BaseModel, ConfigDict, Field
+from pydantic import BaseModel, ConfigDict, Field, model_validator
@@
 class TrainingSpec(BaseModel):
@@
     use_gradient_checkpointing: Literal["unsloth", "true", "false"] = "unsloth"
+
+    `@model_validator`(mode="after")
+    def _enforce_lora_invariant(self) -> TrainingSpec:
+        if self.finetuning_type == "lora" and self.lora is None:
+            self.lora = LoRAParams()
+        if self.finetuning_type == "all_weights" and self.lora is not None:
+            raise ValueError("training.lora must be omitted when finetuning_type='all_weights'")
+        return self
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@services/unsloth/src/nmp/unsloth/tasks/training/backends/unsloth_sft.py`
around lines 85 - 96, The issue is that the `build_peft_kwargs` function relies
on a runtime assertion to check that `lora` is not None, but the schema allows
`finetuning_type="lora"` with `lora=None`, which will cause a crash at runtime.
Instead of relying on runtime assertions, enforce this invariant at the schema
level in `TrainingSpec` by either making `lora` required when
`finetuning_type="lora"` or providing an appropriate default value. This ensures
backend helper functions like `build_peft_kwargs` never receive invalid input
shapes without needing runtime assertions.

@github-actions

Copy link
Copy Markdown
Contributor
Suite Lines Covered Line Rate Branch Rate
Unit Tests 19503/25926 75.2% 60.7%
Integration Tests 11412/24698 46.2% 20.1%

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant