feat(customizer): expose all hyperparams supported in backend lib#371
feat(customizer): expose all hyperparams supported in backend lib#371anubhutivyas wants to merge 1 commit into
Conversation
Signed-off-by: anubhutiv <anubhutiv@nvidia.com>
📝 WalkthroughWalkthroughTwo parallel expansions add optional hyperparameter fields to automodel and unsloth services. Automodel gains LoRA exclusions, Triton toggle, attention backend selection, sequence-packing controls, and optimizer/LR-decay fields. Unsloth gains rope scaling, DoRA/LoRA init options, ChangesAutomodel pass-2 hyperparameter expansion
Unsloth pass-2 hyperparameter expansion
Hyperparameter reference documentation
Possibly related PRs
Suggested labels
Suggested reviewers
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@services/automodel/src/nmp/automodel/tasks/training/backends/config.py`:
- Around line 257-261: The optimizer_targets.get() call with a default fallback
to Adam silently masks configuration errors when an unknown optimizer_name is
provided. Instead of using the get method with a default, explicitly check if
customizer_config.optimizer.optimizer_name exists in the optimizer_targets
dictionary and raise a clear error if it does not, ensuring invalid optimizer
configurations fail immediately rather than silently defaulting to Adam.
In `@services/unsloth/src/nmp/unsloth/tasks/training/backends/unsloth_sft.py`:
- Around line 85-96: The issue is that the `build_peft_kwargs` function relies
on a runtime assertion to check that `lora` is not None, but the schema allows
`finetuning_type="lora"` with `lora=None`, which will cause a crash at runtime.
Instead of relying on runtime assertions, enforce this invariant at the schema
level in `TrainingSpec` by either making `lora` required when
`finetuning_type="lora"` or providing an appropriate default value. This ensures
backend helper functions like `build_peft_kwargs` never receive invalid input
shapes without needing runtime assertions.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: af7a496d-dc75-454e-8083-359c419f6870
📒 Files selected for processing (12)
plugins/nemo-automodel/src/nemo_automodel_plugin/schema.pyplugins/nemo-customizer/src/nemo_customizer/skills/nemo-customizer/references/hyperparameters.mdservices/automodel/src/nmp/automodel/adapter.pyservices/automodel/src/nmp/automodel/api/v2/jobs/schemas.pyservices/automodel/src/nmp/automodel/app/jobs/training/compiler.pyservices/automodel/src/nmp/automodel/app/jobs/training/schemas.pyservices/automodel/src/nmp/automodel/tasks/training/backends/config.pyservices/automodel/tests/test_adapter.pyservices/automodel/tests/test_compiler.pyservices/unsloth/src/nmp/unsloth/schemas.pyservices/unsloth/src/nmp/unsloth/tasks/training/backends/unsloth_sft.pyservices/unsloth/tests/test_model_load_kwargs.py
| # Map the optimizer choice to its torch class. Defaults to Adam | ||
| optimizer_targets = {"adam": "torch.optim.Adam", "adamw": "torch.optim.AdamW"} | ||
| cfg["optimizer"] = { | ||
| "_target_": "torch.optim.Adam", | ||
| "_target_": optimizer_targets.get(customizer_config.optimizer.optimizer_name, "torch.optim.Adam"), | ||
| "lr": customizer_config.optimizer.learning_rate, |
There was a problem hiding this comment.
Fail fast on unknown optimizer names instead of silently using Adam.
Line 260 masks bad config values by defaulting to Adam. That can run with a different optimizer than requested without any error.
Suggested fix
# Map the optimizer choice to its torch class. Defaults to Adam
optimizer_targets = {"adam": "torch.optim.Adam", "adamw": "torch.optim.AdamW"}
+ optimizer_target = optimizer_targets.get(customizer_config.optimizer.optimizer_name)
+ if optimizer_target is None:
+ raise ValueError(
+ f"Unsupported optimizer '{customizer_config.optimizer.optimizer_name}'. "
+ f"Expected one of: {', '.join(sorted(optimizer_targets))}."
+ )
cfg["optimizer"] = {
- "_target_": optimizer_targets.get(customizer_config.optimizer.optimizer_name, "torch.optim.Adam"),
+ "_target_": optimizer_target,
"lr": customizer_config.optimizer.learning_rate,
"weight_decay": customizer_config.optimizer.weight_decay,
"betas": [customizer_config.optimizer.beta1, customizer_config.optimizer.beta2],
"eps": customizer_config.optimizer.eps, # Adam epsilon for numerical stability
}🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@services/automodel/src/nmp/automodel/tasks/training/backends/config.py`
around lines 257 - 261, The optimizer_targets.get() call with a default fallback
to Adam silently masks configuration errors when an unknown optimizer_name is
provided. Instead of using the get method with a default, explicitly check if
customizer_config.optimizer.optimizer_name exists in the optimizer_targets
dictionary and raise a clear error if it does not, ensuring invalid optimizer
configurations fail immediately rather than silently defaulting to Adam.
| def build_peft_kwargs(spec: UnslothJobOutput, *, gradient_checkpointing: bool | str) -> dict[str, Any]: | ||
| """Assemble ``FastLanguageModel.get_peft_model`` kwargs for a LoRA run. | ||
|
|
||
| Torch-free (unit-testable). Caller resolves ``gradient_checkpointing`` from | ||
| ``spec.training.use_gradient_checkpointing`` (the JSON literal → ``True`` / | ||
| ``False`` / ``"unsloth"`` mapping). Optional knobs (``loftq_config``, | ||
| ``modules_to_save``, ``layers_to_transform``, ``layer_replication``) are only | ||
| emitted when set so PEFT/Unsloth see absence, not ``None``. | ||
| """ | ||
| lora = spec.training.lora | ||
| assert lora is not None # validated by UnslothJobInput | ||
| kwargs: dict[str, Any] = { |
There was a problem hiding this comment.
Enforce the LoRA invariant in schema instead of relying on runtime assertions.
Line 95 assumes training.lora is present, but the schema default allows finetuning_type="lora" with lora=None. That path will crash at runtime. Enforce/default this in TrainingSpec so backend helpers never receive an invalid shape.
Proposed fix (schema-level invariant/defaulting)
diff --git a/services/unsloth/src/nmp/unsloth/schemas.py b/services/unsloth/src/nmp/unsloth/schemas.py
@@
-from pydantic import BaseModel, ConfigDict, Field
+from pydantic import BaseModel, ConfigDict, Field, model_validator
@@
class TrainingSpec(BaseModel):
@@
use_gradient_checkpointing: Literal["unsloth", "true", "false"] = "unsloth"
+
+ `@model_validator`(mode="after")
+ def _enforce_lora_invariant(self) -> TrainingSpec:
+ if self.finetuning_type == "lora" and self.lora is None:
+ self.lora = LoRAParams()
+ if self.finetuning_type == "all_weights" and self.lora is not None:
+ raise ValueError("training.lora must be omitted when finetuning_type='all_weights'")
+ return self🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@services/unsloth/src/nmp/unsloth/tasks/training/backends/unsloth_sft.py`
around lines 85 - 96, The issue is that the `build_peft_kwargs` function relies
on a runtime assertion to check that `lora` is not None, but the schema allows
`finetuning_type="lora"` with `lora=None`, which will cause a crash at runtime.
Instead of relying on runtime assertions, enforce this invariant at the schema
level in `TrainingSpec` by either making `lora` required when
`finetuning_type="lora"` or providing an appropriate default value. This ensures
backend helper functions like `build_peft_kwargs` never receive invalid input
shapes without needing runtime assertions.
|
Summary by CodeRabbit
Release Notes
New Features
Documentation