[fix] Preserve original AutoConfig in MegatronWorker.init_configs#1727
[fix] Preserve original AutoConfig in MegatronWorker.init_configs#1727dinhxuanvu wants to merge 3 commits into
Conversation
When a Megatron-Bridge SFT run was launched with overrides like
`++transformer_config_kwargs.mtp_num_layers=0`, the exported HF
checkpoint's `config.json` carried the runtime override
(`num_nextn_predict_layers=0`) instead of the on-disk source-of-truth
value (e.g. 4 for Nemotron-H 120B), making the saved directory
silently inconsistent with the reference repo.
Root cause: `update_model_config` mutated the input `AutoConfig`
in place via `setattr` (recursing into sub-configs the same way).
`MegatronWorker.init_configs` then assigned that already-mutated
object to `self.strategy.hf_config`, and `MegatronStrategy
.save_hf_configs` published it via `model_config.save_pretrained
(work_dir)`.
Fix:
- `skyrl/train/utils/utils.py`: `update_model_config` now returns a
`deepcopy` of the input with overrides applied. The recursive
setattr logic is preserved unchanged in a private
`_apply_overrides_in_place` helper used internally on the copy.
- `skyrl/backends/skyrl_train/workers/megatron/megatron_worker.py`:
`init_configs` keeps the un-mutated `AutoConfig.from_pretrained
(...)` result as `hf_config_original` and assigns it to
`self.strategy.hf_config`, while the Megatron provider/runtime
path continues to use the override-applied copy returned by
`update_model_config`.
Verification:
- pre-commit hooks (ruff, black, gitleaks): pass on both files.
- Standalone unit-style smoke test (recreating the function bodies
in isolation, since the module pulls in heavy runtime deps):
asserts the original config is unchanged, the returned copy
carries top-level overrides, and nested-dict overrides also do
not leak back into the input. Includes a Bug B-style repro:
`update_model_config(hf_cfg, {"num_nextn_predict_layers": 0})`
leaves `hf_cfg.num_nextn_predict_layers` at the original value.
- No existing test in `tests/` references `update_model_config`,
`MegatronWorker.init_configs`, or `save_hf_configs`, so this
path was previously uncovered.
Backward compatibility: the only call site of `update_model_config`
is `MegatronWorker.init_configs`, which is updated in this commit
to consume the returned value. The recursion-via-setattr semantics
are identical to before — only the target object changed (a
deep-copied config instead of the caller's input).
Signed-off-by: Vu Dinh <vudinh@outlook.com>
There was a problem hiding this comment.
Code Review
This pull request updates update_model_config to return a deep copy of the configuration with overrides applied, ensuring the original configuration remains unmodified. This allows megatron_worker.py to preserve the unmutated hf_config_original for self.strategy.hf_config. The review feedback points out a potential AttributeError in _apply_overrides_in_place if nested attributes are standard dictionaries (such as rope_scaling) rather than objects, and suggests a more robust implementation that handles both dictionaries and objects while adding defensive None checks.
| def _apply_overrides_in_place(module_config, override_config_kwargs): | ||
| """Apply override kwargs to ``module_config`` in place (used for sub-configs).""" | ||
| for key, val in override_config_kwargs.items(): | ||
| if isinstance(val, dict): | ||
| update_model_config(getattr(module_config, key), val) | ||
| _apply_overrides_in_place(getattr(module_config, key), val) | ||
| else: | ||
| setattr(module_config, key, val) |
There was a problem hiding this comment.
The current implementation of _apply_overrides_in_place assumes that module_config and all its nested attributes are objects supporting getattr and setattr. However, many HuggingFace configurations contain standard dictionary attributes (e.g., rope_scaling, quantization_config). If an override contains a nested dictionary for one of these attributes, calling getattr on it returns a dict, and recursing into it will cause an AttributeError when setattr is called on the dictionary keys.
Additionally, if module_config is None or if a nested attribute does not exist (returning None), the function will crash. Enforcing defensive programming with proper None checks and handling both dictionaries and objects will make this utility much more robust.
def _apply_overrides_in_place(module_config, override_config_kwargs):
"""Apply override kwargs to ``module_config`` in place (used for sub-configs)."""
if module_config is None:
return
for key, val in override_config_kwargs.items():
is_dict = isinstance(module_config, dict)
current_val = module_config.get(key) if is_dict else getattr(module_config, key, None)
if isinstance(val, dict) and current_val is not None:
_apply_overrides_in_place(current_val, val)
else:
if is_dict:
module_config[key] = val
else:
setattr(module_config, key, val)There was a problem hiding this comment.
Real issue, but pre-existing — the recursive getattr/setattr pattern is unchanged from the prior in-place version; I only relocated it. The only caller (init_configs) passes scalar overrides, so no observable regression today.
Also, the suggested patch conflates "recurse into existing sub-config" with "assign new dict" when current_val is None. Worth a dedicated PR. Happy to file a follow-up issue.
Locks in three properties of update_model_config:
- the caller's input config is not mutated at the top level,
- the returned copy carries top-level overrides,
- nested-dict overrides do not leak back into the caller's input.
Includes the Bug B repro: update_model_config(cfg, {"num_nextn_predict_layers": 0})
leaves the input's num_nextn_predict_layers at its original value.
Runs on cpu_skyrl_train.yaml (uv run --extra skyrl-train --extra dev pytest tests/train/...).
Signed-off-by: Vu Dinh <vudinh@outlook.com>
Avoids a pytest module-name collision with tests/train/generators/test_utils.py (no __init__.py in either directory; pytest's rootdir-based collection treats both as the same top-level module name and aborts collection). The new basename is also more descriptive of what the file actually exercises. Signed-off-by: Vu Dinh <vudinh@outlook.com>
Summary
A Megatron-Bridge SFT run launched with overrides like
++transformer_config_kwargs.mtp_num_layers=0produced an exported HF checkpoint whoseconfig.jsoncarried the runtime override (num_nextn_predict_layers=0) instead of the on-disk source-of-truth value (e.g. 4 for Nemotron-H 120B). The saved directory was silently inconsistent with the reference repo. This PR makesupdate_model_confignon-mutating and ensuresMegatronStrategy.save_hf_configspublishes the un-mutated, on-diskAutoConfig.Root cause
update_model_configinskyrl/train/utils/utils.pymutated the inputAutoConfigin place viasetattr(recursing into sub-configs the same way).MegatronWorker.init_configsinskyrl/backends/skyrl_train/workers/megatron/megatron_worker.pythen assigned the already-mutated object toself.strategy.hf_config, andMegatronStrategy.save_hf_configs(inskyrl/backends/skyrl_train/distributed/strategy.py) published it viamodel_config.save_pretrained(work_dir).Fix
skyrl/train/utils/utils.py:update_model_confignow returns adeepcopyof the input with overrides applied. The recursive setattr logic is preserved unchanged in a private_apply_overrides_in_placehelper used internally on the copy.skyrl/backends/skyrl_train/workers/megatron/megatron_worker.py:init_configskeeps the un-mutatedAutoConfig.from_pretrained(...)result ashf_config_originaland assigns it toself.strategy.hf_config, while the Megatron provider/runtime path continues to use the override-applied copy returned byupdate_model_config.Trade-off on tokenizer-id overrides
The override dict in
MegatronWorker.init_configsmixes two categories: tokenizer-id overrides (bos_token_id,eos_token_id,pad_token_id) and runtime overrides (mtp_num_layers=0, etc). Pre-PR, both leaked into the savedconfig.json; post-PR, neither does. We accept this as the right behavior because (i) the on-diskconfig.jsonshould match the source repo exactly, (ii) tokenizer ids are duplicated intokenizer_config.json/special_tokens_map.jsonwhich are written separately viatokenizer.save_pretrained, and (iii) the on-disk-source-of-truth contract is cleaner. If a future use case needs tokenizer-id overrides persisted to the saved config, splitting the override dict intotokenizer_overrides(applied to the saved copy) andmodel_overrides(runtime-only) is a localized follow-up.Verification
tests/train/utils/test_utils.pycovering the non-mutating contract: input config is unchanged at top level, returned copy carries top-level overrides, nested-dict overrides do not leak back into the input. Includes the Bug B repro (update_model_config(cfg, {"num_nextn_predict_layers": 0})leaves the input'snum_nextn_predict_layersat its original value). Runs incpu_skyrl_train.yaml.pre-commit run --files skyrl/train/utils/utils.py skyrl/backends/skyrl_train/workers/megatron/megatron_worker.py tests/train/utils/test_utils.py(ruff, black, gitleaks) all pass.config.jsonwritten byMegatronStrategy.save_hf_configsfor a Nemotron-H 120B SFT run launched with++transformer_config_kwargs.mtp_num_layers=0matches the on-disk reference (nonum_nextn_predict_layers=0leak).Risk
Low. The only call site of
update_model_configis updated in this commit to consume the returned value. The recursion-via-setattr semantics are identical to before; only the target object changed (a deep-copied config instead of the caller's input).Notes
The
_apply_overrides_in_placehelper is the original recursive body and remains internally callable, so future call sites that explicitly want in-place behavior can opt in.