Skip to content

[fix] Preserve original AutoConfig in MegatronWorker.init_configs#1727

Open
dinhxuanvu wants to merge 3 commits into
NovaSky-AI:mainfrom
dinhxuanvu:vdinh/fix-megatron-worker-config-mutation
Open

[fix] Preserve original AutoConfig in MegatronWorker.init_configs#1727
dinhxuanvu wants to merge 3 commits into
NovaSky-AI:mainfrom
dinhxuanvu:vdinh/fix-megatron-worker-config-mutation

Conversation

@dinhxuanvu
Copy link
Copy Markdown
Contributor

@dinhxuanvu dinhxuanvu commented May 29, 2026

Summary

A Megatron-Bridge SFT run launched with overrides like ++transformer_config_kwargs.mtp_num_layers=0 produced an exported HF checkpoint whose config.json carried the runtime override (num_nextn_predict_layers=0) instead of the on-disk source-of-truth value (e.g. 4 for Nemotron-H 120B). The saved directory was silently inconsistent with the reference repo. This PR makes update_model_config non-mutating and ensures MegatronStrategy.save_hf_configs publishes the un-mutated, on-disk AutoConfig.

Root cause

update_model_config in skyrl/train/utils/utils.py mutated the input AutoConfig in place via setattr (recursing into sub-configs the same way). MegatronWorker.init_configs in skyrl/backends/skyrl_train/workers/megatron/megatron_worker.py then assigned the already-mutated object to self.strategy.hf_config, and MegatronStrategy.save_hf_configs (in skyrl/backends/skyrl_train/distributed/strategy.py) published it via model_config.save_pretrained(work_dir).

Fix

  • skyrl/train/utils/utils.py: update_model_config now returns a deepcopy of the input with overrides applied. The recursive setattr logic is preserved unchanged in a private _apply_overrides_in_place helper used internally on the copy.
  • skyrl/backends/skyrl_train/workers/megatron/megatron_worker.py: init_configs keeps the un-mutated AutoConfig.from_pretrained(...) result as hf_config_original and assigns it to self.strategy.hf_config, while the Megatron provider/runtime path continues to use the override-applied copy returned by update_model_config.

Trade-off on tokenizer-id overrides

The override dict in MegatronWorker.init_configs mixes two categories: tokenizer-id overrides (bos_token_id, eos_token_id, pad_token_id) and runtime overrides (mtp_num_layers=0, etc). Pre-PR, both leaked into the saved config.json; post-PR, neither does. We accept this as the right behavior because (i) the on-disk config.json should match the source repo exactly, (ii) tokenizer ids are duplicated in tokenizer_config.json / special_tokens_map.json which are written separately via tokenizer.save_pretrained, and (iii) the on-disk-source-of-truth contract is cleaner. If a future use case needs tokenizer-id overrides persisted to the saved config, splitting the override dict into tokenizer_overrides (applied to the saved copy) and model_overrides (runtime-only) is a localized follow-up.

Verification

  • CPU unit tests added at tests/train/utils/test_utils.py covering the non-mutating contract: input config is unchanged at top level, returned copy carries top-level overrides, nested-dict overrides do not leak back into the input. Includes the Bug B repro (update_model_config(cfg, {"num_nextn_predict_layers": 0}) leaves the input's num_nextn_predict_layers at its original value). Runs in cpu_skyrl_train.yaml.
  • Lint/format clean: pre-commit run --files skyrl/train/utils/utils.py skyrl/backends/skyrl_train/workers/megatron/megatron_worker.py tests/train/utils/test_utils.py (ruff, black, gitleaks) all pass.
  • Manual repro target: after this fix, config.json written by MegatronStrategy.save_hf_configs for a Nemotron-H 120B SFT run launched with ++transformer_config_kwargs.mtp_num_layers=0 matches the on-disk reference (no num_nextn_predict_layers=0 leak).

Risk

Low. The only call site of update_model_config is updated in this commit to consume the returned value. The recursion-via-setattr semantics are identical to before; only the target object changed (a deep-copied config instead of the caller's input).

Notes

The _apply_overrides_in_place helper is the original recursive body and remains internally callable, so future call sites that explicitly want in-place behavior can opt in.

When a Megatron-Bridge SFT run was launched with overrides like
`++transformer_config_kwargs.mtp_num_layers=0`, the exported HF
checkpoint's `config.json` carried the runtime override
(`num_nextn_predict_layers=0`) instead of the on-disk source-of-truth
value (e.g. 4 for Nemotron-H 120B), making the saved directory
silently inconsistent with the reference repo.

Root cause: `update_model_config` mutated the input `AutoConfig`
in place via `setattr` (recursing into sub-configs the same way).
`MegatronWorker.init_configs` then assigned that already-mutated
object to `self.strategy.hf_config`, and `MegatronStrategy
.save_hf_configs` published it via `model_config.save_pretrained
(work_dir)`.

Fix:
- `skyrl/train/utils/utils.py`: `update_model_config` now returns a
  `deepcopy` of the input with overrides applied. The recursive
  setattr logic is preserved unchanged in a private
  `_apply_overrides_in_place` helper used internally on the copy.
- `skyrl/backends/skyrl_train/workers/megatron/megatron_worker.py`:
  `init_configs` keeps the un-mutated `AutoConfig.from_pretrained
  (...)` result as `hf_config_original` and assigns it to
  `self.strategy.hf_config`, while the Megatron provider/runtime
  path continues to use the override-applied copy returned by
  `update_model_config`.

Verification:
- pre-commit hooks (ruff, black, gitleaks): pass on both files.
- Standalone unit-style smoke test (recreating the function bodies
  in isolation, since the module pulls in heavy runtime deps):
  asserts the original config is unchanged, the returned copy
  carries top-level overrides, and nested-dict overrides also do
  not leak back into the input. Includes a Bug B-style repro:
  `update_model_config(hf_cfg, {"num_nextn_predict_layers": 0})`
  leaves `hf_cfg.num_nextn_predict_layers` at the original value.
- No existing test in `tests/` references `update_model_config`,
  `MegatronWorker.init_configs`, or `save_hf_configs`, so this
  path was previously uncovered.

Backward compatibility: the only call site of `update_model_config`
is `MegatronWorker.init_configs`, which is updated in this commit
to consume the returned value. The recursion-via-setattr semantics
are identical to before — only the target object changed (a
deep-copied config instead of the caller's input).

Signed-off-by: Vu Dinh <vudinh@outlook.com>
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request updates update_model_config to return a deep copy of the configuration with overrides applied, ensuring the original configuration remains unmodified. This allows megatron_worker.py to preserve the unmutated hf_config_original for self.strategy.hf_config. The review feedback points out a potential AttributeError in _apply_overrides_in_place if nested attributes are standard dictionaries (such as rope_scaling) rather than objects, and suggests a more robust implementation that handles both dictionaries and objects while adding defensive None checks.

Comment on lines +999 to 1005
def _apply_overrides_in_place(module_config, override_config_kwargs):
"""Apply override kwargs to ``module_config`` in place (used for sub-configs)."""
for key, val in override_config_kwargs.items():
if isinstance(val, dict):
update_model_config(getattr(module_config, key), val)
_apply_overrides_in_place(getattr(module_config, key), val)
else:
setattr(module_config, key, val)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The current implementation of _apply_overrides_in_place assumes that module_config and all its nested attributes are objects supporting getattr and setattr. However, many HuggingFace configurations contain standard dictionary attributes (e.g., rope_scaling, quantization_config). If an override contains a nested dictionary for one of these attributes, calling getattr on it returns a dict, and recursing into it will cause an AttributeError when setattr is called on the dictionary keys.

Additionally, if module_config is None or if a nested attribute does not exist (returning None), the function will crash. Enforcing defensive programming with proper None checks and handling both dictionaries and objects will make this utility much more robust.

def _apply_overrides_in_place(module_config, override_config_kwargs):
    """Apply override kwargs to ``module_config`` in place (used for sub-configs)."""
    if module_config is None:
        return
    for key, val in override_config_kwargs.items():
        is_dict = isinstance(module_config, dict)
        current_val = module_config.get(key) if is_dict else getattr(module_config, key, None)

        if isinstance(val, dict) and current_val is not None:
            _apply_overrides_in_place(current_val, val)
        else:
            if is_dict:
                module_config[key] = val
            else:
                setattr(module_config, key, val)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Real issue, but pre-existing — the recursive getattr/setattr pattern is unchanged from the prior in-place version; I only relocated it. The only caller (init_configs) passes scalar overrides, so no observable regression today.

Also, the suggested patch conflates "recurse into existing sub-config" with "assign new dict" when current_val is None. Worth a dedicated PR. Happy to file a follow-up issue.

Locks in three properties of update_model_config:
- the caller's input config is not mutated at the top level,
- the returned copy carries top-level overrides,
- nested-dict overrides do not leak back into the caller's input.

Includes the Bug B repro: update_model_config(cfg, {"num_nextn_predict_layers": 0})
leaves the input's num_nextn_predict_layers at its original value.

Runs on cpu_skyrl_train.yaml (uv run --extra skyrl-train --extra dev pytest tests/train/...).

Signed-off-by: Vu Dinh <vudinh@outlook.com>
Avoids a pytest module-name collision with tests/train/generators/test_utils.py
(no __init__.py in either directory; pytest's rootdir-based collection treats
both as the same top-level module name and aborts collection). The new basename
is also more descriptive of what the file actually exercises.

Signed-off-by: Vu Dinh <vudinh@outlook.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant