Describe the bug
The checkpoint loading logic contains two bugs due to incorrect function and attribute references, preventing proper resumption of training from a saved checkpoint.
-
In examples/text/logic/state.py, line 62: self._data_state.test.load_state_dict(loaded_state["test_sampler"])
(FIX: self._data_state.test.sampler.load_state_dict(loaded_state["test_sampler"]) )
Here, there's a typo that tries to yoink a state_dict from a Dataset class
-
In examples/text/main_train.py, line 27: cfg = checkpointing.load_hydra_config_from_run(cfg.load_dir)
(FIX: cfg = checkpointing.load_cfg_from_path(cfg.load_dir) )
Here, the function name is incorrect; the function exists with a different name in utils/checkpointing.py.
To Reproduce
Set load_dir = 'path_to_ckpt_parent' in examples/text/configs/config.yaml and run examples/text/run_train.py
Expected behavior
The checkpoint gets picked up, and training resumes.
Describe the bug
The checkpoint loading logic contains two bugs due to incorrect function and attribute references, preventing proper resumption of training from a saved checkpoint.
In
examples/text/logic/state.py, line 62:self._data_state.test.load_state_dict(loaded_state["test_sampler"])(FIX:
self._data_state.test.sampler.load_state_dict(loaded_state["test_sampler"]))Here, there's a typo that tries to yoink a
state_dictfrom aDatasetclassIn
examples/text/main_train.py, line 27:cfg = checkpointing.load_hydra_config_from_run(cfg.load_dir)(FIX: cfg =
checkpointing.load_cfg_from_path(cfg.load_dir))Here, the function name is incorrect; the function exists with a different name in
utils/checkpointing.py.To Reproduce
Set load_dir = 'path_to_ckpt_parent' in
examples/text/configs/config.yamland runexamples/text/run_train.pyExpected behavior
The checkpoint gets picked up, and training resumes.