Error during teardown of trainer.test() #15851
Unanswered
Alec-Wright
asked this question in
Lightning Trainer API: Trainer, LightningModule, LightningDataModule
Replies: 1 comment
-
|
did you figure this out? i am getting the same error when loading a checkpoint with trainer.fit(ckpt_path=...). If I load the checkpoint with just torch.load and load_state_dict then it loads just fine. |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
I've been getting this strange error. The training all completes fine. Then when the test step is running, it computes the test loss just fine, it saves it to the logger and everything. Then as it exits trainer.test() (at least I think that's where the error occurs), I get this error:
Traceback (most recent call last): File "/scratch/work/wrighta2/conda_envs/pruning/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py", line 650, in _call_and_handle_interrupt return trainer_fn(*args, **kwargs) File "/scratch/work/wrighta2/conda_envs/pruning/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py", line 909, in _test_impl results = self._run(model, ckpt_path=self.ckpt_path) File "/scratch/work/wrighta2/conda_envs/pruning/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py", line 1169, in _run self._teardown() File "/scratch/work/wrighta2/conda_envs/pruning/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py", line 1229, in _teardown self.strategy.teardown() File "/scratch/work/wrighta2/conda_envs/pruning/lib/python3.10/site-packages/pytorch_lightning/strategies/strategy.py", line 476, in teardown self.lightning_module.cpu() File "/scratch/work/wrighta2/conda_envs/pruning/lib/python3.10/site-packages/pytorch_lightning/core/mixins/device_dtype_mixin.py", line 141, in cpu return super().cpu() File "/scratch/work/wrighta2/conda_envs/pruning/lib/python3.10/site-packages/torch/nn/modules/module.py", line 796, in cpu return self._apply(lambda t: t.cpu()) File "/scratch/work/wrighta2/conda_envs/pruning/lib/python3.10/site-packages/torch/nn/modules/module.py", line 639, in _apply module._apply(fn) File "/scratch/work/wrighta2/conda_envs/pruning/lib/python3.10/site-packages/torch/nn/modules/module.py", line 639, in _apply module._apply(fn) File "/scratch/work/wrighta2/conda_envs/pruning/lib/python3.10/site-packages/torch/nn/modules/module.py", line 639, in _apply module._apply(fn) File "/scratch/work/wrighta2/conda_envs/pruning/lib/python3.10/site-packages/torch/nn/modules/rnn.py", line 194, in _apply self.flatten_parameters() File "/scratch/work/wrighta2/conda_envs/pruning/lib/python3.10/site-packages/torch/nn/modules/rnn.py", line 180, in flatten_parameters torch._cudnn_rnn_flatten_weight( RuntimeError: Inplace update to inference tensor outside InferenceMode is not allowed.You can make a clone to get a normal tensor before doing inplace update.See https://github.com/pytorch/rfcs/pull/17 for more details. During handling of the above exception, another exception occurred: Traceback (most recent call last): File "/scratch/work/wrighta2/PruningTime/Full_Train.py", line 80, in <module> trainer.test(model, dataloaders=test_dataloader) File "/scratch/work/wrighta2/conda_envs/pruning/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py", line 862, in test return self._call_and_handle_interrupt(self._test_impl, model, dataloaders, ckpt_path, verbose, datamodule) File "/scratch/work/wrighta2/conda_envs/pruning/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py", line 664, in _call_and_handle_interrupt self._teardown() File "/scratch/work/wrighta2/conda_envs/pruning/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py", line 1229, in _teardown self.strategy.teardown() File "/scratch/work/wrighta2/conda_envs/pruning/lib/python3.10/site-packages/pytorch_lightning/strategies/strategy.py", line 476, in teardown self.lightning_module.cpu() File "/scratch/work/wrighta2/conda_envs/pruning/lib/python3.10/site-packages/pytorch_lightning/core/mixins/device_dtype_mixin.py", line 141, in cpu return super().cpu() File "/scratch/work/wrighta2/conda_envs/pruning/lib/python3.10/site-packages/torch/nn/modules/module.py", line 796, in cpu return self._apply(lambda t: t.cpu()) File "/scratch/work/wrighta2/conda_envs/pruning/lib/python3.10/site-packages/torch/nn/modules/module.py", line 639, in _apply module._apply(fn) File "/scratch/work/wrighta2/conda_envs/pruning/lib/python3.10/site-packages/torch/nn/modules/module.py", line 639, in _apply module._apply(fn) File "/scratch/work/wrighta2/conda_envs/pruning/lib/python3.10/site-packages/torch/nn/modules/module.py", line 639, in _apply module._apply(fn) File "/scratch/work/wrighta2/conda_envs/pruning/lib/python3.10/site-packages/torch/nn/modules/rnn.py", line 194, in _apply self.flatten_parameters() File "/scratch/work/wrighta2/conda_envs/pruning/lib/python3.10/site-packages/torch/nn/modules/rnn.py", line 180, in flatten_parameters torch._cudnn_rnn_flatten_weight( RuntimeError: Inplace update to inference tensor outside InferenceMode is not allowed.You can make a clone to get a normal tensor before doing inplace update.See https://github.com/pytorch/rfcs/pull/17 for more details.So it seems like its trying to use an inference tensor where it shouldn't be, however I'm not clear why this is happening, or what the trainer is trying to do when this error occurs. As far as I can tell, I'm handling the RNN hidden state in the exact same way as in the validation set, and that doesn't produce this error.
Any help would be appreciated! Thanks.
Beta Was this translation helpful? Give feedback.
All reactions