Skip to content

Weak references in dump_allocated_tensors may crash profiler during training #236

@perctrix

Description

@perctrix

gc.get_objects() in dump_allocated_tensors() can return weakref proxy objects. A proxy passes isinstance(obj, torch.Tensor) (forwarded __class__), but if the referent is collected before .numel() / .element_size() / .shape are accessed, a ReferenceError is raised.

This is called from Profiler.record_allocated_tensors() during training (training.py:502, 544), so a crash here interrupts a running experiment.

Fix: wrap tensor attribute access in try/except ReferenceError.

Metadata

Metadata

Assignees

Labels

bugSomething isn't workingtodoNew task or assignment

Type

No fields configured for Bug.

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions