Skip to content

Fix hook memory leak in MRFI by tracking and exposing RemovableHandle references#5

Open
Ziad-Ijja wants to merge 1 commit into
fffasttime:mainfrom
Ziad-Ijja:fix-memory-leak
Open

Fix hook memory leak in MRFI by tracking and exposing RemovableHandle references#5
Ziad-Ijja wants to merge 1 commit into
fffasttime:mainfrom
Ziad-Ijja:fix-memory-leak

Conversation

@Ziad-Ijja

Copy link
Copy Markdown

This pull request fixes a memory leak that occurs when MRFI is instantiated multiple times over a fault injection campaign. The main theme is: making MRFI hooks explicitly managed so they can be properly released.

Root cause:
__add_hooks registered forward pre-hooks and forward hooks via register_forward_pre_hook / register_forward_hook, but the returned RemovableHandle objects were discarded. PyTorch keeps hooks alive as long as their handle exists, meaning every MRFI instantiation accumulated closures (including references to FI_config and internal state) that were never freed, even after the MRFI object itself was deleted. Over a long campaign this causes progressive GPU/CPU memory exhaustion.

Changes in mrfi.py:

  • Introduced _hook_handles (list) and _closed (bool) instance attributes, initialized in __init__ before __add_hooks is called.
  • Modified __add_hooks to append every RemovableHandle returned by register_forward_pre_hook and register_forward_hook to _hook_handles, instead of discarding them.
  • Added remove_hooks(): iterates _hook_handles and calls .remove() on each handle, then clears the list.
  • Added close(): calls remove_hooks(), then removes the FI_config attribute from every submodule via module.modules(), and sets _closed = True to prevent double-closing. This fully detaches MRFI instrumentation from the underlying model, allowing the same model instance to be safely re-wrapped in a subsequent MRFI instantiation.

Tests are passing with pytest --cov=mrfi

@codecov-commenter

Copy link
Copy Markdown

Welcome to Codecov 🎉

Once you merge this PR into your default branch, you're all set! Codecov will compare coverage reports and display results in all future pull requests.

Thanks for integrating Codecov - We've got you covered ☂️

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants