Skip to content
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 0 additions & 5 deletions transformer_engine/pytorch/graph.py
Original file line number Diff line number Diff line change
Expand Up @@ -507,11 +507,6 @@ def hook_fn(
else:
grad_inputs = None
del outputs, grad_inputs
# The following code is added specifically for MCore's special requirements,
# aimed at preventing warmup from altering the control flow.
for module in func.modules():
if hasattr(module, "is_first_microbatch"):
module.is_first_microbatch = True
torch.cuda.synchronize()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Check that MCore modules no longer rely on the is_first_microbatch attribute being reset after warmup. The removed code handled module attributes while the PR description discusses function parameters - these are different mechanisms. Verify that removing this reset doesn't break MCore integration where modules may have is_first_microbatch as an instance attribute that affects control flow.


# All captures here share a mempool. To avoid replays corrupting each other's memory,
Expand Down
Loading