Skip to content

Replace apex clip_grad_norm_ with PyTorch native in dints templates#426

Open
HeyangQin wants to merge 1 commit intoProject-MONAI:mainfrom
HeyangQin:fix/drop-apex-clip-grad
Open

Replace apex clip_grad_norm_ with PyTorch native in dints templates#426
HeyangQin wants to merge 1 commit intoProject-MONAI:mainfrom
HeyangQin:fix/drop-apex-clip-grad

Conversation

@HeyangQin
Copy link
Copy Markdown

@HeyangQin HeyangQin commented Mar 22, 2026

Summary

  • Remove apex.contrib.clip_grad.clip_grad_norm_ from dints auto3dseg templates (train.py and search.py)
  • Use torch.nn.utils.clip_grad_norm_ instead, which handles all tensor types correctly including those with lazy/functional storage in PyTorch >=2.10
  • The previous try/except only caught ModuleNotFoundError (apex not installed), but not the RuntimeError when apex is installed but its multi_tensor_applier is incompatible with newer PyTorch tensor storage

Root cause

On PyTorch >=2.10 (e.g., NGC 25.12), some gradient tensors use lazy/functional storage that no longer exposes a traditional data pointer. apex.contrib.clip_grad.clip_grad_norm_ calls multi_tensor_applier which tries to access the raw data pointer, causing:

RuntimeError: Cannot access data pointer of Tensor that doesn't have storage

Test plan

  • No remaining references to apex.contrib.clip_grad in the repository
  • All call sites use torch.nn.utils.clip_grad_norm_ (either via import alias or fully-qualified)
  • Run auto3dseg dints training on NGC 25.12 container to verify no crash

Fixes Project-MONAI/MONAI#8737

Summary by CodeRabbit

Release Notes

  • Chores
    • Simplified gradient clipping implementation to use PyTorch exclusively, removing optional dependency handling for external libraries.

apex.contrib.clip_grad.clip_grad_norm_ crashes on PyTorch >=2.10 with
"RuntimeError: Cannot access data pointer of Tensor that doesn't have
storage" because apex's multi_tensor_applier cannot handle tensors with
lazy/functional storage introduced in newer PyTorch versions.

The try/except only caught ModuleNotFoundError (apex not installed) but
not the runtime crash when apex is installed but incompatible.

torch.nn.utils.clip_grad_norm_ handles all tensor types correctly and
is the standard approach. The apex version offered marginal performance
gains that are not worth the compatibility breakage.

Fixes Project-MONAI/MONAI#8737
@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Mar 22, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 8c616f21-2a01-4313-9319-0267c7b16f9a

📥 Commits

Reviewing files that changed from the base of the PR and between 21ed8e5 and a859c10.

📒 Files selected for processing (2)
  • auto3dseg/algorithm_templates/dints/scripts/search.py
  • auto3dseg/algorithm_templates/dints/scripts/train.py

Walkthrough

This change replaces conditional imports that attempted to use Apex's gradient clipping function with an unconditional import of PyTorch's implementation. The modification affects two training/search script files to ensure compatibility with PyTorch 2.10, which breaks Apex's multi-tensor operations on tensors with non-traditional storage.

Changes

Cohort / File(s) Summary
Apex Gradient Clipping Removal
auto3dseg/algorithm_templates/dints/scripts/search.py, auto3dseg/algorithm_templates/dints/scripts/train.py
Removed conditional imports attempting to use apex.contrib.clip_grad.clip_grad_norm_ with fallback to torch.nn.utils.clip_grad_norm_. Now unconditionally imports and uses PyTorch's implementation to avoid crashes in PyTorch 2.10 due to Apex's incompatibility with lazy/functional tensor storage.

Estimated code review effort

🎯 1 (Trivial) | ⏱️ ~5 minutes

Poem

🐰 A hop and a bound, Apex we've unbound!
PyTorch gradients clip so clean,
No lazy storage crashes in between,
Version 2.10 now runs without a sound! 🎉

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately summarizes the main change: replacing apex.contrib.clip_grad.clip_grad_norm_ with PyTorch native implementation in dints templates.
Linked Issues check ✅ Passed The PR directly addresses issue #8737 by removing apex.contrib.clip_grad.clip_grad_norm_ from both search.py and train.py, replacing it with torch.nn.utils.clip_grad_norm_ to fix the PyTorch 2.10 incompatibility.
Out of Scope Changes check ✅ Passed All changes are scoped to removing apex clip_grad conditional logic and using PyTorch's implementation in the two specified dints template files, with no unrelated modifications.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

apex.contrib.clip_grad.clip_grad_norm_ crashes with PyTorch 2.10

1 participant