Skip to content

Conversation

@OutisLi
Copy link
Collaborator

@OutisLi OutisLi commented Jan 1, 2026

Summary by CodeRabbit

  • Bug Fixes
    • Improved numerical stability in distance calculations to prevent NaN gradients in edge cases such as padding or masked data entries.

✏️ Tip: You can customize this high-level summary in your review settings.

Copilot AI review requested due to automatic review settings January 1, 2026 12:34
@github-actions github-actions bot added the Python label Jan 1, 2026
@dosubot dosubot bot added the bug label Jan 1, 2026
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR fixes a potential NaN gradient issue in the PyTorch implementation of the pairtab atomic model by replacing the standard torch.linalg.norm computation with a safe norm that uses epsilon clamping.

Key Changes:

  • Modified _get_pairwise_dist method to use torch.sqrt(torch.sum(diff * diff, dim=-1, keepdim=True).clamp(min=1e-14)) instead of torch.linalg.norm
  • Added comprehensive documentation in the Notes section explaining when and why this safe norm is needed
  • Added inline comments explaining the epsilon value choice and its purpose

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

When nlist contains padding indices that have been masked to 0, the
corresponding diff vectors may become zero (if atom 0 happens to be the
center atom itself). To avoid NaN gradients during backpropagation,
Use a safe norm computation with an epsilon floor on the squared sum.
Copy link

Copilot AI Jan 1, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The word "Use" should be lowercase since it continues from the previous sentence. Change "Use a safe norm" to "use a safe norm".

Suggested change
Use a safe norm computation with an epsilon floor on the squared sum.
use a safe norm computation with an epsilon floor on the squared sum.

Copilot uses AI. Check for mistakes.
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Jan 1, 2026

📝 Walkthrough

Walkthrough

Modified the distance computation in _get_pairwise_dist to use a safe norm calculation with an epsilon lower bound (1e-14) instead of torch.linalg.norm, preventing NaN gradients when difference vectors are zero in padded or masked entries.

Changes

Cohort / File(s) Summary
Numerical stability fix
deepmd/pt/model/atomic_model/pairtab_atomic_model.py
Replaced torch.linalg.norm with sqrt(sum(diff^2)) computation and added epsilon lower bound to prevent NaN gradients on zero vectors; output shape preserved

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Suggested labels

Python

Suggested reviewers

  • njzjz

Pre-merge checks and finishing touches

❌ Failed checks (1 inconclusive)
Check name Status Explanation Resolution
Title check ❓ Inconclusive The title 'fix(pt): pairtab' is vague and does not clearly describe what was fixed or the specific nature of the change. Replace with a more descriptive title that explains the fix, such as 'fix(pt): improve numerical stability in pairtab distance computation with safe norm'
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
✨ Finishing touches
  • 📝 Generate docstrings

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
deepmd/pt/model/atomic_model/pairtab_atomic_model.py (1)

409-416: LGTM! Safe norm implementation correctly prevents NaN gradients.

The implementation correctly computes the Euclidean norm with numerical stability:

  • The epsilon value (1e-14) is well-chosen: small enough to not affect physical distances (atomic distances typically > 0.1 Å, squared > 0.01) yet large enough to prevent gradient issues
  • The clamp on the squared sum (before sqrt) is the right approach to prevent unbounded gradients
  • Inline comments clearly explain the rationale
Optional: Consider defining epsilon as a named constant

For improved maintainability, you could define the epsilon as a class-level constant:

class PairTabAtomicModel(BaseAtomicModel):
    # Epsilon for safe norm computation to prevent NaN gradients
    _SAFE_NORM_EPSILON = 1e-14
    ...

Then use it in the computation:

        pairwise_rr = torch.sqrt(
-            torch.sum(diff * diff, dim=-1, keepdim=True).clamp(min=1e-14)
+            torch.sum(diff * diff, dim=-1, keepdim=True).clamp(min=self._SAFE_NORM_EPSILON)
        ).squeeze(-1)

This makes it easier to adjust the epsilon value consistently if needed in the future. However, this is a minor improvement and can be deferred.

📜 Review details

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between b98f6c5 and e6e35b8.

📒 Files selected for processing (1)
  • deepmd/pt/model/atomic_model/pairtab_atomic_model.py
🧰 Additional context used
🧠 Learnings (1)
📓 Common learnings
Learnt from: njzjz
Repo: deepmodeling/deepmd-kit PR: 4144
File: source/api_cc/tests/test_deeppot_dpa_pt.cc:166-246
Timestamp: 2024-10-08T15:32:11.479Z
Learning: Refactoring between test classes `TestInferDeepPotDpaPt` and `TestInferDeepPotDpaPtNopbc` is addressed in PR #3905.
Learnt from: njzjz
Repo: deepmodeling/deepmd-kit PR: 4144
File: source/api_cc/tests/test_deeppot_dpa_pt.cc:166-246
Timestamp: 2024-09-19T04:25:12.408Z
Learning: Refactoring between test classes `TestInferDeepPotDpaPt` and `TestInferDeepPotDpaPtNopbc` is addressed in PR #3905.
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (30)
  • GitHub Check: Agent
  • GitHub Check: CodeQL analysis (python)
  • GitHub Check: Test C++ (true, true, true, false)
  • GitHub Check: Test C++ (true, false, false, true)
  • GitHub Check: Build C++ (clang, clang)
  • GitHub Check: Build C++ (cuda120, cuda)
  • GitHub Check: Build C++ (cpu, cpu)
  • GitHub Check: Test C++ (false, false, false, true)
  • GitHub Check: Build C++ (rocm, rocm)
  • GitHub Check: Test C++ (false, true, true, false)
  • GitHub Check: Analyze (python)
  • GitHub Check: Analyze (c-cpp)
  • GitHub Check: Test Python (5, 3.13)
  • GitHub Check: Test Python (4, 3.13)
  • GitHub Check: Test Python (6, 3.10)
  • GitHub Check: Test Python (4, 3.10)
  • GitHub Check: Test Python (5, 3.10)
  • GitHub Check: Test Python (6, 3.13)
  • GitHub Check: Test Python (3, 3.13)
  • GitHub Check: Test Python (2, 3.10)
  • GitHub Check: Test Python (2, 3.13)
  • GitHub Check: Test Python (1, 3.13)
  • GitHub Check: Test Python (3, 3.10)
  • GitHub Check: Test Python (1, 3.10)
  • GitHub Check: Build C library (2.18, libdeepmd_c.tar.gz)
  • GitHub Check: Build wheels for cp311-manylinux_x86_64
  • GitHub Check: Build wheels for cp311-win_amd64
  • GitHub Check: Build wheels for cp311-macosx_x86_64
  • GitHub Check: Build wheels for cp311-macosx_arm64
  • GitHub Check: Build wheels for cp310-manylinux_aarch64
🔇 Additional comments (1)
deepmd/pt/model/atomic_model/pairtab_atomic_model.py (1)

395-402: LGTM! Clear documentation of the numerical stability fix.

The Notes section clearly explains the rationale for the safe norm computation and when zero difference vectors can occur. This will help future maintainers understand why the epsilon clamp is necessary.

@codecov
Copy link

codecov bot commented Jan 1, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 82.15%. Comparing base (b98f6c5) to head (e6e35b8).

Additional details and impacted files
@@           Coverage Diff           @@
##            devel    #5119   +/-   ##
=======================================
  Coverage   82.15%   82.15%           
=======================================
  Files         709      709           
  Lines       72468    72468           
  Branches     3616     3615    -1     
=======================================
  Hits        59535    59535           
  Misses      11769    11769           
  Partials     1164     1164           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant