Skip to content

feat(pt/pt_expt): O(N) vesin neighbor list for Python/ASE inference#5486

Closed
wanghan-iapcm wants to merge 6 commits into
deepmodeling:masterfrom
wanghan-iapcm:feat-py-vesin-nlist
Closed

feat(pt/pt_expt): O(N) vesin neighbor list for Python/ASE inference#5486
wanghan-iapcm wants to merge 6 commits into
deepmodeling:masterfrom
wanghan-iapcm:feat-py-vesin-nlist

Conversation

@wanghan-iapcm
Copy link
Copy Markdown
Collaborator

@wanghan-iapcm wanghan-iapcm commented Jun 2, 2026

Motivation

The Python / ASE inference path builds the neighbor list with the native all-pairs O(N²) algorithm (build_neighbor_list materializes a dense [nf, nloc, nall, 3] displacement tensor). For large systems this dominates end-to-end throughput — exactly the front-end bottleneck described in the DPA4 manuscript §2.4 (the ASE throughput curves bend down at large N because of native O(N²) neighbor lists).

This PR adds an opt-in, O(N) neighbor-list backend based on the vesin cell list for the Python inference interfaces (DeepPot, the ASE DP calculator), for both the pt (.pth/.pt) and pt_expt (.pte/.pt2) backends. Model results are unchanged: the neighbor search is non-differentiable, so its result is fed to the model's own forward_common_lower + communicate_extended_output, leaving the autograd graph and outputs intact.

API

New nlist_backend argument on DeepPot / DeepEval / the ASE DP calculator:

  • "auto" (default): use the O(N) vesin cell list when applicable (vesin.torch installed, non-spin, non-hessian model) and silently fall back to the native builder otherwise. Keeps the fast path by default; never breaks spin/hessian.
  • "vesin": strict — raise ValueError if vesin.torch is missing, the model is spin/hessian, or (pt_expt) an explicit ASE neighbor_list is also given.
  • "native": force the native all-pairs builder.

Design

  • torch-only, device-resident. The builder uses vesin.torch, which runs the cell list on the device of the input tensors (CPU or CUDA). On GPU the whole extended-system build stays on the GPU (no host round-trip). Lives in deepmd/pt_expt/utils/nlist.py; both backends import it. dpmodel stays vesin-free (no torch code in the array-API backend).
  • The builder converts vesin's (i, j, S) edge list into deepmd's extended format (extended coords/types, mapping, candidate nlist) and lets forward_common_lower.format_nlist re-format (distance-sort, truncate to sel, type-split) — matching forward_common exactly.
  • Works with auto_batch_size.
  • pyproject: depends on vesin[torch] so the device-capable bindings ship by default.

Verification

native vs vesin agree to floating-point round-off (energy bit-identical; forces/virials ~1e-16, the only difference being ghost-atom enumeration order):

path result
se_e2_a / se_atten energy, PBC & non-PBC, atomic, multiframe, auto_batch_size E 0.0, F/V ~1e-16
message-passing DPA2/DPA3 (the manuscript's class) E 0.0, F/V ~5e-16 (spot-checked)
non-energy (dipole) 0.0 (spot-checked)
fparam/aparam, charge_spin pass-through, identical to native

Tests

  • source/tests/pt_expt/utils/test_nlist.py — vesin.torch builder vs native (PBC/non-PBC/distinguish), input-device placement.
  • source/tests/pt/model/test_deeppot.py, source/tests/pt_expt/infer/test_deep_eval.py, test_deep_eval_spin.py, source/tests/pt/test_calculator.py — every nlist_backend dispatch branch (invalid → ValueError, native, auto fallback and strict-vesin raise for vesin-missing / spin / hessian / ASE-conflict), native-vs-vesin consistency (PBC/non-PBC/atomic/multiframe/auto_batch_size), real-spin gate-off end-to-end, and the ASE calculator threading nlist_backend for both backends.

Known limitations

  • No real-CUDA test: the device-placement test runs on CPU in CI (no GPU runner). The CUDA execution path of vesin.torch is implemented but exercised only via the CPU branch here.
  • vesin[torch] is a compiled torch-ABI extension: on a platform without a matching wheel, install could fail (runtime is guarded by is_vesin_torch_available, but pyproject is not). There is no numpy-vesin fallback by design — auto falls back to native, explicit vesin raises.
  • Not addressed (separate follow-ups): the in-graph O(N²) builder used during training / torch.export; extend_coord_with_ghosts over-generating ~27×N ghosts; the (still dead) ASE neighbor_list arg on the pt backend; a committed throughput benchmark.
  • charge_spin and .pt2-runner+vesin are covered by equivalence/.pte rather than dedicated committed tests; message-passing and dipole through vesin are spot-checked, not committed.

Summary by CodeRabbit

  • New Features

    • Added nlist_backend parameter to calculator and inference backends for selecting neighbor-list construction algorithms.
    • Supports "auto" (intelligent backend selection), "native", and "vesin" modes with validation and fallback handling.
  • Chores

    • Added vesin[torch] as a runtime dependency.

Han Wang added 6 commits June 2, 2026 17:56
…erence

The Python/ASE inference path builds the neighbor list with the native
all-pairs O(N^2) algorithm, which dominates end-to-end throughput for large
systems (e.g. via the ASE calculator). Add an opt-in `nlist_backend` option
(default "vesin") that builds the neighbor list with the O(N) vesin cell list
on the host, then feeds it to the model's own forward_common_lower +
communicate_extended_output, leaving the model graph (and results) unchanged.

- deepmd/dpmodel/utils/nlist.py: shared `build_neighbor_list_vesin` +
  `is_vesin_available`; converts vesin (i,j,S) edges to the deepmd extended
  format (extended_coord/atype, mapping, candidate nlist).
- deepmd/pt/infer/deep_eval.py: `nlist_backend` kwarg; `_eval_model_vesin`
  calls the exported forward_common_lower and reduces extended outputs to
  local. Gated to non-spin, non-hessian models; falls back to native when
  vesin is unavailable.
- deepmd/pt_expt/infer/deep_eval.py: `nlist_backend` kwarg; vesin branch in
  `_prepare_inputs` (ASE neighbor_list still takes precedence).
- deepmd/calculator.py: thread `nlist_backend` through the ASE DP calculator.
- pyproject.toml: add `vesin` dependency.

Results match the native builder bit-for-bit in energy and to float64
round-off in forces/virials (ghost ordering differs). Works with
auto_batch_size.

Tests:
- common/dpmodel/test_nlist.py::TestNeighListVesin (builder vs native)
- pt/model/test_deeppot.py::TestDeepPot (dispatch + native-vs-vesin,
  PBC/non-PBC, atomic, multiframe, auto_batch_size)
- pt_expt/infer/test_deep_eval.py::TestDeepEvalEner (vesin consistency)
Add committed tests for the pt_expt DeepPot interface mirroring the pt
backend coverage: native disables vesin, invalid backend -> ValueError,
vesin-unavailable fallback, and spin gate-off.
Replace the synthetic spin gate-off tests (which mutated the _is_spin flag
on an energy model) with end-to-end tests that load a real spin model and
run the spin eval path with nlist_backend="vesin":

- pt: TestDeepPotSpinNlistBackend builds an in-process spin model, asserts
  vesin is gated off, and checks the spin eval (e/f/v/ae/av/fm/mask_mag)
  matches nlist_backend="native".
- pt_expt: TestSpinInference.test_nlist_backend_vesin_gated_off_for_spin
  loads the real .pt2/.pte spin fixtures, asserts vesin is gated off, and
  checks the spin eval matches the reference.
Replace the silent spin/hessian/vesin-missing fallback with explicit
behavior:

- "auto" (new default): use the vesin O(N) builder when applicable (vesin
  installed, non-spin, non-hessian) and silently fall back to native
  otherwise. Preserves the fast path for energy models by default.
- "vesin": strict — raise ValueError if vesin is missing, the model is spin
  or hessian, or (pt_expt) an explicit ASE neighbor_list is also given.
- "native": force the native builder.

The ASE DP calculator default also becomes "auto". Tests updated to cover
every dispatch branch (invalid, native, auto-fallback and strict-raise for
each blocker) on real spin models for both backends.
Add committed tests that the ASE `deepmd.calculator.DP` threads `nlist_backend`
to the underlying DeepPot backend and gives backend-independent results:

- pt backend (.pth) and pt_expt backend (.pte) calculator tests compare
  native vs vesin energy/forces.
- pt_expt: an explicit ASE neighbor_list takes precedence over the auto
  vesin path (_use_vesin stays False).
vesin.torch runs the cell list on the device of the input tensors (CPU or
CUDA), and torch is a hard dependency of both the pt and pt_expt backends, so
the separate numpy builder was redundant. Consolidate on a single torch
builder:

- New deepmd/pt_expt/utils/nlist.py: build_neighbor_list_vesin_torch +
  is_vesin_torch_available. The neighbor search stays on the input device, so
  GPU inference avoids the host round-trip. The vesin call is pinned to the
  input device to ignore any ambient default-device setting.
- Remove the numpy build_neighbor_list_vesin / is_vesin_available from
  deepmd/dpmodel/utils/nlist.py (dpmodel is now vesin-free; torch code does
  not belong in the array-API backend).
- Both pt and pt_expt deep_eval import the torch builder from
  deepmd.pt_expt.utils.nlist and gate on is_vesin_torch_available().
- pyproject: depend on vesin[torch] so the device-capable bindings ship by
  default.

Tests: vesin builder test moved to source/tests/pt_expt/utils/test_nlist.py
(compares against the native builder, plus an input-device-placement check);
skip guards switched to vesin.torch.

def test_nlist_backend_vesin_unavailable(self) -> None:
# "auto" silently falls back; explicit "vesin" raises.
import deepmd.pt.infer.deep_eval as deep_eval_mod
@wanghan-iapcm wanghan-iapcm added the Test CUDA Trigger test CUDA workflow label Jun 2, 2026
@github-actions github-actions Bot removed the Test CUDA Trigger test CUDA workflow label Jun 2, 2026
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 237fd1119c

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

native all-pairs O(N^2) neighbor-list build, which dominates runtime for
large systems on the Python / ASE inference path.
"""
model = self.dp.model["Default"].to(DEVICE)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Preserve data modifiers on the vesin path

For frozen .pth models that carry data_modifier.pth, this path now bypasses ModelWrapper by calling the bare Default model, so the modifier contribution that the native path adds in ModelWrapper.forward is silently dropped whenever nlist_backend resolves to auto/vesin. In those deployments (e.g. models with an added correction term), energies/forces/virials from the default inference path will be missing the modifier rather than matching the existing native results; either apply the modifier after forward_common_lower or keep such models on the native wrapper path.

Useful? React with 👍 / 👎.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Jun 2, 2026

Review Change Stack

📝 Walkthrough

Walkthrough

The PR introduces configurable neighbor-list backend selection for inference, allowing users to opt into faster Vesin torch-based construction instead of the native implementation. It adds validation, device-resident execution, automatic fallback semantics for "auto" mode, and comprehensive test coverage across PyTorch, experimental, and ASE interfaces.

Changes

Neighbor-list backend selection

Layer / File(s) Summary
Vesin torch neighbor-list builder
deepmd/pt_expt/utils/nlist.py, source/tests/pt_expt/utils/test_nlist.py
Implements build_neighbor_list_vesin_torch and is_vesin_torch_available for batched and single-frame Vesin-based neighbor-list construction, including ghost-atom expansion, index remapping, distance filtering, and device consistency. Unit tests verify parity with native builder across PBC modes and type distinction.
pt_expt inference backend integration
deepmd/pt_expt/infer/deep_eval.py, source/tests/pt_expt/infer/test_deep_eval.py, source/tests/pt_expt/infer/test_deep_eval_spin.py
Adds _setup_nlist_backend helper validating backend choice, enforcing Vesin unavailability for spin/hessian models and explicit neighbor_list conflicts, and setting _use_vesin flag. Updates _prepare_inputs to conditionally route to Vesin tensor construction on device. Integration tests cover Vesin vs native parity, multiframe consistency, backend selection logic, and spin-model gating.
pt inference backend integration
deepmd/pt/infer/deep_eval.py, source/tests/pt/model/test_deeppot.py
Adds nlist_backend parameter, _setup_nlist_backend validation, and _eval_model_vesin method for extended evaluation with communicate_extended_output remapping. Tests verify auto/native/invalid backends, Vesin unavailability fallback, hessian compatibility, and multiframe correctness.
ASE calculator integration & dependencies
deepmd/calculator.py, source/tests/pt/test_calculator.py, pyproject.toml
Exposes nlist_backend parameter in ASE DP constructor; threads it to DeepPot. Adds vesin[torch] runtime dependency. Calculator tests verify parameter forwarding and backend-independent output consistency.

Sequence Diagram

sequenceDiagram
  participant User as User code
  participant DP as DP (ASE calculator)
  participant DeepPot as DeepPot (inference)
  participant DeepEval as DeepEval
  participant Vesin as vesin.torch
  User->>DP: DP(model, nlist_backend='auto')
  DP->>DeepPot: nlist_backend parameter
  DeepPot->>DeepEval: initialize with nlist_backend
  DeepEval->>DeepEval: _setup_nlist_backend validation
  alt vesin available & supported
    DeepEval->>Vesin: build_neighbor_list_vesin_torch
    Vesin->>DeepEval: (ext_coord, ext_atype, nlist, mapping)
  else fallback to native
    DeepEval->>DeepEval: native neighbor list
  end
  DeepEval->>DeepPot: extended/native neighbor list
  DeepPot->>User: energy/forces
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~22 minutes

Possibly related PRs

  • deepmodeling/deepmd-kit#5391: Modifies deepmd/pt_expt/infer/deep_eval.py neighbor-list construction in the same area, with this PR extending that centralized work to add backend selection.

Suggested labels

new feature, Python, enhancement

Suggested reviewers

  • njzjz
  • iProzd
🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 34.62% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly and concisely summarizes the main change: adding O(N) vesin neighbor list support for Python/ASE inference in both pt and pt_expt backends.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@deepmd/pt_expt/infer/deep_eval.py`:
- Around line 144-184: The vesin backend must also be blocked for models that
expose Hessian outputs; in _setup_nlist_backend, extend the unsupported check
(currently using self._is_spin) to detect Hessian output variables (e.g. inspect
self.output_variables for any with category ==
OutputVariableCategory.DERV_R_DERV_R) and set unsupported accordingly (e.g.
"hessian"), then use that unsupported flag in the existing vesin and auto
branches so vesin is rejected when Hessians are requested and self._use_vesin is
False in that case.

In `@deepmd/pt_expt/utils/nlist.py`:
- Around line 170-180: The code currently truncates candidate neighbors to keep
= min(nsel, max_nn) before calling nlist_distinguish_types, which drops
farther-type candidates; change the flow so that when distinguish_types is True
you preserve the larger candidate pool for type-based selection: build nlist
using a larger keep (e.g., keep = min(max_nn, sorted_idx.shape[1]) or otherwise
the full available candidates) and call nlist_distinguish_types(nlist[None],
extended_atype[None], sel)[0] first, then after the per-type selection crop the
resulting nlist back to nsel slots; update references to nlist, keep,
sorted_idx, sorted_valid, and nlist_distinguish_types accordingly.

In `@deepmd/pt/infer/deep_eval.py`:
- Around line 144-145: The constructor in deepmd/pt/infer/deep_eval.py currently
accepts neighbor_list and nlist_backend but never uses neighbor_list before the
vesin path is chosen (see the constructor and the vesin-enabling branch around
the vesin path), so update the constructor logic to either (A) pass the provided
neighbor_list through to wherever the backend is configured (wire the ASE
NewPrimitiveNeighborList into the vesin/other backend setup) or (B) immediately
raise a ValueError when neighbor_list is not None and nlist_backend == "vesin"
(or when neighbor_list is non-None and the chosen backend cannot consume it),
ensuring the constructor (the __init__ / class constructor that defines
neighbor_list and nlist_backend) enforces or propagates the caller-provided list
rather than silently ignoring it.

In `@pyproject.toml`:
- Around line 61-63: The pyproject currently lists 'vesin[torch]' as an
unconditional runtime dependency; move it into an optional extra and update
metadata injection so it's only requested for the Torch/ASE inference path.
Create a new optional extra name (e.g., "vesin-torch" or "torch-inference") in
pyproject.toml and remove 'vesin[torch]' from [project].dependencies; instead
add it under [project.optional-dependencies] with that extra key. Update
backend/dynamic_metadata.py (which currently exposes
get_pt_requirement/get_tf_requirement) to either add a new
get_vesin_requirement/get_torch_inference_requirement that returns the optional
extra name or ensure code that triggers the Torch/ASE path references the new
optional extra so runtime can fall back when vesin is unavailable.

In `@source/tests/pt_expt/infer/test_deep_eval.py`:
- Around line 721-731: The test
test_nlist_backend_vesin_conflicts_with_neighbor_list imports ase.neighborlist
unconditionally which causes failures when ASE is not installed; update the test
to detect ASE availability and skip if missing by attempting to import ase (or
using pytest.importorskip) and calling self.skipTest when import fails before
constructing DeepPot with nlist_backend="vesin" and
neighbor_list=ase.neighborlist.NewPrimitiveNeighborList(...); keep references to
DeepPot, nlist_backend, and neighbor_list intact so the behavior under test
remains unchanged.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: e18f825f-19c5-480c-8700-3c04ffcbbc31

📥 Commits

Reviewing files that changed from the base of the PR and between 27a18b6 and 237fd11.

📒 Files selected for processing (10)
  • deepmd/calculator.py
  • deepmd/pt/infer/deep_eval.py
  • deepmd/pt_expt/infer/deep_eval.py
  • deepmd/pt_expt/utils/nlist.py
  • pyproject.toml
  • source/tests/pt/model/test_deeppot.py
  • source/tests/pt/test_calculator.py
  • source/tests/pt_expt/infer/test_deep_eval.py
  • source/tests/pt_expt/infer/test_deep_eval_spin.py
  • source/tests/pt_expt/utils/test_nlist.py

Comment on lines +144 to +184
def _setup_nlist_backend(self, nlist_backend: str) -> None:
"""Resolve the requested neighbor-list backend for the inference path.

``"auto"`` uses the O(N) vesin cell list when applicable and silently
falls back to native otherwise; ``"vesin"`` is strict and raises if it
cannot be honored; ``"native"`` forces the native builder. An explicitly
supplied ASE ``neighbor_list`` takes precedence over the vesin path.
"""
if nlist_backend not in ("auto", "vesin", "native"):
raise ValueError(
f"Unknown nlist_backend {nlist_backend!r}; "
"expected 'auto', 'vesin', or 'native'."
)
self.nlist_backend = nlist_backend
ase_provided = self.neighbor_list is not None
unsupported = "spin" if self._is_spin else None
if nlist_backend == "vesin":
# explicit request: fail loudly if it cannot be honored
if not is_vesin_torch_available():
raise ValueError(
"nlist_backend='vesin' was requested but the 'vesin.torch' "
"package is not installed; install it (`pip install "
"vesin[torch]`) or use nlist_backend='native' (or 'auto')."
)
if unsupported is not None:
raise ValueError(
f"nlist_backend='vesin' is not supported for {unsupported} "
"models; use nlist_backend='native' (or 'auto')."
)
if ase_provided:
raise ValueError(
"nlist_backend='vesin' conflicts with an explicitly supplied "
"neighbor_list; pass only one."
)
self._use_vesin = True
elif nlist_backend == "native":
self._use_vesin = False
else: # auto: use vesin when possible, otherwise fall back silently
self._use_vesin = (
is_vesin_torch_available() and unsupported is None and not ase_provided
)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Block vesin for hessian models here too.

Line 159 only treats spin as unsupported, but this backend still exposes hessian outputs via OutputVariableCategory.DERV_R_DERV_R. A hessian archive can therefore select the vesin path here instead of failing fast, which breaks the advertised backend contract and risks incorrect second-derivative results.

Suggested fix
-        unsupported = "spin" if self._is_spin else None
+        unsupported = None
+        if self._is_spin:
+            unsupported = "spin"
+        elif any(
+            v.category == OutputVariableCategory.DERV_R_DERV_R
+            for v in self._model_output_def.def_outp.get_data().values()
+        ):
+            unsupported = "hessian"
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@deepmd/pt_expt/infer/deep_eval.py` around lines 144 - 184, The vesin backend
must also be blocked for models that expose Hessian outputs; in
_setup_nlist_backend, extend the unsupported check (currently using
self._is_spin) to detect Hessian output variables (e.g. inspect
self.output_variables for any with category ==
OutputVariableCategory.DERV_R_DERV_R) and set unsupported accordingly (e.g.
"hessian"), then use that unsupported flag in the existing vesin and auto
branches so vesin is rejected when Hessians are requested and self._use_vesin is
False in that case.

Comment on lines +170 to +180
nlist = torch.full((nloc, nsel), -1, dtype=torch.int64, device=device)
keep = min(nsel, max_nn)
if keep > 0:
nlist[:, :keep] = torch.where(
sorted_valid[:, :keep],
sorted_idx[:, :keep],
torch.full_like(sorted_idx[:, :keep], -1),
)

if distinguish_types:
nlist = nlist_distinguish_types(nlist[None], extended_atype[None], sel)[0]
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Don't truncate before enforcing per-type slots.

When distinguish_types=True, Lines 170-177 keep only the sum(sel) nearest neighbors before nlist_distinguish_types() runs. That drops farther candidates from underrepresented types, so crowded systems can return the wrong typed neighbor list even though valid per-type neighbors still exist inside rcut. This will silently change model inputs relative to the native builder.

Suggested fix
-    nlist = torch.full((nloc, nsel), -1, dtype=torch.int64, device=device)
-    keep = min(nsel, max_nn)
-    if keep > 0:
-        nlist[:, :keep] = torch.where(
-            sorted_valid[:, :keep],
-            sorted_idx[:, :keep],
-            torch.full_like(sorted_idx[:, :keep], -1),
-        )
-
-    if distinguish_types:
-        nlist = nlist_distinguish_types(nlist[None], extended_atype[None], sel)[0]
+    sorted_nlist = torch.where(
+        sorted_valid,
+        sorted_idx,
+        torch.full_like(sorted_idx, -1),
+    )
+
+    if distinguish_types:
+        sorted_nlist = nlist_distinguish_types(
+            sorted_nlist[None], extended_atype[None], sel
+        )[0]
+
+    nlist = torch.full((nloc, nsel), -1, dtype=torch.int64, device=device)
+    keep = min(nsel, sorted_nlist.shape[1])
+    if keep > 0:
+        nlist[:, :keep] = sorted_nlist[:, :keep]
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@deepmd/pt_expt/utils/nlist.py` around lines 170 - 180, The code currently
truncates candidate neighbors to keep = min(nsel, max_nn) before calling
nlist_distinguish_types, which drops farther-type candidates; change the flow so
that when distinguish_types is True you preserve the larger candidate pool for
type-based selection: build nlist using a larger keep (e.g., keep = min(max_nn,
sorted_idx.shape[1]) or otherwise the full available candidates) and call
nlist_distinguish_types(nlist[None], extended_atype[None], sel)[0] first, then
after the per-type selection crop the resulting nlist back to nsel slots; update
references to nlist, keep, sorted_idx, sorted_valid, and nlist_distinguish_types
accordingly.

Comment on lines 144 to +145
neighbor_list: Optional["ase.neighborlist.NewPrimitiveNeighborList"] = None,
nlist_backend: str = "auto",
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Don’t silently ignore an explicit neighbor_list in the pt backend.

This constructor still exposes neighbor_list, but the value is never stored or consulted before Line 297 enables vesin. With deepmd.calculator.DP now forwarding both knobs, DP(..., neighbor_list=..., nlist_backend="vesin") will take the vesin path instead of the caller-provided list and never warn. Please either wire the ASE list through here or fail fast when neighbor_list is non-None; at minimum, "vesin" should reject that combination.

Also applies to: 260-299

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@deepmd/pt/infer/deep_eval.py` around lines 144 - 145, The constructor in
deepmd/pt/infer/deep_eval.py currently accepts neighbor_list and nlist_backend
but never uses neighbor_list before the vesin path is chosen (see the
constructor and the vesin-enabling branch around the vesin path), so update the
constructor logic to either (A) pass the provided neighbor_list through to
wherever the backend is configured (wire the ASE NewPrimitiveNeighborList into
the vesin/other backend setup) or (B) immediately raise a ValueError when
neighbor_list is not None and nlist_backend == "vesin" (or when neighbor_list is
non-None and the chosen backend cannot consume it), ensuring the constructor
(the __init__ / class constructor that defines neighbor_list and nlist_backend)
enforces or propagates the caller-provided list rather than silently ignoring
it.

Comment thread pyproject.toml
Comment on lines +61 to +63
# O(N) neighbor-list backend for the Python / ASE inference path; the
# [torch] extra adds vesin.torch for a device-resident (GPU) build.
'vesin[torch]',
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
set -euo pipefail

echo "Inspect base dependency declaration:"
rg -n -C2 '^\s*dependencies\s*=\s*\[|vesin|torch' pyproject.toml

echo
echo "Inspect optional dependency definitions and dynamic metadata hooks:"
rg -n -C2 'optional-dependencies|torch|cpu|gpu|ase' pyproject.toml backend README.md

Repository: deepmodeling/deepmd-kit

Length of output: 25749


🏁 Script executed:

#!/bin/bash
set -euo pipefail

echo "All occurrences of vesin:"
rg -n --hidden --no-ignore -S "vesin" pyproject.toml backend README.md

echo
echo "Inspect optional-dependencies block in pyproject.toml around dynamic metadata config:"
rg -n -C2 "optional-dependencies|tool\.deepmd_build_backend|deepmd_build_backend" pyproject.toml

echo
echo "Inspect backend dynamic/metadata helpers that may add torch-related deps:"
ls -1 backend
for f in backend/*.py; do
  echo "---- $f ----"
  rg -n "optional-dependencies|optional_dependencies|get_requires|vesin|torch\]" "$f" || true
done

Repository: deepmodeling/deepmd-kit

Length of output: 2257


🏁 Script executed:

#!/bin/bash
set -euo pipefail

echo "--- pyproject.toml: core dependencies section around lines 35-90 ---"
sed -n '35,95p' pyproject.toml

echo
echo "--- pyproject.toml: tool.deepmd_build_backend.optional-dependencies full block (approx 80-160) ---"
sed -n '80,170p' pyproject.toml

echo
echo "--- pyproject.toml: find any optional extra keys referencing torch / ase ---"
rg -n --context 1 "optional-dependencies|\\base\\b|\\btorch\\b|vesin\\[" pyproject.toml -S

echo
echo "--- backend/dynamic_metadata.py ---"
sed -n '1,120p' backend/dynamic_metadata.py

echo
echo "--- backend/find_pytorch.py (search for vesin / torch extra injection) ---"
rg -n "vesin|optional|extra|ase|torch\\]" backend/find_pytorch.py backend/find_pytorch.py -S || true
sed -n '1,240p' backend/find_pytorch.py

Repository: deepmodeling/deepmd-kit

Length of output: 13414


🏁 Script executed:

#!/bin/bash
set -euo pipefail

echo "Search for vesin in backend:"
rg -n --hidden --no-ignore -S "vesin" backend || true

echo
echo "Search for vesin in optional-dependencies groups only (pyproject.toml):"
rg -n --hidden --no-ignore -S "vesin" pyproject.toml || true

echo
echo "Show dynamic metadata optional-dependencies logic (does it add vesin?):"
rg -n "optional-dependencies|get_pt_requirement|get_tf_requirement" backend/dynamic_metadata.py

Repository: deepmodeling/deepmd-kit

Length of output: 652


Move vesin[torch] out of core runtime dependencies (pyproject.toml: [project].dependencies, lines 61-63).

  • vesin[torch] is added unconditionally to [project].dependencies, so every install attempts it (even non-PyTorch/ASE users and platforms where the compiled extension may not load).
  • No corresponding optional-dependencies extra includes vesin, and backend/dynamic_metadata.py only injects PyTorch/TensorFlow requirements (get_pt_requirement/get_tf_requirement), not vesin.

Place vesin[torch] behind a dedicated optional extra (e.g., torch/ASE inference path) so runtime can fall back when it’s unavailable.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@pyproject.toml` around lines 61 - 63, The pyproject currently lists
'vesin[torch]' as an unconditional runtime dependency; move it into an optional
extra and update metadata injection so it's only requested for the Torch/ASE
inference path. Create a new optional extra name (e.g., "vesin-torch" or
"torch-inference") in pyproject.toml and remove 'vesin[torch]' from
[project].dependencies; instead add it under [project.optional-dependencies]
with that extra key. Update backend/dynamic_metadata.py (which currently exposes
get_pt_requirement/get_tf_requirement) to either add a new
get_vesin_requirement/get_torch_inference_requirement that returns the optional
extra name or ensure code that triggers the Torch/ASE path references the new
optional extra so runtime can fall back when vesin is unavailable.

Comment on lines +721 to +731
def test_nlist_backend_vesin_conflicts_with_neighbor_list(self) -> None:
import ase.neighborlist

with self.assertRaises(ValueError):
DeepPot(
self.tmpfile.name,
nlist_backend="vesin",
neighbor_list=ase.neighborlist.NewPrimitiveNeighborList(
cutoffs=self.rcut, bothways=True
),
)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Skip this ASE-only test when ASE is unavailable.

Lines 722-728 import ase.neighborlist unconditionally, unlike the surrounding ASE-backed tests. In an environment without ASE this test errors instead of skipping, even though the behavior under test depends on an explicit ASE neighbor list.

Suggested fix
+    `@unittest.skipUnless`(
+        importlib.util.find_spec("ase") is not None, "ase not installed"
+    )
     def test_nlist_backend_vesin_conflicts_with_neighbor_list(self) -> None:
         import ase.neighborlist
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@source/tests/pt_expt/infer/test_deep_eval.py` around lines 721 - 731, The
test test_nlist_backend_vesin_conflicts_with_neighbor_list imports
ase.neighborlist unconditionally which causes failures when ASE is not
installed; update the test to detect ASE availability and skip if missing by
attempting to import ase (or using pytest.importorskip) and calling
self.skipTest when import fails before constructing DeepPot with
nlist_backend="vesin" and
neighbor_list=ase.neighborlist.NewPrimitiveNeighborList(...); keep references to
DeepPot, nlist_backend, and neighbor_list intact so the behavior under test
remains unchanged.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR introduces an opt-in O(N) neighbor-list construction path for Python/ASE inference by integrating vesin.torch into the PyTorch (pt) and exportable PyTorch (pt_expt) inference backends, and wires the selection through DeepPot/DeepEval and the ASE DP calculator via a new nlist_backend argument.

Changes:

  • Added nlist_backend dispatch (auto/vesin/native) to inference backends and threaded it through the ASE calculator interface.
  • Implemented a device-resident vesin.torch neighbor-list builder (deepmd/pt_expt/utils/nlist.py) and integrated it into both backends.
  • Added targeted unit tests covering backend selection, fallbacks/raises, spin gating, and native-vs-vesin numerical consistency.

Reviewed changes

Copilot reviewed 10 out of 10 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
deepmd/pt/infer/deep_eval.py Adds nlist_backend selection and a vesin-powered inference evaluation path using forward_common_lower + communicate_extended_output.
deepmd/pt_expt/infer/deep_eval.py Adds nlist_backend selection and integrates vesin builder into _prepare_inputs with ASE-conflict handling.
deepmd/pt_expt/utils/nlist.py New torch-specific vesin neighbor-list builder producing extended coords/types, mapping, and candidate nlists.
deepmd/calculator.py Threads nlist_backend through the ASE calculator to DeepPot.
pyproject.toml Adds vesin[torch] to runtime dependencies.
source/tests/pt/model/test_deeppot.py Adds tests for nlist_backend behavior (auto/native/vesin, missing-vesin behavior, hessian/spin gating, consistency).
source/tests/pt/test_calculator.py Adds ASE calculator-level test ensuring nlist_backend reaches the backend and preserves results.
source/tests/pt_expt/utils/test_nlist.py New tests validating vesin builder vs native neighbor relationships and device placement.
source/tests/pt_expt/infer/test_deep_eval.py Adds pt_expt inference tests for vesin consistency, multiframe, backend selection, ASE conflict behavior, and calculator threading.
source/tests/pt_expt/infer/test_deep_eval_spin.py Adds spin-model gating test ensuring explicit vesin raises and auto falls back while matching reference outputs.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +693 to +700
"""Evaluate using an O(N) ``vesin`` neighbor list built on the host.

The neighbor list is built outside the model graph and fed to the
exported ``forward_common_lower`` interface; the extended-region output
is mapped back to local atoms with :func:`communicate_extended_output`,
exactly mirroring the native ``forward_common`` path. This avoids the
native all-pairs O(N^2) neighbor-list build, which dominates runtime for
large systems on the Python / ASE inference path.
Comment thread pyproject.toml
Comment on lines +61 to +63
# O(N) neighbor-list backend for the Python / ASE inference path; the
# [torch] extra adds vesin.torch for a device-resident (GPU) build.
'vesin[torch]',
Comment thread deepmd/calculator.py
Comment on lines +56 to +58
``"vesin"`` strictly requires the vesin path (raises if it is missing or
the model is spin / hessian); ``"native"`` forces the built-in
all-pairs builder. Ignored when an explicit ``neighbor_list`` is given.
@wanghan-iapcm
Copy link
Copy Markdown
Collaborator Author

I will make deepmd-kit directly consume vesin style nlist.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants