add gromacs to frames node by MrJulEnergy · Pull Request #463 · zincware/IPSuite

MrJulEnergy · 2026-04-03T10:00:10Z

Summary by CodeRabbit

New Features
- Added GROMACS → ASE conversion for trajectories and topologies, including optional energies, stresses, forces and velocities per frame. Supports selecting frame ranges, matches external energy data to trajectory times (with tolerance reporting), and saves converted frames to a persistent frames file for later access. Includes a command-line/demo entry to print basic frame info.

for more information, see https://pre-commit.ci

coderabbitai · 2026-04-03T10:02:23Z

📝 Walkthrough

Walkthrough

Adds a new GROMACS→ASE converter gmx_to_ase() that builds ASE Atoms frames (positions, cell/pbc, velocities, forces), parses optional .edr energy data matched by timestamps, and a Gmx2Frames zntrack node that runs the conversion and persists frames to frames.h5.

Changes

Cohort / File(s)	Summary
GROMACS-to-ASE Conversion & ZnTrack Node `ipsuite/data_loading/add_data_gromacs.py`	New module implementing `gmx_to_ase(topology, trajectory=None, edr=None, start=None, stop=None, step=None)` and `Gmx2Frames(zntrack.Node)`. Adds atom symbol resolution (elements → types → capitalization), frame iteration, positions/velocities/forces handling, EDR parsing with timestamp nearest-match and tolerance logging, energy and stress extraction, SinglePointCalculator attachment, HDF5 persistence via `znh5md.IO.extend()`, and a `frames` property to read stored `ase.Atoms` list. Includes `__main__` demonstration block.

Sequence Diagram

sequenceDiagram
    actor User
    participant Gmx2Frames as Gmx2Frames<br/>(ZnTrack Node)
    participant gmx_to_ase as gmx_to_ase<br/>(Function)
    participant MDAnalysis as MDAnalysis<br/>(Universe/Trajectory)
    participant EDRParser as EDR Parser
    participant ASE as ASE<br/>(Atoms + SinglePointCalculator)
    participant HDF5 as HDF5<br/>(frames.h5)

    User->>Gmx2Frames: run()
    Gmx2Frames->>gmx_to_ase: gmx_to_ase(topology, trajectory, edr, start, stop, step)
    gmx_to_ase->>MDAnalysis: Load topology & trajectory
    MDAnalysis-->>gmx_to_ase: Universe & timesteps

    loop per selected timestep
        gmx_to_ase->>MDAnalysis: read frame (positions, box, velocities, forces)
        alt edr provided
            gmx_to_ase->>EDRParser: load/suppress-warnings & find nearest sample by time
            EDRParser-->>gmx_to_ase: energy, pressure terms, all EDR terms
            gmx_to_ase->>gmx_to_ase: compute 6-component stress vector (from Pres-*)
        end
        gmx_to_ase->>ASE: create Atoms, set positions/cell/pbc/velocities
        gmx_to_ase->>ASE: attach SinglePointCalculator (energy, forces, stress, results)
        ASE-->>gmx_to_ase: Atoms frame
    end

    gmx_to_ase-->>Gmx2Frames: list[Atoms]
    Gmx2Frames->>HDF5: persist frames via znh5md.IO.extend()
    HDF5-->>Gmx2Frames: confirmation

    User->>Gmx2Frames: .frames
    Gmx2Frames->>HDF5: read frames
    HDF5-->>User: list[Atoms]

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Poem

🐰 I hopped through GROMACS trails one night,

found atoms, boxes, velocities in flight,
matched EDR beats by their time,
stitched energies, stresses—what a rhyme!
Frames tucked in HDF5, snug and bright.

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 60.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title 'add gromacs to frames node' directly and clearly describes the main change: adding a new GROMACS-to-frames conversion node (Gmx2Frames class) to the codebase.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch addGmx2FramesNode

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 5

🧹 Nitpick comments (1)

ipsuite/data_loading/add_data_gromacs.py (1)

134-134: Consider warning when EDR and trajectory times diverge significantly.

The nearest-time matching via argmin silently succeeds even when the time difference is large (e.g., different output frequencies). Consider logging a warning if the matched time differs by more than a small tolerance.

Example tolerance check

idx = np.argmin(np.abs(edr_times - ts.time))
time_diff = abs(edr_times[idx] - ts.time)
if time_diff > 0.1:  # ps, adjust as appropriate
    warnings.warn(f"EDR time {edr_times[idx]:.3f} differs from trajectory time {ts.time:.3f} by {time_diff:.3f} ps")

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@ipsuite/data_loading/add_data_gromacs.py` at line 134, The nearest-time
matching using idx = np.argmin(np.abs(edr_times - ts.time)) can silently match
large time gaps; in the function or loop where that line appears (e.g., the
trajectory frame handling block in add_data_gromacs.py), compute time_diff =
abs(edr_times[idx] - ts.time) after selecting idx and if time_diff exceeds a
small tolerance (e.g., 0.1 ps, configurable) emit a warning (warnings.warn or
your logger.warn) indicating the EDR time, trajectory time, and the difference
so users know the outputs are misaligned; make the tolerance configurable or
documented.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@ipsuite/data_loading/add_data_gromacs.py`:
- Line 211: The run method in the class defined in add_data_gromacs.py is
missing a return type annotation; update the def run(self): signature to include
the appropriate return type (e.g., -> None or the actual return type used) so it
conforms with the project's full type annotation guideline, and update any
related docstring or callers if necessary; locate the method by the symbol run
in that file and add the correct type hint.
- Around line 202-207: The annotated types for node attributes are too narrow:
change trajectory and edr from Path to Optional[Path] and start, stop, step from
int to Optional[int] to reflect that zntrack.deps_path(None) /
zntrack.params(None) can return None; update the import to include
typing.Optional (e.g., from typing import Optional) and modify the annotations
for trajectory, edr, start, stop, and step accordingly while leaving topology as
Path.
- Around line 239-240: The example prints may raise an IndexError and shows the
wrong unit: guard access to frames[1] by checking len(frames) >= 2 before
referencing frames[1].calc.results (or else reference frame 0 consistently), and
fix the unit label by using the actual energy units returned by
frames[0].get_potential_energy() (or an energy_unit variable if present) instead
of hardcoding "kJ/mol" (use "eV" if energies are converted to eV); update the
two print lines to conditionally print the second-frame EDR terms only when a
second frame exists and to display the correct units from the energy
value/source.
- Line 135: The energy read from the EDR file (assigned to variable energy in
the line energy = float(edr_data["Potential"][idx])) is in kJ/mol and must be
converted to eV before passing to ASE's SinglePointCalculator; change the code
to convert from kJ/mol to eV (using ase.units, e.g. multiply/divide by units.kJ,
units.mol and units.eV) and supply the converted value to the
SinglePointCalculator so downstream ASE methods return correct energies.

---

Nitpick comments:
In `@ipsuite/data_loading/add_data_gromacs.py`:
- Line 134: The nearest-time matching using idx = np.argmin(np.abs(edr_times -
ts.time)) can silently match large time gaps; in the function or loop where that
line appears (e.g., the trajectory frame handling block in add_data_gromacs.py),
compute time_diff = abs(edr_times[idx] - ts.time) after selecting idx and if
time_diff exceeds a small tolerance (e.g., 0.1 ps, configurable) emit a warning
(warnings.warn or your logger.warn) indicating the EDR time, trajectory time,
and the difference so users know the outputs are misaligned; make the tolerance
configurable or documented.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 9d200379-0275-4ef7-a180-661cd8621105

📥 Commits

Reviewing files that changed from the base of the PR and between 236f4ac and 4584b12.

📒 Files selected for processing (1)

ipsuite/data_loading/add_data_gromacs.py

ipsuite/data_loading/add_data_gromacs.py

coderabbitai · 2026-04-03T10:07:26Z

ipsuite/data_loading/add_data_gromacs.py

+                # GROMACS pressure in bar -> store as-is (not ASE native eV/Å³)
+                stress = np.array([pxx, pyy, pzz, pyz, pxz, pxy])


⚠️ Potential issue | 🟠 Major

Stress units mismatch: GROMACS pressure is in bar, ASE expects eV/Å³.

The comment acknowledges the unit discrepancy, but storing stress in bar breaks ASE's stress conventions. Any code calling atoms.get_stress() will receive values in the wrong units, leading to incorrect virial/pressure calculations downstream.

Convert bar to eV/Å³: 1 bar = 1e-4 GPa = 6.2415e-7 eV/Å³ (use ase.units for precision).

Proposed fix

+from ase import units + +# bar -> eV/Å³ +BAR_TO_EV_ANG3 = 1.0 / (units.bar / (units.eV / units.Ang**3)) + ... - # GROMACS pressure in bar -> store as-is (not ASE native eV/Å³) - stress = np.array([pxx, pyy, pzz, pyz, pxz, pxy]) + # Convert GROMACS pressure (bar) to ASE stress (eV/Å³) + stress = np.array([pxx, pyy, pzz, pyz, pxz, pxy]) * BAR_TO_EV_ANG3

ipsuite/data_loading/add_data_gromacs.py

coderabbitai · 2026-04-03T10:07:26Z

ipsuite/data_loading/add_data_gromacs.py

+    print(f"Potential energy (frame 0): {frames[0].get_potential_energy()} kJ/mol")
+    print(f"All EDR terms on frame 0: {list(frames[1].calc.results.keys())}")


⚠️ Potential issue | 🟡 Minor

Potential IndexError and misleading unit label in example.

Line 240 accesses frames[1] without checking if there are at least 2 frames.

Line 239 prints "kJ/mol" but if energy units are converted to eV (as recommended), this label will be incorrect.

Proposed fix

- print(f"Potential energy (frame 0): {frames[0].get_potential_energy()} kJ/mol") - print(f"All EDR terms on frame 0: {list(frames[1].calc.results.keys())}") + print(f"Potential energy (frame 0): {frames[0].get_potential_energy()} eV") + print(f"All EDR terms on frame 0: {list(frames[0].calc.results.keys())}")

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

print(f"Potential energy (frame 0): {frames[0].get_potential_energy()} kJ/mol")

print(f"All EDR terms on frame 0: {list(frames[1].calc.results.keys())}")

print(f"Potential energy (frame 0): {frames[0].get_potential_energy()} eV")

print(f"All EDR terms on frame 0: {list(frames[0].calc.results.keys())}")

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@ipsuite/data_loading/add_data_gromacs.py` around lines 239 - 240, The example prints may raise an IndexError and shows the wrong unit: guard access to frames[1] by checking len(frames) >= 2 before referencing frames[1].calc.results (or else reference frame 0 consistently), and fix the unit label by using the actual energy units returned by frames[0].get_potential_energy() (or an energy_unit variable if present) instead of hardcoding "kJ/mol" (use "eV" if energies are converted to eV); update the two print lines to conditionally print the second-frame EDR terms only when a second frame exists and to display the correct units from the energy value/source.

coderabbitai · 2026-04-03T10:12:00Z

Note

Docstrings generation - SUCCESS
Generated docstrings for this pull request at #464

@MrJulEnergy

Docstrings generation was requested by @MrJulEnergy. * #463 (comment) The following files were modified: * `ipsuite/data_loading/add_data_gromacs.py`

coderabbitai

Actionable comments posted: 3

♻️ Duplicate comments (1)

ipsuite/data_loading/add_data_gromacs.py (1)
156-165: ⚠️ Potential issue | 🟠 Major

Convert the pressure tensor before assigning stress.

ASE works in eV-based units, and its pressure/stress quantities are represented in eV/Å³; passing the raw Pres-* values here keeps atoms.get_stress() in bar instead. If you want to preserve the native tensor too, keep the bar numbers only under the Pres-* keys and convert the array you pass as stress. (ase-lib.org)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@ipsuite/data_loading/add_data_gromacs.py` around lines 156 - 165, The code
currently assigns raw GROMACS Pres-* values (in bar) directly to stress, which
leaves ASE seeing stress in bar; instead, convert those pressure components to
ASE internal units (eV/Å³) before assigning to the stress array. Locate the
block reading edr_data["Pres-XX"/"Pres-YY"/..."] and replace the direct
assignment to stress with a conversion step (keep the raw bar values in edr_data
if you need them) — e.g., compute a conversion factor or use ASE unit helpers
(ase.units / ase.units.convert) to convert from bar to eV/Å³ and then set stress
= np.array([pxx_converted, pyy_converted, pzz_converted, pyz_converted,
pxz_converted, pxy_converted]) so atoms.get_stress() returns values in ASE
units.

🧹 Nitpick comments (1)

ipsuite/data_loading/add_data_gromacs.py (1)

210-219: Make the node example runnable end-to-end.

Right now the example stops after construction, so it isn't testable in the same way as the other data-loading nodes. Add project.repro() and one observable result/assertion.

📝 Suggested docstring tweak

     >>> with project:
     ...     md = ips.Gmx2Frames(
     ...         topology="gromacs/system.gro",
     ...         trajectory="gromacs/production.xtc",
     ...         edr="gromacs/production.edr",
     ...         start=1,
     ...     )
+    >>> project.repro()
+    >>> len(md.frames) > 0
+    True

As per coding guidelines: "Each Node’s docstring should include a testable example using project and ips".

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@ipsuite/data_loading/add_data_gromacs.py` around lines 210 - 219, The
docstring example for Gmx2Frames stops after construction and isn't runnable;
update it to call project.repro() and assert one observable outcome so it
executes end-to-end. Specifically, after constructing md = ips.Gmx2Frames(...),
invoke project.repro() and then check a result such as verifying md.frames (or
another public attribute/method on Gmx2Frames) has expected length/type (e.g.,
assert len(md.frames) > 0 or isinstance(md.frames, list/ndarray)) so the example
both runs and asserts behavior.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@ipsuite/data_loading/add_data_gromacs.py`:
- Around line 60-73: _match_edr_frame currently scans the whole edr_times array
per frame which is O(n×m) and silently returns an out-of-tolerance index;
replace the np.argmin approach with np.searchsorted on the monotonic edr_times
(or use a moving pointer maintained by the caller) to find the nearest candidate
on the left/right, compute the nearest index and time_diff, and if time_diff >
tolerance reject the match (raise a ValueError or return a sentinel like
None/-1) instead of returning a bad index; update callers to handle the
rejection and keep the function signature (_match_edr_frame(edr_times,
traj_time, tolerance)) and logging behavior.
- Around line 147-179: ts.forces are left in MDAnalysis units (kJ/(mol·Å)) but
ASE expects eV/Å; multiply the copied forces by the same conversion factor used
for energy (the (kJ / mol) factor) before passing into SinglePointCalculator.
Update the code that sets forces (the ts.forces.copy() branch) to convert units
(e.g., forces = ts.forces.copy() * (kJ / mol) when ts.has_forces) so the forces
variable passed to SinglePointCalculator(atoms, energy=..., forces=forces,
stress=...) is in eV/Å.
- Around line 44-57: The code currently derives element symbols from
u.atoms.types which misclassifies CHARMM/AMBER types; change the logic to use
atom names and MDAnalysis's element guessing instead: use u.atoms.names (or call
u.guess_TopologyAttrs(to_guess=['elements']) and read u.atoms.elements) and
build the symbols list from those guessed elements (or rely on
MDAnalysis.DefaultGuesser behavior) rather than transforming types; update the
block that references types, t, and symbols to use names/elements (keeping the
same return of symbols) and remove the manual type-based heuristics.

---

Duplicate comments:
In `@ipsuite/data_loading/add_data_gromacs.py`:
- Around line 156-165: The code currently assigns raw GROMACS Pres-* values (in
bar) directly to stress, which leaves ASE seeing stress in bar; instead, convert
those pressure components to ASE internal units (eV/Å³) before assigning to the
stress array. Locate the block reading edr_data["Pres-XX"/"Pres-YY"/..."] and
replace the direct assignment to stress with a conversion step (keep the raw bar
values in edr_data if you need them) — e.g., compute a conversion factor or use
ASE unit helpers (ase.units / ase.units.convert) to convert from bar to eV/Å³
and then set stress = np.array([pxx_converted, pyy_converted, pzz_converted,
pyz_converted, pxz_converted, pxy_converted]) so atoms.get_stress() returns
values in ASE units.

---

Nitpick comments:
In `@ipsuite/data_loading/add_data_gromacs.py`:
- Around line 210-219: The docstring example for Gmx2Frames stops after
construction and isn't runnable; update it to call project.repro() and assert
one observable outcome so it executes end-to-end. Specifically, after
constructing md = ips.Gmx2Frames(...), invoke project.repro() and then check a
result such as verifying md.frames (or another public attribute/method on
Gmx2Frames) has expected length/type (e.g., assert len(md.frames) > 0 or
isinstance(md.frames, list/ndarray)) so the example both runs and asserts
behavior.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 3211bbb4-e6f9-44fd-bc99-ce5c595a7702

📥 Commits

Reviewing files that changed from the base of the PR and between 4584b12 and c4a2475.

📒 Files selected for processing (1)

ipsuite/data_loading/add_data_gromacs.py

coderabbitai · 2026-04-03T10:42:20Z

ipsuite/data_loading/add_data_gromacs.py

+    # 2. Use atom types (usually cleaner than names for CHARMM-GUI)
+    types = u.atoms.types
+    symbols = []
+    for t in types:
+        t_upper = t.upper()
+        if t_upper in _TYPE_TO_ELEMENT:
+            symbols.append(_TYPE_TO_ELEMENT[t_upper])
+        elif len(t) <= 2 and t[0].isalpha():
+            # Capitalize properly: first letter upper, rest lower
+            symbols.append(t[0].upper() + t[1:].lower() if len(t) > 1 else t.upper())
+        else:
+            # Last resort: take leading alphabetic characters from atom name
+            symbols.append(t[0].upper())
+    return symbols


⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

cat -n ipsuite/data_loading/add_data_gromacs.py | head -100

Repository: zincware/IPSuite

Length of output: 3699

🏁 Script executed:

# Check the full _get_symbols function and surrounding context rg -A 30 "def _get_symbols" ipsuite/data_loading/add_data_gromacs.py

Repository: zincware/IPSuite

Length of output: 1268

🏁 Script executed:

# Look for how _get_symbols is used in the file rg "_get_symbols" ipsuite/data_loading/add_data_gromacs.py -B 2 -A 2

Repository: zincware/IPSuite

Length of output: 335

🏁 Script executed:

# Check if there's any existing logic that uses u.atoms.names rg "u\.atoms\.(names|types)" ipsuite/data_loading/add_data_gromacs.py -B 2 -A 2

Repository: zincware/IPSuite

Length of output: 191

🏁 Script executed:

# Let's understand MDAnalysis element guessing behavior better python << 'EOF' # Check what MDAnalysis does with element guessing import sys # Try to get info about the default guesser try: from MDAnalysis.guesser import DEFAULT_GUESSER print("DEFAULT_GUESSER available") print(f"Type: {type(DEFAULT_GUESSER)}") except ImportError: print("DEFAULT_GUESSER not directly available") # Check the implementation try: from MDAnalysis.guesser.default import DefaultGuesser print("DefaultGuesser available") # Check what the guesser does import inspect print(inspect.getsourcefile(DefaultGuesser)) except ImportError as e: print(f"Could not import DefaultGuesser: {e}") EOF

Repository: zincware/IPSuite

Length of output: 160

🏁 Script executed:

# Let's search for documentation or usage of element guessing in the codebase rg "guess.*element|element.*guess" ipsuite/ -i

Repository: zincware/IPSuite

Length of output: 42

🏁 Script executed:

# Check if there are any tests or examples showing how _get_symbols should work rg -l "_get_symbols|gmx_to_ase" ipsuite/ --type py

Repository: zincware/IPSuite

Length of output: 100

🏁 Script executed:

# Look at the broader context - is there any test file for this module? fd "test.*gromacs|gromacs.*test" ipsuite/ -type f

Repository: zincware/IPSuite

Length of output: 230

🏁 Script executed:

# Let's check what MDAnalysis version is being used and look at its documentation rg "MDAnalysis" ipsuite/ setup.py pyproject.toml requirements.txt -A 1 -B 1

Repository: zincware/IPSuite

Length of output: 2179

🏁 Script executed:

# Check if there are any comments or documentation about element inference in the file rg "element|symbol" ipsuite/data_loading/add_data_gromacs.py -i

Repository: zincware/IPSuite

Length of output: 647

🏁 Script executed:

# Let's check if there are other usages of this in the codebase to understand the impact rg "gmx_to_ase|_get_symbols" ipsuite/ -l

Repository: zincware/IPSuite

Length of output: 100

🏁 Script executed:

# Check the pyproject.toml or similar for MDAnalysis version info find ipsuite -name "pyproject.toml" -o -name "setup.cfg" | head -5 | xargs cat 2>/dev/null | head -100

Repository: zincware/IPSuite

Length of output: 42

🏁 Script executed:

# Look at what ASE Atoms expects for symbols python << 'EOF' # Check what happens when you create ASE atoms with invalid symbols # This will help us understand the severity test_cases = [ "Ow", # Invalid - OW type "Ca", # Calcium "C", # Carbon "O", # Oxygen ] # Check if these are valid element symbols in ASE try: from ase.data import atomic_numbers for sym in test_cases: try: an = atomic_numbers[sym] print(f"✓ {sym:3s} -> atomic number {an}") except KeyError: print(f"✗ {sym:3s} -> INVALID/UNKNOWN") except ImportError: print("ASE not available in sandbox, but we can reason about this") print("Standard element symbols are 1-2 chars, capitalized properly") for sym in test_cases: # Standard chemistry rules if sym == "Ow": print(f"✗ {sym} -> INVALID (no such element)") elif sym == "Ca": print(f"✓ {sym} -> Calcium (valid but semantically wrong for Carbon Alpha)") elif sym in ["C", "O"]: print(f"✓ {sym} -> Valid") EOF

Repository: zincware/IPSuite

Length of output: 303

🏁 Script executed:

# Let's also verify what the comment says about MDAnalysis names vs types # Look at the gmx.py file that might have similar logic cat -n ipsuite/configuration_generation/gmx.py | grep -A 20 "def timestep_to_atoms"

Repository: zincware/IPSuite

Length of output: 842

🏁 Script executed:

# Let's check what happens after symbols are used - does ASE validate them? rg -A 5 "Atoms(symbols=symbols" ipsuite/data_loading/add_data_gromacs.py

Repository: zincware/IPSuite

Length of output: 137

🏁 Script executed:

# Let's understand the problem better by looking at actual GROMACS atom types # Check if there are any test files or examples find . -name "*.gro" -o -name "*.tpr" -o -name "*.xtc" 2>/dev/null | head -5

Repository: zincware/IPSuite

Length of output: 42

🏁 Script executed:

# Search for any documentation on the expected behavior rg "water|OW|CA" ipsuite/data_loading/ -i

Repository: zincware/IPSuite

Length of output: 1415

🏁 Script executed:

# Let's check what the MDAnalysis documentation actually says about element guessing web_search MDAnalysis element guessing default guesser names types

Repository: zincware/IPSuite

Length of output: 159

🌐 Web query:

MDAnalysis element guessing default guesser documentation

💡 Result:

The default guesser for element guessing in MDAnalysis is DefaultGuesser from MDAnalysis.guesser.default_guesser. It is the generic, default guessing class used for most topologies, particularly following biological naming conventions (e.g., "CA" guessed as carbon, not calcium). Element guessing behavior: The guess_atom_element(atomname) method strips numbers and symbols from the atom name, looks it up in MDAnalysis.guesser.tables.atomelements (derived from CHARMM/AMBER tables), removes the last character if not found, or falls back to the first character of the name if still not found. Empty names return empty string. When guessing types, they represent the element using the same method. It is used automatically as the default context (context='default') when creating a Universe with to_guess=['types', 'elements'] or via u.guess_TopologyAttrs(to_guess=['elements']). Pre-2.8.0, standalone functions like guess_atom_element in MDAnalysis.topology.guessers were used (now deprecated). Example: import MDAnalysis as mda u = mda.Universe('topology.gro', context='default', to_guess=['elements']) # or u.guess_TopologyAttrs(to_guess=['elements']) Official documentation: https://docs.mdanalysis.org/stable/documentation_pages/guesser_modules/default_guesser.html and https://userguide.mdanalysis.org/stable/formats/guessers/default.html (MDAnalysis 2.10.0 as of latest docs).

Citations:

1: https://docs.mdanalysis.org/stable/documentation_pages/guesser_modules/default_guesser.html

2: https://userguide.mdanalysis.org/stable/formats/guessers/default.html

3: https://docs.mdanalysis.org/2.1.0/_modules/MDAnalysis/topology/guessers.html

4: https://userguide.mdanalysis.org/stable/guessing.html

🏁 Script executed:

# Let's look at the actual usage and see if invalid symbols cause problems cat -n ipsuite/data_loading/add_data_gromacs.py | sed -n '120,160p'

Repository: zincware/IPSuite

Length of output: 1823

🏁 Script executed:

# Check if there's validation happening elsewhere rg "atomic_numbers\|invalid.*symbol\|element.*valid" ipsuite/data_loading/add_data_gromacs.py

Repository: zincware/IPSuite

Length of output: 42

🏁 Script executed:

# Let's examine what happens with "Ow" type - is it even a realistic scenario? # Check if there are any CHARMM-GUI or common GROMACS force field references rg "CHARMM|amber|oplsaa|gromos" ipsuite/ -i | head -10

Repository: zincware/IPSuite

Length of output: 168

🏁 Script executed:

# Let's look at the exact lines where symbols are used to create Atoms cat -n ipsuite/data_loading/add_data_gromacs.py | sed -n '136,155p'

Repository: zincware/IPSuite

Length of output: 928

Use MDAnalysis's proper element guessing from atom names, not types.

The current approach using u.atoms.types creates semantic bugs:

OW (water oxygen in CHARMM-GUI) becomes invalid Ow

CA (carbon alpha) is hardcoded to Ca (calcium) instead of being guessed as carbon

MDAnalysis's DefaultGuesser operates on atom names using domain-standard rules (stripping digits/symbols, lookups in CHARMM/AMBER tables). Replace the manual type-based logic with u.atoms.names and leverage MDAnalysis's built-in element inference, or call u.guess_TopologyAttrs(to_guess=['elements']) to populate u.atoms.elements properly.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@ipsuite/data_loading/add_data_gromacs.py` around lines 44 - 57, The code currently derives element symbols from u.atoms.types which misclassifies CHARMM/AMBER types; change the logic to use atom names and MDAnalysis's element guessing instead: use u.atoms.names (or call u.guess_TopologyAttrs(to_guess=['elements']) and read u.atoms.elements) and build the symbols list from those guessed elements (or rely on MDAnalysis.DefaultGuesser behavior) rather than transforming types; update the block that references types, t, and symbols to use names/elements (keeping the same return of symbols) and remove the manual type-based heuristics.

coderabbitai · 2026-04-03T10:42:20Z

ipsuite/data_loading/add_data_gromacs.py

+def _match_edr_frame(
+    edr_times: np.ndarray, traj_time: float, tolerance: float = 0.1
+) -> int:
+    """Find the EDR index closest to a trajectory time, warning on large gaps."""
+    idx = int(np.argmin(np.abs(edr_times - traj_time)))
+    time_diff = abs(edr_times[idx] - traj_time)
+    if time_diff > tolerance:
+        logger.warning(
+            "EDR time %.3f ps does not match trajectory time %.3f ps (diff=%.3f ps)",
+            edr_times[idx],
+            traj_time,
+            time_diff,
+        )
+    return idx


⚠️ Potential issue | 🟠 Major

Avoid quadratic EDR matching and reject missed timestamps.

np.argmin(np.abs(edr_times - traj_time)) rescans the full EDR axis for every frame, so matching becomes O(n×m) on long runs. It also still returns a row after tolerance is exceeded, which means shifted or downsampled .edr files silently annotate frames with the wrong labels. If the time axes are monotonic, np.searchsorted or a moving pointer avoids the repeated scan and makes it easy to skip/raise on out-of-tolerance matches.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@ipsuite/data_loading/add_data_gromacs.py` around lines 60 - 73, _match_edr_frame currently scans the whole edr_times array per frame which is O(n×m) and silently returns an out-of-tolerance index; replace the np.argmin approach with np.searchsorted on the monotonic edr_times (or use a moving pointer maintained by the caller) to find the nearest candidate on the left/right, compute the nearest index and time_diff, and if time_diff > tolerance reject the match (raise a ValueError or return a sentinel like None/-1) instead of returning a bad index; update callers to handle the rejection and keep the function signature (_match_edr_frame(edr_times, traj_time, tolerance)) and logging behavior.

coderabbitai · 2026-04-03T10:42:20Z

ipsuite/data_loading/add_data_gromacs.py

+        forces = ts.forces.copy() if ts.has_forces else None
+        energy = None
+        stress = None
+        extra_results = {}
+
+        if edr_data is not None:
+            idx = _match_edr_frame(edr_times, ts.time)
+            energy = float(edr_data["Potential"][idx]) * (kJ / mol)  # convert to eV
+
+            # Build Voigt stress from pressure tensor if available
+            try:
+                pxx = edr_data["Pres-XX"][idx]
+                pyy = edr_data["Pres-YY"][idx]
+                pzz = edr_data["Pres-ZZ"][idx]
+                pyz = edr_data["Pres-YZ"][idx]
+                pxz = edr_data["Pres-XZ"][idx]
+                pxy = edr_data["Pres-XY"][idx]
+                # GROMACS pressure in bar -> store as-is (not ASE native eV/Å³)
+                stress = np.array([pxx, pyy, pzz, pyz, pxz, pxy])
+            except KeyError:
+                pass
+
+            # Store all EDR terms for this frame
+            for term in edr_terms:
+                extra_results[term] = float(edr_data[term][idx])
+
+        if energy is not None or forces is not None:
+            calc = SinglePointCalculator(
+                atoms,
+                energy=energy,
+                forces=forces,
+                stress=stress,
+            )


⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

# First, let's look at the full file context cd /tmp && find . -name "add_data_gromacs.py" -type f 2>/dev/null | head -5

Repository: zincware/IPSuite

Length of output: 42

🏁 Script executed:

# Find the file in the repository find . -name "add_data_gromacs.py" -type f 2>/dev/null

Repository: zincware/IPSuite

Length of output: 102

🏁 Script executed:

# Read the relevant file section to understand context cat -n ipsuite/data_loading/add_data_gromacs.py | head -200

Repository: zincware/IPSuite

Length of output: 7743

🏁 Script executed:

# Check imports to see how kJ/mol is defined head -50 ipsuite/data_loading/add_data_gromacs.py

Repository: zincware/IPSuite

Length of output: 1203

🌐 Web query:

MDAnalysis force units trajectory trajectories

💡 Result:

In MDAnalysis, the base unit for forces in trajectories is kJ/(mol·Å). This applies when forces are present in the trajectory file format (e.g., via Universe.atoms.forces). MDAnalysis automatically converts forces to this unit upon reading trajectories, regardless of the original MD simulation format. Other units like kcal/(mol·Å) or N can be converted using MDAnalysis.units.convert.

Citations:

1: https://userguide.mdanalysis.org/stable/units.html

2: https://mdanalysisfork.readthedocs.io/en/latest/_modules/MDAnalysis/units.html

3: https://docs.mdanalysis.org/2.3.0/documentation_pages/units.html

4: https://userguide.mdanalysis.org/stable/trajectories/trajectories.html

5: https://docs.mdanalysis.org/2.9.0/documentation_pages/units.html

6: https://docs.mdanalysis.org/stable/documentation_pages/units.html

🌐 Web query:

ASE SinglePointCalculator force units eV Angstrom

💡 Result:

In the Atomic Simulation Environment (ASE), the SinglePointCalculator stores pre-computed results (energy, forces, stress, etc.) for a specific atomic configuration from an external calculation. It does not perform new calculations but returns the stored values via methods like get_forces. ASE uses consistent internal units: energies in eV, lengths in Angstroms (Å). Consequently, forces are in eV/Å. The SinglePointCalculator preserves whatever units the input data had when created (typically from I/O readers that convert external code outputs to ASE units). When reading results from formats like CASTEP .geom or ABINIT output, forces are explicitly converted to eV/Angstrom. For example: - CASTEP .geom reader: "Note that everything in the .geom file is in atomic units, which has been conversed to commonly used unit angstrom(length) and eV (energy)." - ABINIT reader: forces from 'cartesian_forces' converted using Hartree / Bohr = eV/Å. All ASE calculators, including those populating SinglePointCalculator, follow this convention (eV for energy, eV/Å for forces), as confirmed by documentation, tutorials (e.g., EMT calculator forces in eV/Å), and community sources. Thus, forces from ASE SinglePointCalculator are in eV/Å.

Citations:

1: https://mattermodeling.stackexchange.com/questions/11625/how-to-check-the-units-from-ase-get-forces-function

2: https://narupatools.readthedocs.io/en/stable/_autosummary/narupatools.ase.calculators.onebody_potential_calculator.OneBodyPotentialCalculator.html

3: https://ase-workshop-2023.github.io/tutorial/05-in-built-calculators/index.html

4: https://github.com/PaNOSC-ViNYL/ase/blob/master/ase/calculators/singlepoint.py

5: https://ase-lib.org/_modules/ase/io/abinit.html

6: https://ase-lib.org/ase/units.html

7: https://microsoft.github.io/skala/ase.html

8: https://wiki.fysik.dtu.dk/~askhl/ase-doc/ase/units.html

9: https://ase-lib.org/development/proposals/calculators.html

10: https://ase-lib.org/_modules/ase/units.html

Convert forces to eV/Å before creating the calculator.

Line 147 forwards ts.forces unchanged, but MDAnalysis uses kJ/(mol·Å) as its base force unit while ASE expects forces in eV/Å. That leaves every stored force off by the same kJ/mol → eV factor you're already applying to Potential on line 154.

Suggested fix

- forces = ts.forces.copy() if ts.has_forces else None + forces = ts.forces.copy() * (kJ / mol) if ts.has_forces else None

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@ipsuite/data_loading/add_data_gromacs.py` around lines 147 - 179, ts.forces are left in MDAnalysis units (kJ/(mol·Å)) but ASE expects eV/Å; multiply the copied forces by the same conversion factor used for energy (the (kJ / mol) factor) before passing into SinglePointCalculator. Update the code that sets forces (the ts.forces.copy() branch) to convert units (e.g., forces = ts.forces.copy() * (kJ / mol) when ts.has_forces) so the forces variable passed to SinglePointCalculator(atoms, energy=..., forces=forces, stress=...) is in eV/Å.

MrJulEnergy and others added 2 commits April 3, 2026 11:59

add gromacs to frames node

2f21136

[pre-commit.ci] auto fixes from pre-commit.com hooks

4584b12

for more information, see https://pre-commit.ci

coderabbitai bot reviewed Apr 3, 2026

View reviewed changes

coderabbitai bot added a commit that referenced this pull request Apr 3, 2026

📝 Add docstrings to addGmx2FramesNode

9f805c0

Docstrings generation was requested by @MrJulEnergy. * #463 (comment) The following files were modified: * `ipsuite/data_loading/add_data_gromacs.py`

coderabbitai bot mentioned this pull request Apr 3, 2026

📝 Add docstrings to addGmx2FramesNode #464

Open

fix code rabbit remarks

c4a2475

coderabbitai bot reviewed Apr 3, 2026

View reviewed changes

		# GROMACS pressure in bar -> store as-is (not ASE native eV/Å³)
		stress = np.array([pxx, pyy, pzz, pyz, pxz, pxy])

		print(f"Potential energy (frame 0): {frames[0].get_potential_energy()} kJ/mol")
		print(f"All EDR terms on frame 0: {list(frames[1].calc.results.keys())}")

Conversation

MrJulEnergy commented Apr 3, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Apr 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram

Estimated code review effort

Poem

❌ Failed checks (1 warning)

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot Apr 3, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot Apr 3, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot commented Apr 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Apr 3, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Apr 3, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Apr 3, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

MrJulEnergy commented Apr 3, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Apr 3, 2026 •

edited

Loading

coderabbitai bot commented Apr 3, 2026 •

edited

Loading