Conversation
for more information, see https://pre-commit.ci
📝 WalkthroughWalkthroughAdds a new GROMACS→ASE converter Changes
Sequence DiagramsequenceDiagram
actor User
participant Gmx2Frames as Gmx2Frames<br/>(ZnTrack Node)
participant gmx_to_ase as gmx_to_ase<br/>(Function)
participant MDAnalysis as MDAnalysis<br/>(Universe/Trajectory)
participant EDRParser as EDR Parser
participant ASE as ASE<br/>(Atoms + SinglePointCalculator)
participant HDF5 as HDF5<br/>(frames.h5)
User->>Gmx2Frames: run()
Gmx2Frames->>gmx_to_ase: gmx_to_ase(topology, trajectory, edr, start, stop, step)
gmx_to_ase->>MDAnalysis: Load topology & trajectory
MDAnalysis-->>gmx_to_ase: Universe & timesteps
loop per selected timestep
gmx_to_ase->>MDAnalysis: read frame (positions, box, velocities, forces)
alt edr provided
gmx_to_ase->>EDRParser: load/suppress-warnings & find nearest sample by time
EDRParser-->>gmx_to_ase: energy, pressure terms, all EDR terms
gmx_to_ase->>gmx_to_ase: compute 6-component stress vector (from Pres-*)
end
gmx_to_ase->>ASE: create Atoms, set positions/cell/pbc/velocities
gmx_to_ase->>ASE: attach SinglePointCalculator (energy, forces, stress, results)
ASE-->>gmx_to_ase: Atoms frame
end
gmx_to_ase-->>Gmx2Frames: list[Atoms]
Gmx2Frames->>HDF5: persist frames via znh5md.IO.extend()
HDF5-->>Gmx2Frames: confirmation
User->>Gmx2Frames: .frames
Gmx2Frames->>HDF5: read frames
HDF5-->>User: list[Atoms]
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 5
🧹 Nitpick comments (1)
ipsuite/data_loading/add_data_gromacs.py (1)
134-134: Consider warning when EDR and trajectory times diverge significantly.The nearest-time matching via
argminsilently succeeds even when the time difference is large (e.g., different output frequencies). Consider logging a warning if the matched time differs by more than a small tolerance.Example tolerance check
idx = np.argmin(np.abs(edr_times - ts.time)) time_diff = abs(edr_times[idx] - ts.time) if time_diff > 0.1: # ps, adjust as appropriate warnings.warn(f"EDR time {edr_times[idx]:.3f} differs from trajectory time {ts.time:.3f} by {time_diff:.3f} ps")🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@ipsuite/data_loading/add_data_gromacs.py` at line 134, The nearest-time matching using idx = np.argmin(np.abs(edr_times - ts.time)) can silently match large time gaps; in the function or loop where that line appears (e.g., the trajectory frame handling block in add_data_gromacs.py), compute time_diff = abs(edr_times[idx] - ts.time) after selecting idx and if time_diff exceeds a small tolerance (e.g., 0.1 ps, configurable) emit a warning (warnings.warn or your logger.warn) indicating the EDR time, trajectory time, and the difference so users know the outputs are misaligned; make the tolerance configurable or documented.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@ipsuite/data_loading/add_data_gromacs.py`:
- Line 211: The run method in the class defined in add_data_gromacs.py is
missing a return type annotation; update the def run(self): signature to include
the appropriate return type (e.g., -> None or the actual return type used) so it
conforms with the project's full type annotation guideline, and update any
related docstring or callers if necessary; locate the method by the symbol run
in that file and add the correct type hint.
- Around line 202-207: The annotated types for node attributes are too narrow:
change trajectory and edr from Path to Optional[Path] and start, stop, step from
int to Optional[int] to reflect that zntrack.deps_path(None) /
zntrack.params(None) can return None; update the import to include
typing.Optional (e.g., from typing import Optional) and modify the annotations
for trajectory, edr, start, stop, and step accordingly while leaving topology as
Path.
- Around line 239-240: The example prints may raise an IndexError and shows the
wrong unit: guard access to frames[1] by checking len(frames) >= 2 before
referencing frames[1].calc.results (or else reference frame 0 consistently), and
fix the unit label by using the actual energy units returned by
frames[0].get_potential_energy() (or an energy_unit variable if present) instead
of hardcoding "kJ/mol" (use "eV" if energies are converted to eV); update the
two print lines to conditionally print the second-frame EDR terms only when a
second frame exists and to display the correct units from the energy
value/source.
- Line 135: The energy read from the EDR file (assigned to variable energy in
the line energy = float(edr_data["Potential"][idx])) is in kJ/mol and must be
converted to eV before passing to ASE's SinglePointCalculator; change the code
to convert from kJ/mol to eV (using ase.units, e.g. multiply/divide by units.kJ,
units.mol and units.eV) and supply the converted value to the
SinglePointCalculator so downstream ASE methods return correct energies.
---
Nitpick comments:
In `@ipsuite/data_loading/add_data_gromacs.py`:
- Line 134: The nearest-time matching using idx = np.argmin(np.abs(edr_times -
ts.time)) can silently match large time gaps; in the function or loop where that
line appears (e.g., the trajectory frame handling block in add_data_gromacs.py),
compute time_diff = abs(edr_times[idx] - ts.time) after selecting idx and if
time_diff exceeds a small tolerance (e.g., 0.1 ps, configurable) emit a warning
(warnings.warn or your logger.warn) indicating the EDR time, trajectory time,
and the difference so users know the outputs are misaligned; make the tolerance
configurable or documented.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 9d200379-0275-4ef7-a180-661cd8621105
📒 Files selected for processing (1)
ipsuite/data_loading/add_data_gromacs.py
| # GROMACS pressure in bar -> store as-is (not ASE native eV/ų) | ||
| stress = np.array([pxx, pyy, pzz, pyz, pxz, pxy]) |
There was a problem hiding this comment.
Stress units mismatch: GROMACS pressure is in bar, ASE expects eV/ų.
The comment acknowledges the unit discrepancy, but storing stress in bar breaks ASE's stress conventions. Any code calling atoms.get_stress() will receive values in the wrong units, leading to incorrect virial/pressure calculations downstream.
Convert bar to eV/ų: 1 bar = 1e-4 GPa = 6.2415e-7 eV/ų (use ase.units for precision).
Proposed fix
+from ase import units
+
+# bar -> eV/ų
+BAR_TO_EV_ANG3 = 1.0 / (units.bar / (units.eV / units.Ang**3))
+
...
- # GROMACS pressure in bar -> store as-is (not ASE native eV/ų)
- stress = np.array([pxx, pyy, pzz, pyz, pxz, pxy])
+ # Convert GROMACS pressure (bar) to ASE stress (eV/ų)
+ stress = np.array([pxx, pyy, pzz, pyz, pxz, pxy]) * BAR_TO_EV_ANG3| print(f"Potential energy (frame 0): {frames[0].get_potential_energy()} kJ/mol") | ||
| print(f"All EDR terms on frame 0: {list(frames[1].calc.results.keys())}") |
There was a problem hiding this comment.
Potential IndexError and misleading unit label in example.
- Line 240 accesses
frames[1]without checking if there are at least 2 frames. - Line 239 prints "kJ/mol" but if energy units are converted to eV (as recommended), this label will be incorrect.
Proposed fix
- print(f"Potential energy (frame 0): {frames[0].get_potential_energy()} kJ/mol")
- print(f"All EDR terms on frame 0: {list(frames[1].calc.results.keys())}")
+ print(f"Potential energy (frame 0): {frames[0].get_potential_energy()} eV")
+ print(f"All EDR terms on frame 0: {list(frames[0].calc.results.keys())}")📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| print(f"Potential energy (frame 0): {frames[0].get_potential_energy()} kJ/mol") | |
| print(f"All EDR terms on frame 0: {list(frames[1].calc.results.keys())}") | |
| print(f"Potential energy (frame 0): {frames[0].get_potential_energy()} eV") | |
| print(f"All EDR terms on frame 0: {list(frames[0].calc.results.keys())}") |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@ipsuite/data_loading/add_data_gromacs.py` around lines 239 - 240, The example
prints may raise an IndexError and shows the wrong unit: guard access to
frames[1] by checking len(frames) >= 2 before referencing frames[1].calc.results
(or else reference frame 0 consistently), and fix the unit label by using the
actual energy units returned by frames[0].get_potential_energy() (or an
energy_unit variable if present) instead of hardcoding "kJ/mol" (use "eV" if
energies are converted to eV); update the two print lines to conditionally print
the second-frame EDR terms only when a second frame exists and to display the
correct units from the energy value/source.
|
Note Docstrings generation - SUCCESS |
Docstrings generation was requested by @MrJulEnergy. * #463 (comment) The following files were modified: * `ipsuite/data_loading/add_data_gromacs.py`
There was a problem hiding this comment.
Actionable comments posted: 3
♻️ Duplicate comments (1)
ipsuite/data_loading/add_data_gromacs.py (1)
156-165:⚠️ Potential issue | 🟠 MajorConvert the pressure tensor before assigning
stress.ASE works in eV-based units, and its pressure/stress quantities are represented in
eV/ų; passing the rawPres-*values here keepsatoms.get_stress()in bar instead. If you want to preserve the native tensor too, keep the bar numbers only under thePres-*keys and convert the array you pass asstress. (ase-lib.org)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@ipsuite/data_loading/add_data_gromacs.py` around lines 156 - 165, The code currently assigns raw GROMACS Pres-* values (in bar) directly to stress, which leaves ASE seeing stress in bar; instead, convert those pressure components to ASE internal units (eV/ų) before assigning to the stress array. Locate the block reading edr_data["Pres-XX"/"Pres-YY"/..."] and replace the direct assignment to stress with a conversion step (keep the raw bar values in edr_data if you need them) — e.g., compute a conversion factor or use ASE unit helpers (ase.units / ase.units.convert) to convert from bar to eV/ų and then set stress = np.array([pxx_converted, pyy_converted, pzz_converted, pyz_converted, pxz_converted, pxy_converted]) so atoms.get_stress() returns values in ASE units.
🧹 Nitpick comments (1)
ipsuite/data_loading/add_data_gromacs.py (1)
210-219: Make the node example runnable end-to-end.Right now the example stops after construction, so it isn't testable in the same way as the other data-loading nodes. Add
project.repro()and one observable result/assertion.📝 Suggested docstring tweak
>>> with project: ... md = ips.Gmx2Frames( ... topology="gromacs/system.gro", ... trajectory="gromacs/production.xtc", ... edr="gromacs/production.edr", ... start=1, ... ) + >>> project.repro() + >>> len(md.frames) > 0 + TrueAs per coding guidelines: "Each Node’s docstring should include a testable example using
projectandips".🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@ipsuite/data_loading/add_data_gromacs.py` around lines 210 - 219, The docstring example for Gmx2Frames stops after construction and isn't runnable; update it to call project.repro() and assert one observable outcome so it executes end-to-end. Specifically, after constructing md = ips.Gmx2Frames(...), invoke project.repro() and then check a result such as verifying md.frames (or another public attribute/method on Gmx2Frames) has expected length/type (e.g., assert len(md.frames) > 0 or isinstance(md.frames, list/ndarray)) so the example both runs and asserts behavior.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@ipsuite/data_loading/add_data_gromacs.py`:
- Around line 60-73: _match_edr_frame currently scans the whole edr_times array
per frame which is O(n×m) and silently returns an out-of-tolerance index;
replace the np.argmin approach with np.searchsorted on the monotonic edr_times
(or use a moving pointer maintained by the caller) to find the nearest candidate
on the left/right, compute the nearest index and time_diff, and if time_diff >
tolerance reject the match (raise a ValueError or return a sentinel like
None/-1) instead of returning a bad index; update callers to handle the
rejection and keep the function signature (_match_edr_frame(edr_times,
traj_time, tolerance)) and logging behavior.
- Around line 147-179: ts.forces are left in MDAnalysis units (kJ/(mol·Å)) but
ASE expects eV/Å; multiply the copied forces by the same conversion factor used
for energy (the (kJ / mol) factor) before passing into SinglePointCalculator.
Update the code that sets forces (the ts.forces.copy() branch) to convert units
(e.g., forces = ts.forces.copy() * (kJ / mol) when ts.has_forces) so the forces
variable passed to SinglePointCalculator(atoms, energy=..., forces=forces,
stress=...) is in eV/Å.
- Around line 44-57: The code currently derives element symbols from
u.atoms.types which misclassifies CHARMM/AMBER types; change the logic to use
atom names and MDAnalysis's element guessing instead: use u.atoms.names (or call
u.guess_TopologyAttrs(to_guess=['elements']) and read u.atoms.elements) and
build the symbols list from those guessed elements (or rely on
MDAnalysis.DefaultGuesser behavior) rather than transforming types; update the
block that references types, t, and symbols to use names/elements (keeping the
same return of symbols) and remove the manual type-based heuristics.
---
Duplicate comments:
In `@ipsuite/data_loading/add_data_gromacs.py`:
- Around line 156-165: The code currently assigns raw GROMACS Pres-* values (in
bar) directly to stress, which leaves ASE seeing stress in bar; instead, convert
those pressure components to ASE internal units (eV/ų) before assigning to the
stress array. Locate the block reading edr_data["Pres-XX"/"Pres-YY"/..."] and
replace the direct assignment to stress with a conversion step (keep the raw bar
values in edr_data if you need them) — e.g., compute a conversion factor or use
ASE unit helpers (ase.units / ase.units.convert) to convert from bar to eV/ų
and then set stress = np.array([pxx_converted, pyy_converted, pzz_converted,
pyz_converted, pxz_converted, pxy_converted]) so atoms.get_stress() returns
values in ASE units.
---
Nitpick comments:
In `@ipsuite/data_loading/add_data_gromacs.py`:
- Around line 210-219: The docstring example for Gmx2Frames stops after
construction and isn't runnable; update it to call project.repro() and assert
one observable outcome so it executes end-to-end. Specifically, after
constructing md = ips.Gmx2Frames(...), invoke project.repro() and then check a
result such as verifying md.frames (or another public attribute/method on
Gmx2Frames) has expected length/type (e.g., assert len(md.frames) > 0 or
isinstance(md.frames, list/ndarray)) so the example both runs and asserts
behavior.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 3211bbb4-e6f9-44fd-bc99-ce5c595a7702
📒 Files selected for processing (1)
ipsuite/data_loading/add_data_gromacs.py
| # 2. Use atom types (usually cleaner than names for CHARMM-GUI) | ||
| types = u.atoms.types | ||
| symbols = [] | ||
| for t in types: | ||
| t_upper = t.upper() | ||
| if t_upper in _TYPE_TO_ELEMENT: | ||
| symbols.append(_TYPE_TO_ELEMENT[t_upper]) | ||
| elif len(t) <= 2 and t[0].isalpha(): | ||
| # Capitalize properly: first letter upper, rest lower | ||
| symbols.append(t[0].upper() + t[1:].lower() if len(t) > 1 else t.upper()) | ||
| else: | ||
| # Last resort: take leading alphabetic characters from atom name | ||
| symbols.append(t[0].upper()) | ||
| return symbols |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
cat -n ipsuite/data_loading/add_data_gromacs.py | head -100Repository: zincware/IPSuite
Length of output: 3699
🏁 Script executed:
# Check the full _get_symbols function and surrounding context
rg -A 30 "def _get_symbols" ipsuite/data_loading/add_data_gromacs.pyRepository: zincware/IPSuite
Length of output: 1268
🏁 Script executed:
# Look for how _get_symbols is used in the file
rg "_get_symbols" ipsuite/data_loading/add_data_gromacs.py -B 2 -A 2Repository: zincware/IPSuite
Length of output: 335
🏁 Script executed:
# Check if there's any existing logic that uses u.atoms.names
rg "u\.atoms\.(names|types)" ipsuite/data_loading/add_data_gromacs.py -B 2 -A 2Repository: zincware/IPSuite
Length of output: 191
🏁 Script executed:
# Let's understand MDAnalysis element guessing behavior better
python << 'EOF'
# Check what MDAnalysis does with element guessing
import sys
# Try to get info about the default guesser
try:
from MDAnalysis.guesser import DEFAULT_GUESSER
print("DEFAULT_GUESSER available")
print(f"Type: {type(DEFAULT_GUESSER)}")
except ImportError:
print("DEFAULT_GUESSER not directly available")
# Check the implementation
try:
from MDAnalysis.guesser.default import DefaultGuesser
print("DefaultGuesser available")
# Check what the guesser does
import inspect
print(inspect.getsourcefile(DefaultGuesser))
except ImportError as e:
print(f"Could not import DefaultGuesser: {e}")
EOFRepository: zincware/IPSuite
Length of output: 160
🏁 Script executed:
# Let's search for documentation or usage of element guessing in the codebase
rg "guess.*element|element.*guess" ipsuite/ -iRepository: zincware/IPSuite
Length of output: 42
🏁 Script executed:
# Check if there are any tests or examples showing how _get_symbols should work
rg -l "_get_symbols|gmx_to_ase" ipsuite/ --type pyRepository: zincware/IPSuite
Length of output: 100
🏁 Script executed:
# Look at the broader context - is there any test file for this module?
fd "test.*gromacs|gromacs.*test" ipsuite/ -type fRepository: zincware/IPSuite
Length of output: 230
🏁 Script executed:
# Let's check what MDAnalysis version is being used and look at its documentation
rg "MDAnalysis" ipsuite/ setup.py pyproject.toml requirements.txt -A 1 -B 1Repository: zincware/IPSuite
Length of output: 2179
🏁 Script executed:
# Check if there are any comments or documentation about element inference in the file
rg "element|symbol" ipsuite/data_loading/add_data_gromacs.py -iRepository: zincware/IPSuite
Length of output: 647
🏁 Script executed:
# Let's check if there are other usages of this in the codebase to understand the impact
rg "gmx_to_ase|_get_symbols" ipsuite/ -lRepository: zincware/IPSuite
Length of output: 100
🏁 Script executed:
# Check the pyproject.toml or similar for MDAnalysis version info
find ipsuite -name "pyproject.toml" -o -name "setup.cfg" | head -5 | xargs cat 2>/dev/null | head -100Repository: zincware/IPSuite
Length of output: 42
🏁 Script executed:
# Look at what ASE Atoms expects for symbols
python << 'EOF'
# Check what happens when you create ASE atoms with invalid symbols
# This will help us understand the severity
test_cases = [
"Ow", # Invalid - OW type
"Ca", # Calcium
"C", # Carbon
"O", # Oxygen
]
# Check if these are valid element symbols in ASE
try:
from ase.data import atomic_numbers
for sym in test_cases:
try:
an = atomic_numbers[sym]
print(f"✓ {sym:3s} -> atomic number {an}")
except KeyError:
print(f"✗ {sym:3s} -> INVALID/UNKNOWN")
except ImportError:
print("ASE not available in sandbox, but we can reason about this")
print("Standard element symbols are 1-2 chars, capitalized properly")
for sym in test_cases:
# Standard chemistry rules
if sym == "Ow":
print(f"✗ {sym} -> INVALID (no such element)")
elif sym == "Ca":
print(f"✓ {sym} -> Calcium (valid but semantically wrong for Carbon Alpha)")
elif sym in ["C", "O"]:
print(f"✓ {sym} -> Valid")
EOFRepository: zincware/IPSuite
Length of output: 303
🏁 Script executed:
# Let's also verify what the comment says about MDAnalysis names vs types
# Look at the gmx.py file that might have similar logic
cat -n ipsuite/configuration_generation/gmx.py | grep -A 20 "def timestep_to_atoms"Repository: zincware/IPSuite
Length of output: 842
🏁 Script executed:
# Let's check what happens after symbols are used - does ASE validate them?
rg -A 5 "Atoms(symbols=symbols" ipsuite/data_loading/add_data_gromacs.pyRepository: zincware/IPSuite
Length of output: 137
🏁 Script executed:
# Let's understand the problem better by looking at actual GROMACS atom types
# Check if there are any test files or examples
find . -name "*.gro" -o -name "*.tpr" -o -name "*.xtc" 2>/dev/null | head -5Repository: zincware/IPSuite
Length of output: 42
🏁 Script executed:
# Search for any documentation on the expected behavior
rg "water|OW|CA" ipsuite/data_loading/ -iRepository: zincware/IPSuite
Length of output: 1415
🏁 Script executed:
# Let's check what the MDAnalysis documentation actually says about element guessing
web_search
MDAnalysis element guessing default guesser names typesRepository: zincware/IPSuite
Length of output: 159
🌐 Web query:
MDAnalysis element guessing default guesser documentation
💡 Result:
The default guesser for element guessing in MDAnalysis is DefaultGuesser from MDAnalysis.guesser.default_guesser. It is the generic, default guessing class used for most topologies, particularly following biological naming conventions (e.g., "CA" guessed as carbon, not calcium). Element guessing behavior: The guess_atom_element(atomname) method strips numbers and symbols from the atom name, looks it up in MDAnalysis.guesser.tables.atomelements (derived from CHARMM/AMBER tables), removes the last character if not found, or falls back to the first character of the name if still not found. Empty names return empty string. When guessing types, they represent the element using the same method. It is used automatically as the default context (context='default') when creating a Universe with to_guess=['types', 'elements'] or via u.guess_TopologyAttrs(to_guess=['elements']). Pre-2.8.0, standalone functions like guess_atom_element in MDAnalysis.topology.guessers were used (now deprecated). Example: import MDAnalysis as mda u = mda.Universe('topology.gro', context='default', to_guess=['elements']) # or u.guess_TopologyAttrs(to_guess=['elements']) Official documentation: https://docs.mdanalysis.org/stable/documentation_pages/guesser_modules/default_guesser.html and https://userguide.mdanalysis.org/stable/formats/guessers/default.html (MDAnalysis 2.10.0 as of latest docs).
Citations:
- 1: https://docs.mdanalysis.org/stable/documentation_pages/guesser_modules/default_guesser.html
- 2: https://userguide.mdanalysis.org/stable/formats/guessers/default.html
- 3: https://docs.mdanalysis.org/2.1.0/_modules/MDAnalysis/topology/guessers.html
- 4: https://userguide.mdanalysis.org/stable/guessing.html
🏁 Script executed:
# Let's look at the actual usage and see if invalid symbols cause problems
cat -n ipsuite/data_loading/add_data_gromacs.py | sed -n '120,160p'Repository: zincware/IPSuite
Length of output: 1823
🏁 Script executed:
# Check if there's validation happening elsewhere
rg "atomic_numbers\|invalid.*symbol\|element.*valid" ipsuite/data_loading/add_data_gromacs.pyRepository: zincware/IPSuite
Length of output: 42
🏁 Script executed:
# Let's examine what happens with "Ow" type - is it even a realistic scenario?
# Check if there are any CHARMM-GUI or common GROMACS force field references
rg "CHARMM|amber|oplsaa|gromos" ipsuite/ -i | head -10Repository: zincware/IPSuite
Length of output: 168
🏁 Script executed:
# Let's look at the exact lines where symbols are used to create Atoms
cat -n ipsuite/data_loading/add_data_gromacs.py | sed -n '136,155p'Repository: zincware/IPSuite
Length of output: 928
Use MDAnalysis's proper element guessing from atom names, not types.
The current approach using u.atoms.types creates semantic bugs:
OW(water oxygen in CHARMM-GUI) becomes invalidOwCA(carbon alpha) is hardcoded toCa(calcium) instead of being guessed as carbon
MDAnalysis's DefaultGuesser operates on atom names using domain-standard rules (stripping digits/symbols, lookups in CHARMM/AMBER tables). Replace the manual type-based logic with u.atoms.names and leverage MDAnalysis's built-in element inference, or call u.guess_TopologyAttrs(to_guess=['elements']) to populate u.atoms.elements properly.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@ipsuite/data_loading/add_data_gromacs.py` around lines 44 - 57, The code
currently derives element symbols from u.atoms.types which misclassifies
CHARMM/AMBER types; change the logic to use atom names and MDAnalysis's element
guessing instead: use u.atoms.names (or call
u.guess_TopologyAttrs(to_guess=['elements']) and read u.atoms.elements) and
build the symbols list from those guessed elements (or rely on
MDAnalysis.DefaultGuesser behavior) rather than transforming types; update the
block that references types, t, and symbols to use names/elements (keeping the
same return of symbols) and remove the manual type-based heuristics.
| def _match_edr_frame( | ||
| edr_times: np.ndarray, traj_time: float, tolerance: float = 0.1 | ||
| ) -> int: | ||
| """Find the EDR index closest to a trajectory time, warning on large gaps.""" | ||
| idx = int(np.argmin(np.abs(edr_times - traj_time))) | ||
| time_diff = abs(edr_times[idx] - traj_time) | ||
| if time_diff > tolerance: | ||
| logger.warning( | ||
| "EDR time %.3f ps does not match trajectory time %.3f ps (diff=%.3f ps)", | ||
| edr_times[idx], | ||
| traj_time, | ||
| time_diff, | ||
| ) | ||
| return idx |
There was a problem hiding this comment.
Avoid quadratic EDR matching and reject missed timestamps.
np.argmin(np.abs(edr_times - traj_time)) rescans the full EDR axis for every frame, so matching becomes O(n×m) on long runs. It also still returns a row after tolerance is exceeded, which means shifted or downsampled .edr files silently annotate frames with the wrong labels. If the time axes are monotonic, np.searchsorted or a moving pointer avoids the repeated scan and makes it easy to skip/raise on out-of-tolerance matches.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@ipsuite/data_loading/add_data_gromacs.py` around lines 60 - 73,
_match_edr_frame currently scans the whole edr_times array per frame which is
O(n×m) and silently returns an out-of-tolerance index; replace the np.argmin
approach with np.searchsorted on the monotonic edr_times (or use a moving
pointer maintained by the caller) to find the nearest candidate on the
left/right, compute the nearest index and time_diff, and if time_diff >
tolerance reject the match (raise a ValueError or return a sentinel like
None/-1) instead of returning a bad index; update callers to handle the
rejection and keep the function signature (_match_edr_frame(edr_times,
traj_time, tolerance)) and logging behavior.
| forces = ts.forces.copy() if ts.has_forces else None | ||
| energy = None | ||
| stress = None | ||
| extra_results = {} | ||
|
|
||
| if edr_data is not None: | ||
| idx = _match_edr_frame(edr_times, ts.time) | ||
| energy = float(edr_data["Potential"][idx]) * (kJ / mol) # convert to eV | ||
|
|
||
| # Build Voigt stress from pressure tensor if available | ||
| try: | ||
| pxx = edr_data["Pres-XX"][idx] | ||
| pyy = edr_data["Pres-YY"][idx] | ||
| pzz = edr_data["Pres-ZZ"][idx] | ||
| pyz = edr_data["Pres-YZ"][idx] | ||
| pxz = edr_data["Pres-XZ"][idx] | ||
| pxy = edr_data["Pres-XY"][idx] | ||
| # GROMACS pressure in bar -> store as-is (not ASE native eV/ų) | ||
| stress = np.array([pxx, pyy, pzz, pyz, pxz, pxy]) | ||
| except KeyError: | ||
| pass | ||
|
|
||
| # Store all EDR terms for this frame | ||
| for term in edr_terms: | ||
| extra_results[term] = float(edr_data[term][idx]) | ||
|
|
||
| if energy is not None or forces is not None: | ||
| calc = SinglePointCalculator( | ||
| atoms, | ||
| energy=energy, | ||
| forces=forces, | ||
| stress=stress, | ||
| ) |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
# First, let's look at the full file context
cd /tmp && find . -name "add_data_gromacs.py" -type f 2>/dev/null | head -5Repository: zincware/IPSuite
Length of output: 42
🏁 Script executed:
# Find the file in the repository
find . -name "add_data_gromacs.py" -type f 2>/dev/nullRepository: zincware/IPSuite
Length of output: 102
🏁 Script executed:
# Read the relevant file section to understand context
cat -n ipsuite/data_loading/add_data_gromacs.py | head -200Repository: zincware/IPSuite
Length of output: 7743
🏁 Script executed:
# Check imports to see how kJ/mol is defined
head -50 ipsuite/data_loading/add_data_gromacs.pyRepository: zincware/IPSuite
Length of output: 1203
🌐 Web query:
MDAnalysis force units trajectory trajectories
💡 Result:
In MDAnalysis, the base unit for forces in trajectories is kJ/(mol·Å). This applies when forces are present in the trajectory file format (e.g., via Universe.atoms.forces). MDAnalysis automatically converts forces to this unit upon reading trajectories, regardless of the original MD simulation format. Other units like kcal/(mol·Å) or N can be converted using MDAnalysis.units.convert.
Citations:
- 1: https://userguide.mdanalysis.org/stable/units.html
- 2: https://mdanalysisfork.readthedocs.io/en/latest/_modules/MDAnalysis/units.html
- 3: https://docs.mdanalysis.org/2.3.0/documentation_pages/units.html
- 4: https://userguide.mdanalysis.org/stable/trajectories/trajectories.html
- 5: https://docs.mdanalysis.org/2.9.0/documentation_pages/units.html
- 6: https://docs.mdanalysis.org/stable/documentation_pages/units.html
🌐 Web query:
ASE SinglePointCalculator force units eV Angstrom
💡 Result:
In the Atomic Simulation Environment (ASE), the SinglePointCalculator stores pre-computed results (energy, forces, stress, etc.) for a specific atomic configuration from an external calculation. It does not perform new calculations but returns the stored values via methods like get_forces. ASE uses consistent internal units: energies in eV, lengths in Angstroms (Å). Consequently, forces are in eV/Å. The SinglePointCalculator preserves whatever units the input data had when created (typically from I/O readers that convert external code outputs to ASE units). When reading results from formats like CASTEP .geom or ABINIT output, forces are explicitly converted to eV/Angstrom. For example: - CASTEP .geom reader: "Note that everything in the .geom file is in atomic units, which has been conversed to commonly used unit angstrom(length) and eV (energy)." - ABINIT reader: forces from 'cartesian_forces' converted using Hartree / Bohr = eV/Å. All ASE calculators, including those populating SinglePointCalculator, follow this convention (eV for energy, eV/Å for forces), as confirmed by documentation, tutorials (e.g., EMT calculator forces in eV/Å), and community sources. Thus, forces from ASE SinglePointCalculator are in eV/Å.
Citations:
- 1: https://mattermodeling.stackexchange.com/questions/11625/how-to-check-the-units-from-ase-get-forces-function
- 2: https://narupatools.readthedocs.io/en/stable/_autosummary/narupatools.ase.calculators.onebody_potential_calculator.OneBodyPotentialCalculator.html
- 3: https://ase-workshop-2023.github.io/tutorial/05-in-built-calculators/index.html
- 4: https://github.com/PaNOSC-ViNYL/ase/blob/master/ase/calculators/singlepoint.py
- 5: https://ase-lib.org/_modules/ase/io/abinit.html
- 6: https://ase-lib.org/ase/units.html
- 7: https://microsoft.github.io/skala/ase.html
- 8: https://wiki.fysik.dtu.dk/~askhl/ase-doc/ase/units.html
- 9: https://ase-lib.org/development/proposals/calculators.html
- 10: https://ase-lib.org/_modules/ase/units.html
Convert forces to eV/Å before creating the calculator.
Line 147 forwards ts.forces unchanged, but MDAnalysis uses kJ/(mol·Å) as its base force unit while ASE expects forces in eV/Å. That leaves every stored force off by the same kJ/mol → eV factor you're already applying to Potential on line 154.
Suggested fix
- forces = ts.forces.copy() if ts.has_forces else None
+ forces = ts.forces.copy() * (kJ / mol) if ts.has_forces else None🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@ipsuite/data_loading/add_data_gromacs.py` around lines 147 - 179, ts.forces
are left in MDAnalysis units (kJ/(mol·Å)) but ASE expects eV/Å; multiply the
copied forces by the same conversion factor used for energy (the (kJ / mol)
factor) before passing into SinglePointCalculator. Update the code that sets
forces (the ts.forces.copy() branch) to convert units (e.g., forces =
ts.forces.copy() * (kJ / mol) when ts.has_forces) so the forces variable passed
to SinglePointCalculator(atoms, energy=..., forces=forces, stress=...) is in
eV/Å.
Summary by CodeRabbit