use lazy znh5md.IO with file_factory for ASEMD frames property#449
use lazy znh5md.IO with file_factory for ASEMD frames property#449
znh5md.IO with file_factory for ASEMD frames property#449Conversation
Replace eager frame loading with a lazy `znh5md.IO` handle using a context-manager-based file factory. Write all frames to a single `md.h5` file instead of per-index files. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
📝 WalkthroughWalkthroughRefactors MD node frame handling to lazy, IO-based access: Changes
Sequence Diagram(s)sequenceDiagram
participant User
participant ASEMD
participant FileFactory
participant HDF5 as "md.h5 (HDF5)"
User->>ASEMD: request frames
ASEMD->>FileFactory: _make_file_factory() (context-managed)
FileFactory-->>ASEMD: znh5md.IO handle
ASEMD->>HDF5: on-demand read/write via znh5md.IO
HDF5-->>ASEMD: atoms/chunked frames
ASEMD->>User: provide frames handle (lazy access)
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@ipsuite/dynamics/md_nodes.py`:
- Around line 170-183: The change moved ASEMD.frames to return a znh5md.IO and
removed the old structures attribute used by downstream code (e.g., calls to
mapped_md.structures); restore a backward-compatible, deprecated shim: add a
public property or attribute named structures on the ASEMD class that internally
uses the new frames() IO to produce the same iterable/sequence that callers
expect (emit a DeprecationWarning when accessed and delegate to frames() to
build/return the structures in the original shape), and update the frames()
docstring to note the migration path.
- Line 260: The initialize_md() routine must remove any existing trajectory file
before the sim run to avoid io.extend() appending old frames; add logic in
initialize_md() to check for and delete self.frames_path / "md.h5" (or otherwise
remove that file) prior to entering the simulation loop so each run starts with
a fresh file. Ensure deletion handles missing-file quietly and raises/logs only
unexpected IO errors.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: eba8ac4f-5dbe-4c05-b4c8-366480eb7af8
📒 Files selected for processing (1)
ipsuite/dynamics/md_nodes.py
ipsuite/dynamics/md_nodes.py
Outdated
| def frames(self) -> znh5md.IO: | ||
| @contextlib.contextmanager | ||
| def _factory() -> t.Generator[h5py.File, None, None]: | ||
| if self.state.rev is None and self.state.remote is None: | ||
| with h5py.File(self.frames_path / "md.h5", "r") as file: | ||
| yield file | ||
| else: | ||
| with self.state.fs.open( | ||
| (self.frames_path / "md.h5").as_posix(), "rb" | ||
| ) as f: | ||
| with h5py.File(f) as file: | ||
| yield file | ||
|
|
||
| return znh5md.IO(file_factory=_factory) |
There was a problem hiding this comment.
Public API compatibility break needs a migration path.
This change shifts ASEMD.frames to znh5md.IO semantics and removes structures, while downstream usage still expects the old surface (tests/integration/calculators/test_ase_md.py:70-76 reads mapped_md.structures). Please keep a temporary compatibility shim (deprecated) or update all dependent callsites/tests in the same PR to avoid runtime breakage.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@ipsuite/dynamics/md_nodes.py` around lines 170 - 183, The change moved
ASEMD.frames to return a znh5md.IO and removed the old structures attribute used
by downstream code (e.g., calls to mapped_md.structures); restore a
backward-compatible, deprecated shim: add a public property or attribute named
structures on the ASEMD class that internally uses the new frames() IO to
produce the same iterable/sequence that callers expect (emit a
DeprecationWarning when accessed and delegate to frames() to build/return the
structures in the original shape), and update the frames() docstring to note the
migration path.
| ) | ||
| io = znh5md.IO( | ||
| self.frames_path / f"{idx}.h5", | ||
| self.frames_path / "md.h5", |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
rg -nP 'initialize_md\(|md\.h5|unlink\(|remove\(' ipsuite/dynamics/md_nodes.py -C3Repository: zincware/IPSuite
Length of output: 1748
🏁 Script executed:
cat -n ipsuite/dynamics/md_nodes.py | sed -n '250,280p'Repository: zincware/IPSuite
Length of output: 1415
🏁 Script executed:
wc -l ipsuite/dynamics/md_nodes.pyRepository: zincware/IPSuite
Length of output: 92
🏁 Script executed:
rg -A5 'znh5md\.IO\(' ipsuite/dynamics/md_nodes.pyRepository: zincware/IPSuite
Length of output: 587
🏁 Script executed:
fd -e py ipsuite/dynamics/ | xargs rg -l 'md\.h5|znh5md' | head -10Repository: zincware/IPSuite
Length of output: 759
🏁 Script executed:
cat -n ipsuite/dynamics/md_nodes.py | sed -n '255,275p'Repository: zincware/IPSuite
Length of output: 1026
🏁 Script executed:
rg 'znh5md\.IO' -B2 -A2 ipsuite/Repository: zincware/IPSuite
Length of output: 9974
🏁 Script executed:
fd . tests/ --full-path | xargs rg 'znh5md\.IO|md\.h5' -lRepository: zincware/IPSuite
Length of output: 127
🏁 Script executed:
rg 'import znh5md|from znh5md' ipsuite/dynamics/md_nodes.py -B2 -A2Repository: zincware/IPSuite
Length of output: 196
🏁 Script executed:
rg 'znh5md' pyproject.tomlRepository: zincware/IPSuite
Length of output: 80
🏁 Script executed:
rg 'create_file|\.extend\(|\.append\(' ipsuite/dynamics/md_nodes.py -B2 -A2Repository: zincware/IPSuite
Length of output: 729
🏁 Script executed:
cat -n ipsuite/dynamics/md_nodes.py | sed -n '259,310p'Repository: zincware/IPSuite
Length of output: 2603
🏁 Script executed:
rg 'io\.create_file\(\)' ipsuite/ -B3 -A3Repository: zincware/IPSuite
Length of output: 443
Delete md.h5 file in initialize_md() to prevent frame accumulation across reruns.
Without deleting the existing md.h5 file, calls to io.extend() will append frames from reruns to the previous trajectory, silently mixing old and new frames. Add file deletion before the simulation loop starts.
Proposed fix
def initialize_md(self):
self.model_outs.mkdir(parents=True, exist_ok=True)
self.frames_path.mkdir(parents=True, exist_ok=True)
+ md_file = self.frames_path / "md.h5"
+ if md_file.exists():
+ md_file.unlink()
self.laufband_path.parent.mkdir(parents=True, exist_ok=True)
(self.model_outs / "outs.txt").write_text("Lorem Ipsum")
self.rng = np.random.default_rng(self.seed)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@ipsuite/dynamics/md_nodes.py` at line 260, The initialize_md() routine must
remove any existing trajectory file before the sim run to avoid io.extend()
appending old frames; add logic in initialize_md() to check for and delete
self.frames_path / "md.h5" (or otherwise remove that file) prior to entering the
simulation loop so each run starts with a fresh file. Ensure deletion handles
missing-file quietly and raises/logs only unexpected IO errors.
There was a problem hiding this comment.
Actionable comments posted: 2
🧹 Nitpick comments (1)
ipsuite/dynamics/md_nodes.py (1)
189-192: Avoid quadratic flattening in multi-file frame aggregation.
sum(list_of_lists, [])is O(n²) and becomes expensive as file count grows. Useitertools.chain.from_iterable()for O(n) performance.♻️ Proposed fix
+from itertools import chain @@ - return sum( - [znh5md.IO(file_factory=self._make_file_factory(f))[:] for f in files], - [], - ) + return list( + chain.from_iterable( + znh5md.IO(file_factory=self._make_file_factory(f))[:] for f in files + ) + )🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@ipsuite/dynamics/md_nodes.py` around lines 189 - 192, The current aggregation uses sum([...], []) which does quadratic flattening; replace it with an O(n) flatten using itertools.chain.from_iterable: import itertools and return list(itertools.chain.from_iterable(znh5md.IO(file_factory=self._make_file_factory(f))[:] for f in files)); keep the same generator expression (znh5md.IO(file_factory=self._make_file_factory(f))[:] for f in files) and ensure the function/method in md_nodes.py that contains this return uses the new import.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@ipsuite/dynamics/md_nodes.py`:
- Around line 129-130: Update the docstring for the frames property in
md_nodes.py to reflect the actual return type (it returns either znh5md.IO or
list[ase.Atoms] depending on number of files) instead of the current single-type
declaration; locate the frames property/method and replace "frames : znh5md.IO"
with a union/explicit description like "frames : znh5md.IO | list[ase.Atoms]"
(or equivalent phrase) and apply the same correction to the other frames
docstring occurrence in the file referencing the same symbol so the docs match
runtime behavior.
- Around line 169-171: The return type of _make_file_factory is missing
parameters; update its signature to return a callable that produces a context
manager yielding an h5py.File (e.g. change def _make_file_factory(self,
file_path: str) -> t.Callable to def _make_file_factory(self, file_path: str) ->
t.Callable[[], t.ContextManager[h5py.File]]), and ensure the inner _factory
still has an appropriate annotation (e.g. -> t.Generator[h5py.File, None, None]
or use contextlib typing) so tools see the concrete context-manager return type.
---
Nitpick comments:
In `@ipsuite/dynamics/md_nodes.py`:
- Around line 189-192: The current aggregation uses sum([...], []) which does
quadratic flattening; replace it with an O(n) flatten using
itertools.chain.from_iterable: import itertools and return
list(itertools.chain.from_iterable(znh5md.IO(file_factory=self._make_file_factory(f))[:]
for f in files)); keep the same generator expression
(znh5md.IO(file_factory=self._make_file_factory(f))[:] for f in files) and
ensure the function/method in md_nodes.py that contains this return uses the new
import.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: c5256b5e-42c7-4396-b66d-f2a1bb27f1bc
📒 Files selected for processing (1)
ipsuite/dynamics/md_nodes.py
| frames : znh5md.IO | ||
| Property that returns a lazy IO handle for trajectory frames. |
There was a problem hiding this comment.
frames attribute docstrings no longer match actual return type.
The docs state frames : znh5md.IO, but implementation returns znh5md.IO | list[ase.Atoms] depending on file count.
Also applies to: 394-395
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@ipsuite/dynamics/md_nodes.py` around lines 129 - 130, Update the docstring
for the frames property in md_nodes.py to reflect the actual return type (it
returns either znh5md.IO or list[ase.Atoms] depending on number of files)
instead of the current single-type declaration; locate the frames
property/method and replace "frames : znh5md.IO" with a union/explicit
description like "frames : znh5md.IO | list[ase.Atoms]" (or equivalent phrase)
and apply the same correction to the other frames docstring occurrence in the
file referencing the same symbol so the docs match runtime behavior.
| def _make_file_factory(self, file_path: str) -> t.Callable: | ||
| @contextlib.contextmanager | ||
| def _factory() -> t.Generator[h5py.File, None, None]: |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# First, let's examine the file structure and the specific lines mentioned
head -n 200 ipsuite/dynamics/md_nodes.py | tail -n 40Repository: zincware/IPSuite
Length of output: 1698
🏁 Script executed:
#!/bin/bash
# Check the imports to verify h5py, contextlib, and typing usage
head -n 30 ipsuite/dynamics/md_nodes.pyRepository: zincware/IPSuite
Length of output: 676
🏁 Script executed:
#!/bin/bash
# Look specifically at lines around 169-180 to see the full context
sed -n '165,185p' ipsuite/dynamics/md_nodes.pyRepository: zincware/IPSuite
Length of output: 945
🏁 Script executed:
#!/bin/bash
# Check for Python version requirements
cat pyproject.toml | grep -A 5 "python"Repository: zincware/IPSuite
Length of output: 478
🏁 Script executed:
#!/bin/bash
# Check for other ContextManager or Callable patterns in the codebase
rg 'ContextManager|contextmanager' ipsuite/ --type py -B 2 -A 2 | head -50Repository: zincware/IPSuite
Length of output: 435
🏁 Script executed:
#!/bin/bash
# Check all underspecified Callable annotations
rg '->\s*t\.Callable(?!\[)' ipsuite/ --type pyRepository: zincware/IPSuite
Length of output: 464
🏁 Script executed:
#!/bin/bash
# Properly escape the regex pattern for finding underspecified Callable
rg 'Callable(?!\[)' ipsuite/ --type py | grep -v '\[' | grep 'Callable'Repository: zincware/IPSuite
Length of output: 285
🏁 Script executed:
#!/bin/bash
# Check how contextlib types are used elsewhere in the project
rg 'AbstractContextManager|ContextManager' ipsuite/ --type pyRepository: zincware/IPSuite
Length of output: 42
🏁 Script executed:
#!/bin/bash
# Check if contextlib is imported elsewhere and how context managers are typed
rg -B 5 'contextlib' ipsuite/dynamics/md_nodes.py | head -30Repository: zincware/IPSuite
Length of output: 387
🏁 Script executed:
#!/bin/bash
# Verify the typing of the decorated generator function
python3 << 'EOF'
import typing as t
import contextlib
import h5py
# Check what typing constructs are available in 3.11+
print("Available in typing module:")
print(f" ContextManager: {hasattr(t, 'ContextManager')}")
print(f" AbstractContextManager: {hasattr(t, 'AbstractContextManager')}")
# Check the correct annotation for a contextmanager-decorated function
from typing import get_type_hints
`@contextlib.contextmanager`
def example() -> t.Generator[str, None, None]:
yield "test"
# The actual type of the decorated function
print(f"\nDecorated function returns: {type(example)}")
print(f"Callable signature needed: Callable[[], {t.ContextManager if hasattr(t, 'ContextManager') else t.AbstractContextManager}[YieldType]]")
EOFRepository: zincware/IPSuite
Length of output: 176
🏁 Script executed:
#!/bin/bash
# Check what typing constructs are available in 3.11+
python3 << 'EOF'
import typing as t
import contextlib
# Check what typing constructs are available in 3.11+
print("Available in typing module:")
print(f" ContextManager: {hasattr(t, 'ContextManager')}")
print(f" AbstractContextManager: {hasattr(t, 'AbstractContextManager')}")
# The correct annotation for a contextmanager-decorated function
# A `@contextlib.contextmanager` decorated generator yields type T
# and returns a context manager that manages that type
print("\nCorrect annotation options:")
print(" Option 1: t.Callable[[], t.ContextManager[FileType]] (if available)")
print(" Option 2: t.Callable[[], t.AbstractContextManager[FileType]] (recommended)")
print(" Option 3: t.Callable[[], ContextManager[FileType]] (from contextlib)")
# Check which is the standard Python 3.11+ pattern
import sys
print(f"\nPython version: {sys.version}")
EOFRepository: zincware/IPSuite
Length of output: 459
🏁 Script executed:
#!/bin/bash
# Search for all Callable annotations in the file to find underspecified ones
rg 'Callable' ipsuite/dynamics/md_nodes.pyRepository: zincware/IPSuite
Length of output: 123
🏁 Script executed:
#!/bin/bash
# Check broader patterns for context manager typing in the codebase
rg 'Callable.*\[' ipsuite/ --type py | head -20Repository: zincware/IPSuite
Length of output: 42
Parameterize _make_file_factory return type annotation.
The return type -> t.Callable lacks type parameters, which violates the requirement to use full type annotations. The decorated _factory function should be annotated to indicate it returns a callable that produces an h5py.File context manager.
Fix
- def _make_file_factory(self, file_path: str) -> t.Callable:
+ def _make_file_factory(
+ self, file_path: str
+ ) -> t.Callable[[], t.ContextManager[h5py.File]]:🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@ipsuite/dynamics/md_nodes.py` around lines 169 - 171, The return type of
_make_file_factory is missing parameters; update its signature to return a
callable that produces a context manager yielding an h5py.File (e.g. change def
_make_file_factory(self, file_path: str) -> t.Callable to def
_make_file_factory(self, file_path: str) -> t.Callable[[],
t.ContextManager[h5py.File]]), and ensure the inner _factory still has an
appropriate annotation (e.g. -> t.Generator[h5py.File, None, None] or use
contextlib typing) so tools see the concrete context-manager return type.
Replace eager frame loading with a lazy
znh5md.IOhandle using a context-manager-based file factory. Write all frames to a singlemd.h5file instead of per-index files.Summary by CodeRabbit