Skip to content

use lazy znh5md.IO with file_factory for ASEMD frames property#449

Open
PythonFZ wants to merge 3 commits intomainfrom
feat/lazy-frames-io
Open

use lazy znh5md.IO with file_factory for ASEMD frames property#449
PythonFZ wants to merge 3 commits intomainfrom
feat/lazy-frames-io

Conversation

@PythonFZ
Copy link
Copy Markdown
Member

@PythonFZ PythonFZ commented Mar 5, 2026

Replace eager frame loading with a lazy znh5md.IO handle using a context-manager-based file factory. Write all frames to a single md.h5 file instead of per-index files.

Summary by CodeRabbit

  • Breaking Changes
    • The previous frames property is replaced by a method that returns a lazy IO handle for on-demand frame loading.
    • The structures property has been removed from MD node classes.
    • Frame output is consolidated into a single centralized container rather than multiple per-index files, changing how simulation frames are accessed and stored.

Replace eager frame loading with a lazy `znh5md.IO` handle using a
context-manager-based file factory. Write all frames to a single
`md.h5` file instead of per-index files.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@PythonFZ PythonFZ requested a review from M-R-Schaefer March 5, 2026 14:16
@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Mar 5, 2026

📝 Walkthrough

Walkthrough

Refactors MD node frame handling to lazy, IO-based access: frames now returns a znh5md.IO handle (or list) via a context-managed file factory; per-index .h5 outputs replaced by a centralized md.h5; structures property removed from both ASEMD and ASEMDSafeSampling. (49 words)

Changes

Cohort / File(s) Summary
MD node IO & frame handling
ipsuite/dynamics/md_nodes.py
Replaced eager per-file frame loading with a context-managed, lazy znh5md.IO factory; frames changed from a property returning list[ase.Atoms] to a method returning `znh5md.IO

Sequence Diagram(s)

sequenceDiagram
    participant User
    participant ASEMD
    participant FileFactory
    participant HDF5 as "md.h5 (HDF5)"

    User->>ASEMD: request frames
    ASEMD->>FileFactory: _make_file_factory() (context-managed)
    FileFactory-->>ASEMD: znh5md.IO handle
    ASEMD->>HDF5: on-demand read/write via znh5md.IO
    HDF5-->>ASEMD: atoms/chunked frames
    ASEMD->>User: provide frames handle (lazy access)
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

Poem

🐰 I nibble bytes and hop on keys,
Frames no longer choke the trees,
A single md.h5 keeps the stash,
Context managers make a dash,
Lazy loads — a carrot-cache with ease 🥕

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately describes the main change: replacing eager frame loading with a lazy znh5md.IO handle and file_factory mechanism for ASEMD frames property.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
  • 📝 Generate docstrings (stacked PR)
  • 📝 Generate docstrings (commit on current branch)
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch feat/lazy-frames-io

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@ipsuite/dynamics/md_nodes.py`:
- Around line 170-183: The change moved ASEMD.frames to return a znh5md.IO and
removed the old structures attribute used by downstream code (e.g., calls to
mapped_md.structures); restore a backward-compatible, deprecated shim: add a
public property or attribute named structures on the ASEMD class that internally
uses the new frames() IO to produce the same iterable/sequence that callers
expect (emit a DeprecationWarning when accessed and delegate to frames() to
build/return the structures in the original shape), and update the frames()
docstring to note the migration path.
- Line 260: The initialize_md() routine must remove any existing trajectory file
before the sim run to avoid io.extend() appending old frames; add logic in
initialize_md() to check for and delete self.frames_path / "md.h5" (or otherwise
remove that file) prior to entering the simulation loop so each run starts with
a fresh file. Ensure deletion handles missing-file quietly and raises/logs only
unexpected IO errors.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: eba8ac4f-5dbe-4c05-b4c8-366480eb7af8

📥 Commits

Reviewing files that changed from the base of the PR and between 02831c6 and 40c0d20.

📒 Files selected for processing (1)
  • ipsuite/dynamics/md_nodes.py

Comment on lines +170 to +183
def frames(self) -> znh5md.IO:
@contextlib.contextmanager
def _factory() -> t.Generator[h5py.File, None, None]:
if self.state.rev is None and self.state.remote is None:
with h5py.File(self.frames_path / "md.h5", "r") as file:
yield file
else:
with self.state.fs.open(
(self.frames_path / "md.h5").as_posix(), "rb"
) as f:
with h5py.File(f) as file:
yield file

return znh5md.IO(file_factory=_factory)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Public API compatibility break needs a migration path.

This change shifts ASEMD.frames to znh5md.IO semantics and removes structures, while downstream usage still expects the old surface (tests/integration/calculators/test_ase_md.py:70-76 reads mapped_md.structures). Please keep a temporary compatibility shim (deprecated) or update all dependent callsites/tests in the same PR to avoid runtime breakage.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@ipsuite/dynamics/md_nodes.py` around lines 170 - 183, The change moved
ASEMD.frames to return a znh5md.IO and removed the old structures attribute used
by downstream code (e.g., calls to mapped_md.structures); restore a
backward-compatible, deprecated shim: add a public property or attribute named
structures on the ASEMD class that internally uses the new frames() IO to
produce the same iterable/sequence that callers expect (emit a
DeprecationWarning when accessed and delegate to frames() to build/return the
structures in the original shape), and update the frames() docstring to note the
migration path.

)
io = znh5md.IO(
self.frames_path / f"{idx}.h5",
self.frames_path / "md.h5",
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

rg -nP 'initialize_md\(|md\.h5|unlink\(|remove\(' ipsuite/dynamics/md_nodes.py -C3

Repository: zincware/IPSuite

Length of output: 1748


🏁 Script executed:

cat -n ipsuite/dynamics/md_nodes.py | sed -n '250,280p'

Repository: zincware/IPSuite

Length of output: 1415


🏁 Script executed:

wc -l ipsuite/dynamics/md_nodes.py

Repository: zincware/IPSuite

Length of output: 92


🏁 Script executed:

rg -A5 'znh5md\.IO\(' ipsuite/dynamics/md_nodes.py

Repository: zincware/IPSuite

Length of output: 587


🏁 Script executed:

fd -e py ipsuite/dynamics/ | xargs rg -l 'md\.h5|znh5md' | head -10

Repository: zincware/IPSuite

Length of output: 759


🏁 Script executed:

cat -n ipsuite/dynamics/md_nodes.py | sed -n '255,275p'

Repository: zincware/IPSuite

Length of output: 1026


🏁 Script executed:

rg 'znh5md\.IO' -B2 -A2 ipsuite/

Repository: zincware/IPSuite

Length of output: 9974


🏁 Script executed:

fd . tests/ --full-path | xargs rg 'znh5md\.IO|md\.h5' -l

Repository: zincware/IPSuite

Length of output: 127


🏁 Script executed:

rg 'import znh5md|from znh5md' ipsuite/dynamics/md_nodes.py -B2 -A2

Repository: zincware/IPSuite

Length of output: 196


🏁 Script executed:

rg 'znh5md' pyproject.toml

Repository: zincware/IPSuite

Length of output: 80


🏁 Script executed:

rg 'create_file|\.extend\(|\.append\(' ipsuite/dynamics/md_nodes.py -B2 -A2

Repository: zincware/IPSuite

Length of output: 729


🏁 Script executed:

cat -n ipsuite/dynamics/md_nodes.py | sed -n '259,310p'

Repository: zincware/IPSuite

Length of output: 2603


🏁 Script executed:

rg 'io\.create_file\(\)' ipsuite/ -B3 -A3

Repository: zincware/IPSuite

Length of output: 443


Delete md.h5 file in initialize_md() to prevent frame accumulation across reruns.

Without deleting the existing md.h5 file, calls to io.extend() will append frames from reruns to the previous trajectory, silently mixing old and new frames. Add file deletion before the simulation loop starts.

Proposed fix
 def initialize_md(self):
     self.model_outs.mkdir(parents=True, exist_ok=True)
     self.frames_path.mkdir(parents=True, exist_ok=True)
+    md_file = self.frames_path / "md.h5"
+    if md_file.exists():
+        md_file.unlink()
     self.laufband_path.parent.mkdir(parents=True, exist_ok=True)
     (self.model_outs / "outs.txt").write_text("Lorem Ipsum")
     self.rng = np.random.default_rng(self.seed)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@ipsuite/dynamics/md_nodes.py` at line 260, The initialize_md() routine must
remove any existing trajectory file before the sim run to avoid io.extend()
appending old frames; add logic in initialize_md() to check for and delete
self.frames_path / "md.h5" (or otherwise remove that file) prior to entering the
simulation loop so each run starts with a fresh file. Ensure deletion handles
missing-file quietly and raises/logs only unexpected IO errors.

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (1)
ipsuite/dynamics/md_nodes.py (1)

189-192: Avoid quadratic flattening in multi-file frame aggregation.

sum(list_of_lists, []) is O(n²) and becomes expensive as file count grows. Use itertools.chain.from_iterable() for O(n) performance.

♻️ Proposed fix
+from itertools import chain
@@
-        return sum(
-            [znh5md.IO(file_factory=self._make_file_factory(f))[:] for f in files],
-            [],
-        )
+        return list(
+            chain.from_iterable(
+                znh5md.IO(file_factory=self._make_file_factory(f))[:] for f in files
+            )
+        )
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@ipsuite/dynamics/md_nodes.py` around lines 189 - 192, The current aggregation
uses sum([...], []) which does quadratic flattening; replace it with an O(n)
flatten using itertools.chain.from_iterable: import itertools and return
list(itertools.chain.from_iterable(znh5md.IO(file_factory=self._make_file_factory(f))[:]
for f in files)); keep the same generator expression
(znh5md.IO(file_factory=self._make_file_factory(f))[:] for f in files) and
ensure the function/method in md_nodes.py that contains this return uses the new
import.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@ipsuite/dynamics/md_nodes.py`:
- Around line 129-130: Update the docstring for the frames property in
md_nodes.py to reflect the actual return type (it returns either znh5md.IO or
list[ase.Atoms] depending on number of files) instead of the current single-type
declaration; locate the frames property/method and replace "frames : znh5md.IO"
with a union/explicit description like "frames : znh5md.IO | list[ase.Atoms]"
(or equivalent phrase) and apply the same correction to the other frames
docstring occurrence in the file referencing the same symbol so the docs match
runtime behavior.
- Around line 169-171: The return type of _make_file_factory is missing
parameters; update its signature to return a callable that produces a context
manager yielding an h5py.File (e.g. change def _make_file_factory(self,
file_path: str) -> t.Callable to def _make_file_factory(self, file_path: str) ->
t.Callable[[], t.ContextManager[h5py.File]]), and ensure the inner _factory
still has an appropriate annotation (e.g. -> t.Generator[h5py.File, None, None]
or use contextlib typing) so tools see the concrete context-manager return type.

---

Nitpick comments:
In `@ipsuite/dynamics/md_nodes.py`:
- Around line 189-192: The current aggregation uses sum([...], []) which does
quadratic flattening; replace it with an O(n) flatten using
itertools.chain.from_iterable: import itertools and return
list(itertools.chain.from_iterable(znh5md.IO(file_factory=self._make_file_factory(f))[:]
for f in files)); keep the same generator expression
(znh5md.IO(file_factory=self._make_file_factory(f))[:] for f in files) and
ensure the function/method in md_nodes.py that contains this return uses the new
import.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: c5256b5e-42c7-4396-b66d-f2a1bb27f1bc

📥 Commits

Reviewing files that changed from the base of the PR and between 40c0d20 and 1f5f64b.

📒 Files selected for processing (1)
  • ipsuite/dynamics/md_nodes.py

Comment on lines +129 to +130
frames : znh5md.IO
Property that returns a lazy IO handle for trajectory frames.
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

frames attribute docstrings no longer match actual return type.

The docs state frames : znh5md.IO, but implementation returns znh5md.IO | list[ase.Atoms] depending on file count.

Also applies to: 394-395

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@ipsuite/dynamics/md_nodes.py` around lines 129 - 130, Update the docstring
for the frames property in md_nodes.py to reflect the actual return type (it
returns either znh5md.IO or list[ase.Atoms] depending on number of files)
instead of the current single-type declaration; locate the frames
property/method and replace "frames : znh5md.IO" with a union/explicit
description like "frames : znh5md.IO | list[ase.Atoms]" (or equivalent phrase)
and apply the same correction to the other frames docstring occurrence in the
file referencing the same symbol so the docs match runtime behavior.

Comment on lines +169 to +171
def _make_file_factory(self, file_path: str) -> t.Callable:
@contextlib.contextmanager
def _factory() -> t.Generator[h5py.File, None, None]:
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# First, let's examine the file structure and the specific lines mentioned
head -n 200 ipsuite/dynamics/md_nodes.py | tail -n 40

Repository: zincware/IPSuite

Length of output: 1698


🏁 Script executed:

#!/bin/bash
# Check the imports to verify h5py, contextlib, and typing usage
head -n 30 ipsuite/dynamics/md_nodes.py

Repository: zincware/IPSuite

Length of output: 676


🏁 Script executed:

#!/bin/bash
# Look specifically at lines around 169-180 to see the full context
sed -n '165,185p' ipsuite/dynamics/md_nodes.py

Repository: zincware/IPSuite

Length of output: 945


🏁 Script executed:

#!/bin/bash
# Check for Python version requirements
cat pyproject.toml | grep -A 5 "python"

Repository: zincware/IPSuite

Length of output: 478


🏁 Script executed:

#!/bin/bash
# Check for other ContextManager or Callable patterns in the codebase
rg 'ContextManager|contextmanager' ipsuite/ --type py -B 2 -A 2 | head -50

Repository: zincware/IPSuite

Length of output: 435


🏁 Script executed:

#!/bin/bash
# Check all underspecified Callable annotations
rg '->\s*t\.Callable(?!\[)' ipsuite/ --type py

Repository: zincware/IPSuite

Length of output: 464


🏁 Script executed:

#!/bin/bash
# Properly escape the regex pattern for finding underspecified Callable
rg 'Callable(?!\[)' ipsuite/ --type py | grep -v '\[' | grep 'Callable'

Repository: zincware/IPSuite

Length of output: 285


🏁 Script executed:

#!/bin/bash
# Check how contextlib types are used elsewhere in the project
rg 'AbstractContextManager|ContextManager' ipsuite/ --type py

Repository: zincware/IPSuite

Length of output: 42


🏁 Script executed:

#!/bin/bash
# Check if contextlib is imported elsewhere and how context managers are typed
rg -B 5 'contextlib' ipsuite/dynamics/md_nodes.py | head -30

Repository: zincware/IPSuite

Length of output: 387


🏁 Script executed:

#!/bin/bash
# Verify the typing of the decorated generator function
python3 << 'EOF'
import typing as t
import contextlib
import h5py

# Check what typing constructs are available in 3.11+
print("Available in typing module:")
print(f"  ContextManager: {hasattr(t, 'ContextManager')}")
print(f"  AbstractContextManager: {hasattr(t, 'AbstractContextManager')}")

# Check the correct annotation for a contextmanager-decorated function
from typing import get_type_hints

`@contextlib.contextmanager`
def example() -> t.Generator[str, None, None]:
    yield "test"

# The actual type of the decorated function
print(f"\nDecorated function returns: {type(example)}")
print(f"Callable signature needed: Callable[[], {t.ContextManager if hasattr(t, 'ContextManager') else t.AbstractContextManager}[YieldType]]")
EOF

Repository: zincware/IPSuite

Length of output: 176


🏁 Script executed:

#!/bin/bash
# Check what typing constructs are available in 3.11+
python3 << 'EOF'
import typing as t
import contextlib

# Check what typing constructs are available in 3.11+
print("Available in typing module:")
print(f"  ContextManager: {hasattr(t, 'ContextManager')}")
print(f"  AbstractContextManager: {hasattr(t, 'AbstractContextManager')}")

# The correct annotation for a contextmanager-decorated function
# A `@contextlib.contextmanager` decorated generator yields type T
# and returns a context manager that manages that type
print("\nCorrect annotation options:")
print("  Option 1: t.Callable[[], t.ContextManager[FileType]]  (if available)")
print("  Option 2: t.Callable[[], t.AbstractContextManager[FileType]]  (recommended)")
print("  Option 3: t.Callable[[], ContextManager[FileType]]  (from contextlib)")

# Check which is the standard Python 3.11+ pattern
import sys
print(f"\nPython version: {sys.version}")
EOF

Repository: zincware/IPSuite

Length of output: 459


🏁 Script executed:

#!/bin/bash
# Search for all Callable annotations in the file to find underspecified ones
rg 'Callable' ipsuite/dynamics/md_nodes.py

Repository: zincware/IPSuite

Length of output: 123


🏁 Script executed:

#!/bin/bash
# Check broader patterns for context manager typing in the codebase
rg 'Callable.*\[' ipsuite/ --type py | head -20

Repository: zincware/IPSuite

Length of output: 42


Parameterize _make_file_factory return type annotation.

The return type -> t.Callable lacks type parameters, which violates the requirement to use full type annotations. The decorated _factory function should be annotated to indicate it returns a callable that produces an h5py.File context manager.

Fix
-    def _make_file_factory(self, file_path: str) -> t.Callable:
+    def _make_file_factory(
+        self, file_path: str
+    ) -> t.Callable[[], t.ContextManager[h5py.File]]:
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@ipsuite/dynamics/md_nodes.py` around lines 169 - 171, The return type of
_make_file_factory is missing parameters; update its signature to return a
callable that produces a context manager yielding an h5py.File (e.g. change def
_make_file_factory(self, file_path: str) -> t.Callable to def
_make_file_factory(self, file_path: str) -> t.Callable[[],
t.ContextManager[h5py.File]]), and ensure the inner _factory still has an
appropriate annotation (e.g. -> t.Generator[h5py.File, None, None] or use
contextlib typing) so tools see the concrete context-manager return type.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant