FIX: Handle a non-linear FrameIterator in HydrogenBondAnalysis #5202

BradyAJohnston · 2026-01-06T14:19:29Z

Fixes #5200

Changes made in this Pull Request:

Refactor count_by_time() to handle returning values on a non-linear FrameIterator

LLM / AI generated code disclosure

LLMs or other AI-powered tools (beyond simple IDE use cases) were used in this contribution: no

PR Checklist

Issue raised/referenced?
Tests updated/added?
Documentation updated/added?
package/CHANGELOG file updated?
Is your name in package/AUTHORS? (If it is not, add it!)
LLM/AI disclosure was updated.

Developers Certificate of Origin

I certify that I can submit this code contribution as described in the Developer Certificate of Origin, under the MDAnalysis LICENSE.

📚 Documentation preview 📚: https://mdanalysis--5202.org.readthedocs.build/en/5202/

BradyAJohnston · 2026-01-06T14:32:44Z

Oops it seems like my auto-formatter went a bit wild - despite still passing Black. Will clean up.

codecov · 2026-01-06T15:02:28Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 92.72%. Comparing base (528b512) to head (5900d3e).
⚠️ Report is 1 commits behind head on develop.

Additional details and impacted files

@@           Coverage Diff            @@
##           develop    #5202   +/-   ##
========================================
  Coverage    92.72%   92.72%           
========================================
  Files          180      180           
  Lines        22475    22475           
  Branches      3190     3190           
========================================
+ Hits         20840    20841    +1     
  Misses        1177     1177           
+ Partials       458      457    -1

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

orbeckst

Looks good!

Optionally: Look at the performance, perhaps there's a faster way to do the lookup.

package/CHANGELOG

orbeckst · 2026-01-06T21:43:07Z

package/MDAnalysis/analysis/hydrogenbonds/hbond_analysis.py

+            count_lookup = dict(zip(indices, tmp_counts))
+            return np.array([count_lookup.get(i, 0) for i in range(len(self.frames))])


Looking up each frame looks slow. Perhaps there's some numpy magic (take???) ?

The only really faster approach I could figure out would be this:

if self.start is None: counts = np.zeros(len(self.frames), dtype=int) positions = np.searchsorted(self.frames, indices) counts[positions] = tmp_counts return counts

But this assumes the self.frames to be sorted. Would this always be the case, given the FrameIterator could be a non-sorted sequence of frames?

I am not sure if self.frames is sorted, possibly not when using run(frames=[2, 3, 0, 7, 6]). Maybe do a quick test?

Perhaps one could sort frames and rearrange counts in the same way and then un-sort everything again before returning?

Okay I just revisited it and the whole function can actually be simplified which I just pushed.

def count_by_time(self): """Counts the number of hydrogen bonds per timestep. Returns ------- counts : numpy.ndarray Contains the total number of hydrogen bonds found at each timestep. Can be used along with :attr:`HydrogenBondAnalysis.times` to plot the number of hydrogen bonds over time. """ hbond_frames = self.results.hbonds[:, 0].astype(int) frame_unique, frame_counts = np.unique(hbond_frames, return_counts=True) counts = np.zeros(max(self.frames) + 1, dtype=int) counts[frame_unique] = frame_counts return counts[self.frames]

Alternatively could ensure we only make a small-as-necessary results array:

def count_by_time(self): """Counts the number of hydrogen bonds per timestep. Returns ------- counts : numpy.ndarray Contains the total number of hydrogen bonds found at each timestep. Can be used along with :attr:`HydrogenBondAnalysis.times` to plot the number of hydrogen bonds over time. """ hbond_frames = self.results.hbonds[:, 0].astype(int) frame_unique, frame_counts = np.unique(hbond_frames, return_counts=True) frame_min = min(self.frames) frame_max = max(self.frames) counts = np.zeros(frame_max - frame_min + 1, dtype=int) counts[frame_unique - frame_min] = frame_counts return counts[self.frames - frame_min]

ended up updating it for the second approach using min and max

The last one looks like a good idea. It looks to me that you can do things like

plot(frames, counts)

or

zip(frames, counts)

because there's a one-to-one correspondence between the two arrays, right?

Yep that is the case. I believe it was already mostly the case for a linear range but this ensures it is the same for all frame combinations.

Might be worth looking to see if other analysis / results classes need a similar tweak

…ysis into fix-hbond-iterator

orbeckst

Very clean, ❤️ it!

orbeckst · 2026-01-08T15:45:47Z

package/MDAnalysis/analysis/hydrogenbonds/hbond_analysis.py

        """
+        hbond_frames = self.results.hbonds[:, 0].astype(int)
+        frame_unique, frame_counts = np.unique(hbond_frames, return_counts=True)
+        frame_min, frame_max = self.frames.min(), self.frames.max()


good, you used np min/max — they are much faster than applying python min/max to arrays

fix hbond iterator

84e63d2

BradyAJohnston force-pushed the fix-hbond-iterator branch from 695a668 to 84e63d2 Compare January 6, 2026 14:39

BradyAJohnston added 2 commits January 6, 2026 14:40

fix changelog

a40ec44

improve lookup dict creation

9bee140

orbeckst approved these changes Jan 6, 2026

View reviewed changes

Update package/CHANGELOG

ae5afa7

orbeckst self-assigned this Jan 6, 2026

BradyAJohnston and others added 4 commits January 8, 2026 10:59

refactor for simplicity

eab215a

Merge branch 'fix-hbond-iterator' of github.com:BradyAJohnston/mdanal…

5329c98

…ysis into fix-hbond-iterator

Merge branch 'develop' into fix-hbond-iterator

57a07cf

only create counts array of necessary size

5900d3e

orbeckst approved these changes Jan 8, 2026

View reviewed changes

orbeckst merged commit 9cbc8ea into MDAnalysis:develop Jan 8, 2026
24 checks passed

		count_lookup = dict(zip(indices, tmp_counts))
		return np.array([count_lookup.get(i, 0) for i in range(len(self.frames))])

FIX: Handle a non-linear FrameIterator in HydrogenBondAnalysis #5202

FIX: Handle a non-linear FrameIterator in HydrogenBondAnalysis #5202

Conversation

BradyAJohnston commented Jan 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

LLM / AI generated code disclosure

PR Checklist

Developers Certificate of Origin

Uh oh!

BradyAJohnston commented Jan 6, 2026

Uh oh!

codecov bot commented Jan 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

orbeckst left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

orbeckst left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

BradyAJohnston commented Jan 6, 2026 •

edited

Loading

codecov bot commented Jan 6, 2026 •

edited

Loading