Add `AlchemicalArchive` by ianmkenney · Pull Request #687 · OpenFreeEnergy/gufe

ianmkenney · 2025-12-03T14:53:33Z

This PR introduced the AlchemicalArchive for serializing an AlchemicalNetwork along with it's transformation results. Closes #323.

codecov · 2025-12-03T14:55:29Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 98.82%. Comparing base (25c818a) to head (fe1033c).
⚠️ Report is 1 commits behind head on main.

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #687      +/-   ##
==========================================
+ Coverage   98.79%   98.82%   +0.02%     
==========================================
  Files          40       41       +1     
  Lines        2498     2555      +57     
==========================================
+ Hits         2468     2525      +57     
  Misses         30       30

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

On real data consisting of 264 ProtocolDAGResults, the dataclass implementation was not scalable due to the lack of proper deduplication. Serialized, the archive was 222 MiB (117 MiB when zst compressed) and took nearly 2 minutes to produce. Using a GufeTokenizable approach, this was reduced to 5 MiB (1 MiB compressed) while only taking seconds to produce.

ianmkenney · 2026-01-20T16:42:36Z

Commit 1772aa6 replaces the use of dataclasses with subclassing GufeTokenizable. While the simplicity from dataclasses is appealing, the performance benefits from using the GufeTokenizable subclass leaves little room for debate.

Implementation	compression algorithm	size (MB)	to_json (ms)	from_json, string (ms)
dataclass	None	222	113000	~8500
dataclass	zstandard	117	--	--
Tokenizable	None	5	0.0014	0.0013

If anyone has thoughts on this, please share!

jthorton · 2026-01-21T09:37:46Z

Great job @ianmkenney. Is there any substantial benefit to adding compression to the Tokenizable implementation as well? If these objects are intended for long term storage, minimising the footprint at the cost of inspectability might be okay if there is a large difference. Or what about a msgpack option?

ianmkenney · 2026-01-21T14:43:47Z

@jthorton at least for the network I've tested, compressing with zstandard reduced the size to about 1 MB. MessagePack would probably be a good option. From what I see currently implemented, compression needs to be done manually. We would want to add a compress keyword arg in to_msgpack for ease of use.

edit: I think as protocols start producing and collecting more artifacts, compression will be much more valuable and should probably be the default.

ianmkenney · 2026-01-21T15:51:16Z

A quick test using MessagePack, with the new compression kwarg.

from gufe.archival import AlchemicalArchive
from gufe.compression import zst_compress, zst_decompress

from sys import getsizeof

archive = AlchemicalArchive.from_json(file="archive.json")

payload = archive.to_msgpack(compress=False)
print("msgpack, uncompressed (bytes):", getsizeof(payload))

payload = archive.to_msgpack(compress=True)
print("msgpack, compressed (bytes):", getsizeof(payload))

payload = archive.to_json()
print("JSON, uncompressed (bytes):", getsizeof(payload))
print("JSON, compressed (bytes):", getsizeof(zst_compress(payload.encode())))

msgpack, uncompressed (bytes): 2494841
msgpack, compressed (bytes): 762852
JSON, uncompressed (bytes): 5544757
JSON, compressed (bytes): 1032531

ianmkenney · 2026-02-02T18:44:21Z

pre-commit.ci autofix

for more information, see https://pre-commit.ci

This reverts commit 6c2f7ee.

This reflects the previously reverted commit but changes execution order of the tests.

github-actions · 2026-02-06T17:32:39Z

No API break detected ✅

* Add untested implementation of AlchemicalArchive * Address ruff check issues * Add docstrings to from_json and to_json * Add tests for AlchemicalArchive * Add archival module to API autosummary * Add news entry * Include fake ProtocolDAGResults in test archive * Fix error in Archive fixture * Implement md5sum and deterministic ProtocolDAGResult ordering * Use lists instead of tuples * Share tokenization map with AlchemicalNetwork and ProtocolDAGResults * Use GufeTokenizable approach over dataclass On real data consisting of 264 ProtocolDAGResults, the dataclass implementation was not scalable due to the lack of proper deduplication. Serialized, the archive was 222 MiB (117 MiB when zst compressed) and took nearly 2 minutes to produce. Using a GufeTokenizable approach, this was reduced to 5 MiB (1 MiB compressed) while only taking seconds to produce. * Update news entry * Allow zstandard compression of msgpack bytes * Fix errors in TestArchival * Test MessagePack roundtrip with and without compression * Check that transformation keys correspond to network edges * Remove use of dictionaries for storing transformation results * Simplify transformation_results validation process * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Removed typo * Test ValueError raised on duplicate Transformation * Test ordering of input transformation_results * Add docstrings * Add regression test for deserializing an AlchemicalArchive * Revert "Add regression test for deserializing an AlchemicalArchive" This reverts commit 6c2f7ee. * Add regression test for deserializing an AlchemicalArchive This reflects the previously reverted commit but changes execution order of the tests. * don't mutate the fixture * add immutability test * Allow user to skip specifying metadata * Issue warning only if difference in major or minor versions * Test conditional issue of warning by semver differences --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Alyssa Travitz <alyssa.travitz@omsf.io>

ianmkenney force-pushed the feat/AlchemicalArchive branch from d3f33c9 to fe155bf Compare December 3, 2025 14:55

Add untested implementation of AlchemicalArchive

3b00b6f

ianmkenney force-pushed the feat/AlchemicalArchive branch from fe155bf to 3b00b6f Compare December 3, 2025 15:33

ianmkenney added 6 commits December 3, 2025 10:45

Address ruff check issues

1ab8b56

Add docstrings to from_json and to_json

a9469a8

Add tests for AlchemicalArchive

a254d83

Add archival module to API autosummary

500af5b

Add news entry

34fac2a

Include fake ProtocolDAGResults in test archive

91c4a02

ianmkenney requested a review from atravitz December 8, 2025 17:10

ianmkenney added 6 commits January 7, 2026 10:53

Fix error in Archive fixture

6a7c508

Implement md5sum and deterministic ProtocolDAGResult ordering

e7b7aa0

Merge remote-tracking branch 'origin/main' into feat/AlchemicalArchive

7f98e37

Use lists instead of tuples

b1c9d82

Share tokenization map with AlchemicalNetwork and ProtocolDAGResults

576f2f0

dotsdl marked this pull request as ready for review January 20, 2026 17:22

dotsdl changed the title ~~[WIP] Add AlchemicalArchive~~ Add AlchemicalArchive Jan 20, 2026

ianmkenney added 2 commits January 21, 2026 09:48

Update news entry

d820964

Allow zstandard compression of msgpack bytes

7a0fdb4

Fix errors in TestArchival

6681348

atravitz reviewed Jan 21, 2026

View reviewed changes

Comment thread gufe/archival.py Outdated

ianmkenney added 3 commits January 21, 2026 14:23

Test MessagePack roundtrip with and without compression

5e0c555

Check that transformation keys correspond to network edges

6813a75

Merge remote-tracking branch 'origin/main' into feat/AlchemicalArchive

611390d

ianmkenney mentioned this pull request Jan 23, 2026

GufeTokenizables cannot contain a dictionary with other GufeTokenizables as keys #714

Open

ianmkenney added 3 commits January 29, 2026 14:20

Remove use of dictionaries for storing transformation results

6bdaa96

Simplify transformation_results validation process

e7dcb20

Merge remote-tracking branch 'origin/main' into feat/AlchemicalArchive

0d10bf2

pre-commit-ci Bot and others added 5 commits February 2, 2026 18:44

[pre-commit.ci] auto fixes from pre-commit.com hooks

e691cda

for more information, see https://pre-commit.ci

Removed typo

1db10c9

Test ValueError raised on duplicate Transformation

78b1d57

Test ordering of input transformation_results

b75e147

Add docstrings

ced3ece

ianmkenney force-pushed the feat/AlchemicalArchive branch from d72ca1f to ced3ece Compare February 3, 2026 17:10

dotsdl reviewed Feb 4, 2026

View reviewed changes

Comment thread gufe/tokenization.py

IAlibay approved these changes Feb 4, 2026

View reviewed changes

ianmkenney added 4 commits February 5, 2026 14:55

Add regression test for deserializing an AlchemicalArchive

6c2f7ee

Merge remote-tracking branch 'origin/main' into feat/AlchemicalArchive

3927fc8

Revert "Add regression test for deserializing an AlchemicalArchive"

97fbde9

This reverts commit 6c2f7ee.

Add regression test for deserializing an AlchemicalArchive

29a1c6f

This reflects the previously reverted commit but changes execution order of the tests.

ianmkenney force-pushed the feat/AlchemicalArchive branch from 2afbd3a to 29a1c6f Compare February 5, 2026 21:09

atravitz requested changes Feb 5, 2026

View reviewed changes

Comment thread gufe/archival.py

atravitz and others added 5 commits February 5, 2026 15:20

don't mutate the fixture

279b847

add immutability test

2cad698

Allow user to skip specifying metadata

9877d1e

Issue warning only if difference in major or minor versions

82866a6

Test conditional issue of warning by semver differences

fe1033c

ianmkenney requested a review from atravitz February 6, 2026 17:41

atravitz approved these changes Feb 6, 2026

View reviewed changes

atravitz merged commit c490289 into OpenFreeEnergy:main Feb 6, 2026
14 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add `AlchemicalArchive`#687

Add `AlchemicalArchive`#687
atravitz merged 36 commits intoOpenFreeEnergy:mainfrom
ianmkenney:feat/AlchemicalArchive

ianmkenney commented Dec 3, 2025

Uh oh!

codecov Bot commented Dec 3, 2025 •

edited

Loading

Uh oh!

ianmkenney commented Jan 20, 2026

Uh oh!

jthorton commented Jan 21, 2026

Uh oh!

ianmkenney commented Jan 21, 2026 •

edited

Loading

Uh oh!

ianmkenney commented Jan 21, 2026

Uh oh!

Uh oh!

ianmkenney commented Feb 2, 2026

Uh oh!

Uh oh!

Uh oh!

github-actions Bot commented Feb 6, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Conversation

ianmkenney commented Dec 3, 2025

Uh oh!

codecov Bot commented Dec 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

ianmkenney commented Jan 20, 2026

Uh oh!

jthorton commented Jan 21, 2026

Uh oh!

ianmkenney commented Jan 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ianmkenney commented Jan 21, 2026

Uh oh!

Uh oh!

ianmkenney commented Feb 2, 2026

Uh oh!

Uh oh!

Uh oh!

github-actions Bot commented Feb 6, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

codecov Bot commented Dec 3, 2025 •

edited

Loading

ianmkenney commented Jan 21, 2026 •

edited

Loading