Skip to content

Conversation

@atravitz
Copy link
Contributor

@atravitz atravitz commented Feb 19, 2025

resolves #1132

also resolves #1130

Checklist

  • switch from using local data to fetching from zenodo
  • support running tests offline if data already exists in the cache
  • match existing openfe fetch rbfe-tutorial-results behavior: this will require a bit more work on the cli fetching side, I'll address in a follow-up PR

Developers certificate of origin

@atravitz atravitz requested a review from mikemhenry February 19, 2025 20:12
@mikemhenry
Copy link
Contributor

Lets add an import of from .conftest import HAS_INTERNET and then a mark @pytest.mark.skipif(not HAS_INTERNET, reason="Internet seems to be unavailable") So that we don't get a test fail if someone runs this on an air gapped network

@mikemhenry
Copy link
Contributor

I think b/c of the caching pooch does, we don't need to worry about the fixture scope so these only run once.

Also I know you are still working on this PR, but I did a quick check of all the files we have checked in and once we get the stuff in openfecli/tests/data/ we should be good, some of the other data files are smaller than pdbs and pngs we have in our docs so we don't need to worry about them IMHO (but if they get put onto zenodo then that is fine too)

46M	./openfecli/tests/data/rbfe_results.tar.gz
7.0M	./openfecli/tests/data/rbfe_results_parallel.tar.gz
1004K	./docs/tutorials/assets/t4lyso.png
836K	./openfe/tests/data/cdk8/cdk8_protein.pdb
760K	./openfe/tests/data/openmm_rfe/vacuum_nocoord.nc
664K	./docs/tutorials/tyk2.png
452K	./openfe/tests/data/eg5/eg5_protein.pdb
396K	./openfecli/tests/data/rbfe_tutorial/tyk2_protein.pdb
396K	./docs/tutorials/inputs/tyk2_protein.pdb
336K	./openfe/tests/data/openmm_rfe/RHFEProtocol_json_results.gz
228K	./docs/guide/setup/img/setup_2x.png
216K	./openfe/tests/data/cdk8.zip
208K	./openfe/tests/data/181l_only.pdb
208K	./docs/tutorials/inputs/181L_mod_capped_protonated.pdb
208K	./docs/tutorials/assets/t4_lysozyme.pdb
148K	./docs/guide/protocols/img/protein_2D_RMSD.png
140K	./docs/guide/setup/img/setup_1x.png
132K	./docs/guide/protocols/img/rbfe_thermocycle.png
124K	./docs/guide/protocols/img/ligand_RMSD.png
120K	./docs/guide/protocols/img/ligand_COM_drift.png

@atravitz
Copy link
Contributor Author

my motivation for putting pooch in a fixture is that it won't get called for tests that don't require it. e.g. if you want to just run the test_format_estimate_uncertainty test, you dont want pooch downloading a test dataset.

@atravitz atravitz force-pushed the add_zenodo_test_data branch from ffab691 to ea69796 Compare February 19, 2025 22:35
@atravitz
Copy link
Contributor Author

Lets add an import of from .conftest import HAS_INTERNET and then a mark @pytest.mark.skipif(not HAS_INTERNET, reason="Internet seems to be unavailable") So that we don't get a test fail if someone runs this on an air gapped network

I'd like to support these tests in instances where the data exists in the cache, but HAS_INTERNET=False? This would cover most of us doing active development, so that we can 1) do offline development if we already have the data cached 2) test on air-gapped networks by manually uploading data into the cache.

I can brute-force this, but I'm curious if you have an elegant way of going about this?

@codecov
Copy link

codecov bot commented Feb 19, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 91.88%. Comparing base (064f33d) to head (914a931).
Report is 86 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1148      +/-   ##
==========================================
- Coverage   93.52%   91.88%   -1.64%     
==========================================
  Files         142      142              
  Lines       10632    10628       -4     
==========================================
- Hits         9944     9766     -178     
- Misses        688      862     +174     
Flag Coverage Δ
fast-tests 91.88% <100.00%> (?)
slow-tests ?

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@mikemhenry
Copy link
Contributor

Let me think about it -- my first thought is that we check if HAS_INTERNET or OFE_USE_DATA_CACHE (which we make a new envar for) is true, since if either of those are true then we can run the tests that need pooch, but the tests that check if we can fetch tutorial data don't use pooch so we would want to keep the way HAS_INTERNET behaves for them.

Can you test this?

  1. Run a test that uses pooch
  2. Turn off your internet
  3. Run test again

And see if that actually works?

@atravitz atravitz force-pushed the add_zenodo_test_data branch from ea69796 to 9f64de4 Compare February 19, 2025 23:56
@atravitz
Copy link
Contributor Author

Let me think about it -- my first thought is that we check if HAS_INTERNET or OFE_USE_DATA_CACHE (which we make a new envar for) is true, since if either of those are true then we can run the tests that need pooch, but the tests that check if we can fetch tutorial data don't use pooch so we would want to keep the way HAS_INTERNET behaves for them.

Can you test this?

1. Run a test that uses pooch

2. Turn off your internet

3. Run test again

And see if that actually works?

Yes that works as expected!

@atravitz atravitz force-pushed the add_zenodo_test_data branch 2 times, most recently from 09beb23 to 795d2d5 Compare February 20, 2025 20:51
@atravitz atravitz force-pushed the add_zenodo_test_data branch from 2267715 to 5a0339e Compare February 20, 2025 22:11
@github-actions
Copy link

No API break detected ✅

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see here: https://github.com/OpenFreeEnergy/openfe/pull/1121/files#r1964427513

same data, just correctly(?) compressed.

@atravitz atravitz marked this pull request as ready for review February 20, 2025 22:38
Copy link
Contributor

@mikemhenry mikemhenry left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM -- I don't think we need a news entry since this is pretty back end stuff and functionally it should all work. The only thing I can think of is if we don't add it to the changelog, somewhere we need to document how to manually load the test data on an air gapped system.

@atravitz
Copy link
Contributor Author

LGTM -- I don't think we need a news entry since this is pretty back end stuff and functionally it should all work. The only thing I can think of is if we don't add it to the changelog, somewhere we need to document how to manually load the test data on an air gapped system.

#1156 working on it here

@atravitz atravitz merged commit b5eb3e2 into main Feb 21, 2025
13 checks passed
@atravitz atravitz deleted the add_zenodo_test_data branch February 21, 2025 18:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Move large test datasets out of git repo fix parallel gather test dataset

3 participants