-
Notifications
You must be signed in to change notification settings - Fork 40
Speedup: Add multiprocessing support to test suite #745
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: dev
Are you sure you want to change the base?
Conversation
- Formatting - isort - Use Path - Incrementally added some type annotation - Use direct equality comparison (==) instead of isclose for ints
- Formatting
- Repeated test code parametrized
- Temporary directory use for each run of a test that subsequent test runs don't short circuit ("there's no work to do") on a file that already exists
|
Tests are failing on CircleCI but not on my machine... It appears that the runners are running out of memory now that they're running 32 tests at once (the number of cores they have). Assuming we don't want to pay more for a CircleCI resource class with more RAM, I can only think to find ways to reduce RAiDER's memory usage, or settle for running fewer tests in parallel. Will try halving the worker count from 32 to 16? |
… into parallelize-tests
|
Even on just 4 workers, this appears to be causing quite the amount of instability. I'm going to put this on pause and work on other things. |
Remove check/optionalness for multiprocessing: Not necessary because multiprocessing is a built-in module; it will never not be installed. Fix incorrect import Update test_scenario_1.py Update transformPoints()'s documentation Don't suppress exception content Use dataset subscript syntax Optimization: lazy-load xarray datasets: Using .open_dataset instead of .load_dataset allows you to load only the part(s) of the file you need. With the datasets in the hundreds of MB that RAiDER works with, this can save massive amounts of memory, and even enable multiple RAiDER processes to run at once on machines that would have run out of memory before. This high memory requirement may also have been what was causing instability in dbekaert#745... hm, I should check that... Formatting Fix type annotation Make __call__ an abstract method on LOS
Squashed commit of the following: commit b32655c Merge: 5727651 3fd5cf5 Author: Jeremy Maurer <maurer.jeremy@gmail.com> Date: Sat Aug 23 21:24:38 2025 -0500 Merge pull request dbekaert#762 from nate-kean/add-lock-file Add lockfiles to RAiDER commit 3fd5cf5 Author: Nate Kean <natekean65535@gmail.com> Date: Sat Aug 23 16:20:28 2025 -0500 Create regenerate-locks.py commit a999d56 Author: Nate Kean <natekean65535@gmail.com> Date: Fri Aug 22 16:15:06 2025 -0500 Update CHANGELOG.md for dbekaert#762 commit 56df113 Author: Nate Kean <natekean65535@gmail.com> Date: Fri Aug 22 16:10:52 2025 -0500 Update documentation for lockfiles commit d809892 Author: Nate Kean <natekean65535@gmail.com> Date: Sat Aug 23 16:32:32 2025 -0500 Update CircleCI config to use lockfiles commit d58b08b Author: Nate Kean <natekean65535@gmail.com> Date: Sat Aug 23 16:32:59 2025 -0500 Generate lockfiles for all other supported Python versions @ 5727651 commit a78ec5a Author: Nate Kean <natekean65535@gmail.com> Date: Fri Aug 22 16:09:05 2025 -0500 Generate lockfile for Python 3.12 @ 5727651 commit 0bb2068 Author: Nate Kean <natekean65535@gmail.com> Date: Thu Aug 21 20:36:44 2025 -0500 Fix tests: use xarray for MERRA2._fetch() commit 5727651 Merge: 784237b 6d12fd4 Author: Charlie Marshak <cmarshak@users.noreply.github.com> Date: Thu Aug 21 13:11:09 2025 -0700 Merge pull request dbekaert#760 from dbekaert/herman-smale-branch Browse Imagery Fix commit 6d12fd4 Merge: 784237b a1d5d3c Author: Charlie Marshak <cmarshak@users.noreply.github.com> Date: Thu Aug 21 11:38:18 2025 -0700 Merge pull request dbekaert#759 from jacquelynsmale/add_browse_s3_tag Add `browse` s3 tag to uploaded `.png` files commit a1d5d3c Author: jacquelynsmale <jacquelyn.smale@gmail.com> Date: Wed Aug 20 16:29:46 2025 -0800 fix 135 commit 0ee91ab Author: jacquelynsmale <jacquelyn.smale@gmail.com> Date: Wed Aug 20 16:09:46 2025 -0800 fix patch side effect commit c7db467 Author: jacquelynsmale <jacquelyn.smale@gmail.com> Date: Wed Aug 20 15:12:35 2025 -0800 fix path in test commit 4704db3 Author: jacquelynsmale <jacquelyn.smale@gmail.com> Date: Wed Aug 20 13:52:37 2025 -0800 switch to accept only Path commit 5f929ba Author: jacquelynsmale <jacquelyn.smale@gmail.com> Date: Tue Aug 19 15:46:07 2025 -0800 fix changelog to reflect tag commit 1a16ac8 Author: jacquelynsmale <jacquelyn.smale@gmail.com> Date: Tue Aug 19 15:36:23 2025 -0800 add in get_tag_set to aws.py commit 784237b Merge: 9eecf23 a259c66 Author: Jeremy Maurer <maurer.jeremy@gmail.com> Date: Thu Jul 3 20:38:01 2025 -0500 Merge pull request dbekaert#746 from garlic-os/numpy-2.0-upgrade Upgrade to numpy 2.0 commit a259c66 Author: Nate Kean <14845347+garlic-os@users.noreply.github.com> Date: Wed Jul 2 17:56:17 2025 -0500 Support both numpy v1 and v2 - environment.yml: Remove requirement for numpy to be at least v2 - Add polyfill for np.trapezoid for numpy v1 commit 4a5d4ea Author: Nate Kean <14845347+garlic-os@users.noreply.github.com> Date: Mon Jun 30 21:14:40 2025 -0500 Update CHANGELOG.md for dbekaert#746 commit 46e8883 Author: Nate Kean <14845347+garlic-os@users.noreply.github.com> Date: Mon Jun 30 21:07:39 2025 -0500 Migrate np.trapz to np.trapezoid commit 7d3c98c Author: Nate Kean <14845347+garlic-os@users.noreply.github.com> Date: Mon Jun 30 21:06:01 2025 -0500 Add ruff ruleset for numpy 2.0 migration commit b5e518d Author: Nate Kean <14845347+garlic-os@users.noreply.github.com> Date: Mon Jun 30 21:05:45 2025 -0500 Set minimum numpy version to 2.0
My bad, I did not know they were different: https://stackoverflow.com/questions/69734856/whats-the-difference-between-and-for-conda-package-versioning This was causing conda-lock to fail to solve due to unnecessarily restricing the Python version to 3.x.0 (e.g., 3.9.0 whereas current is 3.9.23).
- Reduce data duplication: Pull Python versions from .circleci/config.yaml instead of hardcoding them again - Improve logging - Add comments, documentation
commit 593e717 Merge: 49d08bc b32655c Author: Nate Kean <natekean65535@gmail.com> Date: Sat Aug 23 21:49:06 2025 -0500 Merge branch 'dbekaert:dev' into test-fixes commit 49d08bc Merge: 5e285b5 df9e299 Author: Nate Kean <natekean65535@gmail.com> Date: Fri Aug 22 11:28:56 2025 -0500 Merge branch 'test-fixes' of https://github.com/nate-kean/RAiDER into test-fixes commit 5e285b5 Author: Nate Kean <natekean65535@gmail.com> Date: Fri Aug 22 11:28:07 2025 -0500 Update CHANGELOG.md for dbekaert#761 commit ef65d93 Author: Nate Kean <natekean65535@gmail.com> Date: Thu Aug 21 21:28:11 2025 -0500 Fix test_GUNW_hyp3_metadata_update: fix invalid gunw_schema.json commit 0275c85 Author: Nate Kean <natekean65535@gmail.com> Date: Thu Aug 21 20:36:44 2025 -0500 Fix test_temporal_interpolate.py: use xarray for MERRA2._fetch() commit df9e299 Author: Nate Kean <natekean65535@gmail.com> Date: Thu Aug 21 21:28:11 2025 -0500 Fix test_GUNW_hyp3_metadata_update: fix invalid gunw_schema.json commit 0e2c99d Author: Nate Kean <natekean65535@gmail.com> Date: Thu Aug 21 20:36:44 2025 -0500 Fix test_temporal_interpolate.py: use xarray for MERRA2._fetch()
parallelize-tests and test-fixes both make changes to the dependencies, so upon merging test-fixes with this branch, the lockfiles must be regenerated again to incorporate both of their changes.
Plus manual overrides for blocks that break usual formatting rules intentionally
Lint/format test_weather_model.py
|
Docker Build failure: That looks like an out-of-memory error. Looking into it I'm noticing it's actually trying to create a RAiDER environment but is not using the conda locks. This might not actually be related to the pull request. |
Description
Speeds up the testing suite 2-4x.1
This pull request does two main things:
pushdanymore, as that was causing problems from some tests changing the Python interpreter's working directory in the middle of other tests.Since this is off by default, documentation on enabling
pytest-xdistwas also added toCONTRIBUTING.In accomplishing (1), I also had to do (2); putting every test's working files in separate directories was necessary to prevent similar tests from trying to write to the same files, which would cause them to error out and fail. Making all tests in RAiDER use temporary directories was an effort that had already been going on in RAiDER piecemeal, so now we can say that is done and dusted, too.
In the process, I ended up encountering and fixing some bugs:
c7186f3(#745): certain parts of RAiDER used to disobey your configured output directory, always outputting to your current working directory instead.9db2a6a(#745): a piece of code was erroneously renaming any instance of "wet" to "tropo" within the configured output path. Sorry, that was my bug.bbbeef9(#745): AOIs were passing the look direction in isce's format when the code was expecting it to be a string.Motivation and Context
pytest-of-<username>/pytest-<run #>/<test name>.To configure your own setup to run tests in parallel, add this to your pytest arguments:
For example, in your vscode workspace settings, you can add:
This will distribute the tests across your CPU's total number of hyperthreads in vscode's pytest runner.
Note
A bug affects pytest-xdist (pytest-dev/pytest-xdist#872) that causes pytest to hang on some systems when using
-n logical. If this happens to you, try an-none less than your hyperthread count (i.e., on a system with 12 hyperthreads, set to 11).How Has This Been Tested?
All tests continue to pass when run in parallel. I ensured that no test outputs any files to the project directory (or current working directory) anymore, through this script that watches for files being created:
If you run this script and the test suite, you will find that this pull request silences all its activity.
Type of change
Checklist:
Footnotes
On my machine, slow tests enabled,
-n 11, run time was reduced from ~21 minutes to ~9 minutes on all of my CPU's 12 hyperthreads. On CircleCI, slow tests skipped,-n 2, "Run unit tests" time is reduced from ~17 minutes to ~4 minutes. This PR sets-nto just2on CircleCI to keep within the runner's RAM budget. ↩