Skip to content

Conversation

@nate-kean
Copy link
Contributor

@nate-kean nate-kean commented Jun 28, 2025

Description

Speeds up the testing suite 2-4x.1

This pull request does two main things:

  1. Refactors the test suite to allow any tests to run at the same time, and configures our CircleCI runners to run tests in parallel.
  2. Makes all tests output their files to temporary directories, separate for each test.
    • Crucially, the code was changed to do this without using pushd anymore, as that was causing problems from some tests changing the Python interpreter's working directory in the middle of other tests.

Since this is off by default, documentation on enabling pytest-xdist was also added to CONTRIBUTING.

In accomplishing (1), I also had to do (2); putting every test's working files in separate directories was necessary to prevent similar tests from trying to write to the same files, which would cause them to error out and fail. Making all tests in RAiDER use temporary directories was an effort that had already been going on in RAiDER piecemeal, so now we can say that is done and dusted, too.

In the process, I ended up encountering and fixing some bugs:

  • c7186f3 (#745): certain parts of RAiDER used to disobey your configured output directory, always outputting to your current working directory instead.
  • 9db2a6a (#745): a piece of code was erroneously renaming any instance of "wet" to "tropo" within the configured output path. Sorry, that was my bug.
  • bbbeef9 (#745): AOIs were passing the look direction in isce's format when the code was expecting it to be a string.

Motivation and Context

  1. Improves developer experience by making it considerably easier/more convenient to use the test suite to test changes as you go along.
  2. Improves developer experience in a different way by keeping the test suite from "polluting" their project directory with test output files. Before it could also be difficult to determine which tests were producing which files, especially since a single test could output many files across different places. Test output files are now grouped into folders labeled per test. In a typical Linux setup, they will be found in pytest-of-<username>/pytest-<run #>/<test name>.

To configure your own setup to run tests in parallel, add this to your pytest arguments:

-n <#|auto|logical>

For example, in your vscode workspace settings, you can add:

    "python.testing.pytestArgs": [
        "test",
        "-n", "logical"
    ],

This will distribute the tests across your CPU's total number of hyperthreads in vscode's pytest runner.

Note

A bug affects pytest-xdist (pytest-dev/pytest-xdist#872) that causes pytest to hang on some systems when using -n logical. If this happens to you, try an -n one less than your hyperthread count (i.e., on a system with 12 hyperthreads, set to 11).

How Has This Been Tested?

All tests continue to pass when run in parallel. I ensured that no test outputs any files to the project directory (or current working directory) anymore, through this script that watches for files being created:

#! /bin/bash
# (apt install inotify-tools)
inotifywait -r -m \
    --exclude ".*(__pycache__)|(\.git)|(pytest-cache)|(RAiDER\.egg-info).*" \
    -e create . \
    | (
        while read file_path file_event file_name; do \
            echo ${file_path}${file_name} event: ${file_event}; \
        done
    )

If you run this script and the test suite, you will find that this pull request silences all its activity.

Type of change

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)

Checklist:

  • I have added an explanation of what your changes do and why you'd like us to include them.
  • I have written new tests for your core changes, as applicable.
  • I have successfully ran tests with your changes locally.
  • My change requires a change to the documentation.
  • I have updated the documentation accordingly.

Footnotes

  1. On my machine, slow tests enabled, -n 11, run time was reduced from ~21 minutes to ~9 minutes on all of my CPU's 12 hyperthreads. On CircleCI, slow tests skipped, -n 2, "Run unit tests" time is reduced from ~17 minutes to ~4 minutes. This PR sets -n to just 2 on CircleCI to keep within the runner's RAM budget.

Nate Kean and others added 5 commits June 28, 2025 13:01
- Formatting
- isort
- Use Path
- Incrementally added some type annotation
- Use direct equality comparison (==) instead of isclose for ints
- Formatting
- Repeated test code parametrized
- Temporary directory use for each run of a test that subsequent test runs don't short circuit ("there's no work to do") on a file that already exists
@nate-kean nate-kean changed the title Misc fixes Enable tests to run in parallel (150% speedup) Jun 29, 2025
@nate-kean nate-kean changed the title Enable tests to run in parallel (150% speedup) Run tests in parallel (150% speedup) Jun 29, 2025
@nate-kean nate-kean changed the title Run tests in parallel (150% speedup) Run tests in parallel Jun 29, 2025
@nate-kean nate-kean changed the title Run tests in parallel Run tests in parallel, use temp directories for all test artifacts Jun 29, 2025
@nate-kean
Copy link
Contributor Author

nate-kean commented Jun 30, 2025

Tests are failing on CircleCI but not on my machine... It appears that the runners are running out of memory now that they're running 32 tests at once (the number of cores they have). Assuming we don't want to pay more for a CircleCI resource class with more RAM, I can only think to find ways to reduce RAiDER's memory usage, or settle for running fewer tests in parallel.

Will try halving the worker count from 32 to 16?

@nate-kean
Copy link
Contributor Author

Even on just 4 workers, this appears to be causing quite the amount of instability. I'm going to put this on pause and work on other things.

@nate-kean nate-kean marked this pull request as ready for review August 5, 2025 01:54
nate-kean added a commit to nate-kean/RAiDER that referenced this pull request Aug 24, 2025
Remove check/optionalness for multiprocessing:
Not necessary because multiprocessing is a built-in module; it will never not be installed.

Fix incorrect import

Update test_scenario_1.py

Update transformPoints()'s documentation

Don't suppress exception content
Use dataset subscript syntax

Optimization: lazy-load xarray datasets:
Using .open_dataset instead of .load_dataset allows you to load only the part(s) of the file you need. With the datasets in the hundreds of MB that RAiDER works with, this can save massive amounts of memory, and even enable multiple RAiDER processes to run at once on machines that would have run out of memory before. This high memory requirement may also have been what was causing instability in dbekaert#745... hm, I should check that...

Formatting

Fix type annotation

Make __call__ an abstract method on LOS
Squashed commit of the following:

commit b32655c
Merge: 5727651 3fd5cf5
Author: Jeremy Maurer <maurer.jeremy@gmail.com>
Date:   Sat Aug 23 21:24:38 2025 -0500

    Merge pull request dbekaert#762 from nate-kean/add-lock-file

    Add lockfiles to RAiDER

commit 3fd5cf5
Author: Nate Kean <natekean65535@gmail.com>
Date:   Sat Aug 23 16:20:28 2025 -0500

    Create regenerate-locks.py

commit a999d56
Author: Nate Kean <natekean65535@gmail.com>
Date:   Fri Aug 22 16:15:06 2025 -0500

    Update CHANGELOG.md for dbekaert#762

commit 56df113
Author: Nate Kean <natekean65535@gmail.com>
Date:   Fri Aug 22 16:10:52 2025 -0500

    Update documentation for lockfiles

commit d809892
Author: Nate Kean <natekean65535@gmail.com>
Date:   Sat Aug 23 16:32:32 2025 -0500

    Update CircleCI config to use lockfiles

commit d58b08b
Author: Nate Kean <natekean65535@gmail.com>
Date:   Sat Aug 23 16:32:59 2025 -0500

    Generate lockfiles for all other supported Python versions @ 5727651

commit a78ec5a
Author: Nate Kean <natekean65535@gmail.com>
Date:   Fri Aug 22 16:09:05 2025 -0500

    Generate lockfile for Python 3.12 @ 5727651

commit 0bb2068
Author: Nate Kean <natekean65535@gmail.com>
Date:   Thu Aug 21 20:36:44 2025 -0500

    Fix tests: use xarray for MERRA2._fetch()

commit 5727651
Merge: 784237b 6d12fd4
Author: Charlie Marshak <cmarshak@users.noreply.github.com>
Date:   Thu Aug 21 13:11:09 2025 -0700

    Merge pull request dbekaert#760 from dbekaert/herman-smale-branch

    Browse Imagery Fix

commit 6d12fd4
Merge: 784237b a1d5d3c
Author: Charlie Marshak <cmarshak@users.noreply.github.com>
Date:   Thu Aug 21 11:38:18 2025 -0700

    Merge pull request dbekaert#759 from jacquelynsmale/add_browse_s3_tag

    Add `browse` s3 tag to  uploaded `.png` files

commit a1d5d3c
Author: jacquelynsmale <jacquelyn.smale@gmail.com>
Date:   Wed Aug 20 16:29:46 2025 -0800

    fix 135

commit 0ee91ab
Author: jacquelynsmale <jacquelyn.smale@gmail.com>
Date:   Wed Aug 20 16:09:46 2025 -0800

    fix patch side effect

commit c7db467
Author: jacquelynsmale <jacquelyn.smale@gmail.com>
Date:   Wed Aug 20 15:12:35 2025 -0800

    fix path in test

commit 4704db3
Author: jacquelynsmale <jacquelyn.smale@gmail.com>
Date:   Wed Aug 20 13:52:37 2025 -0800

    switch to accept only Path

commit 5f929ba
Author: jacquelynsmale <jacquelyn.smale@gmail.com>
Date:   Tue Aug 19 15:46:07 2025 -0800

    fix changelog to reflect tag

commit 1a16ac8
Author: jacquelynsmale <jacquelyn.smale@gmail.com>
Date:   Tue Aug 19 15:36:23 2025 -0800

    add in get_tag_set to aws.py

commit 784237b
Merge: 9eecf23 a259c66
Author: Jeremy Maurer <maurer.jeremy@gmail.com>
Date:   Thu Jul 3 20:38:01 2025 -0500

    Merge pull request dbekaert#746 from garlic-os/numpy-2.0-upgrade

    Upgrade to numpy 2.0

commit a259c66
Author: Nate Kean <14845347+garlic-os@users.noreply.github.com>
Date:   Wed Jul 2 17:56:17 2025 -0500

    Support both numpy v1 and v2

    - environment.yml: Remove requirement for numpy to be at least v2
    - Add polyfill for np.trapezoid for numpy v1

commit 4a5d4ea
Author: Nate Kean <14845347+garlic-os@users.noreply.github.com>
Date:   Mon Jun 30 21:14:40 2025 -0500

    Update CHANGELOG.md for dbekaert#746

commit 46e8883
Author: Nate Kean <14845347+garlic-os@users.noreply.github.com>
Date:   Mon Jun 30 21:07:39 2025 -0500

    Migrate np.trapz to np.trapezoid

commit 7d3c98c
Author: Nate Kean <14845347+garlic-os@users.noreply.github.com>
Date:   Mon Jun 30 21:06:01 2025 -0500

    Add ruff ruleset for numpy 2.0 migration

commit b5e518d
Author: Nate Kean <14845347+garlic-os@users.noreply.github.com>
Date:   Mon Jun 30 21:05:45 2025 -0500

    Set minimum numpy version to 2.0
@nate-kean nate-kean changed the title Run tests in parallel, use temp directories for all test artifacts Speedup: Run tests in parallel, use temp dirs for tests Aug 24, 2025
@nate-kean nate-kean marked this pull request as draft August 25, 2025 05:03
My bad, I did not know they were different: https://stackoverflow.com/questions/69734856/whats-the-difference-between-and-for-conda-package-versioning

This was causing conda-lock to fail to solve due to unnecessarily restricing the Python version to 3.x.0 (e.g., 3.9.0 whereas current is 3.9.23).
- Reduce data duplication: Pull Python versions from .circleci/config.yaml instead of hardcoding them again
- Improve logging
- Add comments, documentation
commit 593e717
Merge: 49d08bc b32655c
Author: Nate Kean <natekean65535@gmail.com>
Date:   Sat Aug 23 21:49:06 2025 -0500

    Merge branch 'dbekaert:dev' into test-fixes

commit 49d08bc
Merge: 5e285b5 df9e299
Author: Nate Kean <natekean65535@gmail.com>
Date:   Fri Aug 22 11:28:56 2025 -0500

    Merge branch 'test-fixes' of https://github.com/nate-kean/RAiDER into test-fixes

commit 5e285b5
Author: Nate Kean <natekean65535@gmail.com>
Date:   Fri Aug 22 11:28:07 2025 -0500

    Update CHANGELOG.md for dbekaert#761

commit ef65d93
Author: Nate Kean <natekean65535@gmail.com>
Date:   Thu Aug 21 21:28:11 2025 -0500

    Fix test_GUNW_hyp3_metadata_update: fix invalid gunw_schema.json

commit 0275c85
Author: Nate Kean <natekean65535@gmail.com>
Date:   Thu Aug 21 20:36:44 2025 -0500

    Fix test_temporal_interpolate.py: use xarray for MERRA2._fetch()

commit df9e299
Author: Nate Kean <natekean65535@gmail.com>
Date:   Thu Aug 21 21:28:11 2025 -0500

    Fix test_GUNW_hyp3_metadata_update: fix invalid gunw_schema.json

commit 0e2c99d
Author: Nate Kean <natekean65535@gmail.com>
Date:   Thu Aug 21 20:36:44 2025 -0500

    Fix test_temporal_interpolate.py: use xarray for MERRA2._fetch()
parallelize-tests and test-fixes both make changes to the dependencies, so upon merging test-fixes with this branch, the lockfiles must be regenerated again to incorporate both of their changes.
@nate-kean nate-kean changed the title Speedup: Run tests in parallel, use temp dirs for tests Speedup: Add multiprocessing support to test suite Aug 28, 2025
@nate-kean nate-kean marked this pull request as ready for review August 28, 2025 20:09
nate-kean and others added 7 commits September 9, 2025 09:42
- Sort imports
- import datetime as dt (reduce ambiguity between datetime and datetime.datetime)
- Linting: add closing punctuation to first line of docstrings
- Remove superfluous `assert True` statements
Plus manual overrides for blocks that break usual formatting rules intentionally
@nate-kean
Copy link
Contributor Author

Docker Build failure: That looks like an out-of-memory error. Looking into it I'm noticing it's actually trying to create a RAiDER environment but is not using the conda locks. This might not actually be related to the pull request.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants