Skip to content

r.watershed: Can now optionally use already existing maps to optimize performance#6953

Open
sumitchintanwar wants to merge 8 commits intoOSGeo:mainfrom
sumitchintanwar:feat/r-watershed-reuse-flow
Open

r.watershed: Can now optionally use already existing maps to optimize performance#6953
sumitchintanwar wants to merge 8 commits intoOSGeo:mainfrom
sumitchintanwar:feat/r-watershed-reuse-flow

Conversation

@sumitchintanwar
Copy link
Contributor

@sumitchintanwar sumitchintanwar commented Jan 25, 2026

Overview
This PR introduces a "reuse" feature to r.watershed, allowing users to skip the computationally expensive flow accumulation and drainage direction steps if these maps already exist. This is particularly useful for:

  • Iterative testing of different basin threshold values.
  • Batch processing where the underlying elevation/flow model remains constant.

Key Changes

  • Core Logic: Modified main.c and RAM/SEG modules to accept existing accumulation and drainage maps as inputs.
  • Documentation: Added raster/r.watershed/REUSE_FEATURE.md covering workflows, limitations, and "Do's and Don'ts".
  • Benchmarks: Added raster/r.watershed/testsuite/BENCHMARKS.md using hyperfine.
  • Testing: Added new tests in testsuite/ to verify correctness.

Performance
Benchmarks performed using hyperfine on the NC SPM dataset show significant speedups when iterating on thresholds.
Screenshot from 2026-01-27 11-20-29
Multiple Benchmark tests using hyperfine show similar significant improvement like above.

Limitations

  • Input Compatibility: As noted in the documentation, input maps must be generated by r.watershed. Maps from r.stream.* or other tools may have different drainage conventions and are not supported (explicitly warned in docs).
  • Basin Delineation Only: The reuse mode focuses on recalculating basins/streams based on thresholds; it does not re-compute flow physics.

addressed Feedback

  • Replaced time with hyperfine for statistically significant benchmarking.
  • Added safety checks for flags (G_option_collective).
  • Added comprehensive documentation and tests.

Closes #6720

@echoix
Copy link
Member

echoix commented Jan 25, 2026

Did you ever try using hyperfine for actually making the stats, doing enough runs for it to be statistically significative, handling outliers (often caused by other programs running), and removing the shell startup time.
With the timings you shown, using time maybe has limits since it wasn't really long. The "~4 times" is on user-mode CPU time only, but I'm a little sceptical that it isn't just variations between runs.

But any improvement is an improvement!

@petrasovaa
Copy link
Contributor

petrasovaa commented Jan 25, 2026

Thanks! Please check our contributing guidelines for code style (use pre-commit).

Please add a test. Include a quick test (on smaller part of "elevation" map and maybe on artificial surface) and perhaps one that runs longer with larger area and deactivate it to not slow down the CI, but I can then at least run it locally.

I am not an expert on this tool, but one of my concerns with this is that if user tries to pass a flow accumulation/drainage raster computed with different tool (e.g. r.stream.* tools can handle that), the algorithm may potentially fail (incorrect results, segfaults). This can be of course discouraged in the documentation, but maybe there are other ways, e.g. detect the drainage conventions and check they match.

@sumitchintanwar
Copy link
Contributor Author

Did you ever try using hyperfine for actually making the stats, doing enough runs for it to be statistically significative, handling outliers (often caused by other programs running), and removing the shell startup time. With the timings you shown, using time maybe has limits since it wasn't really long. The "~4 times" is on user-mode CPU time only, but I'm a little sceptical that it isn't just variations between runs.

But any improvement is an improvement!

I haven't tried hyperfine. Thank you for the suggestion. I will try that for the tests. i think the actual speedup would be more modest once measured correctly with hyperfine. But hey, It's still a worthwhile improvement.

@sumitchintanwar
Copy link
Contributor Author

Thanks! Please check our contributing guidelines for code style (use pre-commit).

Please add a test. Include a quick test (on smaller part of "elevation" map and maybe on artificial surface) and perhaps one that runs longer with larger area and deactivate it to not slow down the CI, but I can then at least run it locally.

I am not an expert on this tool, but one of my concerns with this is that if user tries to pass a flow accumulation/drainage raster computed with different tool (e.g. r.stream.* tools can handle that), the algorithm may potentially fail (incorrect results, segfaults). This can be of course discouraged in the documentation, but maybe there are other ways, e.g. detect the drainage conventions and check they match.

@petrasovaa, Thanks for the feedback and the code review. I'll check the guidelines and get back to you with proper tests.

@sumitchintanwar
Copy link
Contributor Author

Hey @petrasovaa, @echoix, I have made the requested changes and also added benchmarks using hyperfine along with proper documentation, tests and constraints of this feature. Ready for review when you have a chance!

@sumitchintanwar
Copy link
Contributor Author

On a side note, I have to admit, I'm still climbing the GRASS learning curve a bit, but I'm really enjoying the challenge! It's super interesting digging into how this all works. Let me know what you think of the updates!"

@github-actions github-actions bot added Python Related code is in Python HTML Related code is in HTML docs markdown Related to markdown, markdown files tests Related to Test Suite labels Jan 27, 2026
@sumitchintanwar
Copy link
Contributor Author

Some of the checks are failing because of hyperfine not found. Is there anything I should do to resolve this specifically?

"raster/r.watershed/testsuite/benchmark_reuse.sh: line 15: hyperfine: command not found"

@sumitchintanwar sumitchintanwar force-pushed the feat/r-watershed-reuse-flow branch from e2cc694 to 1449daf Compare January 28, 2026 04:14
@petrasovaa
Copy link
Contributor

Given the significant changes in the code, I think we need to step back and write proper tests of the current code in a separate PR. I don't think the existing tests are comprehensive enough. Once those tests are merged, we can verify this PR is not breaking anything. Limit the region so that the test run fast, but include a test with larger area that we can verify locally but skip it in the CI.

@sumitchintanwar
Copy link
Contributor Author

sumitchintanwar commented Jan 28, 2026

Given the significant changes in the code, I think we need to step back and write proper tests of the current code in a separate PR. I don't think the existing tests are comprehensive enough. Once those tests are merged, we can verify this PR is not breaking anything. Limit the region so that the test run fast, but include a test with larger area that we can verify locally but skip it in the CI.

Okay. I understand @petrasovaa . I will make a separate PR with more comprehensive tests. I would like some clarity on what more tests should I add.

I have skipped Basin delineation, TCI Sp calculation as for those significant algorithm changes will have to be added.
I'm thinking we'll do after the current feature is approved.

@sumitchintanwar
Copy link
Contributor Author

sumitchintanwar commented Jan 28, 2026

The Tests are for reuse functionality are added in PR #6992. I'd appreciate the review.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don’t think benchmarking is appropriate to run in tests, even less in CI where the actual hardware is variable and we won’t even do something with it. It’s nice to have as a reference, maybe place it in the PR or ask if it should be left as a separate file not picked up in tests

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the review. I have skipped it in CI. Should I remove it from the PR entirely ? Results are there in the PR description. I just added it as something to test locally.

@petrasovaa
Copy link
Contributor

Given the significant changes in the code, I think we need to step back and write proper tests of the current code in a separate PR. I don't think the existing tests are comprehensive enough. Once those tests are merged, we can verify this PR is not breaking anything. Limit the region so that the test run fast, but include a test with larger area that we can verify locally but skip it in the CI.

Okay. I understand @petrasovaa . I will make a separate PR with more comprehensive tests. I would like some clarity on what more tests should I add.

That's a misunderstanding... Please reread my comment again. To move on with this PR, we need to have tests in place that would catch any regressions of the existing functionality caused by this PR. This PR may potentially break existing functionality and need to be able to catch that. Existing tests are not comprehensive enough.

I have skipped Basin delineation, TCI Sp calculation as for those significant algorithm changes will have to be added. I'm thinking we'll do after the current feature is approved.

Could explain this a little bit more?

@sumitchintanwar
Copy link
Contributor Author

sumitchintanwar commented Jan 30, 2026

That's a misunderstanding... Please reread my comment again. To move on with this PR, we need to have tests in place that would catch any regressions of the existing functionality caused by this PR. This PR may potentially break existing functionality and need to be able to catch that. Existing tests are not comprehensive enough.

I am sorry for the confusion. I misunderstood. I'll add tests for the current code in a different pr, so it is ensured that the existing functionality isn't broken.

@sumitchintanwar
Copy link
Contributor Author

sumitchintanwar commented Jan 30, 2026

I have skipped Basin delineation, TCI Spi calculation as for those significant algorithm changes will have to be added. I'm thinking we'll do after the current feature is approved.

Could explain this a little bit more?

@petrasovaa
Basically, Basin outputs are dependent on threshold, different thresholds create different basin boundaries from the same flow data. This threshold-based processing is not yet implemented in reuse mode and can be planned for later.

TCI/SPI require slope computation from the elevation model, which is skipped when reusing flow maps.

To keep this change focused, reuse mode is currently limited to outputs that can be derived directly from reused flow maps.

@sumitchintanwar
Copy link
Contributor Author

@petrasovaa The regression tests are added in PR #7029. I’d really appreciate a quick review.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

C Related code is in C docs HTML Related code is in HTML markdown Related to markdown, markdown files module Python Related code is in Python raster Related to raster data processing tests Related to Test Suite

Projects

None yet

Development

Successfully merging this pull request may close these issues.

r.watershed: optionally use already existing maps of flow-accumulation or drainage direction

3 participants