[GSoC 2026 Draft POC] r.proj: OpenMP parallelization via RAM-resident buffer#7185
[GSoC 2026 Draft POC] r.proj: OpenMP parallelization via RAM-resident buffer#7185krcoder123 wants to merge 3 commits intoOSGeo:mainfrom
Conversation
…re Apple M-series
| static char *make_ipol_list(void); | ||
| static char *make_ipol_desc(void); | ||
|
|
||
|
|
There was a problem hiding this comment.
[pre-commit] reported by reviewdog 🐶
| void interpolate_ram(void *full_map, void *obufptr, int cell_type, | ||
| double col_idx, double row_idx, struct Cell_head *incellhd) |
There was a problem hiding this comment.
[pre-commit] reported by reviewdog 🐶
| void interpolate_ram(void *full_map, void *obufptr, int cell_type, | |
| double col_idx, double row_idx, struct Cell_head *incellhd) | |
| void interpolate_ram(void *full_map, void *obufptr, int cell_type, | |
| double col_idx, double row_idx, struct Cell_head *incellhd) |
| } | ||
|
|
||
| /* Direct memory access - Thread Safe for Reads */ | ||
| unsigned char *src = (unsigned char *)full_map + |
There was a problem hiding this comment.
[pre-commit] reported by reviewdog 🐶
| unsigned char *src = (unsigned char *)full_map + | |
| unsigned char *src = (unsigned char *)full_map + |
| memcpy(obufptr, src, cell_size); | ||
| } | ||
|
|
||
|
|
There was a problem hiding this comment.
[pre-commit] reported by reviewdog 🐶
| G_important_message(_("Projecting...")); | ||
| for (row = 0; row < outcellhd.rows; row++) { | ||
| /* obufptr = obuffer */; | ||
|
|
There was a problem hiding this comment.
[pre-commit] reported by reviewdog 🐶
|
|
||
| /* row index in input matrix */ | ||
| double row_idx = (incellhd.north - ycoord1) / incellhd.ns_res; | ||
| if (GPJ_transform(&oproj, &iproj, &tproj, PJ_FWD, &x1, &y1, NULL) < 0) { |
There was a problem hiding this comment.
[pre-commit] reported by reviewdog 🐶
| if (GPJ_transform(&oproj, &iproj, &tproj, PJ_FWD, &x1, &y1, NULL) < 0) { | |
| if (GPJ_transform(&oproj, &iproj, &tproj, PJ_FWD, &x1, &y1, NULL) < | |
| 0) { |
| double row_idx = (incellhd.north - ycoord1) / incellhd.ns_res; | ||
| if (GPJ_transform(&oproj, &iproj, &tproj, PJ_FWD, &x1, &y1, NULL) < 0) { | ||
| Rast_set_null_value(obufptr, 1, cell_type); | ||
| } else { |
There was a problem hiding this comment.
[pre-commit] reported by reviewdog 🐶
| } else { | |
| } | |
| else { |
| interpolate(ibuffer, obufptr, cell_type, col_idx, row_idx, | ||
| &incellhd); | ||
| /* CALL OUR LOCK-FREE RAM INTERPOLATOR */ | ||
| interpolate_ram(full_map_array, obufptr, cell_type, c_idx, r_idx, &incellhd); |
There was a problem hiding this comment.
[pre-commit] reported by reviewdog 🐶
| interpolate_ram(full_map_array, obufptr, cell_type, c_idx, r_idx, &incellhd); | |
| interpolate_ram(full_map_array, obufptr, cell_type, c_idx, | |
| r_idx, &incellhd); |
|
|
||
| xcoord2 = outcellhd.west + (outcellhd.ew_res / 2); | ||
| ycoord2 -= outcellhd.ns_res; | ||
| #pragma omp critical |
There was a problem hiding this comment.
[pre-commit] reported by reviewdog 🐶
| #pragma omp critical | |
| #pragma omp critical |
|
|
||
| return buf; | ||
| } | ||
|
|
There was a problem hiding this comment.
[pre-commit] reported by reviewdog 🐶
|
Thanks! Reading the whole raster into memory goes against how GRASS handles processing, so while I understand this is a proof of concept, I would like to see a plan how to handle that. |
|
Hi Anna, thank you for the feedback. You're right that loading the full raster into RAM conflicts with GRASS's tile based memory model. This POC was mainly a first step to prove the projection math is parallelizable before tackling the harder cache problem. For the full GSoC proposal, my plan is a two path approach: if the map fits within the user's memory limit, use the fast lock-free RAM buffer. If it doesn't, fall back to thread-local tile caches, where each thread maintains its own independent cache, removing lock contention while preserving GRASS's ability to process maps larger than RAM. Does that direction align with what you'd want to see in the proposal? |
Yes, just more detailed analysis. |
Hi Anna, I've completed a full draft of my proposal using your feedback on the memory model. The two-path architecture is documented in detail here: https://docs.google.com/document/d/1IUOA93u2iqIp0Qix-ymChkvmn3wKTUgFkOBhWCMudrA/edit?tab=t.0 Thanks! |
|
@HuidaeCho would you mind having a look at the proposal? |
Sure, reviewing it. |
|
@krcoder123, see my suggestions for minor edits and comments in the proposal. |
|
Related, maybe helpfull: the GDAL warper has a |
Thank you for the detailed review. I've updated the proposal to address your feedback. I clarified the motivation for the two-path design, added the user controlled memory threshold with megabyte and percentage options, and added benchmarking both paths as an early task in the timeline. I also removed all the "GIS" references and fixed the grammar suggestions. Please let me know if anything else needs addressing. |
Thank you for the pointer, I found this pretty helpful. Seeing how GDAL's NUM_THREADS handles thread-safe data access through independent chunks showed that the horizontal row-strip approach I'm going with for Path B is the right direction. The tricky part in GRASS is that readcell was never designed for this, so Path B needs each thread to have its own cache instance instead of sharing one. I've added the GDAL warper as a reference in the proposal. |
|
Hi @HuidaeCho I've updated the proposal and expanded the scope beyond just r.proj. After digging into the r.fill.stats source code, I found that row-level parallelism is blocked by a sliding ring buffer that has to advance sequentially. However, the column loop inside interpolate_row() is completely independent per cell, which makes it a clean parallelization target. r.fill.stats is now confirmed as the secondary module. I'll pick an additional module during the bonding period after auditing the remaining candidates. Please let me know if there are any changes I should make before the deadline. Thank you! https://docs.google.com/document/d/1IUOA93u2iqIp0Qix-ymChkvmn3wKTUgFkOBhWCMudrA/edit?tab=t.0 |
Hi @HuidaeCho, I've updated the proposal significantly since your last review. After further source audits I found that the previously proposed secondary module had sequential dependencies that would limit speedup, so I changed to r.param.scale instead. I audited process.c, identified the sequential sliding buffer as the core blocker, and applied the same RAM preload pattern from r.proj. Draft PR #7236 shows 1.7x speedup on a 100M cell raster. r.geomorphon is planned as the third module with a source audit during the bonding period. Would you be able to take a quick look at the updated proposal before the deadline? Thanks, |
This draft PR demonstrates a proof of concept parallelization of the
r.projprojection loop using OpenMP, submitted as part of my GSoC 2026 proposal for "Parallelization of existing tools."Benchmark Results
UTM Zone 16 → WebMercator, 100M cells, Apple M4, 8 cores — 3 runs each:
2.5x wall-time speedup across two independent projection pipelines.
Technical Approach
The core blocker for parallelizing
r.projis that the legacyreadcelltile cache is not thread-safe. Rather than wrapping it in mutexes (which I prototyped first and confirmed serializes all threads with no speedup), this POC bypasses the cache by loading the input map into a flat RAM array before the parallel loop. Reads from a flat array are naturally thread-safe, eliminating all lock contention.Known Limitations
-Xclang -fopenmp). Full implementation will wireHAVE_OPENMPinto the configure system for Linux/Windows portability.Rast_put_rowrequires a#pragma omp criticalfor sequential writes. Final implementation will use indexed row buffers.Rast_get_rowdirectly.GPJ_transformshares a singletprojcontext across threads. Per-threadPJ_CONTEXTinitialization is the next step.This PR is not intended for merge. I made this PR solely to demonstrate that the projection math is parallelizable and the performance gain is real.