Skip to content

[GSoC 2026 Draft POC] r.proj: OpenMP parallelization via RAM-resident buffer#7185

Draft
krcoder123 wants to merge 3 commits intoOSGeo:mainfrom
krcoder123:gsoc-parallel-rproj
Draft

[GSoC 2026 Draft POC] r.proj: OpenMP parallelization via RAM-resident buffer#7185
krcoder123 wants to merge 3 commits intoOSGeo:mainfrom
krcoder123:gsoc-parallel-rproj

Conversation

@krcoder123
Copy link
Copy Markdown

This draft PR demonstrates a proof of concept parallelization of the r.proj projection loop using OpenMP, submitted as part of my GSoC 2026 proposal for "Parallelization of existing tools."

Benchmark Results

UTM Zone 16 → WebMercator, 100M cells, Apple M4, 8 cores — 3 runs each:

Method Wall Time CPU Usage
Serial (original) ~20.3s 83% (1 core)
Parallel (this PR) ~8.1s 515% (5-6 cores)
UTM → Lat/Lon (EPSG:4326) 9.1s 572%

2.5x wall-time speedup across two independent projection pipelines.

Technical Approach

The core blocker for parallelizing r.proj is that the legacy readcell tile cache is not thread-safe. Rather than wrapping it in mutexes (which I prototyped first and confirmed serializes all threads with no speedup), this POC bypasses the cache by loading the input map into a flat RAM array before the parallel loop. Reads from a flat array are naturally thread-safe, eliminating all lock contention.

Known Limitations

  • OpenMP detection uses macOS-specific flags (-Xclang -fopenmp). Full implementation will wire HAVE_OPENMP into the configure system for Linux/Windows portability.
  • Rast_put_row requires a #pragma omp critical for sequential writes. Final implementation will use indexed row buffers.
  • RAM loading loop uses a workaround — final implementation will use Rast_get_row directly.
  • No fallback for systems where the map exceeds available RAM.
  • GPJ_transform shares a single tproj context across threads. Per-thread PJ_CONTEXT initialization is the next step.

This PR is not intended for merge. I made this PR solely to demonstrate that the projection math is parallelizable and the performance gain is real.

@github-actions github-actions bot added raster Related to raster data processing C Related code is in C module labels Mar 16, 2026
static char *make_ipol_list(void);
static char *make_ipol_desc(void);


Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[pre-commit] reported by reviewdog 🐶

Suggested change

Comment on lines +89 to +90
void interpolate_ram(void *full_map, void *obufptr, int cell_type,
double col_idx, double row_idx, struct Cell_head *incellhd)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[pre-commit] reported by reviewdog 🐶

Suggested change
void interpolate_ram(void *full_map, void *obufptr, int cell_type,
double col_idx, double row_idx, struct Cell_head *incellhd)
void interpolate_ram(void *full_map, void *obufptr, int cell_type,
double col_idx, double row_idx, struct Cell_head *incellhd)

}

/* Direct memory access - Thread Safe for Reads */
unsigned char *src = (unsigned char *)full_map +
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[pre-commit] reported by reviewdog 🐶

Suggested change
unsigned char *src = (unsigned char *)full_map +
unsigned char *src = (unsigned char *)full_map +

memcpy(obufptr, src, cell_size);
}


Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[pre-commit] reported by reviewdog 🐶

Suggested change

G_important_message(_("Projecting..."));
for (row = 0; row < outcellhd.rows; row++) {
/* obufptr = obuffer */;

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[pre-commit] reported by reviewdog 🐶

Suggested change


/* row index in input matrix */
double row_idx = (incellhd.north - ycoord1) / incellhd.ns_res;
if (GPJ_transform(&oproj, &iproj, &tproj, PJ_FWD, &x1, &y1, NULL) < 0) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[pre-commit] reported by reviewdog 🐶

Suggested change
if (GPJ_transform(&oproj, &iproj, &tproj, PJ_FWD, &x1, &y1, NULL) < 0) {
if (GPJ_transform(&oproj, &iproj, &tproj, PJ_FWD, &x1, &y1, NULL) <
0) {

double row_idx = (incellhd.north - ycoord1) / incellhd.ns_res;
if (GPJ_transform(&oproj, &iproj, &tproj, PJ_FWD, &x1, &y1, NULL) < 0) {
Rast_set_null_value(obufptr, 1, cell_type);
} else {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[pre-commit] reported by reviewdog 🐶

Suggested change
} else {
}
else {

interpolate(ibuffer, obufptr, cell_type, col_idx, row_idx,
&incellhd);
/* CALL OUR LOCK-FREE RAM INTERPOLATOR */
interpolate_ram(full_map_array, obufptr, cell_type, c_idx, r_idx, &incellhd);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[pre-commit] reported by reviewdog 🐶

Suggested change
interpolate_ram(full_map_array, obufptr, cell_type, c_idx, r_idx, &incellhd);
interpolate_ram(full_map_array, obufptr, cell_type, c_idx,
r_idx, &incellhd);


xcoord2 = outcellhd.west + (outcellhd.ew_res / 2);
ycoord2 -= outcellhd.ns_res;
#pragma omp critical
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[pre-commit] reported by reviewdog 🐶

Suggested change
#pragma omp critical
#pragma omp critical


return buf;
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[pre-commit] reported by reviewdog 🐶

Suggested change

@petrasovaa
Copy link
Copy Markdown
Contributor

Thanks! Reading the whole raster into memory goes against how GRASS handles processing, so while I understand this is a proof of concept, I would like to see a plan how to handle that.

@krcoder123 krcoder123 closed this Mar 17, 2026
@krcoder123
Copy link
Copy Markdown
Author

Hi Anna, thank you for the feedback. You're right that loading the full raster into RAM conflicts with GRASS's tile based memory model. This POC was mainly a first step to prove the projection math is parallelizable before tackling the harder cache problem.

For the full GSoC proposal, my plan is a two path approach: if the map fits within the user's memory limit, use the fast lock-free RAM buffer. If it doesn't, fall back to thread-local tile caches, where each thread maintains its own independent cache, removing lock contention while preserving GRASS's ability to process maps larger than RAM.

Does that direction align with what you'd want to see in the proposal?

@krcoder123 krcoder123 reopened this Mar 17, 2026
@petrasovaa
Copy link
Copy Markdown
Contributor

Hi Anna, thank you for the feedback. You're right that loading the full raster into RAM conflicts with GRASS's tile based memory model. This POC was mainly a first step to prove the projection math is parallelizable before tackling the harder cache problem.

For the full GSoC proposal, my plan is a two path approach: if the map fits within the user's memory limit, use the fast lock-free RAM buffer. If it doesn't, fall back to thread-local tile caches, where each thread maintains its own independent cache, removing lock contention while preserving GRASS's ability to process maps larger than RAM.

Does that direction align with what you'd want to see in the proposal?

Yes, just more detailed analysis.

@krcoder123
Copy link
Copy Markdown
Author

krcoder123 commented Mar 20, 2026

Yes, just more detailed analysis.

Hi Anna,

I've completed a full draft of my proposal using your feedback on the memory model. The two-path architecture is documented in detail here: https://docs.google.com/document/d/1IUOA93u2iqIp0Qix-ymChkvmn3wKTUgFkOBhWCMudrA/edit?tab=t.0
Would appreciate any additional feedback before the deadline.

Thanks!

@petrasovaa
Copy link
Copy Markdown
Contributor

@HuidaeCho would you mind having a look at the proposal?

@HuidaeCho
Copy link
Copy Markdown
Member

@HuidaeCho would you mind having a look at the proposal?

Sure, reviewing it.

@HuidaeCho
Copy link
Copy Markdown
Member

@krcoder123, see my suggestions for minor edits and comments in the proposal.

@metzm
Copy link
Copy Markdown
Contributor

metzm commented Mar 22, 2026

Related, maybe helpfull: the GDAL warper has a NUM_THREADS option available in GDALWarpOptions. Even though GDAL uses a different caching method for input data, the GDAL implementation of using multiple threads for raster reprojection could provide some ideas about how to implement multi-threading in GRASS r.proj.

@krcoder123
Copy link
Copy Markdown
Author

@krcoder123, see my suggestions for minor edits and comments in the proposal.

Thank you for the detailed review. I've updated the proposal to address your feedback. I clarified the motivation for the two-path design, added the user controlled memory threshold with megabyte and percentage options, and added benchmarking both paths as an early task in the timeline. I also removed all the "GIS" references and fixed the grammar suggestions. Please let me know if anything else needs addressing.

@krcoder123
Copy link
Copy Markdown
Author

Related, maybe helpfull: the GDAL warper has a NUM_THREADS option available in GDALWarpOptions. Even though GDAL uses a different caching method for input data, the GDAL implementation of using multiple threads for raster reprojection could provide some ideas about how to implement multi-threading in GRASS r.proj.

Thank you for the pointer, I found this pretty helpful. Seeing how GDAL's NUM_THREADS handles thread-safe data access through independent chunks showed that the horizontal row-strip approach I'm going with for Path B is the right direction. The tricky part in GRASS is that readcell was never designed for this, so Path B needs each thread to have its own cache instance instead of sharing one. I've added the GDAL warper as a reference in the proposal.

@krcoder123
Copy link
Copy Markdown
Author

Hi @HuidaeCho

I've updated the proposal and expanded the scope beyond just r.proj. After digging into the r.fill.stats source code, I found that row-level parallelism is blocked by a sliding ring buffer that has to advance sequentially. However, the column loop inside interpolate_row() is completely independent per cell, which makes it a clean parallelization target. r.fill.stats is now confirmed as the secondary module. I'll pick an additional module during the bonding period after auditing the remaining candidates.

Please let me know if there are any changes I should make before the deadline. Thank you!

https://docs.google.com/document/d/1IUOA93u2iqIp0Qix-ymChkvmn3wKTUgFkOBhWCMudrA/edit?tab=t.0

@krcoder123
Copy link
Copy Markdown
Author

@krcoder123, see my suggestions for minor edits and comments in the proposal.

Hi @HuidaeCho,

I've updated the proposal significantly since your last review. After further source audits I found that the previously proposed secondary module had sequential dependencies that would limit speedup, so I changed to r.param.scale instead. I audited process.c, identified the sequential sliding buffer as the core blocker, and applied the same RAM preload pattern from r.proj. Draft PR #7236 shows 1.7x speedup on a 100M cell raster. r.geomorphon is planned as the third module with a source audit during the bonding period. Would you be able to take a quick look at the updated proposal before the deadline?
https://docs.google.com/document/d/1IUOA93u2iqIp0Qix-ymChkvmn3wKTUgFkOBhWCMudrA/edit?tab=t.0

Thanks,
Kaushik

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

C Related code is in C module raster Related to raster data processing

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants