quad_utils_mod search: replace 3D scratch arrays with compressed sparse row format by hkershaw-brown · Pull Request #1102 · NCAR/DART

hkershaw-brown · 2026-04-27T16:26:09Z

Description:

Quad_utils_mod box search for fully irregular grids:

Determine the search storage requirement at run time from the grid. Currently there are 3 hardcoded cases, leading to people adding "big" numbers to the module to get their code running vs erroring out. The hardcoded cases I believe are based on the POP workhorse and high res grids ~2015 (looking at POP/model_mod.f90 blame)
Replaces the quad_utils_mod search storage (3D arrays) for fully irregular grids with a compressed sparse row format.

The runtime calculation of storage requirements is needed in particular for regional CESM where people are creating their own regional grids, but is applicable to any model using quad_utils.
The per-code memory is quite high for the box storage (see #979 & https://github.com/hkershaw-brown/quad_search_perf/blob/main/README.md). This pull request does not address distributing memory for the search storage (may be needed later), but does reduce the per core memory.

Compressed Sparse Row format:
The GRID_QUAD_FULLY_IRREGULAR coarse-index search-info array setup previously allocated reg_list_lon/lat(nrx, nry, max_reg_list_num) as a 3D temporary, reaching ~5.5 GB at nrx=nry=900. It also errored out if any coarse box accumulated more than max_reg_list_num=800 candidate quads. Noted in issue #979

Replace with a two-pass compressed sparse row build:

Pass 1: count overlaps into grid_num; peak memory ~11 MB
Prefix sum: derive grid_start and exact total entry count
Pass 2: fill flat lists using a cursor array

Replace hardcoded num_reg with calculation based on grid
num_reg = max(10, min(nx_or_ny, int(sqrt(nx*ny / TARGET_CANDIDATES))))
where TARGET_CANDIDATES=8, giving ~8 candidate quads per coarse box regardless of grid resolution.

Remove no longer used max_reg_list_num from quad_irreg_grid_coords

Fixes issue

Fixes #979

Types of changes

Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Breaking change (fix or feature that would cause existing functionality to not work as expected)
Documentation update

Documentation changes needed?

My change requires a change to the documentation.
- I have updated the documentation accordingly.

Tests

Please describe any tests you ran to verify your changes.
Looking at brute force vs. current method vs new method: https://github.com/hkershaw-brown/quad_search_perf
I've only created global test grids.
Quad_util_developer tests bitwise, but this are fairly limited.

Currently running ROMS_rutgers case to compare memory (still in queue)

Checklist for merging

Updated changelog entry
Documentation updated
Update conf.py

Checklist for release

Merge into main
Create release from the main branch with appropriate tag
Delete feature-branch

Testing Datasets

Dataset needed for testing available upon request
Dataset download instructions included
No dataset needed

See #979 The GRID_QUAD_FULLY_IRREGULAR coarse-index build previously allocated reg_list_lon/lat(nrx, nry, max_reg_list_num) as a 3D temporary, reaching ~5.5 GB at nrx=nry=900. It also errored out if any coarse box accumulated more than max_reg_list_num=800 candidate quads. Replace with a two-pass compressed sparse row build: - Pass 1: count overlaps into grid_num; peak memory ~11 MB - Prefix sum: derive grid_start and exact total entry count - Pass 2: fill flat lists using a cursor array; no cap, no abort Replace hardcoded num_reg with calculation based on grid num_reg = max(10, min(nx_or_ny, int(sqrt(nx*ny / TARGET_CANDIDATES)))) where TARGET_CANDIDATES=8, giving ~8 candidate quads per coarse box regardless of grid resolution. Remove no longer used max_reg_list_num from quad_irreg_grid_coords

hkershaw-brown · 2026-04-27T20:50:00Z

Q. global vs. regional.
if you initialize the interpolation as global, but the grid is regional how does this affect things?

wrap where there is no wrap. (mad wrapping of region around globe)
global, sparse -> slow search
global, not sparse, too many points in a box. -> error out or slow search

How bad is it to have mad-wrap? <-- -->

DART/models/utilities/quad_utils_mod.f90

Lines 764 to 769 in acf0a14

    
           ! for a global grid, the initial values have already been set in 
        
           ! the derived type.  otherwise, find the min/max of lons and lats. 
        
           if (.not. h%opt%global_grid) then 
        
              h%ii%min_lon = minval(h%ii%lons_2d) 
        
              h%ii%max_lon = maxval(h%ii%lons_2d) 
        
              h%ii%lon_width = h%ii%max_lon - h%ii%min_lon  ! FIXME: wrap?

hkershaw-brown · 2026-04-28T14:00:30Z

or regional refinements in global grids:

hkershaw-brown · 2026-04-28T16:41:19Z

@mgharamti this is the quad search I am playing with. I think the main concerns are calculating the course grid from the actual grid rather than the hardcoded 3 cases, and the search structure size. But open to any input.

DART/models/utilities/quad_utils_mod.f90

Lines 730 to 751 in acf0a14

    
           !> Build the data structure for interpolation for an irregular quad grid 
        
           subroutine init_irreg_interp(h) 
        
           type(quad_interp_handle), intent(inout) :: h 
        
           character(len=*), parameter :: routine = 'init_irreg_interp' 
        
           ! Need a temporary data structure to build this. 
        
           ! These arrays keep a list of the x and y indices of dipole quads 
        
           ! that potentially overlap the regular boxes. 
        
           integer, allocatable :: reg_list_lon(:,:,:) 
        
           integer, allocatable :: reg_list_lat(:,:,:) 
        
           real(r8) :: u_c_lons(4), u_c_lats(4), pole_row_lon 
        
           integer  :: i, j, k, pindex, nx, ny, nrx, nry, istatus 
        
           integer  :: reg_lon_ind(2), reg_lat_ind(2), u_total, u_index 
        
           logical  :: cyclic, pole 
        
           integer  :: xlim 
        
           allocate(reg_list_lon(h%ii%num_reg_x, h%ii%num_reg_y, h%ii%max_reg_list_num)) 
        
           allocate(reg_list_lat(h%ii%num_reg_x, h%ii%num_reg_y, h%ii%max_reg_list_num))

mgharamti

@hkershaw-brown, thanks for the work on this. I was always on the edge with the previous style of 3 hard-coded cases but didn't know how to go around that. I think I understand your design and strategy. CSR clearly avoids working with huge 3D arrays.

My first lazy question is why the 8 targets on average? I guess how did you choose it? It seems to me that search cost per obs is ~O(num_grids) and number of bins/boxes is ~O(N/TARGET_CANDIDATES) and so it's a performance tuning parameter. At least we need to explain it in the docs if you decide to go with it.

Second, could we add a post-fill consistency check that verifies fill_pos(i,j) == grid_start(i,j) + grid_num(i,j) for every coarse bin? This would catch any future divergence between the counting pass and filling pass, especially around edge cases such as longitude wrapping, masked/invalid quads, or boundary-touching cells.

mgharamti · 2026-05-01T15:40:26Z

-endif
-
-deallocate(reg_list_lon, reg_list_lat)
+deallocate(fill_pos)


The new new approach is based on the assumption (fact?) that what we counted in the first pass is equal to what we filled in the second pass. I'd be more comfortable if we added a sanity check after filling (similar to the check in the old code). You could do something like this:

do i = 1, nrx do j = 1, nry if (fill_pos(i,j) /= h%ii%grid_start(i,j) + h%ii%grid_num(i,j)) then call error_handler(E_ERR, 'init_irreg_interp', & 'count/fill mismatch in irregular grid interpolation lists') endif enddo enddo

this is a great idea, & the check is in init so not a problem runtime wise to add.

mgharamti · 2026-05-01T15:55:59Z

         call get_quad_corners(h%ii%lons_2d, i, j, cyclic, pole, nx, ny, u_c_lons, istatus)
         if (istatus /= 0) print *, 'get_quad_corners for lons returns failure'
-
         call get_quad_corners(h%ii%lats_2d, i, j, cyclic, pole, nx, ny, u_c_lats, istatus)
         if (istatus /= 0) print *, 'get_quad_corners for lats returns failure'


Do we need to cycle here rather than just print an error? If a failure produces bad corners, both passes may count/fill bad bins or one failure path could behave differently depending on the state (right?)

I will take a closer look - I think I am missing the logic in the original code.

The loop limits are the x,y bounds of the lon and lat arrays.
Is the print just checking if the loop limits are wrong?

If you can go off the end of the arrays,
it seems like both should fail, or both should pass.

If one fails and one passes I think this should be a catastrophic error rather than a print, since lon,lat are for the same point.

hkershaw-brown · 2026-05-01T16:58:23Z

My first lazy question is why the 8 targets on average? I guess how did you choose it? It seems to me that search cost per obs is ~O(num_grids) and number of bins/boxes is ~O(N/TARGET_CANDIDATES) and so it's a performance tuning parameter. At least we need to explain it in the docs if you decide to go with it.

My lazy choice: Dr. Google for what was a good choice for this kind of algorithm.
I was looking at brute force (best memory, but the worst case timing ~50,000 slower) vs. current hard coded method (max memory, target is to go as fast as that).
#979 (comment).

I think the equivalent average target (nx*ny/num_reg*num_reg)is ~10ish for the hard coded versions for the POP grids:

grid	nx	ny	num_reg	average quads per coarse box
gx1 (3deg?)	320	384	90	15.17037037
tx0.5 0.5deg	720	480	180	10.66666667
tx0.1 0.1 deg	3600	2400	900	10.66666667

The grids are not uniform so 'average' is probably a disgusting assumption.

I think it is well worth doing some runs with the test code and different (and variable) grid resolutions to count the actual number of quads per box with the new code (and counting the per course box with the 3 hardcoded POP cases). Maybe we have 799 quads in one box, 3 in another. Currently the 8 is just an ok choice where I was finding comparable speeds to the current version, but it is definitely a magic number that needs justifying.

hkershaw-brown · 2026-05-01T17:01:09Z

+h%ii%grid_num = 0
+
 do i = 1, xlim
-   ! There's no wraparound in y, one box less than grid boundaries


this is a good comment, put back in @hkershaw-brown

actually can you use quad_utils with a periodic y domain? e.g. wrf idealized. Not at the moment I guess.

hkershaw-brown marked this pull request as draft April 28, 2026 12:41

hkershaw-brown requested a review from mgharamti April 28, 2026 16:37

mgharamti requested changes May 1, 2026

View reviewed changes

hkershaw-brown commented May 1, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

quad_utils_mod search: replace 3D scratch arrays with compressed sparse row format#1102

quad_utils_mod search: replace 3D scratch arrays with compressed sparse row format#1102
hkershaw-brown wants to merge 1 commit into
mainfrom
quad_search

hkershaw-brown commented Apr 27, 2026 •

edited

Loading

Uh oh!

hkershaw-brown commented Apr 27, 2026

Uh oh!

hkershaw-brown commented Apr 28, 2026

Uh oh!

hkershaw-brown commented Apr 28, 2026

Uh oh!

mgharamti left a comment

Uh oh!

mgharamti May 1, 2026

Uh oh!

hkershaw-brown May 1, 2026

Uh oh!

mgharamti May 1, 2026

Uh oh!

hkershaw-brown May 1, 2026

Uh oh!

hkershaw-brown commented May 1, 2026

Uh oh!

hkershaw-brown May 1, 2026

Uh oh!

hkershaw-brown May 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

hkershaw-brown commented Apr 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description:

Fixes issue

Types of changes

Documentation changes needed?

Tests

Checklist for merging

Checklist for release

Testing Datasets

Uh oh!

hkershaw-brown commented Apr 27, 2026

Uh oh!

hkershaw-brown commented Apr 28, 2026

Uh oh!

hkershaw-brown commented Apr 28, 2026

Uh oh!

mgharamti left a comment

Choose a reason for hiding this comment

Uh oh!

mgharamti May 1, 2026

Choose a reason for hiding this comment

Uh oh!

hkershaw-brown May 1, 2026

Choose a reason for hiding this comment

Uh oh!

mgharamti May 1, 2026

Choose a reason for hiding this comment

Uh oh!

hkershaw-brown May 1, 2026

Choose a reason for hiding this comment

Uh oh!

hkershaw-brown commented May 1, 2026

Uh oh!

hkershaw-brown May 1, 2026

Choose a reason for hiding this comment

Uh oh!

hkershaw-brown May 1, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

hkershaw-brown commented Apr 27, 2026 •

edited

Loading