(Closes #3157) Initial implementation of maximal parallel region trans. #3205

LonelyCat124 · 2025-10-31T12:12:13Z

No description provided.

codecov · 2025-10-31T12:23:45Z

Codecov Report

❌ Patch coverage is 93.67089% with 5 lines in your changes missing coverage. Please review.
✅ Project coverage is 99.90%. Comparing base (1144737) to head (d76e988).

Files with missing lines	Patch %	Lines
...r/transformations/maximal_parallel_region_trans.py	92.30%	5 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##           master    #3205      +/-   ##
==========================================
- Coverage   99.91%   99.90%   -0.01%     
==========================================
  Files         375      377       +2     
  Lines       53439    53518      +79     
==========================================
+ Hits        53394    53468      +74     
- Misses         45       50       +5

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

LonelyCat124 · 2025-11-03T10:20:57Z

@MetBenjaminWent Chris said you had some cases worth trying this with as functionality tests?

LonelyCat124 · 2025-11-21T15:17:45Z

The examples I've been give by to look at are:
https://code.metoffice.gov.uk/trac/lfric_apps/browser/main/trunk/science/physics_schemes/source/boundary_layer/bdy_impl3.F90 (simpler one)
https://code.metoffice.gov.uk/trac/lfric_apps/browser/main/trunk/science/physics_schemes/source/boundary_layer/ex_coef.F90 (more complex one)

LonelyCat124 · 2025-11-24T15:15:22Z

Made a bit more progress with this now - there was definitely some missing logic for bdy_impl3 to work.

One thing that is apparent that applying things in this way means we result in barriers that live outside parallel regions, which should be purged by OMPMinimiseSyncTrans, but currently aren't.

@sergisiso @arporter Am I ok to make that change as part of this PR?

…y on applying inside if/loop nodes

LonelyCat124 · 2025-11-26T13:42:13Z

I fixed the previous issue, and this has raised some other challenges with the cases I receieved from MO.

OpenMP sentinel isn't kept, and we don't seem to have an option to PSyclone to keep statements with the OpenMP sentinel, which is a problem for some pre-existing files that have code that uses the OpenMP sentinel for conditional compilation. @hiker I think you've mentioned this previously - do we have any branch of PSyclone that handles this at all? I know its in fparser so the frontend can presumably do something with it in PSyclone.
Assignments are currently always excluded from parallel regions, but this is not a good idea. For example we could have a statement such as x = y / omp_get_thread_num() which would have to be inside a parallel region. I was wondering about just having scalar assignments as valid to be inside parallel regions, but this is not so straightforward and I don't have a good answer to how to do this - perhaps scalar assignments to local variables are allowed and others aren't? I'm sure there are still some edge cases here though.
We can't handle some loop structures, e.g. we don't parallelise over jj here:

omp_block = tdims%j_end
!$ omp_block = ceiling(tdims%j_end/real(omp_get_num_threads()))

!$OMP do SCHEDULE(STATIC)
do jj = tdims%j_start, tdims%j_end, omp_block
  do k = blm1, 2, -1
    l = 0
    do j = jj, min(jj+omp_block-1, tdims%j_end)
      do i = tdims%i_start, tdims%i_end
        r_sq = r_rho_levels(i,j,k)*r_rho_levels(i,j,k)
        rr_sq = r_rho_levels(i,j,k+1)*r_rho_levels(i,j,k+1)
        dqw(i,j,k) = (-dtrdz_charney_grid(i,j,k) * (rr_sq * fqw(i,j,k + 1) - r_sq * fqw(i,j,k)) + dqw_nt(i,j,k)) * gamma2(i,j)
...

I assume we just need to use force=True for this loop, but in theory we could evaluate the i,j,k indices and find that they are guaranteed non-overlapping (i think? since we step by omp_block at jj, and j is thus independent) but its not straightforward. I can't remember if we already have an issue about this kinda of analysis? @hiker @sergisiso

sergisiso · 2025-11-26T15:43:39Z

Regarding 3. This is what @mn416 #3213 could solve, it may be worth trying with his dep_analysis

LonelyCat124 · 2025-11-26T15:52:03Z

Yeah, I'd just been looking at the PR earlier today after I looked at this - I guess its still a while until we'd have it ready, but if it could handle cases like this it would definitely help (assuming the rest is otherwise ok).

mn416 · 2025-11-26T16:01:26Z

I copied this chunk of code into a minimally compiling module:

module example_module
  implicit none

  type :: Dims
     integer :: j_start, j_end, i_start, i_end
  end type

contains

  subroutine sub(tdims, r_rho_levels, dtrdz_charney_grid, fqw, dqw, dqw_nt, gamma2, blm1, omp_block)
    type(Dims), intent(inout) :: tdims
    integer, intent(inout) :: r_rho_levels(:,:,:)
    integer, intent(inout) :: dtrdz_charney_grid(:,:,:)
    integer, intent(inout) :: fqw(:,:,:)
    integer, intent(inout) :: dqw(:,:,:)
    integer, intent(inout) :: dqw_nt(:,:,:)
    integer, intent(inout) :: gamma2(:,:)
    integer, intent(inout) :: blm1, omp_block
    integer :: jj, k, r_sq, rr_sq, l, i, j

    do jj = tdims%j_start, tdims%j_end, omp_block
      do k = blm1, 2, -1
        l = 0
        do j = jj, min(jj+omp_block-1, tdims%j_end)
          do i = tdims%i_start, tdims%i_end
            r_sq = r_rho_levels(i,j,k)*r_rho_levels(i,j,k)
            rr_sq = r_rho_levels(i,j,k+1)*r_rho_levels(i,j,k+1)
            dqw(i,j,k) = (-dtrdz_charney_grid(i,j,k) * (rr_sq * fqw(i,j,k + 1) - r_sq * fqw(i,j,k)) + dqw_nt(i,j,k)) * gamma2(i,j)
          end do
        end do
      end do
    end do

  end subroutine
end module

The analysis gives the following output:

Routine sub: 
  Loop jj: conflict free
  Loop k: conflict free
  Loop j: conflict free
  Loop i: conflict free

Interestingly, it does take a few seconds to prove that the jj loop is conflict free.

LonelyCat124 · 2025-12-10T14:00:05Z

I used force=True for the jj loops in the bdy_impl3.F90 file, and I get only 3 parallel regions.

There are 2 things left to potentially look at.

How to determine whether an Assignment is safe to go inside a parallel region - or if we should worry about that right now (and if we can safely determine it and that the assigned variable is private). There are some slightly naive rules I could make (e.g. scalar assignments to local variables are fine), but its difficult to determine if this is actually true. I can't even rely on if they're read-only, as it could need to be read from some loc that ends up outside the parallel region, and since the parallel region is built up manually I'm not sure how to solve this. Any ideas? @sergisiso @hiker
The common pattern of omp end do nowait into omp barrier. I end up (after barrier reduction) with some of these, and I think it might be slightly neater to just convert this pattern into simple omp end do, but it shouldn't move the needle in a relevant way w.r.t performance, so I'm happy to leave this until later.

Initial implementation of maximal parallel region trans. Tests TODO

172c205

LonelyCat124 changed the title ~~Initial implementation of maximal parallel region trans. Tests TODO~~ (Closes #3157) Initial implementation of maximal parallel region trans. Oct 31, 2025

LonelyCat124 added 2 commits October 31, 2025 15:32

Tests for base MaximalParallelRegionTrans class

8b19ef0

Missed __init__ updates

38ef3e6

Merge branch 'master' into 3157_parallel_routine_trans

586c51a

Changes to fix unnecessary parallel additions and to recurse correctl…

1b86da5

…y on applying inside if/loop nodes

LonelyCat124 added 2 commits November 26, 2025 13:49

Fixed some of the behaviour to improve barrier removal

f3fa2a3

linting

cbdde6d

Update the test to handle the new required node structure

0617319

Merge branch 'master' into 3157_parallel_routine_trans

d76e988

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

(Closes #3157) Initial implementation of maximal parallel region trans. #3205

(Closes #3157) Initial implementation of maximal parallel region trans. #3205

LonelyCat124 commented Oct 31, 2025

Uh oh!

codecov bot commented Oct 31, 2025 •

edited

Loading

Uh oh!

LonelyCat124 commented Nov 3, 2025

Uh oh!

LonelyCat124 commented Nov 21, 2025

Uh oh!

LonelyCat124 commented Nov 24, 2025

Uh oh!

LonelyCat124 commented Nov 26, 2025

Uh oh!

sergisiso commented Nov 26, 2025

Uh oh!

LonelyCat124 commented Nov 26, 2025

Uh oh!

mn416 commented Nov 26, 2025

Uh oh!

LonelyCat124 commented Dec 10, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

(Closes #3157) Initial implementation of maximal parallel region trans. #3205

Are you sure you want to change the base?

(Closes #3157) Initial implementation of maximal parallel region trans. #3205

Conversation

LonelyCat124 commented Oct 31, 2025

Uh oh!

codecov bot commented Oct 31, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

LonelyCat124 commented Nov 3, 2025

Uh oh!

LonelyCat124 commented Nov 21, 2025

Uh oh!

LonelyCat124 commented Nov 24, 2025

Uh oh!

LonelyCat124 commented Nov 26, 2025

Uh oh!

sergisiso commented Nov 26, 2025

Uh oh!

LonelyCat124 commented Nov 26, 2025

Uh oh!

mn416 commented Nov 26, 2025

Uh oh!

LonelyCat124 commented Dec 10, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

codecov bot commented Oct 31, 2025 •

edited

Loading