Skip to content

Comments

Poisson_100 and poisson_110 with Poisson Boundary Condition Tests#266

Merged
kaanolgu merged 34 commits intomainfrom
ko/poisson-100-11x
Feb 20, 2026
Merged

Poisson_100 and poisson_110 with Poisson Boundary Condition Tests#266
kaanolgu merged 34 commits intomainfrom
ko/poisson-100-11x

Conversation

@kaanolgu
Copy link
Collaborator

@kaanolgu kaanolgu commented Jan 16, 2026

Description

This PR adds support for the poisson_100 (non-periodic in X, periodic in Y/Z) and poisson_110 (non-periodic in X and Y, periodic in Z) boundary condition cases, along with a 3D test using cos(nπx), cos(nπx)cos(nπy), and cos(nπx)cos(nπy)cos(nπz) where n = 2, 3 analytical solutions to validate both the Poisson solve and the div(grad()) operator.


What's Added

  • poisson_100 via transpose reuse of poisson_010: Rather than implementing a new spectral post-processing path, the 100 case transposes the input field from (nx, ny, nz) to (ny, nx, nz) on the GPU before the FFT, runs it through the existing poisson_010 kernels (process_spectral_010_fw/bw, enforce/undo_periodicity_y), then transposes back. This avoids duplicating post-processing logic entirely.

  • cuFFTMp R2C setup for poisson_100: The R2C plan is configured over the transposed domain so cuFFT sees the Dirichlet direction as the leading dimension. The output buffer is sized (ny+2, nx, nz) using standard Hermitian padding. After the transform, spectral post-processing and Poisson solve happen in-place, followed by C2R and the back-transpose.

  • New GPU kernels in spectral_processing.f90:

    • memcpy3D_with_transpose / memcpy3D_with_transpose_back — lightweight device kernels to swap X/Y layout for the 100 case.
    • process_spectral_110 — full Poisson pipeline for Dirichlet X+Y, periodic Z (normalisation, forward/backward Z post-processing, forward/backward Y post-processing, eigenvalue division).
    • enforce_periodicity_x / undo_periodicity_x — even-extension and its reversal along X, mirroring the existing Y-direction equivalents.
  • OpenMP backend: Stub error messages added for 100 and 110 cases indicating these are not yet implemented on CPU.

  • Test suite: New unified test_poisson_bc.f90 replacing the old case structure, covering 000, 010, 100, and 110 with analytical reference checks. CMakeLists.txt updated with new CTest targets and input_000/input_110 namelists added.


Notes

  • x_sp_st is passed to process_spectral_110 but currently unused — flagged for cleanup.
  • A VERBOSE compile flag controls debug output in the test.

@kaanolgu kaanolgu requested a review from ia267 January 16, 2026 15:23
@kaanolgu kaanolgu changed the title Poisson_100 with cos2pix 1D test case poisson_100 and poisson_110 with cos2pix test case Feb 9, 2026
@kaanolgu kaanolgu changed the title poisson_100 and poisson_110 with cos2pix test case Poisson_100 and poisson_110 with cos2pix 1D test case Feb 9, 2026
@kaanolgu kaanolgu changed the title Poisson_100 and poisson_110 with cos2pix 1D test case Poisson_100 and poisson_110 with cos2pix 3D test case Feb 9, 2026
@ia267 ia267 added the core Issue affecting core mechanisms of the software label Feb 9, 2026
@ia267 ia267 added this to the Non-periodic boundary conditions milestone Feb 9, 2026
Copy link
Collaborator

@ia267 ia267 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • Most comments are for me understand what's happening
  • There are some places with incorrect comments that needs fixing
  • The test structure is more suited to tests directory instead of case - this will need refactoring
  • This PR is implementing core functionality and there are lots of changes here, it will be useful for others to know details of what has been implemented, please provide detailed description in this PR

@kaanolgu
Copy link
Collaborator Author

$ mpirun -n 1 ./build-gpu/tests/bin/test_poisson_bc_cuda_1
 Parallel run with            1 ranks

  === Config 000 (all periodic)
    Grid: 64 x 64 x 64
    BC: x=[periodic,periodic] y=[periodic,periodic] z=[periodic
    ,periodic]
 Domain decomposition by x3d2 (generic)
    Running: COS_X  n = 2
      Poisson L2:   1.8463E-14  PASS
      DivGrad L2:   8.2521E-17  PASS
    Running: COS_Y  n = 2
      Poisson L2:   1.8463E-14  PASS
      DivGrad L2:   5.4572E-17  PASS
    Running: COS_XY  n = 2
      Poisson L2:   3.2582E-14  PASS
      DivGrad L2:   5.1350E-17  PASS
    Running: COS_XYZ  n = 2
      Poisson L2:   3.5844E-14  PASS
      DivGrad L2:   3.0281E-17  PASS
    Running: COS_X  n = 3
      Poisson L2:   2.5073E-05  FAIL
      DivGrad L2:   5.2368E-17  PASS
    Running: COS_Y  n = 3
      Poisson L2:   2.5073E-05  FAIL
      DivGrad L2:   5.6475E-17  PASS
    Running: COS_XY  n = 3
      Poisson L2:   1.1285E-05  FAIL
      DivGrad L2:   4.7943E-07  FAIL
    Running: COS_XYZ  n = 3
      Poisson L2:   7.2923E-06  FAIL
      DivGrad L2:   5.8709E-07  FAIL

  === Config 010 (y-dirichlet)
    Grid: 64 x 65 x 64
    BC: x=[periodic,periodic] y=[dirichlet,dirichlet] z=[periodic
    ,periodic]
 Domain decomposition by x3d2 (generic)
    Running: COS_X  n = 2
      Poisson L2:   1.8463E-14  PASS
      DivGrad L2:   8.1900E-17  PASS
    Running: COS_Y  n = 2
      Poisson L2:   1.8463E-14  PASS
      DivGrad L2:   1.2465E-16  PASS
    Running: COS_XY  n = 2
      Poisson L2:   3.2582E-14  PASS
      DivGrad L2:   5.4966E-17  PASS
    Running: COS_XYZ  n = 2
      Poisson L2:   3.5844E-14  PASS
      DivGrad L2:   3.4459E-17  PASS
    Running: COS_X  n = 3
      Poisson L2:   2.5073E-05  FAIL
      DivGrad L2:   5.2074E-17  PASS
    Running: COS_Y  n = 3
      Poisson L2:   9.3438E-14  PASS
      DivGrad L2:   4.5506E-17  PASS
    Running: COS_XY  n = 3
      Poisson L2:   1.0480E-05  FAIL
      DivGrad L2:   2.9174E-17  PASS
    Running: COS_XYZ  n = 3
      Poisson L2:   7.1984E-06  FAIL
      DivGrad L2:   3.3901E-07  FAIL

  === Config 100 (x-dirichlet)
    Grid: 65 x 64 x 64
    BC: x=[dirichlet,dirichlet] y=[periodic,periodic] z=[periodic
    ,periodic]
 Domain decomposition by x3d2 (generic)
    Running: COS_X  n = 2
      Poisson L2:   1.8463E-14  PASS
      DivGrad L2:   1.3330E-16  PASS
    Running: COS_Y  n = 2
      Poisson L2:   1.8463E-14  PASS
      DivGrad L2:   6.4270E-17  PASS
    Running: COS_XY  n = 2
      Poisson L2:   3.2582E-14  PASS
      DivGrad L2:   5.5889E-17  PASS
    Running: COS_XYZ  n = 2
      Poisson L2:   3.5844E-14  PASS
      DivGrad L2:   3.4497E-17  PASS
    Running: COS_X  n = 3
      Poisson L2:   9.3438E-14  PASS
      DivGrad L2:   4.8336E-17  PASS
    Running: COS_Y  n = 3
      Poisson L2:   2.5073E-05  FAIL
      DivGrad L2:   4.5540E-17  PASS
    Running: COS_XY  n = 3
      Poisson L2:   1.0480E-05  FAIL
      DivGrad L2:   2.9176E-17  PASS
    Running: COS_XYZ  n = 3
      Poisson L2:   7.1984E-06  FAIL
      DivGrad L2:   3.3901E-07  FAIL

  === Config 110 (x,y-dirichlet)
    Grid: 65 x 65 x 64
    BC: x=[dirichlet,dirichlet] y=[dirichlet,dirichlet] z=[periodic
    ,periodic]
 Domain decomposition by x3d2 (generic)
    Running: COS_X  n = 2
      Poisson L2:   1.8463E-14  PASS
      DivGrad L2:   1.2223E-16  PASS
    Running: COS_Y  n = 2
      Poisson L2:   1.8463E-14  PASS
      DivGrad L2:   1.2473E-16  PASS
    Running: COS_XY  n = 2
      Poisson L2:   3.2582E-14  PASS
      DivGrad L2:   6.8235E-17  PASS
    Running: COS_XYZ  n = 2
      Poisson L2:   3.5844E-14  PASS
      DivGrad L2:   4.0814E-17  PASS
    Running: COS_X  n = 3
      Poisson L2:   9.3438E-14  PASS
      DivGrad L2:   1.8311E-16  PASS
    Running: COS_Y  n = 3
      Poisson L2:   9.3438E-14  PASS
      DivGrad L2:   4.5527E-17  PASS
    Running: COS_XY  n = 3
      Poisson L2:   1.6476E-13  PASS
      DivGrad L2:   3.2933E-17  PASS
    Running: COS_XYZ  n = 3
      Poisson L2:   6.6308E-06  FAIL
      DivGrad L2:   1.7309E-17  PASS

  =======================================================
                     GRAND SUMMARY
  =======================================================

   Conf  Type        n   Poisson L2    DivGrad L2    Result  Expected  

    000  COS_X        2    1.8463E-14    8.2521E-17  PASS    PASS    OK
    000  COS_Y        2    1.8463E-14    5.4572E-17  PASS    PASS    OK
    000  COS_XY       2    3.2582E-14    5.1350E-17  PASS    PASS    OK
    000  COS_XYZ      2    3.5844E-14    3.0281E-17  PASS    PASS    OK
    000  COS_X        3    2.5073E-05    5.2368E-17  FAIL    FAIL    OK
    000  COS_Y        3    2.5073E-05    5.6475E-17  FAIL    FAIL    OK
    000  COS_XY       3    1.1285E-05    4.7943E-07  FAIL    FAIL    OK
    000  COS_XYZ      3    7.2923E-06    5.8709E-07  FAIL    FAIL    OK

    010  COS_X        2    1.8463E-14    8.1900E-17  PASS    PASS    OK
    010  COS_Y        2    1.8463E-14    1.2465E-16  PASS    PASS    OK
    010  COS_XY       2    3.2582E-14    5.4966E-17  PASS    PASS    OK
    010  COS_XYZ      2    3.5844E-14    3.4459E-17  PASS    PASS    OK
    010  COS_X        3    2.5073E-05    5.2074E-17  FAIL    FAIL    OK
    010  COS_Y        3    9.3438E-14    4.5506E-17  PASS    PASS    OK
    010  COS_XY       3    1.0480E-05    2.9174E-17  FAIL    FAIL    OK
    010  COS_XYZ      3    7.1984E-06    3.3901E-07  FAIL    FAIL    OK

    100  COS_X        2    1.8463E-14    1.3330E-16  PASS    PASS    OK
    100  COS_Y        2    1.8463E-14    6.4270E-17  PASS    PASS    OK
    100  COS_XY       2    3.2582E-14    5.5889E-17  PASS    PASS    OK
    100  COS_XYZ      2    3.5844E-14    3.4497E-17  PASS    PASS    OK
    100  COS_X        3    9.3438E-14    4.8336E-17  PASS    PASS    OK
    100  COS_Y        3    2.5073E-05    4.5540E-17  FAIL    FAIL    OK
    100  COS_XY       3    1.0480E-05    2.9176E-17  FAIL    FAIL    OK
    100  COS_XYZ      3    7.1984E-06    3.3901E-07  FAIL    FAIL    OK

    110  COS_X        2    1.8463E-14    1.2223E-16  PASS    PASS    OK
    110  COS_Y        2    1.8463E-14    1.2473E-16  PASS    PASS    OK
    110  COS_XY       2    3.2582E-14    6.8235E-17  PASS    PASS    OK
    110  COS_XYZ      2    3.5844E-14    4.0814E-17  PASS    PASS    OK
    110  COS_X        3    9.3438E-14    1.8311E-16  PASS    PASS    OK
    110  COS_Y        3    9.3438E-14    4.5527E-17  PASS    PASS    OK
    110  COS_XY       3    1.6476E-13    3.2933E-17  PASS    PASS    OK
    110  COS_XYZ      3    6.6308E-06    1.7309E-17  FAIL    FAIL    OK

ALL TESTS PASSED SUCCESSFULLY.

@ia267 ia267 force-pushed the ko/poisson-100-11x branch from 2220850 to dc2e994 Compare February 19, 2026 11:07
kaanolgu and others added 3 commits February 19, 2026 13:59
- use a separate device array for real input and complex output
- odd dimension handling in the periodicity kernels
- zero padded_dev and f_out_dev for cufft paths
@ia267
Copy link
Collaborator

ia267 commented Feb 20, 2026

  • made changes in a1876f4 to fix cufft path which was causing test_poisson_bc failures

  • separate r_dev and c_dev device arrays - now each buffer has exactly the right size and type

  • using 2*(nx_loc/2+1) instead of nx_loc + 2 - important for odd nx_loc. For even nx, nx+2 = 2*(nx/2+1) (i.e. both expressions are identical). For odd e.g. nx = 65, nx_loc+2 will give 67 but the correct value is 2*(32+1) = 66. If we use 67 it over-indexes the xtdesc allocation and could lead to read/write one element past the end of the buffer

  • odd dimension handling included in periodicity kernels - for enforce_periodicity_x the old approach with nx=65 will try to read f_in(i=66..68, ..) , i.e. out of bounds. For undo_periodicity_x, position nx=65 was never written. The new kernel with mod(nx, 2) == 1 handles both cases.

  • For cuFFT path zeroed both padded_dev and f_out_dev - this seemed to be necessary - there would be a performance hit but it's limited to cufft - not ideal and we will need full deterministic kernels

@ia267 ia267 self-requested a review February 20, 2026 01:32
Copy link
Collaborator

@ia267 ia267 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Implementation of non-periodic bcs look OK

cuFFT path gave issues for test_poisson_bc.f90 test case. These have been resolved, but there are some changes that for the cuFFT path specifically that might result in reduced performance (for e.g. f_out_dev zero will cost a full-field GPU zero per Poisson solve call).

@kaanolgu kaanolgu changed the title Poisson_100 and poisson_110 with cos2pix 3D test case Poisson_100 and poisson_110 with Poisson Boundary Condition Tests Feb 20, 2026
@kaanolgu
Copy link
Collaborator Author

kaanolgu commented Feb 20, 2026

LGTM (All tests pass both for cufft and cufftmp) – the nx_loc calculation formula change increased the precision of poisson l2 norm check result from 1.066806E-14 to 1.8463E-14

@kaanolgu kaanolgu merged commit f2489f2 into main Feb 20, 2026
2 checks passed
@kaanolgu kaanolgu deleted the ko/poisson-100-11x branch February 20, 2026 11:48
@ia267
Copy link
Collaborator

ia267 commented Feb 20, 2026

Note that this PR (in a1876f4) fixes #279 - it now uses correct leading dimension 2*(self%ny_loc/2+1)

@CFD-Xing
Copy link
Contributor

This cause a failure on CUDA: https://gitlab.ae.ic.ac.uk/xcompact3d/x3d2/-/pipelines

@kaanolgu
Copy link
Collaborator Author

kaanolgu commented Feb 20, 2026

This cause a failure on CUDA: https://gitlab.ae.ic.ac.uk/xcompact3d/x3d2/-/pipelines

This cause a failure on CUDA: https://gitlab.ae.ic.ac.uk/xcompact3d/x3d2/-/pipelines

25/33 Test #25: test_poisson_bc_cuda_1 ...........***Failed    3.52 sec
--------------------------------------------------------------------------
WARNING: Open MPI tried to bind a process but failed.  This is a
warning only; your job will continue, though performance may
be degraded.
  Local host:        runner-ctfebq3kk-project-11-concurrent-0
  Application name:  bin/test_poisson_bc_cuda_1
  Error message:     failed to bind memory
  Location:          ../../../../../orte/mca/rtc/hwloc/rtc_hwloc.c:447
--------------------------------------------------------------------------
 Parallel run with            1 ranks
  === Config 000 (all periodic)
    Grid: 64 x 64 x 64
    BC: x=[periodic,periodic] y=[periodic,periodic] z=[periodic
    ,periodic]
 Domain decomposition by x3d2 (generic)
WARN: NCCL library not found...
WARN: init failed for remote transport: ibrc
 Using cuFFTMp for FFT
    Running: COS_X  n = 2
0: Null pointer for tmp$r (/builds/xcompact3d/x3d2/src/backend/cuda/backend.f90: 491)

This null pointer for tmp looks like the same problem @ia267 also experiencing with the Debug build which is reported by @ia267 to NVIDIA on their forums. We might need to temporarily disable the Debug build

Edit: The flag is also going to be removed anyway from another topic in the official forums

@ia267
Copy link
Collaborator

ia267 commented Feb 20, 2026

I have seen this issue sometimes when running in debug build.

This is the my forum post: https://forums.developer.nvidia.com/t/cuda-fortran-null-device-pointer-after-intrinsic-assignment-of-derived-type/360540/2

I haven't been able to create a small example that'll reproduce this. Could potentially be an issue with -Mchkptr compiler flag (there are couple of forum questions on this)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

core Issue affecting core mechanisms of the software

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants