Skip to content

Perf: parallelize count_pw_st with OpenMP collapse(2)#7438

Open
MiniYuanBot wants to merge 1 commit into
deepmodeling:developfrom
mystic-qaq:feat/openmp-collapse-for-loop
Open

Perf: parallelize count_pw_st with OpenMP collapse(2)#7438
MiniYuanBot wants to merge 1 commit into
deepmodeling:developfrom
mystic-qaq:feat/openmp-collapse-for-loop

Conversation

@MiniYuanBot
Copy link
Copy Markdown

@MiniYuanBot MiniYuanBot commented Jun 5, 2026

What's changed?

  • source/source_basis/module_pw/pw_distributeg.cpp (count_pw_st):

    • Added OpenMP parallel for collapse(2) to the (ix, iy) double loop for plane-wave stick enumeration
    • Added reduction(+: npwtot_local, nstot_local) for accumulation of total plane-wave and stick counts
    • Added reduction(min/max: ...) for boundary coordinate tracking (lix, rix, liy, riy)
    • This change accelerates the PW initialization stage, which becomes a bottleneck for large FFT grids
  • Performance impact (tested on Intel Core i7, GCC 13.3.0, -O3 -fopenmp, grid=256×256×256, repeats=10):

    • Benchmark focuses on the modified count_pw_st function, which is the hotspot in PW initialization for large grids.
Threads Total (ms) Avg (ms) Speedup Efficiency
1 12363.61 1236.36 1.00 100.0%
2 6111.27 611.13 2.02 101.2%
4 3234.23 323.42 3.82 95.6%
8 2105.93 210.59 5.87 73.4%
12 1851.56 185.16 6.68 55.6%
  • Near-linear scaling up to 4 threads (efficiency >95%)

  • 8-thread efficiency drops to ~73% due to memory bandwidth saturation

  • 12-thread marginal gain diminishes, consistent with SMT overhead on consumer-grade platforms

  • Behavior changes: None. The serial code path is preserved when _OPENMP is undefined. All existing MODULE_PW_* unit tests (12/12) continue to pass.

@MiniYuanBot
Copy link
Copy Markdown
Author

MiniYuanBot commented Jun 5, 2026

\label project_learning
This is Problem 1 of the assignment01 on the plane wave module.
Thanks for the review: )

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants