Skip to content

Cleanup of "Set gpu tbp"#767

Merged
TysonRayJones merged 41 commits into
set_gpu_tpbfrom
gpu-tpb-cleanup
Jun 1, 2026
Merged

Cleanup of "Set gpu tbp"#767
TysonRayJones merged 41 commits into
set_gpu_tpbfrom
gpu-tpb-cleanup

Conversation

@TysonRayJones
Copy link
Copy Markdown
Member

No description provided.

These exemptions for communicator NULL-ness enable an error message to reach the user even then the user has called MPI_Init themselves but then triggered a validation error before the communicator could be set
for consistency with e.g. env.isMultithreaded. The user arg to e.g. initQuESTEnv() is kept as "userOwnsMpi" for a very superficial consistency with e.g. "useMultithreaded" :^)
by just using extern. This is terrible and inadvisable, but offers an easier understanding of the software architecture (and so is easier to fix correctly) than the previous "macros change which signatures this header exposes" design.

Also, we removed the unnecessary avoiding of defining comm_setMpiComm when SUBCOMM was not defined, which made the architecture even more confusing. Now, SUBCOMM only influences the contents of subcommunicator.cpp and subcommunicator.hpp. Simple!
although I suspect this is a poor choice of name. The logic should always be considered "compiled" when MPI is known, and we choose instead whether to "expose" the MPI signature to the users
without triggering an internal error
replacing the original internal error. Note that all of the other MPI functions between comm_config.cpp and comm_subroutines.cpp are unguarded; we should create a macro around them
- added comm_isActive to indicate whether QuEST is using MPI (which is distinct from whether MPI itself is initialised)
- renamed comm_isInit to comm_isMpiInit, since it queries MPI directly/globally, and when true, does not indicate whether QuEST is actually using MPI
- record isMpiUserOwned within comm_config.cpp, since failed-validation must not kill user-owned MPI, and it must know user-ownership before QuESTEnv succeeds/records it (because validation can fail DURING QuESTEnv initialisation)
- explicitly divided (through doc) comm_config.cpp into things which query MPI globally, and thinks which query only QuEST's MPI env/communicator
-
as found by Codex! All hail our new overlords
@TysonRayJones TysonRayJones marked this pull request as draft May 31, 2026 22:34
@TysonRayJones
Copy link
Copy Markdown
Member Author

Above has no changes; just merging (with conflict resolution) of #765, since it contains the new experimental.h

Comment on lines +304 to +308
// why the loops below are explicitly compile-time unrolled. Beware that when
// numThreadsPerBlock is increased from 128, this kernel will still behave
// correctly, but privateCache below will spill over into local memory at a
// performance penalty for NumTargs <= 5, with spillage occurring for fewer
// NumTargs as numThreadsPerBlock increases.
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@otbrown just flagging this!

(existing checks were internal errors, and incomplete)
@TysonRayJones TysonRayJones mentioned this pull request Jun 1, 2026
7 tasks
Comment thread quest/include/environment.h Outdated
Woof that's a lot of boilerplate - but at least we have the safest environment variables in the business! 😅
since it should be added in a separate PR with the other intendedly programmatically-accessible fields. I know in my heart of hearts that if I left isHipCompiled attached, the other fields would never follow hehe
@TysonRayJones TysonRayJones marked this pull request as ready for review June 1, 2026 06:24
@TysonRayJones TysonRayJones requested a review from otbrown June 1, 2026 06:24
@TysonRayJones
Copy link
Copy Markdown
Member Author

TysonRayJones commented Jun 1, 2026

@otbrown Note this diff is polluted when a merge from #765, so that I could make use of experimental.h. I wouldn't bother looking at the diff anyway, it's a doozy! Instead, I flag the main changes:

  • moved setQuESTNumGpuThreadsPerBlock() (and the getter) out of environment.cpp, and into experimental.cpp (in the future, they're appropriate to move to debug.cpp)
  • replaced the cmake QUEST_DEFAULT_NUM_THREADS_PER_BLOCK variable with an environment variable QUEST_DEFAULT_NUM_GPU_THREADS_PER_BLOCK (to be consistent with e.g. QUEST_DEFAULT_VALIDATION_EPSILON)
  • added user-input validation
  • documented setQuESTNumGpuThreadsPerBlock (and the new env-var)
  • replaced the unit tests

Let me know if that's all fine, and I'll do the merging (since I'll need to merge upstream first then solve conflicts)

@otbrown
Copy link
Copy Markdown
Collaborator

otbrown commented Jun 1, 2026

Thanks @TysonRayJones!

Sounds good to me, except the ability to "bake" this in at compile time is important and should not be lost. This value is architecture dependent, and therefore a facility maintaining a central install of QuEST may wish to set it based on some profiling that they do on their specific hardware. This can then be made available to all users by compiling it in to the library.

The ability to set it at run time was really intended purely as a convenience for the person doing the profiling (cough @JPRichings) could iterate through values while doing that profiling without having to recompile every time.

I think the environment variable is a reasonable way to do the runtime setting, but we do still need the compile time setting.

For balance I should note that we could achieve the same with just the environment variable -- we would need to either a) document to users that they should always export the env var before running QuEST, or b) export it for them in the module load. Option a) has obvious problems, but b) would be fine I think. Still, I think the preprocessor variable seems like the better place semantically to set a value for everyone.

Final note is I prefer perfTune.cpp to debug.cpp for this value and others like it since it's really about tuning for the hardware, not finding or fixing a bug! Not relevant now however.

which is now the lowest-priority default, overridden at executable launch via the environment variable, in-turn overridden at runtime using the setter
given it can now be a result of the cmake var, the env var, or the runtime setter argument
@TysonRayJones TysonRayJones merged commit 640e09a into set_gpu_tpb Jun 1, 2026
@TysonRayJones TysonRayJones deleted the gpu-tpb-cleanup branch June 1, 2026 23:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants