Skip to content

Add support for mixing Clang and GCC (as NVCC host).#2297

Merged
pcanal merged 3 commits intoceleritas-project:developfrom
pcanal:clang-vs-gcc
Mar 6, 2026
Merged

Add support for mixing Clang and GCC (as NVCC host).#2297
pcanal merged 3 commits intoceleritas-project:developfrom
pcanal:clang-vs-gcc

Conversation

@pcanal
Copy link
Copy Markdown
Contributor

@pcanal pcanal commented Mar 4, 2026

The Clang compiler and the GCC compiler have made an incompatible choice in regard to the handling of template alias used as template parameter (the standard is silent on the subject and both interpretation are 'valid' although the GCC choice seems more 'natural').

The problem arise if Celeritas is built with Clang but uses an instance of NVCC which GCC has the host compiler.

The symptoms is a failing dynamic cast on what is seemingly a compatible/correct type. However it turns out that Clang makes a valid but surprising interpretation of the C++ standard. Namely when a class template is used as a template argument or an alias to that same class template is used, Clang choose to make the result type/instance distinct.

Clang failing example: https://godbolt.org/z/xW8T1PcxW
gcc is different/works: https://godbolt.org/z/Tj1PfqWeT

The problem is further limited in scope by the fact that the Celeritas code is internally consistent (and works fully when compiled with Clang for CPU only) but mixing Clang and NVCC (with gcc underneath) leads to the inconsistency.

The solution here consist in moving any code involving directly the class template instance with the class template alias as a template parameter to the .cc side. This result in the need to have seeming 'simple' trampoline function template that are explicitly instantiated in the .cc file.

Related notes:

https://www.open-std.org/jtc1/sc22/wg21/docs/cwg_active.html#1286 https://stackoverflow.com/questions/78778437/gcc-and-clang-disagree-on-using-alias-templates-as-template-template-argument

llvm/llvm-project#72377
llvm/llvm-project#33002

The Clang compiler and the GCC compiler have made an incompatible choice
in regard to the handling of template alias used as template parameter
(the standard is silent on the subject and both interpretation are
'valid' although the GCC choice seems more 'natural').

The problem arise if Celeritas is built with Clang but uses an instance
of NVCC which GCC has the host compiler.

The symptoms is a failing dynamic cast on what is seemingly a
compatible/correct type.  However it turns out that Clang makes a valid
but surprising interpretation of the C++ standard.  Namely when a class
template is used as a template argument or an alias to that same class
template is used, Clang choose to make the result type/instance
distinct.

Clang failing example: https://godbolt.org/z/xW8T1PcxW
gcc is different/works: https://godbolt.org/z/Tj1PfqWeT

The problem is further limited in scope by the fact that the Celeritas
code is internally consistent (and works fully when compiled with Clang
for CPU only) but mixing Clang and NVCC (with gcc underneath) leads to
the inconsistency.

The solution here consist in moving any code involving directly
the class template instance with the class template alias as a
template parameter to the `.cc` side.  This result in the need
to have seeming 'simple' trampoline function template that are
explicitly instantiated in the `.cc` file.

Related notes:

https://www.open-std.org/jtc1/sc22/wg21/docs/cwg_active.html#1286
https://stackoverflow.com/questions/78778437/gcc-and-clang-disagree-on-using-alias-templates-as-template-template-argument

llvm/llvm-project#72377
llvm/llvm-project#33002
@pcanal pcanal requested review from amandalund and sethrj March 4, 2026 18:13
@pcanal pcanal self-assigned this Mar 4, 2026
@pcanal pcanal added the bug Something isn't working label Mar 4, 2026
@github-actions
Copy link
Copy Markdown

github-actions bot commented Mar 4, 2026

Test summary

 5 990 files   9 609 suites   19m 28s ⏱️
 2 194 tests  2 165 ✅  29 💤 0 ❌
32 902 runs  32 761 ✅ 141 💤 0 ❌

Results for commit b75c498.

♻️ This comment has been updated with latest results.

@codecov
Copy link
Copy Markdown

codecov bot commented Mar 4, 2026

Codecov Report

❌ Patch coverage is 80.00000% with 2 lines in your changes missing coverage. Please review.
✅ Project coverage is 87.24%. Comparing base (271fc06) to head (f2dca85).
⚠️ Report is 1 commits behind head on develop.

Files with missing lines Patch % Lines
src/celeritas/optical/gen/OffloadGatherAction.cc 0.00% 2 Missing ⚠️
Additional details and impacted files

Impacted file tree graph

@@             Coverage Diff             @@
##           develop    #2297      +/-   ##
===========================================
- Coverage    87.24%   87.24%   -0.01%     
===========================================
  Files         1353     1353              
  Lines        43142    43148       +6     
  Branches     13196    13196              
===========================================
+ Hits         37638    37643       +5     
- Misses        4284     4286       +2     
+ Partials      1220     1219       -1     
Files with missing lines Coverage Δ
src/celeritas/optical/gen/OffloadAction.cc 90.00% <100.00%> (+0.71%) ⬆️
src/celeritas/optical/gen/OffloadAction.hh 55.55% <ø> (ø)
src/celeritas/optical/gen/OffloadGatherAction.hh 83.33% <ø> (ø)
src/celeritas/optical/gen/OffloadGatherAction.cc 80.76% <0.00%> (-6.74%) ⬇️

... and 3 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link
Copy Markdown
Member

@sethrj sethrj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A suggestion to consider, good if not. Thanks for tracking this down! Did you encounter the compiler mismatch when disabling vecegom? root-project/veccore#28 may help if so.

Comment thread src/celeritas/optical/gen/OffloadGatherAction.hh Outdated
@pcanal
Copy link
Copy Markdown
Contributor Author

pcanal commented Mar 4, 2026

Did you encounter the compiler mismatch when disabling vecegom?

There is zero conceptual reason why that would help (this is an nvcc/gcc vs Clang issue not a RDC issue). Nonetheless, I verified and as expected it fails also with ORANGE.

@sethrj
Copy link
Copy Markdown
Member

sethrj commented Mar 4, 2026

@pcanal the compiler detection is influenced by veccore's use of check_language(CUDA), so I was asking less about the binary incompatiblity and more about the autodetection of the host compiler.

@pcanal
Copy link
Copy Markdown
Contributor Author

pcanal commented Mar 4, 2026

the compiler detection is influenced by veccore's use of check_language(CUDA), so I was asking less about the binary incompatiblity and more about the autodetection of the host compiler.

In my case, the host detection (from check_language) sort-of fails but turns out to be irrelevant. The NVCC I have uses GCC and fails miserably (crash and burn on invocation of NVCC) if I try to make it use Clang as the 'host' compiler.

If I could not make the mix mode works, I would have pursued fixing the host detection (i.e. try the VecCore patch) and have CMake fails in this mix case. But since I could make it work, this is moot for now.

@pcanal pcanal enabled auto-merge (squash) March 6, 2026 02:54
@pcanal pcanal merged commit d6352fc into celeritas-project:develop Mar 6, 2026
40 checks passed
@pcanal pcanal deleted the clang-vs-gcc branch March 6, 2026 13:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants