Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
98 commits
Select commit Hold shift + click to select a range
22a0ac0
[CODEGEN] Added GPU abstraction to CODEGEN
Jooorgen Jul 13, 2023
81cf765
[jthip] change % to %% in CODEGEN cudacpp.mk
valassi Jul 13, 2023
b83f8c9
[jthip] clang-format GpuAbstraction.h both in CODEGEN and in ggttgg.mad
valassi Jul 13, 2023
1afbafc
[jthip] clang-format GpuRuntime.h both in CODEGEN and in ggttgg.mad
valassi Jul 13, 2023
d1f5c5b
Made the codegenerated files same as the templated files in gg_ttgg
Jooorgen Jul 17, 2023
1b5c0fd
[jthip] backport to CODEGEN from ggttgg.mad
valassi Jul 18, 2023
0f1b811
[jthip] complete backport to CODEGEN from ggttgg.mad, including a few…
valassi Jul 18, 2023
71ff5e2
[jthip] in CODEGEN, remove the copying to src of GpuRuntime.h and Gpu…
valassi Jul 18, 2023
a37fb41
[jthip] In CODEGEN, acknowledge Joergen in each file and in COPYRIGHT…
valassi Jul 18, 2023
428aa50
[CODEGEN] Added HIP runtime include in mgOnGpuConfig.h in codegen
Jooorgen Jul 20, 2023
24fbbb6
[jthip/namespace] backport latest changes from ggttgg.mad to CODEGEN
valassi Jul 25, 2023
10df703
[jthip] in CODEGEN, backport also cudacpp_src.mk using GPUCC instead …
valassi Jul 25, 2023
43e0c64
[CODEGEN] Added changes from gg_ttgg.mad to code generator
Jooorgen Aug 10, 2023
e99a2b8
[CODEGEN] Added export of GPUCC and GPUFLAGS to codegen
Jooorgen Aug 10, 2023
4adb62f
Fixed warning and changed HIPARCHFLAGS export so it exports to cudacp…
Jooorgen Aug 10, 2023
e18c882
[CODEGEN] Fixed error in runTest.cc and reverted changes in cudacpp_s…
Jooorgen Aug 10, 2023
f588cd4
Merge branch 'madgraph5:master' into gpu_abstraction_only
Jooorgen Sep 28, 2023
597de73
[CODEGEN] Added GPU abstraction to CODEGEN
Jooorgen Jul 13, 2023
1a6496a
Updated first name in Author list
Jooorgen Sep 29, 2023
d2e2f47
[jt774] (before merging upstream/master) improve logic of "if CUDA el…
valassi Jan 24, 2024
8e9120c
[jt774] (before merging usptream/master) remove CODEGEN #cudacpp.mk#
valassi Jan 24, 2024
4ec244a
Merge remote-tracking branch 'upstream/master' into jt774
valassi Jan 24, 2024
cf8875b
[jt774] (after merging upstream/master) fix CODEGEN cudacpp.mk: repla…
valassi Jan 24, 2024
e32bc4e
[jt774] (after merging upstream/master) fix clang formatting in CODEG…
valassi Jan 24, 2024
def02b5
[jt774] regenerate gg_tt.mad - the build fails
valassi Jan 24, 2024
d4200cf
Revert "[jt774] regenerate gg_tt.mad - the build fails"
valassi Jan 24, 2024
ce629c2
[jt774] remove '#include "GpuAbstraction.h"' from CODEGEN mgOnGpuVect…
valassi Jan 24, 2024
7363e1f
[jthip] in CODEGEN, remove the copying to src of GpuRuntime.h and Gpu…
valassi Jul 18, 2023
47e2b8f
[jt774] in CODEGEN mgOnGpuFptypes.h, replace one more __CUDACC__ by M…
valassi Jan 24, 2024
721652e
[jt774] cherry-pick commit 1b5c0fdff ([jthip] backport to CODEGEN fro…
valassi Jul 18, 2023
71a9ece
[jthip24] (after merging upstream/master) fix clang formatting in COD…
valassi Jan 24, 2024
8090013
[jt774] copy CODEGEN check_sa.cc from jthip24 as-is (currently f44a9c…
valassi Jan 25, 2024
9a8c86c
[jt774] add one empty line in CODEGEN MemoryAccessMomenta.h as in jth…
valassi Jan 25, 2024
0a069c7
[jthip24] Remove hip_runtime.h from CODEGEN mgOnGpuConfig.h and add i…
valassi Jan 25, 2024
89170af
[jt774] copy CODEGEN cpp_model_parameters_h.inc from jthip24 as-is (c…
valassi Jan 25, 2024
69d5ed6
[jt774] fix clang formatting in CODEGEN model_handling.py after the p…
valassi Jan 25, 2024
d43cfeb
[jt774] copy CODEGEN cudacpp.mk from jthip24 as-is (currently 4ba2133…
valassi Jan 25, 2024
5e424fe
[jt774] regenerate ggtt.mad - add add the previously absent GpuAbstra…
valassi Jan 25, 2024
464703b
[jt774] *** COMPLETE SYNC OF JTHIP24 AND JT774 *** regenerate all pro…
valassi Jan 25, 2024
fa56bf8
[jt774] in gg_tt.sa cudacpp.mk, fix autodiscovery of HIP_HOME from 'w…
valassi Jan 25, 2024
41f08d7
[jt774] in gg_tt.sa cudacpp.mk, remove trailing '/' from CUDA_HOME an…
valassi Jan 25, 2024
9809728
[jt774] in gg_tt.sa, replace filesystem by experimental/filesystem fo…
valassi Jan 25, 2024
50211c7
[jt774] in gg_tt.sa check_sa.cc, replace some __CUDACC__ by MGONGPUCP…
valassi Jan 25, 2024
034822e
[jt774] in gg_tt.sa, move back hip_runtime.h from GpuAsbtraction.h to…
valassi Jan 25, 2024
0b2060a
[jt774] in gg_tt.sa cudacpp.mk, add -lstdc++fs when linking the gpu l…
valassi Jan 25, 2024
cca2658
[jt744] rerun all 78 tput tests, all ok
valassi Jan 26, 2024
39c6372
[jt744] rerun 18 tmad tests, all ok
valassi Jan 26, 2024
8698762
[jt774] in gg_tt.sa check_sa.cc, disable the printout of the gcc tool…
valassi Jan 26, 2024
3e473af
[jt774] in tput/throughputX.sh, add support for AMD GPUs (still assum…
valassi Jan 26, 2024
d13493f
[jt774] in gg_tt.sa, bypass std::filesystem completely to ease portab…
valassi Jan 27, 2024
9f3e3b5
[jt774] first tput tee test on LUMI with AMD GPU (using ggtt.sa): her…
valassi Jan 27, 2024
1185c97
[jt774] in gg_tt.sa cudacpp.mk, use FC to link fgcheck.exe, gcheck.ex…
valassi Jan 27, 2024
0943ae1
[jt774] in gg_tt.sa cudacpp.mk, improve resolution of path to libamdh…
valassi Jan 27, 2024
cbf3d36
[jt774] second (first successful) tput tee test on LUMI with AMD GPU …
valassi Jan 27, 2024
5c27ed6
[jt774] backport AMD HIP changes for PR #801 from gg_tt.sa to CODEGEN
valassi Jan 27, 2024
d8f71cb
[jt774] fix clang formatting in CODEGEN backport
valassi Jan 27, 2024
a80e630
[jt774] regenerate gg_tt.sa (all ok no change) and gg_tt.mad (will no…
valassi Jan 27, 2024
2bdc440
[jt774] in gg_tt.mad cudacpp.mk, add -lpthread when linking runTest.e…
valassi Jan 27, 2024
ee5e389
[jt774] first (successful) tput tee test on LUMI with AMD GPU using g…
valassi Jan 27, 2024
3862f10
[jt774] first tmad test on LUMI - it fails for Fortran vs C++ (no HI…
valassi Jan 27, 2024
c5a8bdc
[jt774] confirm the tmad ggtt test failure even with HIP_HOME=none on…
valassi Jan 27, 2024
083982d
[jt774] well, tmad tests for ggtt fail also on itscrd90 now
valassi Jan 27, 2024
4f0f277
[jt774] fix CODEGEN mgOnGpuConfig.h where I had accidentally hardcode…
valassi Jan 27, 2024
2fc0d87
[jt774] in CODEGEN cudacpp.mk, backport form gg_tt.mad, add -lpthread…
valassi Jan 27, 2024
d1368d0
[jt774] regenerate gg_tt.mad with the latest fixes
valassi Jan 27, 2024
8bceeae
[jt774] rerun ggtt tmad on itscrd90, now all ok
valassi Jan 27, 2024
39472bd
[jt774] in tmad/madX.sh, add support for AMD GPUs as done in tput (st…
valassi Jan 27, 2024
a6643d8
[jt774] first successful tmad test for ggtt on LUMI/HIP (note, no 512…
valassi Jan 27, 2024
4105509
[jt774] regenerate all processes
valassi Jan 27, 2024
3615bb1
[jt774] in tput/allTees.sh, add -makeonly to only build all tests ins…
valassi Jan 27, 2024
fc09144
[jt774] Fix Jorgen's name in CODEGEN COPYRIGHT
valassi Jan 27, 2024
bf4a53f
[jt744] rerun all 78 tput tests on itscrd90, all ok
valassi Jan 28, 2024
f724e69
[jt744] rerun 18 tmad tests on itscrd90, all ok
valassi Jan 28, 2024
134d126
[jt774] first execution of all 78 tput tests on LUMI (on CPUs and AMD…
valassi Jan 28, 2024
e8ea9a0
[jt774] first execution of all 18 tmad tests on LUMI (on CPUs and AMD…
valassi Jan 28, 2024
55a4f76
[jt774] in gq_ttq.mad cudacpp.mk, allow multi-word CXX (e.g. for the …
valassi Jan 29, 2024
68b589d
[jt774] in CODEGEN (backport gq_ttq.mad) cudacpp.mk, allow multi-word…
valassi Jan 29, 2024
707507f
[jt774] manually fix cudacpp.mk in all processes for #807 and #505
valassi Jan 29, 2024
67a18a5
[jt744] in gq_ttq.mad, remove -G from HIP debug builds on LUMI (#808)
valassi Jan 29, 2024
6572c5d
[jt744] in CODEGEN (backport gq_ttq.mad), remove -G from HIP debug bu…
valassi Jan 29, 2024
05f5ecf
[jt774] regenerate gq_ttq.mad to check all is ok - actually Jorgen's …
valassi Jan 29, 2024
fcb359a
[jt774] manually fix all processes for cudacpp '-G' option in HIP #80…
valassi Jan 29, 2024
9ff021b
[jt774] in gg_tt.mad, fix mgOnGpuCxtypes.h so that cucomplex and thru…
valassi Jan 29, 2024
5296b40
[jt774] in gg_tt.mad, fix complex type #ifdef's and fix "-x hip" comp…
valassi Jan 29, 2024
35a70d3
[jt774] in CODEGEN (backport from gg_tt.mad), fix mgOnGpuCxtypes.h so…
valassi Jan 30, 2024
d7fd274
[jt774] regenerate gg_tt.mad, all ok (including code formatting fixes)
valassi Jan 30, 2024
b28dff0
[jt774] regenerate all processes
valassi Jan 30, 2024
7c4823d
[jt774] in tput scripts, add -rorhst and -hip flags
valassi Jan 30, 2024
7735bb1
[jt774] in CODEGEN, fix cudacpp_src.mk for non-SM models
valassi Jan 30, 2024
a5a7661
[jt774] regenerate gg_tt.mad, all ok no change
valassi Jan 30, 2024
31cb663
[jt774] regenerate heft process, ok it changed as expected
valassi Jan 30, 2024
9ed3aaf
fixing typo in the allowed option
oliviermattelaer Jan 31, 2024
33322e0
[jt774] rerun 78 tput tests on itscrd90, all ok [NB using code genera…
valassi Jan 30, 2024
6c42fc2
[jt774] rerun 18 tmad tests on itscrd90, all ok [NB using code genera…
valassi Jan 30, 2024
fc19f84
[jt774] regenerate all code with Olivier's patch - only launch_plugin…
valassi Jan 31, 2024
eeee118
[jt774] rerun all 78 tput tests on LUMI - same issues as before for g…
valassi Jan 31, 2024
227ac86
[jt774] rerun all 18 tmad tests on LUMI - same issues as before for g…
valassi Jan 31, 2024
60299b7
[jt774] ** COMPLETE JT774 ** go back to tput and tmad logs from itscr…
valassi Jan 31, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
The table of contents is too big for display.
Diff view
Diff view
  •  
  •  
  •  
Original file line number Diff line number Diff line change
Expand Up @@ -103,7 +103,7 @@ def default_setup(self):
fct_mod=(self.reset_makeopts,(),{}),
allowed=['auto', 'none', 'sse4', 'avx2','512y','512z'])
self.add_param('cudacpp_backend', 'CPP', include=False, hidden=False,
allowed=['Fortan', 'CPP', 'CUDA'])
allowed=['Fortran', 'CPP', 'CUDA'])
self['vector_size'] = 16 # already setup in default class (just change value)
self['aloha_flag'] = '--fast-math'
self['matrix_flag'] = '-O3'
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,7 @@ generates includes the following authors:
Stephan Hageboeck (CERN)
Olivier Mattelaer (Universite Catholique de Louvain, original author)
Stefan Roiser (CERN, original author)
Jorgen Teig (CERN)
Andrea Valassi (CERN, original author)
Zenny Wettersten (CERN)

Expand All @@ -28,5 +29,4 @@ acknowledged collaboration with the following collaborators:
Taran Singhania (PES University Bangalore)
David Smith (CERN)
Carl Vuosalo (University of Wisconsin-Madison)
Joergen Teig (CERN)

Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,7 @@ The full development team currently includes the following authors :
Stephan Hageboeck (CERN)
Olivier Mattelaer (Universite Catholique de Louvain, original author)
Stefan Roiser (CERN, original author)
Jorgen Teig (CERN)
Andrea Valassi (CERN, original author)
Zenny Wettersten (CERN)
See https://github.com/madgraph5/madgraph4gpu for more details. For the full
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@
// Copyright (C) 2020-2023 CERN and UCLouvain.
// Licensed under the GNU Lesser General Public License (version 3 or later).
// Modified by: A. Valassi (Sep 2021) for the MG5aMC CUDACPP plugin.
// Further modified by: A. Valassi (2021-2023) for the MG5aMC CUDACPP plugin.
// Further modified by: J. Teig, A. Valassi (2021-2023) for the MG5aMC CUDACPP plugin.
//==========================================================================
// This file has been automatically generated for CUDA/C++ standalone by
%(info_lines)s
Expand All @@ -15,7 +15,7 @@
#include <iomanip>
#include <iostream>

#ifdef __CUDACC__
#ifdef MGONGPUCPP_GPUIMPL
using namespace mg5amcGpu;
#else
using namespace mg5amcCpu;
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -25,7 +25,7 @@
#include "read_slha.h"

// NB: namespaces mg5amcGpu and mg5amcCpu includes types which are defined in different ways for CPU and GPU builds (see #318 and #725)
#ifdef __CUDACC__
#ifdef MGONGPUCPP_GPUIMPL
namespace mg5amcGpu
#else
namespace mg5amcCpu
Expand Down Expand Up @@ -85,7 +85,7 @@ namespace mg5amcCpu
#include <limits>

// NB: namespaces mg5amcGpu and mg5amcCpu includes types which are defined in different ways for CPU and GPU builds (see #318 and #725)
#ifdef __CUDACC__
#ifdef MGONGPUCPP_GPUIMPL
namespace mg5amcGpu
#else
namespace mg5amcCpu
Expand Down Expand Up @@ -155,7 +155,7 @@ namespace mg5amcCpu
//==========================================================================

// NB: namespaces mg5amcGpu and mg5amcCpu includes types which are defined in different ways for CPU and GPU builds (see #318 and #725)
#ifdef __CUDACC__
#ifdef MGONGPUCPP_GPUIMPL
namespace mg5amcGpu
#else
namespace mg5amcCpu
Expand All @@ -172,7 +172,7 @@ namespace mg5amcCpu
#pragma GCC diagnostic push
#pragma GCC diagnostic ignored "-Wunused-variable" // e.g. <<warning: unused variable ‘mdl_G__exp__2’ [-Wunused-variable]>>
#pragma GCC diagnostic ignored "-Wunused-parameter" // e.g. <<warning: unused parameter ‘G’ [-Wunused-parameter]>>
#ifdef __CUDACC__
#ifdef MGONGPUCPP_GPUIMPL
#pragma nv_diagnostic push
#pragma nv_diag_suppress 177 // e.g. <<warning #177-D: variable "mdl_G__exp__2" was declared but never referenced>>
#endif
Expand All @@ -196,7 +196,7 @@ namespace mg5amcCpu
%(eftspecial2)s
return out;
}
#ifdef __CUDACC__
#ifdef MGONGPUCPP_GPUIMPL
#pragma GCC diagnostic pop
#pragma nv_diagnostic pop
#endif
Expand Down
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
// Copyright (C) 2020-2023 CERN and UCLouvain.
// Licensed under the GNU Lesser General Public License (version 3 or later).
// Created by: S. Roiser (Nov 2021) for the MG5aMC CUDACPP plugin.
// Further modified by: S. Roiser, A. Valassi (2021-2023) for the MG5aMC CUDACPP plugin.
// Further modified by: S. Roiser, J. Teig, A. Valassi (2021-2023) for the MG5aMC CUDACPP plugin.

#ifndef BRIDGE_H
#define BRIDGE_H 1
Expand All @@ -14,16 +14,23 @@
#include "MemoryAccessMomenta.h" // for MemoryAccessMomenta::neppM
#include "MemoryBuffers.h" // for HostBufferMomenta, DeviceBufferMomenta etc

//#ifdef __HIPCC__
//#include <experimental/filesystem> // see https://rocm.docs.amd.com/en/docs-5.4.3/CHANGELOG.html#id79
//#else
//#include <filesystem> // bypass this completely to ease portability on LUMI #803
//#endif

#include <sys/stat.h> // bypass std::filesystem #803

#include <algorithm>
#include <cassert>
#include <cmath>
#include <cstring>
#include <filesystem>
#include <iostream>
#include <memory>
#include <type_traits>

#ifdef __CUDACC__
#ifdef MGONGPUCPP_GPUIMPL
namespace mg5amcGpu
#else
namespace mg5amcCpu
Expand Down Expand Up @@ -83,7 +90,7 @@ namespace mg5amcCpu
Bridge& operator=( const Bridge& ) = delete;
Bridge& operator=( Bridge&& ) = delete;

#ifdef __CUDACC__
#ifdef MGONGPUCPP_GPUIMPL
/**
* Set the gpublocks and gputhreads for the gpusequence - throws if evnt != gpublocks*gputhreads
* (this is needed for BridgeKernel tests rather than for actual production use in Fortran)
Expand Down Expand Up @@ -150,7 +157,7 @@ namespace mg5amcCpu
unsigned int m_nevt; // number of events
int m_nGoodHel; // the number of good helicities (-1 initially when they have not yet been calculated)

#ifdef __CUDACC__
#ifdef MGONGPUCPP_GPUIMPL
int m_gputhreads; // number of gpu threads (default set from number of events, can be modified)
int m_gpublocks; // number of gpu blocks (default set from number of events, can be modified)
DeviceBuffer<FORTRANFPTYPE, sizePerEventMomenta> m_devMomentaF;
Expand Down Expand Up @@ -187,12 +194,12 @@ namespace mg5amcCpu
// Forward declare transposition methods
//

#ifdef __CUDACC__
#ifdef MGONGPUCPP_GPUIMPL

template<typename Tin, typename Tout>
__global__ void dev_transposeMomentaF2C( const Tin* in, Tout* out, const unsigned int nevt );

#endif // __CUDACC__
#endif // MGONGPUCPP_GPUIMPL

template<typename Tin, typename Tout>
void hst_transposeMomentaF2C( const Tin* in, Tout* out, const unsigned int nevt );
Expand All @@ -209,7 +216,7 @@ namespace mg5amcCpu
Bridge<FORTRANFPTYPE>::Bridge( unsigned int nevtF, unsigned int nparF, unsigned int np4F )
: m_nevt( nevtF )
, m_nGoodHel( -1 )
#ifdef __CUDACC__
#ifdef MGONGPUCPP_GPUIMPL
, m_gputhreads( 256 ) // default number of gpu threads
, m_gpublocks( m_nevt / m_gputhreads ) // this ensures m_nevt <= m_gpublocks*m_gputhreads
, m_devMomentaF( m_nevt )
Expand All @@ -233,7 +240,7 @@ namespace mg5amcCpu
{
if( nparF != CPPProcess::npar ) throw std::runtime_error( "Bridge constructor: npar mismatch" );
if( np4F != CPPProcess::np4 ) throw std::runtime_error( "Bridge constructor: np4 mismatch" );
#ifdef __CUDACC__
#ifdef MGONGPUCPP_GPUIMPL
if( ( m_nevt < s_gputhreadsmin ) || ( m_nevt % s_gputhreadsmin != 0 ) )
throw std::runtime_error( "Bridge constructor: nevt should be a multiple of " + std::to_string( s_gputhreadsmin ) );
while( m_nevt != m_gpublocks * m_gputhreads )
Expand All @@ -249,20 +256,28 @@ namespace mg5amcCpu
#else
std::cout << "WARNING! Instantiate host Bridge (nevt=" << m_nevt << ")" << std::endl;
m_pmek.reset( new MatrixElementKernelHost( m_hstMomentaC, m_hstGs, m_hstRndHel, m_hstRndCol, m_hstMEs, m_hstSelHel, m_hstSelCol, m_nevt ) );
#endif // __CUDACC__
#endif // MGONGPUCPP_GPUIMPL
// Create a process object, read param card and set parameters
// FIXME: the process instance can happily go out of scope because it is only needed to read parameters?
// FIXME: the CPPProcess should really be a singleton? what if fbridgecreate is called from several Fortran threads?
CPPProcess process( /*verbose=*/false );
std::string paramCard = "../../Cards/param_card.dat";
if( !std::filesystem::exists( paramCard ) )
{
paramCard = "../" + paramCard;
}
/*
#ifdef __HIPCC__
if( !std::experimental::filesystem::exists( paramCard ) ) paramCard = "../" + paramCard;
#else
if( !std::filesystem::exists( paramCard ) ) paramCard = "../" + paramCard;
#endif
*/
//struct stat dummybuffer; // bypass std::filesystem #803
//if( !( stat( paramCard.c_str(), &dummyBuffer ) == 0 ) ) paramCard = "../" + paramCard; //
auto fileExists = []( std::string& fileName )
{ struct stat buffer; return stat( fileName.c_str(), &buffer ) == 0; };
if( !fileExists( paramCard ) ) paramCard = "../" + paramCard; // bypass std::filesystem #803
process.initProc( paramCard );
}

#ifdef __CUDACC__
#ifdef MGONGPUCPP_GPUIMPL
template<typename FORTRANFPTYPE>
void Bridge<FORTRANFPTYPE>::set_gpugrid( const int gpublocks, const int gputhreads )
{
Expand All @@ -276,7 +291,7 @@ namespace mg5amcCpu
}
#endif

#ifdef __CUDACC__
#ifdef MGONGPUCPP_GPUIMPL
template<typename FORTRANFPTYPE>
void Bridge<FORTRANFPTYPE>::gpu_sequence( const FORTRANFPTYPE* momenta,
const FORTRANFPTYPE* gs,
Expand All @@ -291,14 +306,14 @@ namespace mg5amcCpu
constexpr int neppM = MemoryAccessMomenta::neppM;
if constexpr( neppM == 1 && std::is_same_v<FORTRANFPTYPE, fptype> )
{
checkCuda( cudaMemcpy( m_devMomentaC.data(), momenta, m_devMomentaC.bytes(), cudaMemcpyHostToDevice ) );
gpuMemcpy( m_devMomentaC.data(), momenta, m_devMomentaC.bytes(), gpuMemcpyHostToDevice );
}
else
{
checkCuda( cudaMemcpy( m_devMomentaF.data(), momenta, m_devMomentaF.bytes(), cudaMemcpyHostToDevice ) );
gpuMemcpy( m_devMomentaF.data(), momenta, m_devMomentaF.bytes(), gpuMemcpyHostToDevice );
const int thrPerEvt = CPPProcess::npar * CPPProcess::np4; // AV: transpose alg does 1 element per thread (NOT 1 event per thread)
//const int thrPerEvt = 1; // AV: try new alg with 1 event per thread... this seems slower
dev_transposeMomentaF2C<<<m_gpublocks * thrPerEvt, m_gputhreads>>>( m_devMomentaF.data(), m_devMomentaC.data(), m_nevt );
gpuLaunchKernel( dev_transposeMomentaF2C, m_gpublocks * thrPerEvt, m_gputhreads, m_devMomentaF.data(), m_devMomentaC.data(), m_nevt );
}
if constexpr( std::is_same_v<FORTRANFPTYPE, fptype> )
{
Expand Down Expand Up @@ -341,7 +356,7 @@ namespace mg5amcCpu
}
#endif

#ifndef __CUDACC__
#ifndef MGONGPUCPP_GPUIMPL
template<typename FORTRANFPTYPE>
void Bridge<FORTRANFPTYPE>::cpu_sequence( const FORTRANFPTYPE* momenta,
const FORTRANFPTYPE* gs,
Expand Down Expand Up @@ -396,7 +411,7 @@ namespace mg5amcCpu
// - C++ array: momenta[npagM][npar][np4][neppM] with nevt=npagM*neppM (AOSOA)
//

#ifdef __CUDACC__
#ifdef MGONGPUCPP_GPUIMPL
template<typename Tin, typename Tout>
__global__ void dev_transposeMomentaF2C( const Tin* in, Tout* out, const unsigned int nevt )
{
Expand Down
Original file line number Diff line number Diff line change
@@ -1,17 +1,18 @@
// Copyright (C) 2020-2023 CERN and UCLouvain.
// Licensed under the GNU Lesser General Public License (version 3 or later).
// Created by: A. Valassi (Jan 2022) for the MG5aMC CUDACPP plugin.
// Further modified by: A. Valassi (2022-2023) for the MG5aMC CUDACPP plugin.
// Further modified by: J. Teig, A. Valassi (2022-2023) for the MG5aMC CUDACPP plugin.

#include "BridgeKernels.h"

#include "GpuAbstraction.h"
#include "MemoryAccessMomenta.h"

#include <sstream>

//============================================================================

#ifdef __CUDACC__
#ifdef MGONGPUCPP_GPUIMPL
namespace mg5amcGpu
#else
namespace mg5amcCpu
Expand Down Expand Up @@ -45,7 +46,7 @@ namespace mg5amcCpu

//============================================================================

#ifndef __CUDACC__
#ifndef MGONGPUCPP_GPUIMPL
namespace mg5amcCpu
{

Expand Down Expand Up @@ -96,7 +97,7 @@ namespace mg5amcCpu

//============================================================================

#ifdef __CUDACC__
#ifdef MGONGPUCPP_GPUIMPL
namespace mg5amcGpu
{

Expand Down
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
// Copyright (C) 2020-2023 CERN and UCLouvain.
// Licensed under the GNU Lesser General Public License (version 3 or later).
// Created by: A. Valassi (Jan 2022) for the MG5aMC CUDACPP plugin.
// Further modified by: A. Valassi (2022-2023) for the MG5aMC CUDACPP plugin.
// Further modified by: J. Teig, A. Valassi (2022-2023) for the MG5aMC CUDACPP plugin.

#ifndef BRIDGEKERNELS_H
#define BRIDGEKERNELS_H 1
Expand All @@ -12,7 +12,7 @@
#include "MatrixElementKernels.h"
#include "MemoryBuffers.h"

#ifdef __CUDACC__
#ifdef MGONGPUCPP_GPUIMPL
namespace mg5amcGpu
#else
namespace mg5amcCpu
Expand Down Expand Up @@ -49,7 +49,7 @@ namespace mg5amcCpu

//--------------------------------------------------------------------------

#ifndef __CUDACC__
#ifndef MGONGPUCPP_GPUIMPL
// A Bridge wrapper class encapsulating matrix element calculations on a CPU host
class BridgeKernelHost final : public BridgeKernelBase
{
Expand Down Expand Up @@ -89,7 +89,7 @@ namespace mg5amcCpu

//--------------------------------------------------------------------------

#ifdef __CUDACC__
#ifdef MGONGPUCPP_GPUIMPL
// A Bridge wrapper class encapsulating matrix element calculations on a GPU device
class BridgeKernelDevice : public BridgeKernelBase
{
Expand Down
Original file line number Diff line number Diff line change
@@ -1,15 +1,16 @@
// Copyright (C) 2020-2023 CERN and UCLouvain.
// Licensed under the GNU Lesser General Public License (version 3 or later).
// Created by: A. Valassi (Dec 2021) for the MG5aMC CUDACPP plugin.
// Further modified by: A. Valassi (2021-2023) for the MG5aMC CUDACPP plugin.
// Further modified by: J. Teig, A. Valassi (2021-2023) for the MG5aMC CUDACPP plugin.

#include "CommonRandomNumbers.h"
#include "GpuAbstraction.h"
#include "MemoryBuffers.h"
#include "RandomNumberKernels.h"

#include <cassert>

#ifdef __CUDACC__
#ifdef MGONGPUCPP_GPUIMPL
namespace mg5amcGpu
#else
namespace mg5amcCpu
Expand Down
Original file line number Diff line number Diff line change
@@ -1,10 +1,11 @@
// Copyright (C) 2020-2023 CERN and UCLouvain.
// Licensed under the GNU Lesser General Public License (version 3 or later).
// Created by: A. Valassi (Jan 2022) for the MG5aMC CUDACPP plugin.
// Further modified by: A. Valassi (2022-2023) for the MG5aMC CUDACPP plugin.
// Further modified by: J. Teig, A. Valassi (2022-2023) for the MG5aMC CUDACPP plugin.

#include "CrossSectionKernels.h"

#include "GpuAbstraction.h"
#include "MemoryAccessMatrixElements.h"
#include "MemoryAccessWeights.h"
#include "MemoryBuffers.h"
Expand Down Expand Up @@ -77,7 +78,7 @@ debug_me_is_abnormal( const fptype& me, size_t ievtALL )

//============================================================================

#ifdef __CUDACC__
#ifdef MGONGPUCPP_GPUIMPL
namespace mg5amcGpu
#else
namespace mg5amcCpu
Expand Down Expand Up @@ -185,7 +186,7 @@ namespace mg5amcCpu

//============================================================================

#ifdef __CUDACC__
#ifdef MGONGPUCPP_GPUIMPL
namespace mg5amcGpu
{

Expand Down
Loading