[core] Test GPGPU (OpenCL/CUDA) backend with AMGCL solver #3831

ddemidov · 2019-01-15T13:57:20Z

Enable vexcl backend for amgcl solver, which makes it possible to use
GPGPU (either CUDA or OpenCL) in order to accelerate solution.

CMake option AMGCL_GPGPU (default: OFF) controls whether to compile
GPGPU support.
CMake option AMGCL_GPGPU_BACKEND (default: OpenCL) selects vexcl
backend (OpenCL/CUDA)
New setting in linear solver parameters: use_gpgpu (default: false) enables GPGPU at
runtime.
Environment variable OCL_DEVICE may be used to select a particular
compute device.

~~It is enough to clone and configure VexCL anywhere for cmake to pick it up.~~
VexCL sources are in external_libraries.

ddemidov · 2019-01-16T11:38:39Z

Regarding increased memory consumption during compilation:

I think we could keep just the declarations of AMGCLScalarSolve and AMGCLBlockSolve function templates in amgcl_solver.h, and move the actual definitions (and explicit instantiations for all possible combinations of TSparseSpaceType and TBlockSize) into a cpp file. This file would be the only one to actually include vexcl headers. This way compilation of the rest of Kratos should not be affected.

ddemidov · 2019-01-16T12:48:53Z

dc926f4 seems to work. @RiccardoRossi, can you check if this reduces compilation memory requirements? I suspect this may also reduce the overall compile time of KratosCore, since all amgcl headers are now moved to a separate compilation unit.

ddemidov · 2019-01-17T13:48:38Z

7519c45 allows to choose between OpenCL and CUDA backends with a cmake option.

RiccardoRossi · 2019-01-18T14:59:06Z

When compiling with cotire and the following flags:

-DUSE_COTIRE=ON
-DMPI_NEEDED=ON
-DAMGCL_GPGPU=ON
-DAMGCL_GPGPU_BACKEND=CUDA \

the following warning are issued (however the code compiles and runs)

-- CXX target mpipython cotired without precompiled header. Too few applicable sources.
--------------------------------  standard install dir /home/riccardo/Kratos
installed blas = /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.7.1
installed lapack = /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.7.1/usr/lib/x86_64-linux-gnu/blas/libblas.so.3.7.1
-- Configuring done
CMake Error at cmake_modules/cotire.cmake:2254 (file):
  Error evaluating generator expression:

    $<CXX_COMPILER_ID:MSVC>

  $<CXX_COMPILER_ID> may only be used with binary targets.  It may not be
  used with add_custom_command or add_custom_target.
Call Stack (most recent call first):
  cmake_modules/cotire.cmake:2848 (cotire_generate_target_script)
  cmake_modules/cotire.cmake:3254 (cotire_process_target_language)
  cmake_modules/cotire.cmake:3431 (cotire_target)
  kratos/CMakeLists.txt:184 (cotire)


CMake Error at cmake_modules/cotire.cmake:2254 (file):
  Error evaluating generator expression:

    $<CXX_COMPILER_ID:MSVC>

  $<CXX_COMPILER_ID> may only be used with binary targets.  It may not be
  used with add_custom_command or add_custom_target.
Call Stack (most recent call first):
  cmake_modules/cotire.cmake:2848 (cotire_generate_target_script)
  cmake_modules/cotire.cmake:3254 (cotire_process_target_language)
  cmake_modules/cotire.cmake:3431 (cotire_target)
  kratos/CMakeLists.txt:184 (cotire)


CMake Error at cmake_modules/cotire.cmake:2254 (file):
  Error evaluating generator expression:

    $<CXX_COMPILER_ID:MSVC>

  $<CXX_COMPILER_ID> may only be used with binary targets.  It may not be
  used with add_custom_command or add_custom_target.
Call Stack (most recent call first):
  cmake_modules/cotire.cmake:2848 (cotire_generate_target_script)
  cmake_modules/cotire.cmake:3254 (cotire_process_target_language)
  cmake_modules/cotire.cmake:3431 (cotire_target)
  kratos/CMakeLists.txt:184 (cotire)


CMake Error at cmake_modules/cotire.cmake:2254 (file):
  Error evaluating generator expression:

    $<CXX_COMPILER_ID:MSVC>

  $<CXX_COMPILER_ID> may only be used with binary targets.  It may not be
  used with add_custom_command or add_custom_target.
Call Stack (most recent call first):
  cmake_modules/cotire.cmake:2848 (cotire_generate_target_script)
  cmake_modules/cotire.cmake:3254 (cotire_process_target_language)
  cmake_modules/cotire.cmake:3431 (cotire_target)
  kratos/CMakeLists.txt:184 (cotire)


CMake Error at cmake_modules/cotire.cmake:2254 (file):
  Error evaluating generator expression:

    $<CXX_COMPILER_ID:MSVC>

  $<CXX_COMPILER_ID> may only be used with binary targets.  It may not be
  used with add_custom_command or add_custom_target.
Call Stack (most recent call first):
  cmake_modules/cotire.cmake:2848 (cotire_generate_target_script)
  cmake_modules/cotire.cmake:3254 (cotire_process_target_language)
  cmake_modules/cotire.cmake:3431 (cotire_target)
  kratos/CMakeLists.txt:184 (cotire)


CMake Error at cmake_modules/cotire.cmake:2254 (file):
  Error evaluating generator expression:

    $<CXX_COMPILER_ID:MSVC>

  $<CXX_COMPILER_ID> may only be used with binary targets.  It may not be
  used with add_custom_command or add_custom_target.
Call Stack (most recent call first):
  cmake_modules/cotire.cmake:2848 (cotire_generate_target_script)
  cmake_modules/cotire.cmake:3254 (cotire_process_target_language)
  cmake_modules/cotire.cmake:3431 (cotire_target)
  kratos/CMakeLists.txt:184 (cotire)


CMake Error at cmake_modules/cotire.cmake:2254 (file):
  Error evaluating generator expression:

    $<CXX_COMPILER_ID:MSVC>

  $<CXX_COMPILER_ID> may only be used with binary targets.  It may not be
  used with add_custom_command or add_custom_target.
Call Stack (most recent call first):-- CXX target mpipython cotired without precompiled header. Too few applicable sources.
--------------------------------  standard install dir /home/riccardo/Kratos
installed blas = /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.7.1
installed lapack = /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.7.1/usr/lib/x86_64-linux-gnu/blas/libblas.so.3.7.1
-- Configuring done
CMake Error at cmake_modules/cotire.cmake:2254 (file):
  Error evaluating generator expression:

    $<CXX_COMPILER_ID:MSVC>

  $<CXX_COMPILER_ID> may only be used with binary targets.  It may not be
  used with add_custom_command or add_custom_target.
Call Stack (most recent call first):
  cmake_modules/cotire.cmake:2848 (cotire_generate_target_script)
  cmake_modules/cotire.cmake:3254 (cotire_process_target_language)
  cmake_modules/cotire.cmake:3431 (cotire_target)
  kratos/CMakeLists.txt:184 (cotire)


CMake Error at cmake_modules/cotire.cmake:2254 (file):
  Error evaluating generator expression:

    $<CXX_COMPILER_ID:MSVC>

  $<CXX_COMPILER_ID> may only be used with binary targets.  It may not be
  used with add_custom_command or add_custom_target.
Call Stack (most recent call first):
  cmake_modules/cotire.cmake:2848 (cotire_generate_target_script)
  cmake_modules/cotire.cmake:3254 (cotire_process_target_language)
  cmake_modules/cotire.cmake:3431 (cotire_target)
  kratos/CMakeLists.txt:184 (cotire)


CMake Error at cmake_modules/cotire.cmake:2254 (file):
  Error evaluating generator expression:

    $<CXX_COMPILER_ID:MSVC>

  $<CXX_COMPILER_ID> may only be used with binary targets.  It may not be
  used with add_custom_command or add_custom_target.
Call Stack (most recent call first):
  cmake_modules/cotire.cmake:2848 (cotire_generate_target_script)
  cmake_modules/cotire.cmake:3254 (cotire_process_target_language)
  cmake_modules/cotire.cmake:3431 (cotire_target)
  kratos/CMakeLists.txt:184 (cotire)


CMake Error at cmake_modules/cotire.cmake:2254 (file):
  Error evaluating generator expression:

    $<CXX_COMPILER_ID:MSVC>

  $<CXX_COMPILER_ID> may only be used with binary targets.  It may not be
  used with add_custom_command or add_custom_target.
Call Stack (most recent call first):
  cmake_modules/cotire.cmake:2848 (cotire_generate_target_script)
  cmake_modules/cotire.cmake:3254 (cotire_process_target_language)
  cmake_modules/cotire.cmake:3431 (cotire_target)
  kratos/CMakeLists.txt:184 (cotire)


CMake Error at cmake_modules/cotire.cmake:2254 (file):
  Error evaluating generator expression:

    $<CXX_COMPILER_ID:MSVC>

  $<CXX_COMPILER_ID> may only be used with binary targets.  It may not be
  used with add_custom_command or add_custom_target.
Call Stack (most recent call first):
  cmake_modules/cotire.cmake:2848 (cotire_generate_target_script)
  cmake_modules/cotire.cmake:3254 (cotire_process_target_language)
  cmake_modules/cotire.cmake:3431 (cotire_target)
  kratos/CMakeLists.txt:184 (cotire)


CMake Error at cmake_modules/cotire.cmake:2254 (file):
  Error evaluating generator expression:

    $<CXX_COMPILER_ID:MSVC>

  $<CXX_COMPILER_ID> may only be used with binary targets.  It may not be
  used with add_custom_command or add_custom_target.
Call Stack (most recent call first):
  cmake_modules/cotire.cmake:2848 (cotire_generate_target_script)
  cmake_modules/cotire.cmake:3254 (cotire_process_target_language)
  cmake_modules/cotire.cmake:3431 (cotire_target)
  kratos/CMakeLists.txt:184 (cotire)


CMake Error at cmake_modules/cotire.cmake:2254 (file):
  Error evaluating generator expression:

    $<CXX_COMPILER_ID:MSVC>

  $<CXX_COMPILER_ID> may only be used with binary targets.  It may not be
  used with add_custom_command or add_custom_target.
Call Stack (most recent call first):
  cmake_modules/cotire.cmake:2848 (cotire_generate_target_script)
  cmake_modules/cotire.cmake:3254 (cotire_process_target_language)
  cmake_modules/cotire.cmake:3431 (cotire_target)
  kratos/CMakeLists.txt:197 (cotire)


CMake Error at cmake_modules/cotire.cmake:2254 (file):
  Error evaluating generator expression:

    $<CXX_COMPILER_ID:MSVC>

  $<CXX_COMPILER_ID> may only be used with binary targets.  It may not be
  used with add_custom_command or add_custom_target.
Call Stack (most recent call first):
  cmake_modules/cotire.cmake:2848 (cotire_generate_target_script)
  cmake_modules/cotire.cmake:3254 (cotire_process_target_language)
  cmake_modules/cotire.cmake:3431 (cotire_target)
  kratos/CMakeLists.txt:197 (cotire)


CMake Error at cmake_modules/cotire.cmake:2254 (file):
  Error evaluating generator expression:

    $<CXX_COMPILER_ID:MSVC>

  $<CXX_COMPILER_ID> may only be used with binary targets.  It may not be
  used with add_custom_command or add_custom_target.
Call Stack (most recent call first):
  cmake_modules/cotire.cmake:2848 (cotire_generate_target_script)
  cmake_modules/cotire.cmake:3254 (cotire_process_target_language)
  cmake_modules/cotire.cmake:3431 (cotire_target)
  kratos/CMakeLists.txt:197 (cotire)


-- Generating done

  cmake_modules/cotire.cmake:2848 (cotire_generate_target_script)
  cmake_modules/cotire.cmake:3254 (cotire_process_target_language)
  cmake_modules/cotire.cmake:3431 (cotire_target)
  kratos/CMakeLists.txt:197 (cotire)


CMake Error at cmake_modules/cotire.cmake:2254 (file):
  Error evaluating generator expression:

    $<CXX_COMPILER_ID:MSVC>

  $<CXX_COMPILER_ID> may only be used with binary targets.  It may not be
  used with add_custom_command or add_custom_target.
Call Stack (most recent call first):
  cmake_modules/cotire.cmake:2848 (cotire_generate_target_script)
  cmake_modules/cotire.cmake:3254 (cotire_process_target_language)
  cmake_modules/cotire.cmake:3431 (cotire_target)
  kratos/CMakeLists.txt:197 (cotire)


CMake Error at cmake_modules/cotire.cmake:2254 (file):
  Error evaluating generator expression:

    $<CXX_COMPILER_ID:MSVC>

  $<CXX_COMPILER_ID> may only be used with binary targets.  It may not be
  used with add_custom_command or add_custom_target.
Call Stack (most recent call first):
  cmake_modules/cotire.cmake:2848 (cotire_generate_target_script)
  cmake_modules/cotire.cmake:3254 (cotire_process_target_language)
  cmake_modules/cotire.cmake:3431 (cotire_target)
  kratos/CMakeLists.txt:197 (cotire)


-- Generating done

RiccardoRossi · 2019-01-18T15:00:26Z

correction:

when running the compilation with the flags

-DUSE_COTIRE=ON
-DMPI_NEEDED=ON
-DAMGCL_GPGPU=ON
-DAMGCL_GPGPU_BACKEND=CUDA \

the following error is issued

 riccardo  ~/.../AMGCLissue/test_cylinder_mpi.gid  mpirun --np 4 python3 MainKratos.py 
Traceback (most recent call last):
  File "MainKratos.py", line 3, in <module>
    import KratosMultiphysics
  File "/home/riccardo/Kratos/KratosMultiphysics/__init__.py", line 13, in <module>
    from Kratos import *
ImportError: dynamic module does not define module export function (PyInit_Kratos)
Traceback (most recent call last):
  File "MainKratos.py", line 3, in <module>
    import KratosMultiphysics
  File "/home/riccardo/Kratos/KratosMultiphysics/__init__.py", line 13, in <module>
    from Kratos import *
ImportError: dynamic module does not define module export function (PyInit_Kratos)
Traceback (most recent call last):
  File "MainKratos.py", line 3, in <module>
    import KratosMultiphysics
  File "/home/riccardo/Kratos/KratosMultiphysics/__init__.py", line 13, in <module>
    from Kratos import *
ImportError: dynamic module does not define module export function (PyInit_Kratos)
Traceback (most recent call last):
  File "MainKratos.py", line 3, in <module>
    import KratosMultiphysics
  File "/home/riccardo/Kratos/KratosMultiphysics/__init__.py", line 13, in <module>
    from Kratos import *
ImportError: dynamic module does not define module export function (PyInit_Kratos)
-------------------------------------------------------
Primary job  terminated normally, but 1 process returned
a non-zero exit code.. Per user-direction, the job has been aborted.
-------------------------------------------------------
--------------------------------------------------------------------------
mpirun detected that one or more processes exited with non-zero status, thus causing
the job to be terminated. The first process to do so was:

  Process name: [[54811,1],1]
  Exit code:    1

ddemidov · 2019-01-18T18:29:49Z

I don't think the error is related to the changes here?

ddemidov · 2019-01-18T18:32:13Z

Re cotire: sakra/cotire#135

RiccardoRossi · 2019-01-22T17:34:19Z

yes it looks like the error with cotire is the one you indicate... however i did not really understand how the patch should be applied

RiccardoRossi · 2019-01-22T17:41:18Z

mmm also this one is relevant

sakra/cotire#94

although it still does not give a solution (i think you are doing what they suggest)

ddemidov · 2019-01-22T18:15:41Z

I did not do anything to solve the cotire issue; I think this should be solved on the cotire side first.

ddemidov · 2019-01-22T19:08:04Z

Rebased onto the current master and squashed the commits.

RiccardoRossi · 2019-01-23T07:21:02Z

I have been trying stuff with cotire (including trying out the newest version). Unfortunately cotire is needed since we use it in our CI.

I verified and the latest master it does compile with cotire without any problem, so the problem comes from the local modifications of the CMakeLists.txt

I am 95% sure that the problem comes from

  set_property(CACHE AMGCL_GPGPU_BACKEND PROPERTY STRINGS "OpenCL" "CUDA")

or

  target_compile_definitions(KratosCore PUBLIC AMGCL_GPGPU)

can't we use "add_define" instead and define it globally to all the KratosCore?

ddemidov · 2019-01-23T08:01:11Z

Modifying global state is considered extremely bad practice in cmake, but since this is how all of Kratos is currently built anyway, I think we can do that.

ddemidov · 2019-01-23T08:56:59Z

@RiccardoRossi , is c702ec1 what you had in mind? Does that work?

ddemidov · 2019-01-23T10:46:36Z

25048ef applies cotire patch from sakra/cotire#155. With this, I am able to compile with both AMGCL_GPGPU=ON and USE_COTIRE=ON.

Enable vexcl backend for amgcl solver, which makes it possible to use GPGPU (either CUDA or OpenCL) in order to accelerate solution. * CMake option AMGCL_GPGPU (default: off) controls whether to compile GPGPU support. * CMake option AMGCL_GPGPU_BACKEND (default: OpenCL) selects vexcl backend (OpenCL/CUDA) * New setting in linear solver parameters: `use_gpgpu` enables GPGPU at runtime. * Environment variable OCL_DEVICE may be used to select a particular compute device.

See sakra/cotire#155

ddemidov · 2019-02-08T13:41:32Z

I've rebased the PR onto the current master.
271886c adds vexcl source (MIT licensed, around 60K lines, https://github.com/ddemidov/vexcl) to external_libraries, and a034674 patches local vexcl cmake to resolve the cotire issue (sakra/cotire#94).

With this, I am able to compile Kratos core with -DUSE_COTIRE=ON and no warnings.

Since we instantiating the templates explicitly, nothing stops us to convert the templates to plain functions.

RiccardoRossi · 2019-02-13T15:17:17Z

approving...in case appveyor builds... :D

ddemidov · 2019-02-13T19:00:39Z

It does!

philbucher · 2019-02-14T07:40:13Z

this looks cool @ddemidov
I read through this PR and in order to use the GPU I should

enable it in the configure-file
set use_gpgpu in the solver settings
That should be it rigtht?
=> then amgcl will automatically make use of the available hardware?

ddemidov · 2019-02-14T08:19:13Z

Enable the GPGPU support in cmake:

You can choose as GPGPU backend either OpenCL:

cmake -DAMGCL_GPGPU=ON -DAMGCL_GPGPU_BACKEND=OpenCL ...

or CUDA:

cmake -DAMGCL_GPGPU=ON -DAMGCL_GPGPU_BACKEND=CUDA ...

For OpenCL to work you need libOpenCL, OpenCL ICD, and OpenCL headers. The library and ICD usually come with the graphic drivers, and OpenCL headers may be installed as OpenCL SDK or opencl-headers package.

For CUDA you need to install Nvidia CUDA Toolkit.

Then you can use use_gpgpu in the solver settings.

The solver uses vex::Filter::Env device filter which means you can control which device to use with an environment variable OCL_DEVICE=<part of device name>. For example, OCL_DEVICE=Intel to work on Intel CPU with OpenCL support, or OCL_DEVICE=Tesla to choose an Nvidia Tesla GPU. With OpenCL, you can install clinfo package (or it may come with graphic drivers), and use it to list available devices and their names. Here is its output on my machine:

$ clinfo | grep 'Device Name'
  Device Name    Intel(R) Core(TM) i7 CPU         920  @ 2.67GHz
  Device Name    Tesla K40c
  Device Name    GeForce GT 610

ddemidov · 2019-02-14T08:22:15Z

I think @RiccardoRossi did some experiments regarding performance of GPGPU solver in Kratos.

RiccardoRossi · 2019-02-14T09:05:17Z

yes i did:

i have a GTX 1070, it is about 3-3.5 times faster everything included than my i7-8700

(once again, kudos @ddemidov )

One cool thing is that by using OpenCL instead of CUDA you can also use AMD gpus if you have one.
This is pretty unique ...

RiccardoRossi · 2019-02-14T09:05:39Z

BTW, we need to add the usage comments to the wiki

philbucher · 2019-07-17T07:24:56Z

ping it would be nice if you could make a small entry in the Wiki
FYI @AndreasWinterstein @adityaghantasala

mpentek · 2019-08-08T15:37:15Z

Hi, this is very good news and I am trying it out currently. Here are my initial experiences. Mind you, I am not an expert on compiling and how to exploit the advantage of GPU-usage, but would try to support this endeavor by at least having a go at it.

As @ddemidov mentions, the changes regarding setting up and using with Kratos are the following:

in the configure files (or the OpenCL version)

-DAMGCL_GPGPU=ON 
-DAMGCL_GPGPU_BACKEND=CUDA

to be able to use when AMGCL solvers are selected with the specific flag in the project parameters

"use_gpgpu" : true

In case the Wiki part will be written/updated, one should not forget the changes needed for boost, as it needs to be bootstrapped and installed

./bootstrap.sh
./b2 install

as well as properly exported. I needed the following settings:

export CPP_INCLUDE_PATH=~/<path_to_folder>/boost_1_69_0/include:$CPP_INCLUDE_PATH
export LD_LIBRARY_PATH=~/<path_to_folder>/boost_1_69_0/lib:$LD_LIBRARY_PATH

As far as my recent experience goes, that is the easier part. I tried to set it up on 2 desktops today:

Ubuntu 18.04 LTS x64 - Nvidia Quadro P400 - Nvidia Driver Version 430.26 - CUDA Version 10.2 -> setup, compiling, running ==>> WORKS
Ubuntu 18.04 LTS x64 - Nvidia NVS 315 - Nvidia Driver Version 390.116 - tried CUDA Versions 9.1 and higher -> problem because: the hardware seems to need the driver 390, whereas that seems to limit to CUDA 9.1 which does not come (at least not out of the box) for Ubuntu 18.04 ==>> DOES NOT WORK

It it also not straightforward to know when every component version is correct and installed properly (at least not for me). After the following checks it seems to be ready to use:

~$ nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2017 NVIDIA Corporation
Built on Fri_Nov__3_21:07:56_CDT_2017
Cuda compilation tools, release 9.1, V9.1.85

~$ nvidia-smi
Thu Aug  8 17:25:58 2019       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 430.26       Driver Version: 430.26       CUDA Version: 10.2     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  Quadro P400         Off  | 00000000:65:00.0  On |                  N/A |
| 34%   43C    P8    N/A /  N/A |    548MiB /  1997MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+

where the driver NVIDIA number and CUDA versions are displayed and are correct. It took some iterations to remove incorrect versions and set up.

Compiling Kratos with that one running configuration (hardware+software) works with both the flags CUDA and OpenCL. I did some test runs with the 2D and 3D cylinder case from Kratos Fluid using monolithic VMS. No extensive performance testing, as I am not aware how this could be done objectively on a local machine.

Maybe some hints? @ddemidov I tried it out with a rather primitive way using taskset to pin to threads and time it. Maybe that is already killing performance as I pin it to processor threads? Also regarding the next point, running with GPU support means that the computation takes the available/exploits additional resources from the GPU (as much as it can get), or does it try to push the whole job to the GPU?

Interestingly, the 3D case (with 140k nodes and 820k elements) went through with OpenCL using circa 800 MB from the 2GB available from my GPU whereas with CUDA it threw CUDA Driver API Error (2 - CUDA_ERROR_OUT_OF_MEMORY).

Can it happen that the same hardware manages with one type of compilation to handle a case but not with another? @ddemidov

It seems that on my hardware and compiled with the CUDA flag I manage to run a case at most 600-700k elements. Could it be, that although it works with my GPU, the hardware is just not powerful enough to exploit this possibility? @RiccardoRossi what cases did you try it out with? The GPU you mention seems to have 8GB available.

Does one still need to take care of the environment variable OCL_DEVICE and compile flag -DUSE_COTIRE, or those are not relevant at the current state anymore?

Specs for NVIDIA Quadro P400:
GPU Memory: 2 GB GDDR5
Memory Interface: 64-bit
Memory Bandwidth: Up to 32 GB/s
NVIDIA CUDA® Cores: 256
Graphic APIs: OpenGL 4.53, DirectX 12.04, Vulkan 1.03
Compute: APIs CUDA, DirectCompute, OpenCL

Running on Fujitsu desktop with:
Processor: Intel Xeon(R) W-2145 CPU @ 3.70GHz × 16
Memory: 62.5 GiB (DIMM DDR4 Synchronous 2666 MHz (0.4 ns) 16 GB x 4)

ddemidov · 2019-08-08T16:32:21Z

one should not forget the changes needed for boost, as it needs to be bootstrapped and installed

Right. although I prefer to use system version of boost libs (installed with something like apt install libboost-all-devel on ubuntu).

Ubuntu 18.04 LTS x64 - Nvidia NVS 315 - Nvidia Driver Version 390.116 - tried CUDA Versions 9.1 and higher -> problem because: the hardware seems to need the driver 390, whereas that seems to limit to CUDA 9.1 which does not come (at least not out of the box) for Ubuntu 18.04 ==>> DOES NOT WORK

I have the same experience. Nvidia in the last years makes it hard for owners of 'old' hardware: they drop driver support for what they consider 'old' very quickly, and the latest cuda toolkit versions require the latest driver versions. It is much easier to use their OpenCL though: it does not seem to have this problem. So I would consider using -DAMGCL_GPGPU_BACKEND=OpenCL.

Compiling Kratos with that one running configuration (hardware+software) works with both the flags CUDA and OpenCL. I did some test runs with the 2D and 3D cylinder case from Kratos Fluid using monolithic VMS. No extensive performance testing, as I am not aware how this could be done objectively on a local machine.

Maybe some hints? @ddemidov I tried it out with a rather primitive way using taskset to pin to threads and time it. Maybe that is already killing performance as I pin it to processor threads? Also regarding the next point, running with GPU support means that the computation takes the available/exploits additional resources from the GPU (as much as it can get), or does it try to push the whole job to the GPU?

I think the most objective way is just to measure the wallclock time needed to complete the whole run. Not sure how to do this exactly in Kratos though. Also, if you are using OpenCL backend, make sure you are not using your CPU instead of GPU (or CPU together with GPU). You can use environment variables to control the compute device choice:

OCL_DEVICE=GeForce # Choose device by name
OCL_PLATFORM=NVIDIA # Choose device by platform name
OCL_TYPE=GPU # Choose device by type

See the complete list here: https://vexcl.readthedocs.io/en/latest/initialize.html#common-filters.

Interestingly, the 3D case (with 140k nodes and 820k elements) went through with OpenCL using circa 800 MB from the 2GB available from my GPU whereas with CUDA it threw CUDA Driver API Error (2 - CUDA_ERROR_OUT_OF_MEMORY).

Not sure what happens here. VexCL uses sparse matrices from (closed-source) CUSPARSE library with the CUDA backend, so it is possible the matrices require more memory either during construction or permanently.

Specs for NVIDIA Quadro P400:
GPU Memory: 2 GB GDDR5
Memory Interface: 64-bit
Memory Bandwidth: Up to 32 GB/s

This does not look like a very fast GPU (looking at memory bandwidth). @RiccardoRossi ran his tests on GTX 1070 which has the bandwidth of 256 GB/s. The performance of the solver should be roughly proportional to the available memory bandwidth, so your GPU should be about 8x slower than Riccardo's.

mpentek · 2019-08-08T17:16:35Z

@ddemidov thanks for the hints! I will give it a try with the environment variables.

With respect to memory requirements and comsumption, I should understand that GPU computing tries to push the whole computation to the GPU, right? Or am I still misunderstanding it.

@adityaghantasala @AndreasWinterstein it would be worth trying and testing on our more capable desktop machine.

@RiccardoRossi any hints and experiences with respect to problem type and size to run? Have you had similar issues, like having not enough memory?

RiccardoRossi · 2019-08-08T17:47:19Z

in my experiencr it was approximately 3 times faster with gpu than with cpu. i have 8GB video ram, so i did not reach that limit neother with ooencl nor with cuda, but admittedly tbe case only had approx 1m elements

…

On Thu, Aug 8, 2019, 7:16 PM mpentek ***@***.***> wrote: @ddemidov <https://github.com/ddemidov> thanks for the hints! I will give it a try with the environment variables. With respect to memory requirements and comsumption, I should understand that GPU computing tries to push the whole computation to the GPU, right? Or am I still misunderstanding it. @adityaghantasala <https://github.com/adityaghantasala> @AndreasWinterstein <https://github.com/AndreasWinterstein> it would be worth trying and testing on our more capable desktop machine. @RiccardoRossi <https://github.com/RiccardoRossi> any hints and experiences with respect to problem type and size to run? Have you had similar issues, like having not enough memory? — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#3831?email_source=notifications&email_token=AB5PWEKPUM5SZCEN2ABJMBDQDRIHJA5CNFSM4GQEBQ7KYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD34JZEY#issuecomment-519609491>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AB5PWENQVK22WAU57K5MMCDQDRIHJANCNFSM4GQEBQ7A> .

ddemidov · 2019-08-08T17:59:43Z

With respect to memory requirements and comsumption, I should understand that GPU computing tries to push the whole computation to the GPU, right? Or am I still misunderstanding it.

AMGCL constructs the AMG hierarchy (the set of coarser and coarser system matrices together with inter-level transfer operators) on the CPU, and then transfers the complete hierarchy to the GPU. The whole computation is then done on the GPU.

RiccardoRossi · 2019-08-09T13:12:15Z

@ddemidov i think you should tell that you did implement the whole compitation on gpu, but that after testing the cpu was faster for the "preparation phase" even including tjentransfer time

…

On Thu, Aug 8, 2019, 7:59 PM Denis Demidov ***@***.***> wrote: With respect to memory requirements and comsumption, I should understand that GPU computing tries to push the whole computation to the GPU, right? Or am I still misunderstanding it. AMGCL constructs the AMG hierarchy (the set of coarser and coarser system matrices together with inter-level transfer operators) on the CPU, and then transfers the complete hierarchy to the GPU. The whole computation is then done on the GPU. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#3831?email_source=notifications&email_token=AB5PWEPE42FDGHKCJN3KFWLQDRNJBA5CNFSM4GQEBQ7KYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD34NS2A#issuecomment-519625064>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AB5PWEJXFHWY6WYLQXT53WTQDRNJBANCNFSM4GQEBQ7A> .

ddemidov · 2019-08-09T15:09:09Z

I don't have an option to do the setup on the GPU, but we did compare performance with the cusplibrary, which does the complete setup GPU-side. It appeared our approach was faster (and more memory-efficient). See the benchmarks here: https://amgcl.readthedocs.io/en/latest/benchmarks.html#d-poisson-problem.

ddemidov requested a review from RiccardoRossi January 15, 2019 13:58

ddemidov force-pushed the core/amgcl-test-opencl branch from 2fd4aab to 7519c45 Compare January 17, 2019 12:47

ddemidov force-pushed the core/amgcl-test-opencl branch from c0d09cf to 2632dfa Compare January 22, 2019 19:07

ddemidov force-pushed the core/amgcl-test-opencl branch from 2632dfa to 8324752 Compare January 22, 2019 19:10

ddemidov changed the title ~~Test opencl backend with amgcl solver~~ [core] Test GPGPU (OpenCL/CUDA) backend with AMGCL solver Jan 22, 2019

ddemidov force-pushed the core/amgcl-test-opencl branch from cc1ccc1 to 25048ef Compare January 23, 2019 11:36

ddemidov mentioned this pull request Jan 23, 2019

Fix generator expression for binary targets in compile definitions sakra/cotire#155

Open

ddemidov added 5 commits February 8, 2019 16:08

See if cotire works this way

1a576a9

Patch COTIRE to support cmake generator expressions

9b83d1a

See sakra/cotire#155

Add vexcl source to external_libraries

271886c

Patch vexcl cmake to resolve cotire issue

a034674

ddemidov force-pushed the core/amgcl-test-opencl branch from 25048ef to a034674 Compare February 8, 2019 13:37

ddemidov force-pushed the core/amgcl-test-opencl branch from 962ef32 to a034674 Compare February 9, 2019 19:12

Get rid of explicit template instantiation

22f0b38

Since we instantiating the templates explicitly, nothing stops us to convert the templates to plain functions.

ddemidov force-pushed the core/amgcl-test-opencl branch from bcbe915 to 22f0b38 Compare February 13, 2019 11:55

RiccardoRossi approved these changes Feb 13, 2019

View reviewed changes

ddemidov merged commit e01fbd1 into master Feb 13, 2019

ddemidov deleted the core/amgcl-test-opencl branch February 13, 2019 19:00

[core] Test GPGPU (OpenCL/CUDA) backend with AMGCL solver #3831

[core] Test GPGPU (OpenCL/CUDA) backend with AMGCL solver #3831

Uh oh!

Conversation

ddemidov commented Jan 15, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ddemidov commented Jan 16, 2019

Uh oh!

ddemidov commented Jan 16, 2019

Uh oh!

ddemidov commented Jan 17, 2019

Uh oh!

RiccardoRossi commented Jan 18, 2019

Uh oh!

RiccardoRossi commented Jan 18, 2019 • edited by ddemidov Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ddemidov commented Jan 18, 2019

Uh oh!

ddemidov commented Jan 18, 2019

Uh oh!

RiccardoRossi commented Jan 22, 2019

Uh oh!

RiccardoRossi commented Jan 22, 2019

Uh oh!

ddemidov commented Jan 22, 2019

Uh oh!

ddemidov commented Jan 22, 2019

Uh oh!

RiccardoRossi commented Jan 23, 2019

Uh oh!

ddemidov commented Jan 23, 2019

Uh oh!

ddemidov commented Jan 23, 2019

Uh oh!

ddemidov commented Jan 23, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ddemidov commented Feb 8, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

RiccardoRossi commented Feb 13, 2019

Uh oh!

ddemidov commented Feb 13, 2019

Uh oh!

philbucher commented Feb 14, 2019

Uh oh!

ddemidov commented Feb 14, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ddemidov commented Feb 14, 2019

Uh oh!

RiccardoRossi commented Feb 14, 2019

Uh oh!

RiccardoRossi commented Feb 14, 2019

Uh oh!

philbucher commented Jul 17, 2019

Uh oh!

mpentek commented Aug 8, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ddemidov commented Aug 8, 2019

Uh oh!

mpentek commented Aug 8, 2019

Uh oh!

RiccardoRossi commented Aug 8, 2019 via email

Uh oh!

ddemidov commented Aug 8, 2019

Uh oh!

RiccardoRossi commented Aug 9, 2019 via email

Uh oh!

ddemidov commented Aug 9, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

ddemidov commented Jan 15, 2019 •

edited

Loading

RiccardoRossi commented Jan 18, 2019 •

edited by ddemidov

Loading

ddemidov commented Jan 23, 2019 •

edited

Loading

ddemidov commented Feb 8, 2019 •

edited

Loading

ddemidov commented Feb 14, 2019 •

edited

Loading

mpentek commented Aug 8, 2019 •

edited

Loading

ddemidov commented Aug 9, 2019 •

edited

Loading