Skip to content

Conversation

@ddemidov
Copy link
Member

@ddemidov ddemidov commented Jan 15, 2019

Enable vexcl backend for amgcl solver, which makes it possible to use
GPGPU (either CUDA or OpenCL) in order to accelerate solution.

  • CMake option AMGCL_GPGPU (default: OFF) controls whether to compile
    GPGPU support.
  • CMake option AMGCL_GPGPU_BACKEND (default: OpenCL) selects vexcl
    backend (OpenCL/CUDA)
  • New setting in linear solver parameters: use_gpgpu (default: false) enables GPGPU at
    runtime.
  • Environment variable OCL_DEVICE may be used to select a particular
    compute device.

It is enough to clone and configure VexCL anywhere for cmake to pick it up.
VexCL sources are in external_libraries.

@ddemidov
Copy link
Member Author

Regarding increased memory consumption during compilation:

I think we could keep just the declarations of AMGCLScalarSolve and AMGCLBlockSolve function templates in amgcl_solver.h, and move the actual definitions (and explicit instantiations for all possible combinations of TSparseSpaceType and TBlockSize) into a cpp file. This file would be the only one to actually include vexcl headers. This way compilation of the rest of Kratos should not be affected.

@ddemidov
Copy link
Member Author

dc926f4 seems to work. @RiccardoRossi, can you check if this reduces compilation memory requirements? I suspect this may also reduce the overall compile time of KratosCore, since all amgcl headers are now moved to a separate compilation unit.

@ddemidov ddemidov force-pushed the core/amgcl-test-opencl branch from 2fd4aab to 7519c45 Compare January 17, 2019 12:47
@ddemidov
Copy link
Member Author

7519c45 allows to choose between OpenCL and CUDA backends with a cmake option.

@RiccardoRossi
Copy link
Member

When compiling with cotire and the following flags:

-DUSE_COTIRE=ON
-DMPI_NEEDED=ON
-DAMGCL_GPGPU=ON
-DAMGCL_GPGPU_BACKEND=CUDA \

the following warning are issued (however the code compiles and runs)

-- CXX target mpipython cotired without precompiled header. Too few applicable sources.
--------------------------------  standard install dir /home/riccardo/Kratos
installed blas = /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.7.1
installed lapack = /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.7.1/usr/lib/x86_64-linux-gnu/blas/libblas.so.3.7.1
-- Configuring done
CMake Error at cmake_modules/cotire.cmake:2254 (file):
  Error evaluating generator expression:

    $<CXX_COMPILER_ID:MSVC>

  $<CXX_COMPILER_ID> may only be used with binary targets.  It may not be
  used with add_custom_command or add_custom_target.
Call Stack (most recent call first):
  cmake_modules/cotire.cmake:2848 (cotire_generate_target_script)
  cmake_modules/cotire.cmake:3254 (cotire_process_target_language)
  cmake_modules/cotire.cmake:3431 (cotire_target)
  kratos/CMakeLists.txt:184 (cotire)


CMake Error at cmake_modules/cotire.cmake:2254 (file):
  Error evaluating generator expression:

    $<CXX_COMPILER_ID:MSVC>

  $<CXX_COMPILER_ID> may only be used with binary targets.  It may not be
  used with add_custom_command or add_custom_target.
Call Stack (most recent call first):
  cmake_modules/cotire.cmake:2848 (cotire_generate_target_script)
  cmake_modules/cotire.cmake:3254 (cotire_process_target_language)
  cmake_modules/cotire.cmake:3431 (cotire_target)
  kratos/CMakeLists.txt:184 (cotire)


CMake Error at cmake_modules/cotire.cmake:2254 (file):
  Error evaluating generator expression:

    $<CXX_COMPILER_ID:MSVC>

  $<CXX_COMPILER_ID> may only be used with binary targets.  It may not be
  used with add_custom_command or add_custom_target.
Call Stack (most recent call first):
  cmake_modules/cotire.cmake:2848 (cotire_generate_target_script)
  cmake_modules/cotire.cmake:3254 (cotire_process_target_language)
  cmake_modules/cotire.cmake:3431 (cotire_target)
  kratos/CMakeLists.txt:184 (cotire)


CMake Error at cmake_modules/cotire.cmake:2254 (file):
  Error evaluating generator expression:

    $<CXX_COMPILER_ID:MSVC>

  $<CXX_COMPILER_ID> may only be used with binary targets.  It may not be
  used with add_custom_command or add_custom_target.
Call Stack (most recent call first):
  cmake_modules/cotire.cmake:2848 (cotire_generate_target_script)
  cmake_modules/cotire.cmake:3254 (cotire_process_target_language)
  cmake_modules/cotire.cmake:3431 (cotire_target)
  kratos/CMakeLists.txt:184 (cotire)


CMake Error at cmake_modules/cotire.cmake:2254 (file):
  Error evaluating generator expression:

    $<CXX_COMPILER_ID:MSVC>

  $<CXX_COMPILER_ID> may only be used with binary targets.  It may not be
  used with add_custom_command or add_custom_target.
Call Stack (most recent call first):
  cmake_modules/cotire.cmake:2848 (cotire_generate_target_script)
  cmake_modules/cotire.cmake:3254 (cotire_process_target_language)
  cmake_modules/cotire.cmake:3431 (cotire_target)
  kratos/CMakeLists.txt:184 (cotire)


CMake Error at cmake_modules/cotire.cmake:2254 (file):
  Error evaluating generator expression:

    $<CXX_COMPILER_ID:MSVC>

  $<CXX_COMPILER_ID> may only be used with binary targets.  It may not be
  used with add_custom_command or add_custom_target.
Call Stack (most recent call first):
  cmake_modules/cotire.cmake:2848 (cotire_generate_target_script)
  cmake_modules/cotire.cmake:3254 (cotire_process_target_language)
  cmake_modules/cotire.cmake:3431 (cotire_target)
  kratos/CMakeLists.txt:184 (cotire)


CMake Error at cmake_modules/cotire.cmake:2254 (file):
  Error evaluating generator expression:

    $<CXX_COMPILER_ID:MSVC>

  $<CXX_COMPILER_ID> may only be used with binary targets.  It may not be
  used with add_custom_command or add_custom_target.
Call Stack (most recent call first):-- CXX target mpipython cotired without precompiled header. Too few applicable sources.
--------------------------------  standard install dir /home/riccardo/Kratos
installed blas = /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.7.1
installed lapack = /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.7.1/usr/lib/x86_64-linux-gnu/blas/libblas.so.3.7.1
-- Configuring done
CMake Error at cmake_modules/cotire.cmake:2254 (file):
  Error evaluating generator expression:

    $<CXX_COMPILER_ID:MSVC>

  $<CXX_COMPILER_ID> may only be used with binary targets.  It may not be
  used with add_custom_command or add_custom_target.
Call Stack (most recent call first):
  cmake_modules/cotire.cmake:2848 (cotire_generate_target_script)
  cmake_modules/cotire.cmake:3254 (cotire_process_target_language)
  cmake_modules/cotire.cmake:3431 (cotire_target)
  kratos/CMakeLists.txt:184 (cotire)


CMake Error at cmake_modules/cotire.cmake:2254 (file):
  Error evaluating generator expression:

    $<CXX_COMPILER_ID:MSVC>

  $<CXX_COMPILER_ID> may only be used with binary targets.  It may not be
  used with add_custom_command or add_custom_target.
Call Stack (most recent call first):
  cmake_modules/cotire.cmake:2848 (cotire_generate_target_script)
  cmake_modules/cotire.cmake:3254 (cotire_process_target_language)
  cmake_modules/cotire.cmake:3431 (cotire_target)
  kratos/CMakeLists.txt:184 (cotire)


CMake Error at cmake_modules/cotire.cmake:2254 (file):
  Error evaluating generator expression:

    $<CXX_COMPILER_ID:MSVC>

  $<CXX_COMPILER_ID> may only be used with binary targets.  It may not be
  used with add_custom_command or add_custom_target.
Call Stack (most recent call first):
  cmake_modules/cotire.cmake:2848 (cotire_generate_target_script)
  cmake_modules/cotire.cmake:3254 (cotire_process_target_language)
  cmake_modules/cotire.cmake:3431 (cotire_target)
  kratos/CMakeLists.txt:184 (cotire)


CMake Error at cmake_modules/cotire.cmake:2254 (file):
  Error evaluating generator expression:

    $<CXX_COMPILER_ID:MSVC>

  $<CXX_COMPILER_ID> may only be used with binary targets.  It may not be
  used with add_custom_command or add_custom_target.
Call Stack (most recent call first):
  cmake_modules/cotire.cmake:2848 (cotire_generate_target_script)
  cmake_modules/cotire.cmake:3254 (cotire_process_target_language)
  cmake_modules/cotire.cmake:3431 (cotire_target)
  kratos/CMakeLists.txt:184 (cotire)


CMake Error at cmake_modules/cotire.cmake:2254 (file):
  Error evaluating generator expression:

    $<CXX_COMPILER_ID:MSVC>

  $<CXX_COMPILER_ID> may only be used with binary targets.  It may not be
  used with add_custom_command or add_custom_target.
Call Stack (most recent call first):
  cmake_modules/cotire.cmake:2848 (cotire_generate_target_script)
  cmake_modules/cotire.cmake:3254 (cotire_process_target_language)
  cmake_modules/cotire.cmake:3431 (cotire_target)
  kratos/CMakeLists.txt:184 (cotire)


CMake Error at cmake_modules/cotire.cmake:2254 (file):
  Error evaluating generator expression:

    $<CXX_COMPILER_ID:MSVC>

  $<CXX_COMPILER_ID> may only be used with binary targets.  It may not be
  used with add_custom_command or add_custom_target.
Call Stack (most recent call first):
  cmake_modules/cotire.cmake:2848 (cotire_generate_target_script)
  cmake_modules/cotire.cmake:3254 (cotire_process_target_language)
  cmake_modules/cotire.cmake:3431 (cotire_target)
  kratos/CMakeLists.txt:184 (cotire)


CMake Error at cmake_modules/cotire.cmake:2254 (file):
  Error evaluating generator expression:

    $<CXX_COMPILER_ID:MSVC>

  $<CXX_COMPILER_ID> may only be used with binary targets.  It may not be
  used with add_custom_command or add_custom_target.
Call Stack (most recent call first):
  cmake_modules/cotire.cmake:2848 (cotire_generate_target_script)
  cmake_modules/cotire.cmake:3254 (cotire_process_target_language)
  cmake_modules/cotire.cmake:3431 (cotire_target)
  kratos/CMakeLists.txt:197 (cotire)


CMake Error at cmake_modules/cotire.cmake:2254 (file):
  Error evaluating generator expression:

    $<CXX_COMPILER_ID:MSVC>

  $<CXX_COMPILER_ID> may only be used with binary targets.  It may not be
  used with add_custom_command or add_custom_target.
Call Stack (most recent call first):
  cmake_modules/cotire.cmake:2848 (cotire_generate_target_script)
  cmake_modules/cotire.cmake:3254 (cotire_process_target_language)
  cmake_modules/cotire.cmake:3431 (cotire_target)
  kratos/CMakeLists.txt:197 (cotire)


CMake Error at cmake_modules/cotire.cmake:2254 (file):
  Error evaluating generator expression:

    $<CXX_COMPILER_ID:MSVC>

  $<CXX_COMPILER_ID> may only be used with binary targets.  It may not be
  used with add_custom_command or add_custom_target.
Call Stack (most recent call first):
  cmake_modules/cotire.cmake:2848 (cotire_generate_target_script)
  cmake_modules/cotire.cmake:3254 (cotire_process_target_language)
  cmake_modules/cotire.cmake:3431 (cotire_target)
  kratos/CMakeLists.txt:197 (cotire)


-- Generating done

  cmake_modules/cotire.cmake:2848 (cotire_generate_target_script)
  cmake_modules/cotire.cmake:3254 (cotire_process_target_language)
  cmake_modules/cotire.cmake:3431 (cotire_target)
  kratos/CMakeLists.txt:197 (cotire)


CMake Error at cmake_modules/cotire.cmake:2254 (file):
  Error evaluating generator expression:

    $<CXX_COMPILER_ID:MSVC>

  $<CXX_COMPILER_ID> may only be used with binary targets.  It may not be
  used with add_custom_command or add_custom_target.
Call Stack (most recent call first):
  cmake_modules/cotire.cmake:2848 (cotire_generate_target_script)
  cmake_modules/cotire.cmake:3254 (cotire_process_target_language)
  cmake_modules/cotire.cmake:3431 (cotire_target)
  kratos/CMakeLists.txt:197 (cotire)


CMake Error at cmake_modules/cotire.cmake:2254 (file):
  Error evaluating generator expression:

    $<CXX_COMPILER_ID:MSVC>

  $<CXX_COMPILER_ID> may only be used with binary targets.  It may not be
  used with add_custom_command or add_custom_target.
Call Stack (most recent call first):
  cmake_modules/cotire.cmake:2848 (cotire_generate_target_script)
  cmake_modules/cotire.cmake:3254 (cotire_process_target_language)
  cmake_modules/cotire.cmake:3431 (cotire_target)
  kratos/CMakeLists.txt:197 (cotire)


-- Generating done

@RiccardoRossi
Copy link
Member

RiccardoRossi commented Jan 18, 2019

correction:

when running the compilation with the flags

-DUSE_COTIRE=ON
-DMPI_NEEDED=ON
-DAMGCL_GPGPU=ON
-DAMGCL_GPGPU_BACKEND=CUDA \

the following error is issued

 riccardo  ~/.../AMGCLissue/test_cylinder_mpi.gid  mpirun --np 4 python3 MainKratos.py 
Traceback (most recent call last):
  File "MainKratos.py", line 3, in <module>
    import KratosMultiphysics
  File "/home/riccardo/Kratos/KratosMultiphysics/__init__.py", line 13, in <module>
    from Kratos import *
ImportError: dynamic module does not define module export function (PyInit_Kratos)
Traceback (most recent call last):
  File "MainKratos.py", line 3, in <module>
    import KratosMultiphysics
  File "/home/riccardo/Kratos/KratosMultiphysics/__init__.py", line 13, in <module>
    from Kratos import *
ImportError: dynamic module does not define module export function (PyInit_Kratos)
Traceback (most recent call last):
  File "MainKratos.py", line 3, in <module>
    import KratosMultiphysics
  File "/home/riccardo/Kratos/KratosMultiphysics/__init__.py", line 13, in <module>
    from Kratos import *
ImportError: dynamic module does not define module export function (PyInit_Kratos)
Traceback (most recent call last):
  File "MainKratos.py", line 3, in <module>
    import KratosMultiphysics
  File "/home/riccardo/Kratos/KratosMultiphysics/__init__.py", line 13, in <module>
    from Kratos import *
ImportError: dynamic module does not define module export function (PyInit_Kratos)
-------------------------------------------------------
Primary job  terminated normally, but 1 process returned
a non-zero exit code.. Per user-direction, the job has been aborted.
-------------------------------------------------------
--------------------------------------------------------------------------
mpirun detected that one or more processes exited with non-zero status, thus causing
the job to be terminated. The first process to do so was:

  Process name: [[54811,1],1]
  Exit code:    1

@ddemidov
Copy link
Member Author

I don't think the error is related to the changes here?

@ddemidov
Copy link
Member Author

Re cotire: sakra/cotire#135

@RiccardoRossi
Copy link
Member

yes it looks like the error with cotire is the one you indicate... however i did not really understand how the patch should be applied

@RiccardoRossi
Copy link
Member

mmm also this one is relevant

sakra/cotire#94

although it still does not give a solution (i think you are doing what they suggest)

@ddemidov
Copy link
Member Author

I did not do anything to solve the cotire issue; I think this should be solved on the cotire side first.

@ddemidov ddemidov force-pushed the core/amgcl-test-opencl branch from c0d09cf to 2632dfa Compare January 22, 2019 19:07
@ddemidov
Copy link
Member Author

Rebased onto the current master and squashed the commits.

@ddemidov ddemidov force-pushed the core/amgcl-test-opencl branch from 2632dfa to 8324752 Compare January 22, 2019 19:10
@ddemidov ddemidov changed the title Test opencl backend with amgcl solver [core] Test GPGPU (OpenCL/CUDA) backend with AMGCL solver Jan 22, 2019
@RiccardoRossi
Copy link
Member

I have been trying stuff with cotire (including trying out the newest version). Unfortunately cotire is needed since we use it in our CI.

I verified and the latest master it does compile with cotire without any problem, so the problem comes from the local modifications of the CMakeLists.txt

I am 95% sure that the problem comes from

  set_property(CACHE AMGCL_GPGPU_BACKEND PROPERTY STRINGS "OpenCL" "CUDA")

or

  target_compile_definitions(KratosCore PUBLIC AMGCL_GPGPU)

can't we use "add_define" instead and define it globally to all the KratosCore?

@ddemidov
Copy link
Member Author

Modifying global state is considered extremely bad practice in cmake, but since this is how all of Kratos is currently built anyway, I think we can do that.

@ddemidov
Copy link
Member Author

@RiccardoRossi , is c702ec1 what you had in mind? Does that work?

@ddemidov
Copy link
Member Author

ddemidov commented Jan 23, 2019

25048ef applies cotire patch from sakra/cotire#155. With this, I am able to compile with both AMGCL_GPGPU=ON and USE_COTIRE=ON.

Enable vexcl backend for amgcl solver, which makes it possible to use
GPGPU (either CUDA or OpenCL) in order to accelerate solution.

* CMake option AMGCL_GPGPU (default: off) controls whether to compile
  GPGPU support.
* CMake option AMGCL_GPGPU_BACKEND (default: OpenCL) selects vexcl
  backend (OpenCL/CUDA)
* New setting in linear solver parameters: `use_gpgpu` enables GPGPU at
  runtime.
* Environment variable OCL_DEVICE may be used to select a particular
  compute device.
@ddemidov ddemidov force-pushed the core/amgcl-test-opencl branch from 25048ef to a034674 Compare February 8, 2019 13:37
@ddemidov
Copy link
Member Author

ddemidov commented Feb 8, 2019

I've rebased the PR onto the current master.
271886c adds vexcl source (MIT licensed, around 60K lines, https://github.com/ddemidov/vexcl) to external_libraries, and a034674 patches local vexcl cmake to resolve the cotire issue (sakra/cotire#94).

With this, I am able to compile Kratos core with -DUSE_COTIRE=ON and no warnings.

@ddemidov ddemidov force-pushed the core/amgcl-test-opencl branch from 962ef32 to a034674 Compare February 9, 2019 19:12
Since we instantiating the templates explicitly,
nothing stops us to convert the templates to plain functions.
@ddemidov ddemidov force-pushed the core/amgcl-test-opencl branch from bcbe915 to 22f0b38 Compare February 13, 2019 11:55
@RiccardoRossi
Copy link
Member

approving...in case appveyor builds... :D

@ddemidov ddemidov merged commit e01fbd1 into master Feb 13, 2019
@ddemidov
Copy link
Member Author

It does!

@ddemidov ddemidov deleted the core/amgcl-test-opencl branch February 13, 2019 19:00
@philbucher
Copy link
Member

this looks cool @ddemidov
I read through this PR and in order to use the GPU I should

  • enable it in the configure-file
  • set use_gpgpu in the solver settings
    That should be it rigtht?
    => then amgcl will automatically make use of the available hardware?

@ddemidov
Copy link
Member Author

ddemidov commented Feb 14, 2019

  • Enable the GPGPU support in cmake:

You can choose as GPGPU backend either OpenCL:

cmake -DAMGCL_GPGPU=ON -DAMGCL_GPGPU_BACKEND=OpenCL ...

or CUDA:

cmake -DAMGCL_GPGPU=ON -DAMGCL_GPGPU_BACKEND=CUDA ...

For OpenCL to work you need libOpenCL, OpenCL ICD, and OpenCL headers. The library and ICD usually come with the graphic drivers, and OpenCL headers may be installed as OpenCL SDK or opencl-headers package.

For CUDA you need to install Nvidia CUDA Toolkit.

  • Then you can use use_gpgpu in the solver settings.

The solver uses vex::Filter::Env device filter which means you can control which device to use with an environment variable OCL_DEVICE=<part of device name>. For example, OCL_DEVICE=Intel to work on Intel CPU with OpenCL support, or OCL_DEVICE=Tesla to choose an Nvidia Tesla GPU. With OpenCL, you can install clinfo package (or it may come with graphic drivers), and use it to list available devices and their names. Here is its output on my machine:

$ clinfo | grep 'Device Name'
  Device Name    Intel(R) Core(TM) i7 CPU         920  @ 2.67GHz
  Device Name    Tesla K40c
  Device Name    GeForce GT 610

@ddemidov
Copy link
Member Author

I think @RiccardoRossi did some experiments regarding performance of GPGPU solver in Kratos.

@RiccardoRossi
Copy link
Member

yes i did:

i have a GTX 1070, it is about 3-3.5 times faster everything included than my i7-8700

(once again, kudos @ddemidov )

One cool thing is that by using OpenCL instead of CUDA you can also use AMD gpus if you have one.
This is pretty unique ...

@RiccardoRossi
Copy link
Member

BTW, we need to add the usage comments to the wiki

@philbucher
Copy link
Member

ping it would be nice if you could make a small entry in the Wiki
FYI @AndreasWinterstein @adityaghantasala

@mpentek
Copy link
Member

mpentek commented Aug 8, 2019

Hi, this is very good news and I am trying it out currently. Here are my initial experiences. Mind you, I am not an expert on compiling and how to exploit the advantage of GPU-usage, but would try to support this endeavor by at least having a go at it.

As @ddemidov mentions, the changes regarding setting up and using with Kratos are the following:

in the configure files (or the OpenCL version)

-DAMGCL_GPGPU=ON 
-DAMGCL_GPGPU_BACKEND=CUDA

to be able to use when AMGCL solvers are selected with the specific flag in the project parameters

"use_gpgpu" : true

In case the Wiki part will be written/updated, one should not forget the changes needed for boost, as it needs to be bootstrapped and installed

./bootstrap.sh
./b2 install

as well as properly exported. I needed the following settings:

export CPP_INCLUDE_PATH=~/<path_to_folder>/boost_1_69_0/include:$CPP_INCLUDE_PATH
export LD_LIBRARY_PATH=~/<path_to_folder>/boost_1_69_0/lib:$LD_LIBRARY_PATH

As far as my recent experience goes, that is the easier part. I tried to set it up on 2 desktops today:

  1. Ubuntu 18.04 LTS x64 - Nvidia Quadro P400 - Nvidia Driver Version 430.26 - CUDA Version 10.2 -> setup, compiling, running ==>> WORKS

  2. Ubuntu 18.04 LTS x64 - Nvidia NVS 315 - Nvidia Driver Version 390.116 - tried CUDA Versions 9.1 and higher -> problem because: the hardware seems to need the driver 390, whereas that seems to limit to CUDA 9.1 which does not come (at least not out of the box) for Ubuntu 18.04 ==>> DOES NOT WORK

It it also not straightforward to know when every component version is correct and installed properly (at least not for me). After the following checks it seems to be ready to use:

~$ nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2017 NVIDIA Corporation
Built on Fri_Nov__3_21:07:56_CDT_2017
Cuda compilation tools, release 9.1, V9.1.85
~$ nvidia-smi
Thu Aug  8 17:25:58 2019       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 430.26       Driver Version: 430.26       CUDA Version: 10.2     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  Quadro P400         Off  | 00000000:65:00.0  On |                  N/A |
| 34%   43C    P8    N/A /  N/A |    548MiB /  1997MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+

where the driver NVIDIA number and CUDA versions are displayed and are correct. It took some iterations to remove incorrect versions and set up.

Compiling Kratos with that one running configuration (hardware+software) works with both the flags CUDA and OpenCL. I did some test runs with the 2D and 3D cylinder case from Kratos Fluid using monolithic VMS. No extensive performance testing, as I am not aware how this could be done objectively on a local machine.

Maybe some hints? @ddemidov I tried it out with a rather primitive way using taskset to pin to threads and time it. Maybe that is already killing performance as I pin it to processor threads? Also regarding the next point, running with GPU support means that the computation takes the available/exploits additional resources from the GPU (as much as it can get), or does it try to push the whole job to the GPU?

Interestingly, the 3D case (with 140k nodes and 820k elements) went through with OpenCL using circa 800 MB from the 2GB available from my GPU whereas with CUDA it threw CUDA Driver API Error (2 - CUDA_ERROR_OUT_OF_MEMORY).

Can it happen that the same hardware manages with one type of compilation to handle a case but not with another? @ddemidov

It seems that on my hardware and compiled with the CUDA flag I manage to run a case at most 600-700k elements. Could it be, that although it works with my GPU, the hardware is just not powerful enough to exploit this possibility? @RiccardoRossi what cases did you try it out with? The GPU you mention seems to have 8GB available.

Does one still need to take care of the environment variable OCL_DEVICE and compile flag -DUSE_COTIRE, or those are not relevant at the current state anymore?

Specs for NVIDIA Quadro P400:
GPU Memory: 2 GB GDDR5
Memory Interface: 64-bit
Memory Bandwidth: Up to 32 GB/s
NVIDIA CUDA® Cores: 256
Graphic APIs: OpenGL 4.53, DirectX 12.04, Vulkan 1.03
Compute: APIs CUDA, DirectCompute, OpenCL

Running on Fujitsu desktop with:
Processor: Intel Xeon(R) W-2145 CPU @ 3.70GHz × 16
Memory: 62.5 GiB (DIMM DDR4 Synchronous 2666 MHz (0.4 ns) 16 GB x 4)

@ddemidov
Copy link
Member Author

ddemidov commented Aug 8, 2019

one should not forget the changes needed for boost, as it needs to be bootstrapped and installed

Right. although I prefer to use system version of boost libs (installed with something like apt install libboost-all-devel on ubuntu).

Ubuntu 18.04 LTS x64 - Nvidia NVS 315 - Nvidia Driver Version 390.116 - tried CUDA Versions 9.1 and higher -> problem because: the hardware seems to need the driver 390, whereas that seems to limit to CUDA 9.1 which does not come (at least not out of the box) for Ubuntu 18.04 ==>> DOES NOT WORK

I have the same experience. Nvidia in the last years makes it hard for owners of 'old' hardware: they drop driver support for what they consider 'old' very quickly, and the latest cuda toolkit versions require the latest driver versions. It is much easier to use their OpenCL though: it does not seem to have this problem. So I would consider using -DAMGCL_GPGPU_BACKEND=OpenCL.

Compiling Kratos with that one running configuration (hardware+software) works with both the flags CUDA and OpenCL. I did some test runs with the 2D and 3D cylinder case from Kratos Fluid using monolithic VMS. No extensive performance testing, as I am not aware how this could be done objectively on a local machine.

Maybe some hints? @ddemidov I tried it out with a rather primitive way using taskset to pin to threads and time it. Maybe that is already killing performance as I pin it to processor threads? Also regarding the next point, running with GPU support means that the computation takes the available/exploits additional resources from the GPU (as much as it can get), or does it try to push the whole job to the GPU?

I think the most objective way is just to measure the wallclock time needed to complete the whole run. Not sure how to do this exactly in Kratos though. Also, if you are using OpenCL backend, make sure you are not using your CPU instead of GPU (or CPU together with GPU). You can use environment variables to control the compute device choice:

OCL_DEVICE=GeForce # Choose device by name
OCL_PLATFORM=NVIDIA # Choose device by platform name
OCL_TYPE=GPU # Choose device by type

See the complete list here: https://vexcl.readthedocs.io/en/latest/initialize.html#common-filters.

Interestingly, the 3D case (with 140k nodes and 820k elements) went through with OpenCL using circa 800 MB from the 2GB available from my GPU whereas with CUDA it threw CUDA Driver API Error (2 - CUDA_ERROR_OUT_OF_MEMORY).

Not sure what happens here. VexCL uses sparse matrices from (closed-source) CUSPARSE library with the CUDA backend, so it is possible the matrices require more memory either during construction or permanently.

Specs for NVIDIA Quadro P400:
GPU Memory: 2 GB GDDR5
Memory Interface: 64-bit
Memory Bandwidth: Up to 32 GB/s

This does not look like a very fast GPU (looking at memory bandwidth). @RiccardoRossi ran his tests on GTX 1070 which has the bandwidth of 256 GB/s. The performance of the solver should be roughly proportional to the available memory bandwidth, so your GPU should be about 8x slower than Riccardo's.

@mpentek
Copy link
Member

mpentek commented Aug 8, 2019

@ddemidov thanks for the hints! I will give it a try with the environment variables.

With respect to memory requirements and comsumption, I should understand that GPU computing tries to push the whole computation to the GPU, right? Or am I still misunderstanding it.

@adityaghantasala @AndreasWinterstein it would be worth trying and testing on our more capable desktop machine.

@RiccardoRossi any hints and experiences with respect to problem type and size to run? Have you had similar issues, like having not enough memory?

@RiccardoRossi
Copy link
Member

RiccardoRossi commented Aug 8, 2019 via email

@ddemidov
Copy link
Member Author

ddemidov commented Aug 8, 2019

With respect to memory requirements and comsumption, I should understand that GPU computing tries to push the whole computation to the GPU, right? Or am I still misunderstanding it.

AMGCL constructs the AMG hierarchy (the set of coarser and coarser system matrices together with inter-level transfer operators) on the CPU, and then transfers the complete hierarchy to the GPU. The whole computation is then done on the GPU.

@RiccardoRossi
Copy link
Member

RiccardoRossi commented Aug 9, 2019 via email

@ddemidov
Copy link
Member Author

ddemidov commented Aug 9, 2019

I don't have an option to do the setup on the GPU, but we did compare performance with the cusplibrary, which does the complete setup GPU-side. It appeared our approach was faster (and more memory-efficient). See the benchmarks here: https://amgcl.readthedocs.io/en/latest/benchmarks.html#d-poisson-problem.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants