CUDA-examples/Device_Configuration.md at main · kct22aws/CUDA-examples

Device configuration

A GPU's configuration can be determined by reading information from cudaDeviceProp. This object is accessible from a CUDA runtime running on a system with Nvidia GPU.

The source code is here.

Instruction

To run the source code:

nvcc -o cuda-device-profile cuda-device-profile.cu

This should generate an excuble:

cuda-device-profile

in the same directory.

Now run the executable:

./cuda-device-profile

and the output should be similar to this:

Device count: 1
Name: NVIDIA GeForce RTX 4060 Laptop GPU
Compute capability: 8.9
##### MEMORY INFO FOR DEVICE 0 ---
Total global memory: 8328511488
Total constant memory: 65536
##### MP INFORMATION FOR DEVICE 0 ---
Multiprocessor count: 24
Max threads per block: 1024
Max threads dimensions: (1024, 1024, 64)
Max grid dimensions: (2147483647, 65535, 65535)https://developer.nvidia.com/cuda-gpus

The os system is:

casablanca@casablanca-xps:~/Documents/projects/cuda/github/CUDA-examples$ lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description:    Ubuntu 22.04.3 LTS
Release:        22.04
Codename:       jammy

CUDA Compiler

The command to determine the compiler used for the example is nvcc --version.

Following is my output:

nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2023 NVIDIA Corporation
Built on Tue_Aug_15_22:02:13_PDT_2023
Cuda compilation tools, release 12.2, V12.2.140
Build cuda_12.2.r12.2/compiler.33191640_0

Notes

A cudaDeviceProp object is defined by:

cudaDeviceProp prop;

A comprehensive list of attributes accessible by this object is already mentioned.

Some of important attributes are shown in this example:

Device info: count, name.
Max number of threads per block: maxThreadsPerBlock.
Max thread dimensions: maxThreadsDim.
Max grid dimensions: maxGridSize.

Blocks and threads info

Information about blocks and threads are important for setting up how a function will be parallelized in GPU.

In function invocation or kernel definition, we will see this notation:

<<<P, Q>>>

where
P denotes number of blocks to run in parallel;
Q denotes number of threads in each block.

The system in this example is from a laptop equipped with Nvidia RTX4060 GPU. Compute capability index for each GPU series is published here. You may compare it to your GPU output.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Device configuration

Instruction

CUDA Compiler

Notes

Blocks and threads info

FilesExpand file tree

Device_Configuration.md

Latest commit

History

Device_Configuration.md

File metadata and controls

Device configuration

Instruction

CUDA Compiler

Notes

Blocks and threads info