A GPU's configuration can be determined by reading information from cudaDeviceProp. This object is accessible from a CUDA runtime running on a system with Nvidia GPU.
The source code is here.
To run the source code:
nvcc -o cuda-device-profile cuda-device-profile.cu
This should generate an excuble:
cuda-device-profile
in the same directory.
Now run the executable:
./cuda-device-profile
and the output should be similar to this:
Device count: 1
Name: NVIDIA GeForce RTX 4060 Laptop GPU
Compute capability: 8.9
##### MEMORY INFO FOR DEVICE 0 ---
Total global memory: 8328511488
Total constant memory: 65536
##### MP INFORMATION FOR DEVICE 0 ---
Multiprocessor count: 24
Max threads per block: 1024
Max threads dimensions: (1024, 1024, 64)
Max grid dimensions: (2147483647, 65535, 65535)https://developer.nvidia.com/cuda-gpus
The os system is:
casablanca@casablanca-xps:~/Documents/projects/cuda/github/CUDA-examples$ lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description: Ubuntu 22.04.3 LTS
Release: 22.04
Codename: jammy
The command to determine the compiler used for the example is nvcc --version.
Following is my output:
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2023 NVIDIA Corporation
Built on Tue_Aug_15_22:02:13_PDT_2023
Cuda compilation tools, release 12.2, V12.2.140
Build cuda_12.2.r12.2/compiler.33191640_0
A cudaDeviceProp object is defined by:
cudaDeviceProp prop;
A comprehensive list of attributes accessible by this object is already mentioned.
Some of important attributes are shown in this example:
- Device info:
count,name. - Max number of threads per block:
maxThreadsPerBlock. - Max thread dimensions:
maxThreadsDim. - Max grid dimensions:
maxGridSize.
Information about blocks and threads are important for setting up how a function will be parallelized in GPU.
In function invocation or kernel definition, we will see this notation:
<<<P, Q>>>
where
P denotes number of blocks to run in parallel;
Q denotes number of threads in each block.
The system in this example is from a laptop equipped with Nvidia RTX4060 GPU. Compute capability index for each GPU series is published here. You may compare it to your GPU output.