Back to Table of Content | Previous: Introduction | Next: Increment Array Example
GPUs are considered as side Devices to the CPUs (which are called Host). The Host-Device programming model enforces assignment of work from the host to the device in the following fashion:
- Serial code executes on the host (CPU) thread.
- Parallel code executes on the Device (GPU) threads across multiple processing elements.
- A kernel is a function that runs on the GPU (Device), executed in parallel by many threads. Each thread runs the same code (the kernel) but operates on different pieces of data, leveraging the massive parallelism of the GPU architecture.
- Kernels are launched from the host (CPU) and run on the device (GPU), allowing the separation of serial (CPU) and parallel (GPU) workloads.
- Kernels are executed by thread blocks, which are groups of threads. Thread blocks are assigned to SMs (review: Two-Level Parallelism in GPUs). These thread blocks are further organized into warps, where each warp consists of 32 threads. The threads in a warp execute in a SIMT (Single Instruction, Multiple Threads) fashion, meaning all threads in the warp execute the same instruction at the same time, but on different data.
- The GPU schedules and manages thousands of threads across multiple thread blocks and warps, which enables massive parallel computation.
The diagram below illustrates a template of how the Host-Device model works. The CPU's sequential code runs via a single CPU thread until it reaches the call to the GPU (kernel). Once the GPU completes the kernel execution, it returns the results back to the CPU thread.
Back to Table of Content | Previous: Introduction | Next: Increment Array Example
