This repository contains learning materials and exercises for NVIDIA Nsight Tools. Gola is to learn how to profile your application with NVIDIA Nsight Systems,Compute and NVTX API calls to find performance limiters and bottlenecks and apply incremental parallelization strategies. The content was tested on NVIDIA driver 515.65.
- Introduction: Overview of profiling tools and Mini Weather application
- Lab 1: Profile Serial application to find hotspots using NVIDIA Nsight System
- Lab 2: Parallelise the serial application using OpenACC compute directives
- Lab 3: Optimizing loops
- Lab 4: Apply incremental parallelization strategies and use profiler's report for the next step
- Lab 5: Nsight Compute Kernel Level Analysis
- [Optional]
- Lab 6:Performance Analysis of an application using Nsight Systems and Compute (CUDA example)
- Advanced: Multiprocess profiling
The target audience for this lab is researchers/graduate students and developers who are interested in getting hands on experience with the NVIDIA Nsight System through profiling a real life parallel application.
While Labs 1-5 do not assume any expertise in CUDA experience, basic knowledge of OpenACC programming (e.g: compute constructs), GPU architecture, and programming experience with C/C++ is desirable.
The Optional lab 6 requires basic knowledge of CUDA programming, GPU architecture, and programming experience with C/C++.
The lab material will be presented in a 2.5hr session. The link to the material is available for download at the end of each lab.
- Please go through the list of exisiting bugs/issues or file a new issue at Github.