Create an example that runs one workflow through the default CUDA stream, and another that runs through a custom stream. Show in NSight Systems how to identify the stream a kernel ran on. Show how data is synchronous in GPU on a stream, but asynchronous between streams.
Create an example that runs one workflow through the default CUDA stream, and another that runs through a custom stream. Show in NSight Systems how to identify the stream a kernel ran on. Show how data is synchronous in GPU on a stream, but asynchronous between streams.