Add NVTX marker instrumentation#2345
Conversation
edopao
left a comment
There was a problem hiding this comment.
Definitely useful, in my opinion. Adding @egparedes because this PR is useful in the tracing of asynchronous execution to visualize the program call.
| dace.Memlet(f"{output}[0]"), | ||
| ) | ||
|
|
||
| if (config.COLLECT_METRICS_LEVEL == metrics.GPU_TX_MARKERS) and gpu: |
There was a problem hiding this comment.
Your current check is at compile time. The check in the instrumentation above is at runtime. I propose to always add NVTX instrumentation, because the config.COLLECT_METRICS_LEVEL value is meant to be a runtime setting. It is still possible to skip SDFG instrumentation by setting use_metrics=False on the dace backend, at compile-time.
There was a problem hiding this comment.
I am not sure I would be up to enabling NVTX ranges by default since they will add some (really minimal) overhead to the CPU side. If use_metrics=False can be set by default then it would be fine but it would increase some more complexity for production and profiling runs
|
Thanks @iomaganaris for this PR and @edopao for pointing me here. I would like to make this PR larger to add NVTX markers in the whole program/operator hot path, not only in the DaCe backend. I didn't have time today but I propose to discuss offline the scope and design of this together, since @edopao has also done some work with this. |
LGTM but Enrique had some ideas to extend this PR.
egparedes
left a comment
There was a problem hiding this comment.
After taking again a look at this, I think it's orthogonal to the current on-going work to add wider support to GPU markers in gt4py, so we can just merge it without further wait.
) This PR does not add any new feature, it only changes the way to enable an existing feature introduced in #2345. The `COLLECT_METRICS_LEVEL` is supposed to be a runtime configuration, to enable or disable collection of the metrics. Besides, the current GPU trace is not producing any metric. The NVTX/ROC-TX traces require to introduce some calls in the generated code. Therefore we need a separate configuration variable, to be checked at lowering/compile time, to allow the user to introduce the GPU TX markers in the generated code.
Not sure if this is useful but I'm opening a PR to avoid losing these changes.
It generates the following NVTX ranges:

Requirements