Skip to content

Add NVTX marker instrumentation#2345

Merged
iomaganaris merged 6 commits intoGridTools:mainfrom
iomaganaris:nvtx_markers
Jan 13, 2026
Merged

Add NVTX marker instrumentation#2345
iomaganaris merged 6 commits intoGridTools:mainfrom
iomaganaris:nvtx_markers

Conversation

@iomaganaris
Copy link
Copy Markdown
Contributor

Not sure if this is useful but I'm opening a PR to avoid losing these changes.

It generates the following NVTX ranges:
image

Requirements

Copy link
Copy Markdown
Contributor

@edopao edopao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Definitely useful, in my opinion. Adding @egparedes because this PR is useful in the tracing of asynchronous execution to visualize the program call.

Comment thread src/gt4py/next/program_processors/runners/dace/workflow/translation.py Outdated
dace.Memlet(f"{output}[0]"),
)

if (config.COLLECT_METRICS_LEVEL == metrics.GPU_TX_MARKERS) and gpu:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your current check is at compile time. The check in the instrumentation above is at runtime. I propose to always add NVTX instrumentation, because the config.COLLECT_METRICS_LEVEL value is meant to be a runtime setting. It is still possible to skip SDFG instrumentation by setting use_metrics=False on the dace backend, at compile-time.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure I would be up to enabling NVTX ranges by default since they will add some (really minimal) overhead to the CPU side. If use_metrics=False can be set by default then it would be fine but it would increase some more complexity for production and profiling runs

@havogt havogt requested a review from egparedes October 27, 2025 09:58
@egparedes
Copy link
Copy Markdown
Contributor

Thanks @iomaganaris for this PR and @edopao for pointing me here. I would like to make this PR larger to add NVTX markers in the whole program/operator hot path, not only in the DaCe backend. I didn't have time today but I propose to discuss offline the scope and design of this together, since @edopao has also done some work with this.

edopao
edopao previously approved these changes Dec 4, 2025
@edopao edopao dismissed their stale review December 4, 2025 11:32

LGTM but Enrique had some ideas to extend this PR.

egparedes
egparedes approved these changes Jan 13, 2026
Copy link
Copy Markdown
Contributor

@egparedes egparedes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After taking again a look at this, I think it's orthogonal to the current on-going work to add wider support to GPU markers in gt4py, so we can just merge it without further wait.

@iomaganaris iomaganaris merged commit 422e064 into GridTools:main Jan 13, 2026
23 checks passed
edopao added a commit that referenced this pull request Jan 19, 2026
)

This PR does not add any new feature, it only changes the way to enable
an existing feature introduced in #2345.

The `COLLECT_METRICS_LEVEL` is supposed to be a runtime configuration,
to enable or disable collection of the metrics. Besides, the current GPU
trace is not producing any metric.

The NVTX/ROC-TX traces require to introduce some calls in the generated
code. Therefore we need a separate configuration variable, to be checked
at lowering/compile time, to allow the user to introduce the GPU TX
markers in the generated code.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants