-
Notifications
You must be signed in to change notification settings - Fork 67
Description
I’m seeing a crash with unitrace when I enable collection only after warmup kernels. I'm trying to use unitrace for profiling an OCL kernel, and I would like to use --start-paused to profile only the main trials.
This is the command I used: PTI_ENABLE_COLLECTION=0 unitrace --opencl --chrome-kernel-logging --chrome-itt-logging --start-paused --metric-sampling --output-dir-path ./profiler_output -o profiler_output/trace python task.py
Working version:
It works fine if I enable profiling for all trials:
os.environ["PTI_ENABLE_COLLECTION"] = "1"
<warmup trials>
<real trials>
os.environ["PTI_ENABLE_COLLECTION"] = "0"
Version with error:
However, I would like to start profiling only after the warmup trials:
<warmup trials>
os.environ["PTI_ENABLE_COLLECTION"] = "1"
<real trials>
os.environ["PTI_ENABLE_COLLECTION"] = "0"
This results in the following error:
Error:
task.py::TestVectorAddOCL::test_benchmark1 python: /home/nwiedema/pti-gpu/tools/unitrace/src/opencl/cl_collector.h:1322: static void ClCollector::OnExitEnqueueKernel(cl_callback_data*, ClCollector*, uint64_t*) [with T = _cl_params_clEnqueueNDRangeKernel; cl_callback_data = _cl_callback_data; uint64_t = long unsigned int]: Assertion `*(params->event) != nullptr' failed.
Fatal Python error: Aborted
I only see this issue with opencl; for sycl (without the --opencl flag) it works fine. I also tried Temporal or Out-of-Application Control (with --session), but get the same error.
Environment:
latest unitrace version (built from main
GPU: ptl (but can reproduce on lnl)
OS: Linux
Thanks in advance!