How to analyze etdump results for QNN backend? #16285
Replies: 3 comments
-
|
For QNN profiling, I would mainly look at two aspects:
It describes some related debugging and profiling workflows around the Qualcomm backend, though you may already be familiar with it. |
Beta Was this translation helpful? Give feedback.
-
Delegate_CALL is inside Method::execute call. If the time is similar, that means the delegate call (the execution time in HTP) is dominant.
DELEGATE_CALL is everything inside HTP, and the individual operator call (like aten_bmm) means they fall back to cpu.
We can use optrace or the debugger https://github.com/pytorch/executorch/tree/main/backends/qualcomm/debugger#qairt-visualizer as shared by @yujiaoliang
@haowhsu-quic @shewu-quic @winskuo-quic @DannyYuyang-quic do we have guidance on this? |
Beta Was this translation helpful? Give feedback.
-
|
Hi @yujiaoliang
Yes, you can dump QHAS and optrace to observe the NPU utilization and TCM usage.: |
Beta Was this translation helpful? Give feedback.

Uh oh!
There was an error while loading. Please reload this page.
-
I have obtained
etdumpresults from a QNN‑LLM execution and used the Inspector API to generate the profiling table. However, I’m not sure how to interpret this table. Specifically, I have the following questions:Accelerator (execute) time (cycles),Method::execute, andDELEGATE_CALLat the top. Their latencies are quite similar. Are these essentially representing the same thing?aten_, which I assume correspond to different operators. How can I determine which of these are running on the CPU due to fallback, and which are actually running on the NPU?Any guidance or documentation pointers would be greatly appreciated. Thank you!
Beta Was this translation helpful? Give feedback.
All reactions