-
Notifications
You must be signed in to change notification settings - Fork 11
Open
Description
Description:
When disassembling the CUDA binary with both nvdisasm and cuobjdump, I noticed that load instructions inside branch targets are completely omitted by the NVCC compiler—except those marked with the .volatile modifier. As a result, any timing measured against these loads does not reflect true L1-cache access latency.
I change the function uncached_access_timing_device to store temp to gloabl memory at the end, and I get consistent latency across different load modifiers.
Thank you for your impressive work on gpu Rowhammer attack! It helps me a lot.
ShaopengLin and sushidoggg
Metadata
Metadata
Assignees
Labels
No labels