Findings related to ptx ld modifiers.

**Description:**
When disassembling the CUDA binary with both nvdisasm and cuobjdump, I noticed that load instructions inside branch targets are completely omitted by the NVCC compiler—except those marked with the .volatile modifier. As a result, any timing measured against these loads does not reflect true L1-cache access latency.

I change the function `uncached_access_timing_device`  to store `temp` to gloabl memory at the end, and I get consistent latency across different load modifiers.

Thank you for your impressive work on gpu Rowhammer attack! It helps me a lot.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Findings related to ptx ld modifiers. #3

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Findings related to ptx ld modifiers. #3

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions