Skip to content

Findings related to ptx ld modifiers. #3

@Repeater9

Description

@Repeater9

Description:
When disassembling the CUDA binary with both nvdisasm and cuobjdump, I noticed that load instructions inside branch targets are completely omitted by the NVCC compiler—except those marked with the .volatile modifier. As a result, any timing measured against these loads does not reflect true L1-cache access latency.

I change the function uncached_access_timing_device to store temp to gloabl memory at the end, and I get consistent latency across different load modifiers.

Thank you for your impressive work on gpu Rowhammer attack! It helps me a lot.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions