Skip to content

simpleCopyKernel fails to advance the pointers. #64

@Artem-B

Description

@Artem-B

nvbandwidth/kernels.cu

Lines 20 to 28 in 4a49bda

__global__ void simpleCopyKernel(unsigned long long loopCount, uint4 *dst, uint4 *src) {
for (unsigned int i = 0; i < loopCount; i++) {
const int idx = blockIdx.x * blockDim.x + threadIdx.x;
size_t offset = idx * sizeof(uint4);
uint4* dst_uint4 = reinterpret_cast<uint4*>((char*)dst + offset);
uint4* src_uint4 = reinterpret_cast<uint4*>((char*)src + offset);
__stcg(dst_uint4, __ldcg(src_uint4));
}
}

It appears that offset or pointers never advance during loop iterations, and we always load/store from/to the same locations. It does not do as much copying as it's intended to.

https://godbolt.org/z/rM3dP5GsE

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions