simpleCopyKernel fails to advance the pointers.

https://github.com/NVIDIA/nvbandwidth/blob/4a49bdae82d69e69330a88746cb12b449796ee8b/kernels.cu#L20-L28

It appears that `offset` or pointers never advance during loop iterations, and we always load/store from/to the same locations. It does not do as much copying as it's intended to.

https://godbolt.org/z/rM3dP5GsE


	__global__ void simpleCopyKernel(unsigned long long loopCount, uint4 dst, uint4 src) {
	for (unsigned int i = 0; i < loopCount; i++) {
	const int idx = blockIdx.x * blockDim.x + threadIdx.x;
	size_t offset = idx * sizeof(uint4);
	uint4* dst_uint4 = reinterpret_cast<uint4>((char)dst + offset);
	uint4* src_uint4 = reinterpret_cast<uint4>((char)src + offset);
	__stcg(dst_uint4, __ldcg(src_uint4));
	}
	}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

simpleCopyKernel fails to advance the pointers. #64

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

simpleCopyKernel fails to advance the pointers. #64

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions