Magnetic upwind kernel optimization

The cudaMagW kernels are partially optimized but at least a few register variables can be eliminated. The Y and Z kernels only require 2 of the 3 components present in dims which could eliminate one more register.

If a third tile of shared memory is not too much [check with nvidia's occupancy calculator spreadsheet] at least one __syncthreads() can be eliminated from the YZ kernels. Attempts to rewrite the algorithm to eliminate conditions using exact math ops (x+0, x-x and x*1 evaluate exactly in ieee754) are encouraged.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Magnetic upwind kernel optimization #1

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Magnetic upwind kernel optimization #1

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions