Skip to content

Magnetic upwind kernel optimization #1

@ekeever1

Description

@ekeever1

The cudaMagW kernels are partially optimized but at least a few register variables can be eliminated. The Y and Z kernels only require 2 of the 3 components present in dims which could eliminate one more register.

If a third tile of shared memory is not too much [check with nvidia's occupancy calculator spreadsheet] at least one __syncthreads() can be eliminated from the YZ kernels. Attempts to rewrite the algorithm to eliminate conditions using exact math ops (x+0, x-x and x*1 evaluate exactly in ieee754) are encouraged.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions