You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Optimization of various matrix operations for different hardware- 1) Hierarchical memory (registers, cache, virtual memory), 2) Instruction level parallelism, 3) Multicore processors, 4) Shared memory parallelism, 5) GPU (CUDA), 6) Distributed memory parallelism (MPI). Code developed as a part of amath 583 course at University of Washington.