- Try using [taco](https://github.com/tensor-compiler/taco) to generate optimized, compiled matrix code, e.g. for sandwich and categorical operations that we are currently doing in C++