Compilation and install cna be performed with:
pip install .
See unit tests for example of how to jit and call cutlass kernels.
Set JAX_CUTLASS_FFI_CUDA_ARCHITECTURE=N to specify a specific SM architecture. e.g. 100 or 103.
This project uses experimental features in CuTeDSL and may need updates as APIs stabilize.