I saw on the TODO list Flash Attention, so I wanted to bring to your attention the announcement here.
Two packages were announced there:
1] Loading model weights saved using the PyTorch format / safetensors format (including handling for HuggingFace's sharding)
2] Flash Attention - self explanatory :)