Flash Attention and native PyTorch weights

I saw on the TODO list Flash Attention, so I wanted to bring to your attention the announcement [here](https://github.com/dotnet/TorchSharp/discussions/1231). 

Two packages were announced there:

1] Loading model weights saved using the PyTorch format / safetensors format (including handling for HuggingFace's sharding)

2] Flash Attention - self explanatory :)