Skip to content
#

pytorch-extension

Here are 16 public repositories matching this topic...

⚡ LLM-Speed: High-performance CUDA kernels for LLM inference — FlashAttention with O(N) memory, Tensor Core GEMM (95% cuBLAS), and seamless PyTorch integration. Supports Volta to Hopper GPUs.

  • Updated Apr 21, 2026
  • Python

Improve this page

Add a description, image, and links to the pytorch-extension topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the pytorch-extension topic, visit your repo's landing page and select "manage topics."

Learn more