pytorch-extension

⚡ LLM-Speed: High-performance CUDA kernels for LLM inference — FlashAttention with O(N) memory, Tensor Core GEMM (95% cuBLAS), and seamless PyTorch integration. Supports Volta to Hopper GPUs.

Updated Apr 21, 2026
Python

tbox98 / freegrad

Star

PyTorch extension for alternative backward rules and gradient transforms (STE, gradient jamming, non-standard activations).

training research deep-learning python3 pytorch autograd neural-networks gradient backpropagation binary-neural-networks activation-functions gradient-clipping pytorch-extension gradient-modification

Updated Nov 30, 2025
Python

eshibusawa / Simple-Examples

Star

simple examples of tools and libraries

python cuda pybind11 cupy cub pytorch-extension tensorcore

Updated Jan 27, 2026
Python

torajharsh / aether-scale

Star

High-performance matrix engine for Unit-Domain Flow (UDF). Eliminates Mantissa Friction with 0.00 MSE integrity.

Updated Feb 17, 2026
Python

Improve this page

Add a description, image, and links to the pytorch-extension topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the pytorch-extension topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

pytorch-extension

Here are 16 public repositories matching this topic...

ildoonet / pytorch-gradual-warmup-lr

stevewongv / SPANet

stevewongv / DSC-PyTorch

cmpark0126 / pytorch-polynomial-lr-decay

ZichaoLong / aTEAM

aredden / torch-bnb-fp4

openml / openml-pytorch

frgfm / torch-cuda-template

pminhtam / xnor_conv_pytorch_extension

artitw / apex

djalex88 / gaussian_rbf

AmirMardan / pytorch_extending_cpp_binding

LessUp / llm-speed

tbox98 / freegrad

eshibusawa / Simple-Examples

torajharsh / aether-scale

Improve this page

Add this topic to your repo