This repository documents the process of finding the optimal learning rate for deep neural networks
-
Updated
Jun 3, 2025
This repository documents the process of finding the optimal learning rate for deep neural networks
High-performance Triton kernels for NVIDIA H100. Implements fused FP8 LayerNorm, tiled FlashAttention, and SRAM-optimized memory primitives for Hopper architecture.
Add a description, image, and links to the deep-learning-optimization topic page so that developers can more easily learn about it.
To associate your repository with the deep-learning-optimization topic, visit your repo's landing page and select "manage topics."