torch-compile

Star

Here are 8 public repositories matching this topic...

sayakpaul / diffusers-torchao

Star

End-to-end recipes for optimizing diffusion models with torchao and diffusers (inference and FP8 training).

flux torch text-to-image diffusion-models torch-compile torchao architecture-optimization

Updated Jan 8, 2026
Python

huggingface / lora-fast

Star

Minimal repository to demonstrate fast LoRA inference with Flux family of models.

flux torch lora diffusion peft diffusers torch-compile img-gen

Updated Jul 23, 2025
Python

ProfineAI / profine-cli

Star

Profine automatically profiles and optimizes PyTorch training jobs on real GPUs, delivering measurable speedups and lower GPU costs before teams waste days tuning configs by hand.

Updated May 20, 2026
Python

RBLN-SW / torch-rbln

Star

PyTorch extension for Rebellions NPU

python machine-learning deep-learning neural-network inference pytorch npu hardware-backend ai-accelerator torch-compile rbln rebellions-ai rebellions eager-mode

Updated May 31, 2026
Python

AbstractEyes / geofractal

Star

Wide-model collective ensemble system with fractal, geometric, and heavy compilation optimizations.

Updated Apr 2, 2026
Jupyter Notebook

shreyansh26 / Accelerating-Cross-Encoder-Inference

Star

Leveraging torch.compile to accelerate cross-encoder inference

inference-optimization mlsys jina cross-encoder torch-compile

Updated Mar 3, 2025
Python

JonSnow1807 / pytorch-autotune

Star

🚀 2-4x faster PyTorch training with one line of code. Beats torch.compile by 79%. Zero config, automatic hardware optimization for T4/V100/A100/H100 GPUs.

performance deep-learning optimization pytorch gpu-acceleration auto-ml mixed-precision torch-compile training-speedup

Updated Aug 10, 2025
Python

Optimized CSM-1B TTS pipeline for RTX 5090 (Blackwell sm_120). CUDA graph replay via patched HF Transformers. ~0.46x RTF. Topics (tags): csm text-to-speech rtx-5090 blackwell cuda-graphs torch-compile sesame streaming pytorch

text-to-speech streaming pytorch tts sesame csm huggingface blackwell torch-compile rtx-5090 sm-120 cuda-graphs

Updated Apr 5, 2026
Python

Improve this page

Add a description, image, and links to the torch-compile topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the torch-compile topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

torch-compile

Here are 8 public repositories matching this topic...

sayakpaul / diffusers-torchao

huggingface / lora-fast

ProfineAI / profine-cli

RBLN-SW / torch-rbln

AbstractEyes / geofractal

shreyansh26 / Accelerating-Cross-Encoder-Inference

JonSnow1807 / pytorch-autotune

D3velop-llc / csm-rtx5090

Improve this page

Add this topic to your repo