Skip to content

Deep-unlearning/smol-audio

Repository files navigation

Smol Audio

Smol Audio 🔊

Practical notebooks for shrinking, optimizing, and customizing audio AI models with the Hugging Face ecosystem.

Latest examples

  • Inference with Perception Encoder for Audio-Video (PE-AV)
  • Fine-tune Audio Flamingo 3
  • Granite Speech 4.0 1b ASR

Note

GitHub doesn't always render notebooks well. If you have trouble viewing them, try opening in Colab using the links below.

Category Notebook Description
ASR Fine-tuning Fine-tune Whisper Fine-tune Whisper on a custom language/domain using transformers + datasets
ASR Fine-tuning Fine-tune Granite Speech Italian Fine-tune IBM Granite Speech for Italian ASR with the YODAS-Granary dataset
Audio Captioning Fine-tune Audio Flamingo 3 Fine-tune Audio Flamingo 3 for audio captioning (full + LoRA)
ASR Fine-tuning Fine-tune Parakeet Fine-tune NVIDIA Parakeet CTC for speech recognition (full + LoRA)
ASR Fine-tuning Fine-tune Voxtral ASR Fine-tune Voxtral for ASR with prompt masking (full + LoRA)
Multimodal Inference with PE-AV-Base Zero-shot video classification and audio↔text retrieval (AudioCaps) with Meta's Perception Encoder for Audio-Video

About

Practical, Colab-friendly notebooks for fine-tuning and running audio AI models

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors