Smol Audio 🔊

Practical notebooks for shrinking, optimizing, and customizing audio AI models with the Hugging Face ecosystem.

Note

GitHub doesn't always render notebooks well. If you have trouble viewing them, try opening in Colab using the links below.

Category	Notebook	Description
ASR Fine-tuning	Fine-tune Whisper	Fine-tune Whisper on a custom language/domain using transformers + datasets
ASR Fine-tuning	Fine-tune Granite Speech Italian	Fine-tune IBM Granite Speech for Italian ASR with the YODAS-Granary dataset
Audio Captioning	Fine-tune Audio Flamingo 3	Fine-tune Audio Flamingo 3 for audio captioning (full + LoRA)
ASR Fine-tuning	Fine-tune Parakeet	Fine-tune NVIDIA Parakeet CTC for speech recognition (full + LoRA)
ASR Fine-tuning	Fine-tune Voxtral ASR	Fine-tune Voxtral for ASR with prompt masking (full + LoRA)
Multimodal	Inference with PE-AV-Base	Zero-shot video classification and audio↔text retrieval (AudioCaps) with Meta's Perception Encoder for Audio-Video

Name		Name	Last commit message	Last commit date
Latest commit History 20 Commits
finetune_neutts_nano		finetune_neutts_nano
.gitignore		.gitignore
Fine_tune_Audio_Flamingo_3.ipynb		Fine_tune_Audio_Flamingo_3.ipynb
Fine_tune_Granite_Speech_Italian.ipynb		Fine_tune_Granite_Speech_Italian.ipynb
Fine_tune_Parakeet.ipynb		Fine_tune_Parakeet.ipynb
Fine_tune_Voxtral_ASR.ipynb		Fine_tune_Voxtral_ASR.ipynb
Fine_tune_Whisper.ipynb		Fine_tune_Whisper.ipynb
Inference_PE_AV_Base.ipynb		Inference_PE_AV_Base.ipynb
LICENSE		LICENSE
README.md		README.md
image.png		image.png

Provide feedback