Machine Learning Methods for Audio and Video Analysis

A project-driven proseminar course focusing on applying machine learning techniques to audio and visual data. Topics include optimization, template matching, face detection, audio feature extraction, CNNs, and end-to-end multimedia analysis projects.

📘 Course Description

This course provided hands-on experience with modern ML workflows for analyzing audio and video data. Key technical components included:

Data acquisition & preprocessing (images, audio, video)
Optimization methods for model training
Template matching and feature extraction
Computer vision (OpenCV, face detection/recognition, CNNs)
Audio analysis (spectrograms, MFCCs, librosa)
Deep learning (PyTorch Lightning, pre-trained models)

🗓️ Course Structure

Foundations

Weeks 1-4: Data pipelines, optimization fundamentals, text classification
Weeks 5-8: Image processing (OpenCV), template matching, face detection/recognition
Weeks 9-12: Audio analysis (librosa), spectrogram features, neural networks for audio

Advanced Topics

Convolutional Neural Networks (CNNs) for image classification
Transfer learning with pre-trained models (e.g., ResNet)
Image captioning and multimodal analysis
Audio classification pipelines

🛠️ Technical Skills Developed

Machine Learning

Model optimization (loss functions, gradient descent)
Feature engineering for images/audio
CNN architectures & transfer learning

Tools

Python (NumPy, pandas, scikit-learn)
TensorFlow/Keras, PyTorch Lightning
OpenCV, librosa, ffmpeg
Google Colab, Jupyter Notebooks

Data Types

Image datasets (MNIST, facial recognition datasets)
Audio waveforms & spectrograms
Video processing fundamentals

🚀 Projects & Assignments

Project	Description	Technologies
Face Detection	Implemented Haar cascades & deep learning models for face recognition	OpenCV, CNN
Audio Feature Extraction	Analyzed MFCCs & spectral features for audio classification	librosa, scipy
Image Classification	Built CNN pipelines for digit recognition	TensorFlow, Keras

💡 Competencies Gained

Data-Centric ML: Preprocessing pipelines for unstructured data
Model Deployment: Optimized inference for real-world constraints
Multimodal Analysis: Combined audio/visual features in joint systems

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
Assignment_1.ipynb		Assignment_1.ipynb
Assignment_2_Image_analysis.ipynb		Assignment_2_Image_analysis.ipynb
Assignment_3_Audio_analysis.ipynb		Assignment_3_Audio_analysis.ipynb
README.md		README.md
Screen Shot 2022-11-17 at 10.04.36 PM.png		Screen Shot 2022-11-17 at 10.04.36 PM.png
hw1_getting_data.ipynb		hw1_getting_data.ipynb
hw2_optimization.ipynb		hw2_optimization.ipynb
hw3_Template_Matching.ipynb		hw3_Template_Matching.ipynb
hw4_face_detection.ipynb		hw4_face_detection.ipynb
hw6_Audio_Features.ipynb		hw6_Audio_Features.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Machine Learning Methods for Audio and Video Analysis

📘 Course Description

🗓️ Course Structure

Foundations

Advanced Topics

🛠️ Technical Skills Developed

🚀 Projects & Assignments

💡 Competencies Gained

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Machine Learning Methods for Audio and Video Analysis

📘 Course Description

🗓️ Course Structure

Foundations

Advanced Topics

🛠️ Technical Skills Developed

🚀 Projects & Assignments

💡 Competencies Gained

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages