This repository contains coursework and projects from the Deep Learning class. The work focuses on modern deep architectures (MLP, CNN, Transformer) and practical applications using images captured from the SuperTuxKart simulator.
- Convolutional Neural Networks (CNNs)
- Transformer architectures and self-attention
- Attention mechanisms and positional encodings
- Pooling, convolutions, and feature extraction
- Training deep networks end-to-end for perception tasks
-
Training deep networks and MLPs to classify images collected from SuperTuxKart. Models include simple MLP baselines and deeper convolutional models.
-
Building Convolutional Networks to perform core vision tasks on SuperTuxKart imagery:
- Classification (what is in the frame)
- Segmentation (which pixels belong to road, obstacles, etc.)
- Detection (bounding boxes for objects)
-
Final autonomous driving project: combining CNNs, Transformers, and MLP modules to predict controls and make the kart drive autonomously inside the simulator.
All images and data used in the experiments were collected from SuperTuxKart. See the homework folders for the raw data and any preprocessing code.
The recorded runs for the final project are in Homework 4/videos/.
CNN planner
MLP planner
Transformer planner
The final project demonstrates autonomous control using three modeling approaches (CNN, MLP, Transformer). Video links above show recorded runs on the Lighthouse track.
The CNN planner uses a convolutional architecture to process the input images and predict steering commands. The MLP planner applies a multi-layer perceptron to the same task, while the Transformer planner leverages self-attention mechanisms for decision-making.
The results show that all three planners can navigate the track, with varying degrees of success. The CNN planner generally performs well, effectively recognizing road features and obstacles. The MLP planner is simpler but still manages to drive reasonably well. The Transformer planner, while more complex, shows promise in handling longer-term dependencies in the driving task.


