This repository holds the final Jupyter Notebook for a project completed in Fall 2018 for the Machine Learning at Scale course for UC Berkeley's Master of Information and Data Science (MIDS). The original development code is in a private repository given that this project may be reassigned in future iterations of this course.
We predicted online ad click-through rate using a MapReduce algorithm written from scratch in Spark that applies gradient descent to logistic regression. Unit testing was performed in Juptyer Notebook; final implementation executed on distributed cloud infrastructure via Google DataProc.