In this group research project, I built a CNN+LSTM neural network that is based on the top-down approach and, as a member of the research group, experimented with two variations of encoder-decoder models - Merge and Inject. We were able to achieve great performance results by training our models on 90,000 image+reference caption examples from Google's Conceptual Captions dataset. Our model's performance approaches state-of-the-art results for the task of image captioning. Project report with a complete performance analysis can be found in the 'report' folder of this repository. Our team's repository for collaboration is here.
- Created a data sourcing and preprocessing script - Madhavan
- Performed EDA on the preprocessed image data - Mike
- Built and trained 6 variations of the encoder-decoder CNN+LSTM model - Mike, Malavika, Madhavan
- Analyzed the performance of each model using BLEU-4, METEOR and ROUGE-l scoring and provided the discussion on the final peformance results for each deep learning model - Mike, Madhavan, Malavika
- Wrote a project report capturing the project's motivation, goals, methods and results - Mike, Madhavan, Malavika
Programming Language: Python 3.7
Libraries: Keras, TensorFlow, NLTK, OpenCV, NumPy, Matplotlib, Requests, Concurrent
Madhavan Seshadri and Malavika Srikanth
Professor Daniel Bauer