Text Based Image Recommendation Engine

My fascination for the field of natural language processing led me to pursue a personal project. I constructed a recommendation engine which, based on free text input by the user, displays relevant images to the lexical meaning.

Image Captioning Algorithm

The architecture consists of two branches - a CNN one and an LSTM one. The training set is made of 6000 images from the flickr8k dataset. This set had multiple captions for each image. Considering the wide variety of images, this training data size is rather small for a neural network model of this capability. It took the model approximately 4-5 hrs to train on this data set. To get a higher accuracy of prediction, it would be good to train the model on a dataset like Coco which has ~100,000 images. This would however take approximately 3 days and was not attempted due to paucity of time.

NoSQL database implementation

I took 200 images from the test dataset, and these were converted into equivalent strings and stored in the NoSQL database. The captions generated by the trained model for each of the images were also stored in MongoDB. One can continuously add new images to the database without having to train the model.

NLP Querying using BERT

Next, the input query was passed through the Bert model. The Bert model is amazing at embedding any sentence into a vector space where each token is related to one another. This was done so that the match with the captions would be more oriented towards the intent of the question rather than just plain word comparison. The BERT model is capable of matching sentences with similar semantic meaning but different usage of words. Each caption from the database was passed through Bert and the ‘Cosine Similarity’ was measured with the query. If it crossed a threshold of 0.6, it would display the image. Currently work in progress to create the web interface that will connect to the database in the server and accept user queries.

Methods to improve the current model

Train it on a large dataset like Coco. It will be able to learn a variety of different tokens and generate much more accurate captions.
Use attention-based models to focus on important aspects of the image.
Make modifications to the existing architecture by perhaps, increasing the number of parameters or implementing regularization.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
Flicker8k_Dataset		Flicker8k_Dataset
.gitattributes		.gitattributes
Flickr8k.token.txt.txt		Flickr8k.token.txt.txt
Flickr_8k.testImages.txt		Flickr_8k.testImages.txt
Flickr_8k.trainImages.txt		Flickr_8k.trainImages.txt
GeneratorRecommender.ipynb		GeneratorRecommender.ipynb
README.md		README.md
app.py		app.py
glove6b-20220112T120319Z-001.zip		glove6b-20220112T120319Z-001.zip
vqa.h5		vqa.h5

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Text Based Image Recommendation Engine

Table of Contents

Image Captioning Algorithm

NoSQL database implementation

NLP Querying using BERT

Methods to improve the current model

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Text Based Image Recommendation Engine

Table of Contents

Image Captioning Algorithm

NoSQL database implementation

NLP Querying using BERT

Methods to improve the current model

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages