Skip to content

beubax/Text_Based_Image_Recommendation_Engine

Repository files navigation

Text Based Image Recommendation Engine

My fascination for the field of natural language processing led me to pursue a personal project. I constructed a recommendation engine which, based on free text input by the user, displays relevant images to the lexical meaning.

Table of Contents


Image Captioning Algorithm

The architecture consists of two branches - a CNN one and an LSTM one. The training set is made of 6000 images from the flickr8k dataset. This set had multiple captions for each image. Considering the wide variety of images, this training data size is rather small for a neural network model of this capability. It took the model approximately 4-5 hrs to train on this data set. To get a higher accuracy of prediction, it would be good to train the model on a dataset like Coco which has ~100,000 images. This would however take approximately 3 days and was not attempted due to paucity of time.

NoSQL database implementation

I took 200 images from the test dataset, and these were converted into equivalent strings and stored in the NoSQL database. The captions generated by the trained model for each of the images were also stored in MongoDB. One can continuously add new images to the database without having to train the model.

NLP Querying using BERT

Next, the input query was passed through the Bert model. The Bert model is amazing at embedding any sentence into a vector space where each token is related to one another. This was done so that the match with the captions would be more oriented towards the intent of the question rather than just plain word comparison. The BERT model is capable of matching sentences with similar semantic meaning but different usage of words. Each caption from the database was passed through Bert and the ‘Cosine Similarity’ was measured with the query. If it crossed a threshold of 0.6, it would display the image. Currently work in progress to create the web interface that will connect to the database in the server and accept user queries.

Methods to improve the current model

  • Train it on a large dataset like Coco. It will be able to learn a variety of different tokens and generate much more accurate captions.
  • Use attention-based models to focus on important aspects of the image.
  • Make modifications to the existing architecture by perhaps, increasing the number of parameters or implementing regularization.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors