A Generative AI project that automatically describes the content of an image using Deep Learning. It combines Computer Vision (InceptionV3) and Natural Language Processing (LSTM) to generate accurate, human-like captions.
The model is deployed and running live! You can test it with your own images here: 👉 Click here to try the Live App on Hugging Face
This project uses an Encoder-Decoder architecture:
- Image Encoder (InceptionV3):
- We use a pre-trained InceptionV3 model (trained on ImageNet) to extract high-level visual features from images.
- The last classification layer is removed, leaving us with a feature vector of shape
(2048,).
- Sequence Decoder (LSTM):
- The extracted image features are passed to an LSTM (Long Short-Term Memory) network.
- The LSTM learns to generate a sequence of words (caption) based on the image features and the previous words generated.
Model Pipeline:
Input Image ➡️ InceptionV3 ➡️ Feature Vector ➡️ LSTM ➡️ Predicted Caption
The model was trained on the Flickr8k Dataset, which consists of:
- 8,000 images (6,000 training, 1,000 val, 1,000 test).
- 5 captions per image (Total 40,000 captions).
> Note: Due to size constraints, the raw dataset is not included in this repository. You can download it from Kaggle and place it in the src/ folder.
To run this project locally on your machine:
-
Clone the repository:
git clone https://github.com/Marshal-GG/Advanced-Image-Captioning-System.git cd Advanced-Image-Captioning-System -
Install dependencies:
pip install -r requirements.txt
-
Download the Data:
- Download Flickr8k images and
Flickr8k.token.txt. - Place them in the
src/folder (or update paths in the notebook).
- Download Flickr8k images and
-
Run the Training Notebook:
- Open
main.ipynbto see the data preprocessing, model training, and evaluation steps.
- Open
-
Run the App:
python app.py
- Metric: The model effectiveness is evaluated using qualitative analysis (visual inspection).
- Sample Output:
If you have any questions about this project or want to discuss Generative AI, feel free to connect!
- LinkedIn: [https://www.linkedin.com/in/rupam-g/]
- Email: [marshalgcom@gmail.com]
