A repo for cool image, text retrieval using OpenAI's CLIP model. For information about how CLIP works see my article on Medium.
Using the following Flickr 8K dataset from Kaggle. It contains a variety of images each paired
with 5 different captions. A few example images and captions can be found in the sample_data folder.
- Install all the dependencies with poetry using
poetry install. It is recommended to have the virtual environments created inside the project with thepoetry config virtualenvs.prefer-active-python truecommand. - The first step is to pre-compute the image embeddings with the
image_text_retrieval/scripts/pre_compute_embeddings.pyscript. This script can be ran with thepre_compute_embeddingscommand. It uses dvc to store the script parameters which can be changed in theparams.yamlfile. It expects a directory of images (data/imagesby default) and it saves out the embeddings a mapping file in two different files. - To run the backend api use the command
image_text_retrieval_apiand go tohttp://0.0.0.0:8000/docsin the browser to get an interactive swagger:
- To run the streamlit ui run
streamlit run image_text_retrieval/ui/app_ui.pyand it should open in the browser. You should see a page which looks like this:

