MultiModal RAG with ColPali

Overview

This repository is designed to demonstrate how to integrate ColPali embeddings for advanced multi-modal retrieval augmented generation (RAG). We use a PDF index for querying, combined with a Llama 3.2 Vision-Language model for result generation.

ColPali Model

We incorporate the ColPali Embedding model from Hugging Face, specifically vidore/colpali-v1.2, which provides robust embeddings for text and vision. The RAGMultiModalModel class is leveraged for indexing and retrieval.

Installation Steps

Install Requirements

pip install byaldi
sudo apt-get install -y poppler-utils
pip install huggingface_hub
!pip install together --q

Log in to Hugging Face
Provide your HF_TOKEN to authenticate with Hugging Face.
```
from huggingface_hub import login
login(token="HF_TOKEN")
```

Initialize the Model

from byaldi import RAGMultiModalModel
model = RAGMultiModalModel.from_pretrained('vidore/colpali-v1.2')

Index Creation

The PDF file colpali.pdf is downloaded, then passed to model.index, which creates an index for retrieval. The index_name argument is set to 'colpali'.

Querying the Model

After generating the index, we run a query such as:

query = "What is ColPali's (late interaction) evaluation base line score on DocQ and InfoQ?"
results = model.search(query, k=2)

The top retrievals are processed to gather the best possible answer from the colpali.pdf content.

MultiModal RAG Flow

User Query
Text + Vision Embedding via ColPali
Index -> Retrieve relevant pages
Llama 3.2 VLM processes both text query and retrieved PDF content
Generated Answer

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
MultimodalRAG.ipynb		MultimodalRAG.ipynb
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

MultiModal RAG with ColPali

Overview

ColPali Model

Installation Steps

Index Creation

Querying the Model

MultiModal RAG Flow

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

MultiModal RAG with ColPali

Overview

ColPali Model

Installation Steps

Index Creation

Querying the Model

MultiModal RAG Flow

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages