Advance Multimodal RAG

Multimodal Retrieval-Augmented Generation (RAG) for complex PDF documents using LangChain, ChromaDB, Unstructured, Gradio, and Ollama.

Disclaimer: Since models are run locally, your local machine must meet some hardware specifications. At least 5 GB of available system memory is required.

Features

PDF Ingestion: Extracts text, tables, and images from PDF files.
Confidentiality: Models are run locally, maintaining the confidentiality of sensitive documents.
Summarization: Generates concise summaries for text, tables, and detailed descriptions for images.
Multimodal Retrieval: Stores and retrieves information using vector embeddings and a document store.
Question Answering: Answers user queries based strictly on the provided PDF context.
Gradio Interface: User-friendly chat interface supporting file uploads and multimodal queries.

Installation

Clone the repository:

git clone https://github.com/yourusername/rag-multimod.git
cd rag-multimod

Install dependencies:
```
pip install -r requirements.txt
```
Or use the dependencies listed in pyproject.toml.
Extra System dependencies:

To run locally, Unstructured requires tesseract-ocr and poppler-utils to be installed on your local machine.

Check official docs for more information: https://docs.unstructured.io/open-source/installation/full-installation

Usage

Run the application:

python main.py

Open the Gradio interface in your browser, upload PDF files, and ask questions about their content.

Project Structure

main.py: Entry point, Gradio chat interface.
utils.py: PDF processing, summarization, prompt building.
vectorstore.py: Vector storage and retrieval logic.
pyproject.toml: Python dependencies and project metadata.

Requirements

Notes

Only answers questions based on the uploaded PDF context.
For unsupported queries, responds with: I can not process the request.

Collaboration

I welcome contributions from the community! If you would like to collaborate:

Fork the repository and create your branch.
Open issues for bugs, feature requests, or questions.
Submit pull requests with clear descriptions of your changes.
For major changes, please open an issue first to discuss what you would like to change.

Feel free to reach out via GitHub Discussions or directly via e-mail gauravyadav199808@gmail.com!

Name		Name	Last commit message	Last commit date
Latest commit History 23 Commits
.idea		.idea
.gitignore		.gitignore
.python-version		.python-version
README.md		README.md
main.py		main.py
pyproject.toml		pyproject.toml
utils.py		utils.py
uv.lock		uv.lock
vectorstore.py		vectorstore.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Advance Multimodal RAG

Multimodal Retrieval-Augmented Generation (RAG) for complex PDF documents using LangChain, ChromaDB, Unstructured, Gradio, and Ollama.

Features

Installation

Usage

Project Structure

Requirements

Notes

Collaboration

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Advance Multimodal RAG

Multimodal Retrieval-Augmented Generation (RAG) for complex PDF documents using LangChain, ChromaDB, Unstructured, Gradio, and Ollama.

Features

Installation

Usage

Project Structure

Requirements

Notes

Collaboration

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages