Skip to content

GauravASY/Mutltimodal_RAG_v1

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

23 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Advance Multimodal RAG

Multimodal Retrieval-Augmented Generation (RAG) for complex PDF documents using LangChain, ChromaDB, Unstructured, Gradio, and Ollama.

Disclaimer: Since models are run locally, your local machine must meet some hardware specifications. At least 5 GB of available system memory is required.

Features

  • PDF Ingestion: Extracts text, tables, and images from PDF files.
  • Confidentiality: Models are run locally, maintaining the confidentiality of sensitive documents.
  • Summarization: Generates concise summaries for text, tables, and detailed descriptions for images.
  • Multimodal Retrieval: Stores and retrieves information using vector embeddings and a document store.
  • Question Answering: Answers user queries based strictly on the provided PDF context.
  • Gradio Interface: User-friendly chat interface supporting file uploads and multimodal queries.

Installation

  1. Clone the repository:

    git clone https://github.com/yourusername/rag-multimod.git
    cd rag-multimod
  2. Install dependencies:

    pip install -r requirements.txt

    Or use the dependencies listed in pyproject.toml.

  3. Extra System dependencies:

    To run locally, Unstructured requires tesseract-ocr and poppler-utils to be installed on your local machine.

    Check official docs for more information: https://docs.unstructured.io/open-source/installation/full-installation

Usage

Run the application:

python main.py

Open the Gradio interface in your browser, upload PDF files, and ask questions about their content.

Project Structure

Requirements

Notes

  • Only answers questions based on the uploaded PDF context.
  • For unsupported queries, responds with: I can not process the request.

Collaboration

I welcome contributions from the community! If you would like to collaborate:

  • Fork the repository and create your branch.
  • Open issues for bugs, feature requests, or questions.
  • Submit pull requests with clear descriptions of your changes.
  • For major changes, please open an issue first to discuss what you would like to change.

Feel free to reach out via GitHub Discussions or directly via e-mail gauravyadav199808@gmail.com!

About

This Advanced Multimodal RAG system enables secure, offline querying of complex PDF documents. It uses unstructured to ingest text, tables, and images, generates summaries via local LLMs (Ollama/Llama 3), and retrieves context using ChromaDB to answer user queries through a Gradio chat interface.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages