Skip to content

Latest commit

Β 

History

History
120 lines (87 loc) Β· 3.43 KB

File metadata and controls

120 lines (87 loc) Β· 3.43 KB

🧠 DocuMind LLM β€” Intelligent Document Q&A Assistant

Python Hugging Face FAISS LangChain Streamlit License: MIT Status Contributions Welcome Stars Forks Last Commit


DocuMind LLM is a Generative AI-powered document assistant built with Hugging Face Transformers, FAISS, and LangChain.
It allows users to upload PDF files, intelligently index their contents, and ask natural language questions about the document.


πŸš€ Features

  • πŸ“„ PDF Upload & Parsing β€” Extracts text and chunks it for semantic understanding.
  • πŸ€– LLM-powered Q&A β€” Uses a Transformer model (e.g., mistralai/Mixtral, google/flan-t5, etc.) to answer questions.
  • ⚑ FAISS-based Vector Search β€” Enables fast and accurate document retrieval.
  • πŸ’¬ Conversational Memory β€” Keeps track of your recent queries for context-aware responses.
  • 🧩 Modular Architecture β€” Easy to extend with other models, vector stores, or APIs.

🧰 Tech Stack

Component Technology
Embeddings Hugging Face Sentence Transformers
Vector Store FAISS
LLM Hugging Face Transformers
Interface Streamlit / Flask
Backend Python 3.10+

βš™οΈ Installation

git clone https://github.com/ramarav/DocuMind-LLM.git
cd DocuMind-LLM
pip install -r requirements.txt

🧠 Usage

1️⃣ Start the App

python app.py

2️⃣ Upload a PDF file

Choose any .pdf document you want to query.

3️⃣ Ask Questions

Type natural language questions like:

β€œWhat are the main topics covered in this document?”
β€œSummarize section 3.”
β€œWhat are the key takeaways?”


πŸ“š Example Use Cases

  • Research paper summarization
  • Legal contract question answering
  • Technical documentation assistant
  • Corporate report analysis
  • AI-based knowledge discovery

πŸ§‘β€πŸ’» Folder Structure

DocuMind-LLM/
β”‚
β”œβ”€β”€ app.py                # Main entry point
β”œβ”€β”€ utils/                # Helper scripts
β”‚   β”œβ”€β”€ pdf_loader.py
β”‚   β”œβ”€β”€ embedder.py
β”‚   β”œβ”€β”€ vector_store.py
β”‚   └── qa_engine.py
β”œβ”€β”€ sample.pdf            # Example document
β”œβ”€β”€ requirements.txt
└── README.md

πŸͺ„ Future Enhancements

  • Add chat history memory using LangChain.
  • Integrate OpenAI API for comparison.
  • Enable multi-file document search.
  • Add semantic summarization features.

πŸ† Credits

Developed by Mekala Ramarao
πŸ’Ό AMD India | AI/ML Engineer
πŸ”— LinkedIn β€’ GitHub

πŸ“œ License

This project is licensed under the MIT License.