Upload any legal contract or research paper — ask questions in plain English — get accurate answers with exact page citations. No hallucination guaranteed.
👉 Try DocuMind Live — docu-mind03.streamlit.app
⚡ No installation needed — upload any PDF and start asking questions instantly!
FastAPI backend is also deployed separately on Render:
- Base URL:
https://documind-api-y0hc.onrender.com - Swagger Docs:
https://documind-api-y0hc.onrender.com/docs
Note: Render free tier may have cold start delay of 30-60 seconds.
Reading a 100-page legal contract or research paper to find one specific clause is time-consuming and error-prone.
DocuMind lets you:
- Upload any PDF document
- Ask questions in plain English
- Get precise answers with page number citations
- Know exactly which part of the document was used
- Never get hallucinated answers — strict document-only responses
| Feature | Description |
|---|---|
| 📄 PDF Upload | Upload any legal or research PDF |
| 🧩 Smart Chunking | 1000-token chunks with 200-token overlap |
| 🔍 Semantic Search | HuggingFace embeddings + ChromaDB vector search |
| 🤖 LLM Answer | Groq LLaMA3-70b generates accurate answers |
| 📌 Source Citations | Every answer shows exact page numbers |
| 📋 Auto Summary | Document summary generated on upload |
| 🟢🔴 Confidence Score | Green = found in doc, Red = not found |
| 🚫 No Hallucination | Strict prompting — only answers from document |
User uploads PDF
↓
PDF → Pages → Chunks (1000 tokens, 200 overlap)
↓
HuggingFace Embeddings (all-MiniLM-L6-v2)
↓
ChromaDB Vector Store
↓
User asks question → Similarity Search → Top 3 chunks
↓
Groq LLaMA3-70b → Answer with page citations
↓
Streamlit UI displays answer + confidence + sources
| Layer | Technology |
|---|---|
| Frontend | Streamlit |
| Backend | FastAPI + Uvicorn |
| RAG Pipeline | LangChain |
| Embeddings | HuggingFace all-MiniLM-L6-v2 |
| Vector DB | ChromaDB (Persistent) |
| LLM | Groq API — LLaMA3-70b-versatile |
| PDF Processing | PyPDF |
DocuMind/
├── backend/
│ ├── main.py # FastAPI server — 3 endpoints
│ ├── rag_pipeline.py # Embeddings + ChromaDB + Groq LLM
│ ├── summarizer.py # Auto document summary
│ └── utils.py # PDF loader + chunker
├── frontend/
│ └── app.py # Streamlit UI
├── uploads/ # Uploaded PDFs stored here
├── chroma_db/ # Vector embeddings stored here
├── .env.example # Environment variables template
├── .gitignore
└── requirements.txt
git clone https://github.com/bhatt-aditya03/DocuMind.git
cd DocuMindpython3 -m venv venv
source venv/bin/activate # Mac/Linux
venv\Scripts\activate # Windowspip install -r requirements.txtcp .env.example .env
# Add your Groq API key in .envstreamlit run frontend/app.pyhttp://localhost:8501
Create a .env file in root directory:
GROQ_API_KEY=your_groq_api_key_hereGet your free Groq API key at console.groq.com
| Method | Endpoint | Description |
|---|---|---|
GET |
/ |
API health check |
POST |
/upload |
Upload and process PDF |
POST |
/ask |
Ask question about document |
GET |
/summary/{doc_id} |
Get document summary |
# Upload PDF
curl -X POST "http://localhost:8000/upload" \
-F "file=@document.pdf"
# Ask question
curl -X POST "http://localhost:8000/ask" \
-H "Content-Type: application/json" \
-d '{"question": "What are the key clauses?", "doc_id": "document"}'- 📜 Legal Contracts — Find specific clauses instantly
- 📚 Research Papers — Extract key findings and methodology
- 📋 Policy Documents — Understand complex policies quickly
- 🎓 Academic Syllabi — Navigate course content easily
- 📑 Business Reports — Quick insights from long reports
| Day | What was built |
|---|---|
| Day 1 | Project setup, dependencies, environment |
| Day 2 | RAG pipeline — PDF chunking, ChromaDB, Groq LLM |
| Day 3 | FastAPI backend — upload, ask, summary endpoints |
| Day 4 | Streamlit UI — chat interface, confidence indicator |
| Day 5 | README, deployment, live on Streamlit Cloud |
Post-launch refactor:
@lru_cachefor embedding model, specific exception handling, sharedsanitize_doc_id()utility, temp file cleanup, and English docstrings throughout.
Aditya Bhatt
MIT License — free to use, modify and distribute!
