An intelligent medical document analysis system that uses AI to help users understand and query their medical PDFs through natural language conversations.
- Introduction
- Features
- Tech Stack
- Architecture
- Prerequisites
- Installation
- Configuration
- Usage
- API Documentation
- Project Structure
- Contributing
- Troubleshooting
- License
- Acknowledgments
DocuMed is an AI-powered medical document assistant that helps users interact with their medical PDFs through natural language. Built with cutting-edge technologies like LangChain, Pinecone, and Groq LLM, DocuMed provides accurate, cited answers to medical questions based on uploaded documents.
- ๐ Simplify Medical Documents: Convert complex medical PDFs into easy-to-understand insights
- ๐ค AI-Powered: Leverages advanced language models for accurate responses
- ๐ Privacy-Focused: Your documents stay secure in your vector database
- ๐ฌ Conversational: Natural language interface for easy interaction
- ๐ Source Citations: Every answer includes references to source documents
-
๐ PDF Upload & Processing
- Multi-file upload support
- Automatic text extraction and chunking
- Vector embedding generation
- Pinecone vector database storage
-
๐ฌ Intelligent Q&A
- Natural language question answering
- Context-aware responses
- Source citation for transparency
- Conversation history tracking
-
๐จ Modern UI
- Clean, dark-themed interface
- Real-time chat experience
- File management sidebar
- Responsive design
-
๐ฅ Export Capabilities
- Download conversation history
- Timestamped exports
- Formatted text output
- RESTful API with FastAPI
- Vector similarity search
- LLM-powered response generation
- Session management
- Error handling and logging
- CORS support for web clients
| Technology | Purpose | Version |
|---|---|---|
| FastAPI | Web framework | 0.104+ |
| Python | Core language | 3.8+ |
| LangChain | LLM orchestration | Latest |
| Pinecone | Vector database | Latest |
| Groq | LLM provider | Latest |
| Google Generative AI | Embeddings | Latest |
| PyPDF | PDF processing | Latest |
| Uvicorn | ASGI server | Latest |
| Technology | Purpose | Version |
|---|---|---|
| Streamlit | Web UI framework | 1.28+ |
| Requests | HTTP client | Latest |
- LangChain: Chain orchestration and document processing
- Groq LLM:
llama-3.3-70b-versatilefor response generation - Google Embeddings:
text-embedding-004for vector embeddings - Pinecone: Serverless vector database for semantic search
โโโโโโโโโโโโโโโโโโโ
โ Streamlit UI โ
โ (Frontend) โ
โโโโโโโโโโฌโโโโโโโโโ
โ HTTP
โผ
โโโโโโโโโโโโโโโโโโโ
โ FastAPI โ
โ (Backend) โ
โโโโโโโโโโฌโโโโโโโโโ
โ
โโโโโโดโโโโโฌโโโโโโโโโโโโโฌโโโโโโโโโโโ
โผ โผ โผ โผ
โโโโโโโโโโ โโโโโโโโ โโโโโโโโโโโโ โโโโโโโโโโ
โPineconeโ โ Groq โ โ Google โ โ PDF โ
โVector โ โ LLM โ โEmbeddingsโ โProcessorโ
โ DB โ โ โ โ โ โ โ
โโโโโโโโโโ โโโโโโโโ โโโโโโโโโโโโ โโโโโโโโโโ
- Upload: User uploads PDF โ Backend processes โ Chunks text โ Generates embeddings โ Stores in Pinecone
- Query: User asks question โ Embeds query โ Searches Pinecone โ Retrieves context โ LLM generates answer โ Returns with sources
Before you begin, ensure you have the following installed:
You'll need to obtain the following API keys:
- Groq API Key: Get it here
- Pinecone API Key: Get it here
- Google API Key: Get it here
git clone https://github.com/Adiaparmar/DocuMed-Medical_AI_Assistant.git
cd DocuMed-Medical_AI_Assistantcd server
# Create virtual environment
uv venv
# Activate virtual environment
# On Windows:
.venv\Scripts\activate
# On macOS/Linux:
source .venv/bin/activate
# Install dependencies
uv pip install -r requirements.txtcd ../client
# Install dependencies
uv pip install -r requirements.txtCreate a .env file in the server directory:
cd server
touch .env # On Windows: type nul > .envEdit server/.env and add your API keys:
# LLM Configuration
GROQ_API_KEY=your_groq_api_key_here
# Vector Database
PINECONE_API_KEY=your_pinecone_api_key_here
PINECONE_INDEX_NAME=medical-docs
# Embeddings
GOOGLE_API_KEY=your_google_api_key_here- Go to Pinecone Console
- Create a new index:
- Name:
medical-docs - Dimensions:
768(for Google text-embedding-004) - Metric:
cosine - Cloud: Choose your preferred region
- Name:
cd server
uv run uvicorn main:app --reload --host 127.0.0.1 --port 8001The API will be available at: http://127.0.0.1:8001
cd client
uv run streamlit run app.pyThe UI will open automatically at: http://localhost:8501
-
Upload Documents
- Click "Browse files" in the sidebar
- Select one or more PDF files
- Click "๐ Upload to Database"
- Wait for confirmation
-
Ask Questions
- Type your medical question in the chat input
- Press Enter
- View the AI-generated response with source citations
-
Export History
- Click "๐ฅ Download" to save your conversation
- File includes timestamps and source references
http://127.0.0.1:8001
POST /upload_pdfs/
Content-Type: multipart/form-data
Parameters:
files: List[UploadFile] - PDF files to upload
Response:
{
"message": "Files uploaded and processed successfully",
"files_processed": 2
}POST /ask/
Content-Type: multipart/form-data
Parameters:
question: str - User's medical question
Response:
{
"response": "AI-generated answer",
"sources": ["source1.pdf", "source2.pdf"]
}GET /docsInteractive Swagger UI for API testing.
DocuMed-Medical_AI_Assistant/
โ
โโโ server/ # Backend FastAPI application
โ โโโ main.py # FastAPI app entry point
โ โโโ logger.py # Logging configuration
โ โโโ requirements.txt # Backend dependencies
โ โโโ .env # Environment variables (not in repo)
โ โ
โ โโโ middlewares/ # Middleware components
โ โ โโโ __init__.py
โ โ โโโ exception_handlers.py
โ โ
โ โโโ routes/ # API route handlers
โ โ โโโ __init__.py
โ โ โโโ upload_pdf.py # PDF upload endpoint
โ โ โโโ ask_question.py # Q&A endpoint
โ โ
โ โโโ modules/ # Core business logic
โ โโโ __init__.py
โ โโโ llm.py # LLM chain setup
โ โโโ load_vectorstore.py # Vector DB operations
โ โโโ pdf_handlers.py # PDF processing
โ โโโ query_handlers.py # Query processing
โ
โโโ client/ # Frontend Streamlit application
โ โโโ app.py # Main Streamlit app
โ โโโ config.py # API configuration
โ โโโ requirements.txt # Frontend dependencies
โ โ
โ โโโ .streamlit/ # Streamlit configuration
โ โ โโโ config.toml # Theme settings
โ โ
โ โโโ components/ # UI components
โ โ โโโ __init__.py
โ โ โโโ upload.py # Upload component
โ โ โโโ chatUI.py # Chat interface
โ โ โโโ history_download.py # Export component
โ โ
โ โโโ utils/ # Utility functions
โ โโโ __init__.py
โ โโโ api.py # API client functions
โ
โโโ .gitignore # Git ignore rules
โโโ README.md # This file
We welcome contributions! Here's how you can help:
- Fork the repository
- Create a feature branch
git checkout -b feature/AmazingFeature
- Make your changes
- Commit your changes
git commit -m 'Add some AmazingFeature' - Push to the branch
git push origin feature/AmazingFeature
- Open a Pull Request
- Follow PEP 8 style guide for Python code
- Write clear, descriptive commit messages
- Add tests for new features
- Update documentation as needed
- Ensure all tests pass before submitting PR
- ๐ Bug fixes
- โจ New features
- ๐ Documentation improvements
- ๐จ UI/UX enhancements
- ๐งช Test coverage
- ๐ Internationalization
Problem: ModuleNotFoundError: No module named 'xyz'
Solution:
uv pip install -r requirements.txtProblem: 401 Unauthorized or Invalid API Key
Solution:
- Verify your
.envfile has correct API keys - Ensure no extra spaces in the
.envfile - Check API key validity on respective platforms
Problem: PineconeException: Index not found
Solution:
- Verify index name in
.envmatches Pinecone console - Ensure index dimensions are set to 768
- Check Pinecone API key is valid
Problem: Address already in use
Solution:
# Windows
netstat -ano | findstr :8001
taskkill /PID <PID> /F
# macOS/Linux
lsof -ti:8001 | xargs kill -9Problem: Streamlit shows white theme
Solution:
- Ensure
.streamlit/config.tomlexists - Restart Streamlit completely
- Clear browser cache
This project is licensed under the MIT License - see the LICENSE file for details.
- LangChain - For the amazing LLM orchestration framework
- Groq - For providing fast LLM inference
- Pinecone - For the vector database infrastructure
- Google - For the embedding models
- Streamlit - For the intuitive UI framework
- FastAPI - For the modern web framework
- Author: Adiaparmar
- GitHub: @Adiaparmar
- Project Link: DocuMed Repository
If you encounter any issues or have questions:
- Check the Troubleshooting section
- Search existing issues
- Create a new issue with detailed information
DocuMed is for informational purposes only and is not a substitute for professional medical advice, diagnosis, or treatment. Always seek the advice of your physician or other qualified health provider with any questions you may have regarding a medical condition.
Made with โค๏ธ for better healthcare accessibility
โญ Star this repo if you find it helpful!