🩺 DocuMed - AI-Powered Medical Document Assistant

An intelligent medical document analysis system that uses AI to help users understand and query their medical PDFs through natural language conversations.

📋 Table of Contents

Introduction
Features
Tech Stack
Architecture
Prerequisites
Installation
Configuration
Usage
API Documentation
Project Structure
Contributing
Troubleshooting
License
Acknowledgments

🎯 Introduction

DocuMed is an AI-powered medical document assistant that helps users interact with their medical PDFs through natural language. Built with cutting-edge technologies like LangChain, Pinecone, and Groq LLM, DocuMed provides accurate, cited answers to medical questions based on uploaded documents.

Why DocuMed?

📄 Simplify Medical Documents: Convert complex medical PDFs into easy-to-understand insights
🤖 AI-Powered: Leverages advanced language models for accurate responses
🔒 Privacy-Focused: Your documents stay secure in your vector database
💬 Conversational: Natural language interface for easy interaction
📚 Source Citations: Every answer includes references to source documents

✨ Features

Core Features

📁 PDF Upload & Processing
- Multi-file upload support
- Automatic text extraction and chunking
- Vector embedding generation
- Pinecone vector database storage
💬 Intelligent Q&A
- Natural language question answering
- Context-aware responses
- Source citation for transparency
- Conversation history tracking
🎨 Modern UI
- Clean, dark-themed interface
- Real-time chat experience
- File management sidebar
- Responsive design
📥 Export Capabilities
- Download conversation history
- Timestamped exports
- Formatted text output

Technical Features

RESTful API with FastAPI
Vector similarity search
LLM-powered response generation
Session management
Error handling and logging
CORS support for web clients

🛠 Tech Stack

Backend

Technology	Purpose	Version
FastAPI	Web framework	0.104+
Python	Core language	3.8+
LangChain	LLM orchestration	Latest
Pinecone	Vector database	Latest
Groq	LLM provider	Latest
Google Generative AI	Embeddings	Latest
PyPDF	PDF processing	Latest
Uvicorn	ASGI server	Latest

Frontend

Technology	Purpose	Version
Streamlit	Web UI framework	1.28+
Requests	HTTP client	Latest

AI/ML

LangChain: Chain orchestration and document processing
Groq LLM: llama-3.3-70b-versatile for response generation
Google Embeddings: text-embedding-004 for vector embeddings
Pinecone: Serverless vector database for semantic search

🏗 Architecture

┌─────────────────┐
│   Streamlit UI  │
│   (Frontend)    │
└────────┬────────┘
         │ HTTP
         ▼
┌─────────────────┐
│   FastAPI       │
│   (Backend)     │
└────────┬────────┘
         │
    ┌────┴────┬────────────┬──────────┐
    ▼         ▼            ▼          ▼
┌────────┐ ┌──────┐ ┌──────────┐ ┌────────┐
│Pinecone│ │ Groq │ │  Google  │ │  PDF   │
│Vector  │ │ LLM  │ │Embeddings│ │Processor│
│  DB    │ │      │ │          │ │        │
└────────┘ └──────┘ └──────────┘ └────────┘

Data Flow

Upload: User uploads PDF → Backend processes → Chunks text → Generates embeddings → Stores in Pinecone
Query: User asks question → Embeds query → Searches Pinecone → Retrieves context → LLM generates answer → Returns with sources

📦 Prerequisites

Before you begin, ensure you have the following installed:

Python 3.8+ (Download)
uv (Python package manager) - pip install uv
Git (Download)

API Keys Required

You'll need to obtain the following API keys:

Groq API Key: Get it here
Pinecone API Key: Get it here
Google API Key: Get it here

🚀 Installation

1. Clone the Repository

git clone https://github.com/Adiaparmar/DocuMed-Medical_AI_Assistant.git
cd DocuMed-Medical_AI_Assistant

2. Set Up Backend

cd server

# Create virtual environment
uv venv

# Activate virtual environment
# On Windows:
.venv\Scripts\activate
# On macOS/Linux:
source .venv/bin/activate

# Install dependencies
uv pip install -r requirements.txt

3. Set Up Frontend

cd ../client

# Install dependencies
uv pip install -r requirements.txt

⚙️ Configuration

1. Create Environment File

Create a .env file in the server directory:

cd server
touch .env  # On Windows: type nul > .env

2. Add API Keys

Edit server/.env and add your API keys:

# LLM Configuration
GROQ_API_KEY=your_groq_api_key_here

# Vector Database
PINECONE_API_KEY=your_pinecone_api_key_here
PINECONE_INDEX_NAME=medical-docs

# Embeddings
GOOGLE_API_KEY=your_google_api_key_here

3. Create Pinecone Index

Go to Pinecone Console
Create a new index:
- Name: medical-docs
- Dimensions: 768 (for Google text-embedding-004)
- Metric: cosine
- Cloud: Choose your preferred region

🎮 Usage

Starting the Backend Server

cd server
uv run uvicorn main:app --reload --host 127.0.0.1 --port 8001

The API will be available at: http://127.0.0.1:8001

Starting the Frontend

cd client
uv run streamlit run app.py

The UI will open automatically at: http://localhost:8501

Using the Application

Upload Documents
- Click "Browse files" in the sidebar
- Select one or more PDF files
- Click "🚀 Upload to Database"
- Wait for confirmation
Ask Questions
- Type your medical question in the chat input
- Press Enter
- View the AI-generated response with source citations
Export History
- Click "📥 Download" to save your conversation
- File includes timestamps and source references

📚 API Documentation

Base URL

http://127.0.0.1:8001

Endpoints

1. Upload PDFs

POST /upload_pdfs/
Content-Type: multipart/form-data

Parameters:
  files: List[UploadFile] - PDF files to upload

Response:
  {
    "message": "Files uploaded and processed successfully",
    "files_processed": 2
  }

2. Ask Question

POST /ask/
Content-Type: multipart/form-data

Parameters:
  question: str - User's medical question

Response:
  {
    "response": "AI-generated answer",
    "sources": ["source1.pdf", "source2.pdf"]
  }

3. API Documentation

GET /docs

Interactive Swagger UI for API testing.

📁 Project Structure

DocuMed-Medical_AI_Assistant/
│
├── server/                      # Backend FastAPI application
│   ├── main.py                 # FastAPI app entry point
│   ├── logger.py               # Logging configuration
│   ├── requirements.txt        # Backend dependencies
│   ├── .env                    # Environment variables (not in repo)
│   │
│   ├── middlewares/            # Middleware components
│   │   ├── __init__.py
│   │   └── exception_handlers.py
│   │
│   ├── routes/                 # API route handlers
│   │   ├── __init__.py
│   │   ├── upload_pdf.py       # PDF upload endpoint
│   │   └── ask_question.py     # Q&A endpoint
│   │
│   └── modules/                # Core business logic
│       ├── __init__.py
│       ├── llm.py              # LLM chain setup
│       ├── load_vectorstore.py # Vector DB operations
│       ├── pdf_handlers.py     # PDF processing
│       └── query_handlers.py   # Query processing
│
├── client/                      # Frontend Streamlit application
│   ├── app.py                  # Main Streamlit app
│   ├── config.py               # API configuration
│   ├── requirements.txt        # Frontend dependencies
│   │
│   ├── .streamlit/             # Streamlit configuration
│   │   └── config.toml         # Theme settings
│   │
│   ├── components/             # UI components
│   │   ├── __init__.py
│   │   ├── upload.py           # Upload component
│   │   ├── chatUI.py           # Chat interface
│   │   └── history_download.py # Export component
│   │
│   └── utils/                  # Utility functions
│       ├── __init__.py
│       └── api.py              # API client functions
│
├── .gitignore                  # Git ignore rules
└── README.md                   # This file

🤝 Contributing

We welcome contributions! Here's how you can help:

Getting Started

Fork the repository
Create a feature branch
```
git checkout -b feature/AmazingFeature
```
Make your changes
Commit your changes
```
git commit -m 'Add some AmazingFeature'
```
Push to the branch
```
git push origin feature/AmazingFeature
```
Open a Pull Request

Contribution Guidelines

Follow PEP 8 style guide for Python code
Write clear, descriptive commit messages
Add tests for new features
Update documentation as needed
Ensure all tests pass before submitting PR

Areas for Contribution

🐛 Bug fixes
✨ New features
📝 Documentation improvements
🎨 UI/UX enhancements
🧪 Test coverage
🌐 Internationalization

🔧 Troubleshooting

Common Issues

1. ModuleNotFoundError

Problem: ModuleNotFoundError: No module named 'xyz'

Solution:

uv pip install -r requirements.txt

2. API Key Errors

Problem: 401 Unauthorized or Invalid API Key

Solution:

Verify your .env file has correct API keys
Ensure no extra spaces in the .env file
Check API key validity on respective platforms

3. Pinecone Connection Issues

Problem: PineconeException: Index not found

Solution:

Verify index name in .env matches Pinecone console
Ensure index dimensions are set to 768
Check Pinecone API key is valid

4. Port Already in Use

Problem: Address already in use

Solution:

# Windows
netstat -ano | findstr :8001
taskkill /PID <PID> /F

# macOS/Linux
lsof -ti:8001 | xargs kill -9

5. White Theme Instead of Dark

Problem: Streamlit shows white theme

Solution:

Ensure .streamlit/config.toml exists
Restart Streamlit completely
Clear browser cache

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

🙏 Acknowledgments

LangChain - For the amazing LLM orchestration framework
Groq - For providing fast LLM inference
Pinecone - For the vector database infrastructure
Google - For the embedding models
Streamlit - For the intuitive UI framework
FastAPI - For the modern web framework

📞 Contact & Support

Author: Adiaparmar
GitHub: @Adiaparmar
Project Link: DocuMed Repository

Support

If you encounter any issues or have questions:

Check the Troubleshooting section
Search existing issues
Create a new issue with detailed information

⚠️ Disclaimer

DocuMed is for informational purposes only and is not a substitute for professional medical advice, diagnosis, or treatment. Always seek the advice of your physician or other qualified health provider with any questions you may have regarding a medical condition.

Made with ❤️ for better healthcare accessibility

⭐ Star this repo if you find it helpful!

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
client		client
server		server
.gitignore		.gitignore
.python-version		.python-version
README.md		README.md
main.py		main.py
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Folders and files

Latest commit

History

Repository files navigation

🩺 DocuMed - AI-Powered Medical Document Assistant

📋 Table of Contents

🎯 Introduction

Why DocuMed?

✨ Features

Core Features

Technical Features

🛠 Tech Stack

Backend

Frontend

AI/ML

🏗 Architecture

Data Flow

📦 Prerequisites

API Keys Required

🚀 Installation

1. Clone the Repository

2. Set Up Backend

3. Set Up Frontend

⚙️ Configuration

1. Create Environment File

2. Add API Keys

3. Create Pinecone Index

🎮 Usage

Starting the Backend Server

Starting the Frontend

Using the Application

📚 API Documentation

Base URL

Endpoints

1. Upload PDFs

2. Ask Question

3. API Documentation

📁 Project Structure

🤝 Contributing

Getting Started

Contribution Guidelines

Areas for Contribution

🔧 Troubleshooting

Common Issues

1. ModuleNotFoundError

2. API Key Errors

3. Pinecone Connection Issues

4. Port Already in Use

5. White Theme Instead of Dark

📄 License

🙏 Acknowledgments

📞 Contact & Support

Support

⚠️ Disclaimer

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages