Skip to content

Adiaparmar/DocuMed-Medical_AI_Assistant

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 

History

7 Commits
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 

Repository files navigation

๐Ÿฉบ DocuMed - AI-Powered Medical Document Assistant

Python FastAPI Streamlit License

An intelligent medical document analysis system that uses AI to help users understand and query their medical PDFs through natural language conversations.

DocuMed Banner


๐Ÿ“‹ Table of Contents


๐ŸŽฏ Introduction

DocuMed is an AI-powered medical document assistant that helps users interact with their medical PDFs through natural language. Built with cutting-edge technologies like LangChain, Pinecone, and Groq LLM, DocuMed provides accurate, cited answers to medical questions based on uploaded documents.

Why DocuMed?

  • ๐Ÿ“„ Simplify Medical Documents: Convert complex medical PDFs into easy-to-understand insights
  • ๐Ÿค– AI-Powered: Leverages advanced language models for accurate responses
  • ๐Ÿ”’ Privacy-Focused: Your documents stay secure in your vector database
  • ๐Ÿ’ฌ Conversational: Natural language interface for easy interaction
  • ๐Ÿ“š Source Citations: Every answer includes references to source documents

โœจ Features

Core Features

  • ๐Ÿ“ PDF Upload & Processing

    • Multi-file upload support
    • Automatic text extraction and chunking
    • Vector embedding generation
    • Pinecone vector database storage
  • ๐Ÿ’ฌ Intelligent Q&A

    • Natural language question answering
    • Context-aware responses
    • Source citation for transparency
    • Conversation history tracking
  • ๐ŸŽจ Modern UI

    • Clean, dark-themed interface
    • Real-time chat experience
    • File management sidebar
    • Responsive design
  • ๐Ÿ“ฅ Export Capabilities

    • Download conversation history
    • Timestamped exports
    • Formatted text output

Technical Features

  • RESTful API with FastAPI
  • Vector similarity search
  • LLM-powered response generation
  • Session management
  • Error handling and logging
  • CORS support for web clients

๐Ÿ›  Tech Stack

Backend

Technology Purpose Version
FastAPI Web framework 0.104+
Python Core language 3.8+
LangChain LLM orchestration Latest
Pinecone Vector database Latest
Groq LLM provider Latest
Google Generative AI Embeddings Latest
PyPDF PDF processing Latest
Uvicorn ASGI server Latest

Frontend

Technology Purpose Version
Streamlit Web UI framework 1.28+
Requests HTTP client Latest

AI/ML

  • LangChain: Chain orchestration and document processing
  • Groq LLM: llama-3.3-70b-versatile for response generation
  • Google Embeddings: text-embedding-004 for vector embeddings
  • Pinecone: Serverless vector database for semantic search

๐Ÿ— Architecture

โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚   Streamlit UI  โ”‚
โ”‚   (Frontend)    โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
         โ”‚ HTTP
         โ–ผ
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚   FastAPI       โ”‚
โ”‚   (Backend)     โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
         โ”‚
    โ”Œโ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
    โ–ผ         โ–ผ            โ–ผ          โ–ผ
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ” โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ” โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ” โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚Pineconeโ”‚ โ”‚ Groq โ”‚ โ”‚  Google  โ”‚ โ”‚  PDF   โ”‚
โ”‚Vector  โ”‚ โ”‚ LLM  โ”‚ โ”‚Embeddingsโ”‚ โ”‚Processorโ”‚
โ”‚  DB    โ”‚ โ”‚      โ”‚ โ”‚          โ”‚ โ”‚        โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜ โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”˜ โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜ โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

Data Flow

  1. Upload: User uploads PDF โ†’ Backend processes โ†’ Chunks text โ†’ Generates embeddings โ†’ Stores in Pinecone
  2. Query: User asks question โ†’ Embeds query โ†’ Searches Pinecone โ†’ Retrieves context โ†’ LLM generates answer โ†’ Returns with sources

๐Ÿ“ฆ Prerequisites

Before you begin, ensure you have the following installed:

  • Python 3.8+ (Download)
  • uv (Python package manager) - pip install uv
  • Git (Download)

API Keys Required

You'll need to obtain the following API keys:

  1. Groq API Key: Get it here
  2. Pinecone API Key: Get it here
  3. Google API Key: Get it here

๐Ÿš€ Installation

1. Clone the Repository

git clone https://github.com/Adiaparmar/DocuMed-Medical_AI_Assistant.git
cd DocuMed-Medical_AI_Assistant

2. Set Up Backend

cd server

# Create virtual environment
uv venv

# Activate virtual environment
# On Windows:
.venv\Scripts\activate
# On macOS/Linux:
source .venv/bin/activate

# Install dependencies
uv pip install -r requirements.txt

3. Set Up Frontend

cd ../client

# Install dependencies
uv pip install -r requirements.txt

โš™๏ธ Configuration

1. Create Environment File

Create a .env file in the server directory:

cd server
touch .env  # On Windows: type nul > .env

2. Add API Keys

Edit server/.env and add your API keys:

# LLM Configuration
GROQ_API_KEY=your_groq_api_key_here

# Vector Database
PINECONE_API_KEY=your_pinecone_api_key_here
PINECONE_INDEX_NAME=medical-docs

# Embeddings
GOOGLE_API_KEY=your_google_api_key_here

3. Create Pinecone Index

  1. Go to Pinecone Console
  2. Create a new index:
    • Name: medical-docs
    • Dimensions: 768 (for Google text-embedding-004)
    • Metric: cosine
    • Cloud: Choose your preferred region

๐ŸŽฎ Usage

Starting the Backend Server

cd server
uv run uvicorn main:app --reload --host 127.0.0.1 --port 8001

The API will be available at: http://127.0.0.1:8001

Starting the Frontend

cd client
uv run streamlit run app.py

The UI will open automatically at: http://localhost:8501

Using the Application

  1. Upload Documents

    • Click "Browse files" in the sidebar
    • Select one or more PDF files
    • Click "๐Ÿš€ Upload to Database"
    • Wait for confirmation
  2. Ask Questions

    • Type your medical question in the chat input
    • Press Enter
    • View the AI-generated response with source citations
  3. Export History

    • Click "๐Ÿ“ฅ Download" to save your conversation
    • File includes timestamps and source references

๐Ÿ“š API Documentation

Base URL

http://127.0.0.1:8001

Endpoints

1. Upload PDFs

POST /upload_pdfs/
Content-Type: multipart/form-data

Parameters:
  files: List[UploadFile] - PDF files to upload

Response:
  {
    "message": "Files uploaded and processed successfully",
    "files_processed": 2
  }

2. Ask Question

POST /ask/
Content-Type: multipart/form-data

Parameters:
  question: str - User's medical question

Response:
  {
    "response": "AI-generated answer",
    "sources": ["source1.pdf", "source2.pdf"]
  }

3. API Documentation

GET /docs

Interactive Swagger UI for API testing.


๐Ÿ“ Project Structure

DocuMed-Medical_AI_Assistant/
โ”‚
โ”œโ”€โ”€ server/                      # Backend FastAPI application
โ”‚   โ”œโ”€โ”€ main.py                 # FastAPI app entry point
โ”‚   โ”œโ”€โ”€ logger.py               # Logging configuration
โ”‚   โ”œโ”€โ”€ requirements.txt        # Backend dependencies
โ”‚   โ”œโ”€โ”€ .env                    # Environment variables (not in repo)
โ”‚   โ”‚
โ”‚   โ”œโ”€โ”€ middlewares/            # Middleware components
โ”‚   โ”‚   โ”œโ”€โ”€ __init__.py
โ”‚   โ”‚   โ””โ”€โ”€ exception_handlers.py
โ”‚   โ”‚
โ”‚   โ”œโ”€โ”€ routes/                 # API route handlers
โ”‚   โ”‚   โ”œโ”€โ”€ __init__.py
โ”‚   โ”‚   โ”œโ”€โ”€ upload_pdf.py       # PDF upload endpoint
โ”‚   โ”‚   โ””โ”€โ”€ ask_question.py     # Q&A endpoint
โ”‚   โ”‚
โ”‚   โ””โ”€โ”€ modules/                # Core business logic
โ”‚       โ”œโ”€โ”€ __init__.py
โ”‚       โ”œโ”€โ”€ llm.py              # LLM chain setup
โ”‚       โ”œโ”€โ”€ load_vectorstore.py # Vector DB operations
โ”‚       โ”œโ”€โ”€ pdf_handlers.py     # PDF processing
โ”‚       โ””โ”€โ”€ query_handlers.py   # Query processing
โ”‚
โ”œโ”€โ”€ client/                      # Frontend Streamlit application
โ”‚   โ”œโ”€โ”€ app.py                  # Main Streamlit app
โ”‚   โ”œโ”€โ”€ config.py               # API configuration
โ”‚   โ”œโ”€โ”€ requirements.txt        # Frontend dependencies
โ”‚   โ”‚
โ”‚   โ”œโ”€โ”€ .streamlit/             # Streamlit configuration
โ”‚   โ”‚   โ””โ”€โ”€ config.toml         # Theme settings
โ”‚   โ”‚
โ”‚   โ”œโ”€โ”€ components/             # UI components
โ”‚   โ”‚   โ”œโ”€โ”€ __init__.py
โ”‚   โ”‚   โ”œโ”€โ”€ upload.py           # Upload component
โ”‚   โ”‚   โ”œโ”€โ”€ chatUI.py           # Chat interface
โ”‚   โ”‚   โ””โ”€โ”€ history_download.py # Export component
โ”‚   โ”‚
โ”‚   โ””โ”€โ”€ utils/                  # Utility functions
โ”‚       โ”œโ”€โ”€ __init__.py
โ”‚       โ””โ”€โ”€ api.py              # API client functions
โ”‚
โ”œโ”€โ”€ .gitignore                  # Git ignore rules
โ””โ”€โ”€ README.md                   # This file

๐Ÿค Contributing

We welcome contributions! Here's how you can help:

Getting Started

  1. Fork the repository
  2. Create a feature branch
    git checkout -b feature/AmazingFeature
  3. Make your changes
  4. Commit your changes
    git commit -m 'Add some AmazingFeature'
  5. Push to the branch
    git push origin feature/AmazingFeature
  6. Open a Pull Request

Contribution Guidelines

  • Follow PEP 8 style guide for Python code
  • Write clear, descriptive commit messages
  • Add tests for new features
  • Update documentation as needed
  • Ensure all tests pass before submitting PR

Areas for Contribution

  • ๐Ÿ› Bug fixes
  • โœจ New features
  • ๐Ÿ“ Documentation improvements
  • ๐ŸŽจ UI/UX enhancements
  • ๐Ÿงช Test coverage
  • ๐ŸŒ Internationalization

๐Ÿ”ง Troubleshooting

Common Issues

1. ModuleNotFoundError

Problem: ModuleNotFoundError: No module named 'xyz'

Solution:

uv pip install -r requirements.txt

2. API Key Errors

Problem: 401 Unauthorized or Invalid API Key

Solution:

  • Verify your .env file has correct API keys
  • Ensure no extra spaces in the .env file
  • Check API key validity on respective platforms

3. Pinecone Connection Issues

Problem: PineconeException: Index not found

Solution:

  • Verify index name in .env matches Pinecone console
  • Ensure index dimensions are set to 768
  • Check Pinecone API key is valid

4. Port Already in Use

Problem: Address already in use

Solution:

# Windows
netstat -ano | findstr :8001
taskkill /PID <PID> /F

# macOS/Linux
lsof -ti:8001 | xargs kill -9

5. White Theme Instead of Dark

Problem: Streamlit shows white theme

Solution:

  • Ensure .streamlit/config.toml exists
  • Restart Streamlit completely
  • Clear browser cache

๐Ÿ“„ License

This project is licensed under the MIT License - see the LICENSE file for details.


๐Ÿ™ Acknowledgments

  • LangChain - For the amazing LLM orchestration framework
  • Groq - For providing fast LLM inference
  • Pinecone - For the vector database infrastructure
  • Google - For the embedding models
  • Streamlit - For the intuitive UI framework
  • FastAPI - For the modern web framework

๐Ÿ“ž Contact & Support

Support

If you encounter any issues or have questions:

  1. Check the Troubleshooting section
  2. Search existing issues
  3. Create a new issue with detailed information

โš ๏ธ Disclaimer

DocuMed is for informational purposes only and is not a substitute for professional medical advice, diagnosis, or treatment. Always seek the advice of your physician or other qualified health provider with any questions you may have regarding a medical condition.


Made with โค๏ธ for better healthcare accessibility

โญ Star this repo if you find it helpful!

About

An intelligent medical document analysis system that uses AI to help users understand and query their medical PDFs through natural language conversations.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages