Healthcare Prescription Decoding

The objective is to build a Natural Language Processing (NLP) system that takes raw photos of medical prescriptions as input and decodes them into structured, usable text.

This tool is designed to reduce manual data entry errors and streamline medication management for healthcare providers.

My Approach: A 2-Stage Pipeline

Given the complexity of handwriting analysis and the critical nature of medical data, I implemented a robust 2-stage pipeline that separates visual processing from semantic understanding.

Stage 1: Image Preprocessing (Computer Vision)

Tool: OpenCV

Process: Raw prescription images often contain noise, shadows, or poor lighting.

Thresholding & Enhancement: Utilized OpenCV to apply adaptive thresholding and image preprocessing techniques.
Goal: To significantly enhance text visibility and separate the handwritten text from the background noise, preparing a clean input for the extraction phase.

Stage 2: Entity Extraction (NLP)

Tool: spaCy & Scikit-learn

Process: Once the text is visible, the system treats it as a sequence of tokens to be classified.

Tokenization: Employed spaCy to break down the processed text into individual meaningful units.
Named Entity Recognition (NER): utilized classification models to recognize and extract critical medical entities. The model specifically targets:
Drug Names: (e.g., "Paracetamol")
Dosages: (e.g., "500mg")
Instructions: (e.g., "Twice a day after meals")

Impact

Streamlined Workflow: Automates the information extraction process, allowing doctors and pharmacists to digitize records instantly.
Enhanced Safety: Reduces the risk of human error in interpreting handwriting, ensuring patients receive the correct medication details.

Installation

This project was built in a standard Python environment.

Clone the repository:

git clone [Your-GitHub-Repo-URL]
cd [Your-Project-Folder]

Install Python dependencies:

pip install -r requirements.txt

(If you don't have a requirements.txt, you can install the key libraries directly based on the tech stack used):

pip install opencv-python spacy scikit-learn numpy

Download spaCy Model: You will need to download the English language model for spaCy:

python -m spacy download en_core_web_sm

How to Run

The core logic is contained in the main script (e.g., main.py or notebook.ipynb).

Place Your Data: Ensure your raw prescription images are in the input_images/ directory.
Run the Script:

python main.py

View Results: The structured text (JSON or CSV) will be saved to the output/ directory.

Project Structure

├── input_images/               # Folder for raw prescription photos
├── output/                     # Folder for decoded structured text results
├── main.py                     # Main script for the extraction pipeline
├── preprocessing.py            # OpenCV functions for image thresholding
├── extraction.py               # spaCy logic for NER and tokenization
├── requirements.txt            # Python dependencies
└── README.md                   # This file

Technologies Used

Python: Core programming language.
OpenCV: For image preprocessing and thresholding.
spaCy: For NLP, tokenization, and entity extraction.
Scikit-learn: For classification tasks.

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
README.md		README.md
shl_assessment.ipynb		shl_assessment.ipynb
submission.csv		submission.csv
test_transcripts.csv		test_transcripts.csv
train_transcripts.csv		train_transcripts.csv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Healthcare Prescription Decoding

My Approach: A 2-Stage Pipeline

Stage 1: Image Preprocessing (Computer Vision)

Stage 2: Entity Extraction (NLP)

Impact

Installation

How to Run

Project Structure

Technologies Used

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Healthcare Prescription Decoding

My Approach: A 2-Stage Pipeline

Stage 1: Image Preprocessing (Computer Vision)

Stage 2: Entity Extraction (NLP)

Impact

Installation

How to Run

Project Structure

Technologies Used

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages