Aadhaar OCR Microservice

A FastAPI-based microservice for extracting information from Aadhaar cards using Optical Character Recognition (OCR).

Project Structure

aadhaar_ocr_fastapi/
├── app/
│   ├── __init__.py          # Package initialization
│   ├── main.py              # FastAPI application entry point
│   ├── routes/              # API route definitions
│   │   └── ocr.py          # OCR API endpoints
│   └── services/            # Business logic services
│       └── ocr_service.py  # OCR processing service
├── requirements.txt         # Project dependencies
README.md               # Project documentation

Quick Start

Install dependencies:

pip install -r requirements.txt
brew install tesseract       # Install Tesseract OCR engine

Run the service:

uvicorn app.main:app --reload

API Usage

Extract Aadhaar Info

POST /api/ocr/aadhaar

Upload an Aadhaar card image (PNG/JPEG) to extract:

Name
Aadhaar Number
DOB
Gender
Address

Example Response:

{
    "status": "success",
    "data": {
        "name": "Suresh Kumar",
        "aadhaar_number": "XXXX XXXX XXXX",
        "dob": "1992-06-15",
        "gender": "Male",
        "address": "Full extracted address"
    }
}

Project Flow

Image Upload
- User uploads an Aadhaar card image through the API endpoint
- The image is received as a multipart form data
OCR Processing
- The image is processed using PyTesseract OCR engine
- Text is extracted from the image
Information Extraction
- The extracted text is parsed to find specific patterns:
  - Name: Extracted from text above DOB or between DOB and Gender
  - Aadhaar Number: Using regex pattern (4 digits space 4 digits space 4 digits)
  - DOB: Using regex pattern (DD/MM/YYYY)
  - Gender: Looking for keywords like 'Male', 'Female', 'M', 'F'
  - Address: Extracted by identifying address keywords and parsing the text block
Data Cleaning
- Text is cleaned to remove special characters
- Address formatting is standardized
- Personal information is validated against regex patterns

Technology Stack

FastAPI: Modern, fast web framework for building APIs
PyTesseract: Python wrapper for Google's Tesseract-OCR
Pillow: Python Imaging Library for image processing
Python-Multipart: For handling file uploads
uvicorn: ASGI server implementation for running the application

Key Features

Efficient OCR processing using Tesseract
Robust text pattern matching for accurate information extraction
Clean and maintainable code structure
FastAPI's built-in request validation and error handling
Easy to extend with additional OCR capabilities

Error Handling

The API includes comprehensive error handling for:

Invalid image formats
Missing image uploads
OCR processing failures
Pattern matching failures
Invalid data formats

Security Considerations

All Aadhaar numbers are masked in the response (XXXX XXXX XXXX)
The service doesn't store any Aadhaar images or extracted data
Input validation is performed at all levels
Rate limiting can be added for production use

Contributing

Fork the repository
Create your feature branch (git checkout -b feature/AmazingFeature)
Commit your changes (git commit -m 'Add some AmazingFeature')
Push to the branch (git push origin feature/AmazingFeature)
Open a Pull Request

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Aadhaar OCR Microservice

Project Structure

Quick Start

API Usage

Extract Aadhaar Info

Project Flow

Technology Stack

Key Features

Error Handling

Security Considerations

Contributing

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
aadhaar_ocr_fastapi		aadhaar_ocr_fastapi
README.md		README.md

Folders and files

Latest commit

History

Repository files navigation

Aadhaar OCR Microservice

Project Structure

Quick Start

API Usage

Extract Aadhaar Info

Project Flow

Technology Stack

Key Features

Error Handling

Security Considerations

Contributing

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages