Skip to content

hasff/python-handwritten-ocr-document-generator

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Python Handwritten OCR Document Generator

Extract structured data from handwritten images using Google Gemini and export it automatically to PDF, Excel and Word documents.

This project is a production-style example of how to digitise handwritten records for:

  • maintenance and inspection checklists
  • field data collection
  • warehouse and inventory logs
  • healthcare and clinical notes
  • any paper-based workflow that needs to go digital

How it works

Most OCR tutorials work on clean printed text. This example works on a real photo taken with a mobile phone — handwritten, imperfect, and rotated.

The pipeline handles orientation automatically, extracts the table structure, and generates three ready-to-use output formats.

Step 1 — Input: a real handwritten photo

Handwritten input

Step 2 — OCR: Google Gemini reads the table

The image is sent to the Gemini API, which returns the table as a structured list — headers and rows, ready to process.

Step 3 — Output: PDF, Excel and Word generated automatically


PDF — printable report with title and footnote

Excel — formatted spreadsheet with frozen header

Word — editable document ready to share

What gets extracted

Field Description
Headers First row of the handwritten table
Data rows Each subsequent row, preserving original values

The structure is detected dynamically — no hardcoded column names, no templates. Works on any handwritten table.


Quick Start

git clone https://github.com/hasff/python-handwritten-ocr-document-generator.git
cd python-handwritten-ocr-document-generator
python -m venv venv
source venv/bin/activate   # Windows: venv\Scripts\activate
pip install -r requirements.txt

Get a free Gemini API key at Google AI Studio — no billing required.

Create a .env file in the project root:

GEMINI_API_KEY=your_key_here

Place your image in input/handwrite.jpeg, then run:

python program.py

The script will generate:

output/report.pdf
output/report.xlsx
output/report.docx

Note: By default DEBUG = True in program.py, which uses mock data without consuming API quota. Set DEBUG = False to run the full pipeline with a real image.


Technical approach

Rather than using traditional OCR libraries, this project uses Google Gemini's vision capabilities to understand handwritten content:

  • EXIF orientation is detected and corrected automatically before processing
  • The image is sent to Gemini with a structured prompt requesting a JSON array of arrays
  • The response is parsed and passed to three independent export functions
  • Each format (PDF, Excel, Word) is generated with consistent styling — matching colors, alternating rows, and a shared visual theme

This approach handles messy handwriting, rotated images, and irregular layouts without any preprocessing or template configuration.


Why not EasyOCR or Tesseract?

Traditional OCR libraries struggle with cursive handwriting. During development, EasyOCR was tested on the same image and returned fragments like "2", "J", "s" with confidence scores below 10%.

Gemini reads the same image correctly on the first attempt.


Need custom document automation?

I help companies automate document processing pipelines:

  • digitisation of handwritten forms and checklists
  • batch processing of images and scanned documents
  • export to Excel, PDF, Word, CSV or JSON
  • integration with databases, APIs and ERP systems
  • OCR for printed and handwritten content

📩 Contact: hugoferro.business(at)gmail.com

🌐 Courses and professional tools: https://hasff.github.io/site/

About

Extracts text from handwritten images using Google Gemini OCR and exports to PDF, Excel and Word documents.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages