Extract structured data from handwritten images using Google Gemini and export it automatically to PDF, Excel and Word documents.
This project is a production-style example of how to digitise handwritten records for:
- maintenance and inspection checklists
- field data collection
- warehouse and inventory logs
- healthcare and clinical notes
- any paper-based workflow that needs to go digital
Most OCR tutorials work on clean printed text. This example works on a real photo taken with a mobile phone — handwritten, imperfect, and rotated.
The pipeline handles orientation automatically, extracts the table structure, and generates three ready-to-use output formats.
Step 1 — Input: a real handwritten photo
Step 2 — OCR: Google Gemini reads the table
The image is sent to the Gemini API, which returns the table as a structured list — headers and rows, ready to process.
Step 3 — Output: PDF, Excel and Word generated automatically
![]() PDF — printable report with title and footnote |
![]() Excel — formatted spreadsheet with frozen header |
![]() Word — editable document ready to share |
| Field | Description |
|---|---|
| Headers | First row of the handwritten table |
| Data rows | Each subsequent row, preserving original values |
The structure is detected dynamically — no hardcoded column names, no templates. Works on any handwritten table.
git clone https://github.com/hasff/python-handwritten-ocr-document-generator.git
cd python-handwritten-ocr-document-generator
python -m venv venv
source venv/bin/activate # Windows: venv\Scripts\activate
pip install -r requirements.txtGet a free Gemini API key at Google AI Studio — no billing required.
Create a .env file in the project root:
GEMINI_API_KEY=your_key_here
Place your image in input/handwrite.jpeg, then run:
python program.pyThe script will generate:
output/report.pdf
output/report.xlsx
output/report.docx
Note: By default
DEBUG = Trueinprogram.py, which uses mock data without consuming API quota. SetDEBUG = Falseto run the full pipeline with a real image.
Rather than using traditional OCR libraries, this project uses Google Gemini's vision capabilities to understand handwritten content:
- EXIF orientation is detected and corrected automatically before processing
- The image is sent to Gemini with a structured prompt requesting a JSON array of arrays
- The response is parsed and passed to three independent export functions
- Each format (PDF, Excel, Word) is generated with consistent styling — matching colors, alternating rows, and a shared visual theme
This approach handles messy handwriting, rotated images, and irregular layouts without any preprocessing or template configuration.
Traditional OCR libraries struggle with cursive handwriting. During development, EasyOCR was tested on the same image and returned fragments like "2", "J", "s" with confidence scores below 10%.
Gemini reads the same image correctly on the first attempt.
I help companies automate document processing pipelines:
- digitisation of handwritten forms and checklists
- batch processing of images and scanned documents
- export to Excel, PDF, Word, CSV or JSON
- integration with databases, APIs and ERP systems
- OCR for printed and handwritten content
📩 Contact: hugoferro.business(at)gmail.com
🌐 Courses and professional tools: https://hasff.github.io/site/



