Skip to content

htmelvis/recipe-parser

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Recipe Parser

A Python-based OCR and LLM classification system for parsing handwritten recipe images into structured data for a searchable web application.

Overview

This project processes directories of handwritten recipe images (notecards, recipe pages) and extracts structured information including:

  • Recipe name
  • Recipe type and subtype classification
  • Ingredients list
  • Instructions
  • Cooking time (when present)
  • Required utensils (e.g., oven, 9x13 pan, etc.)

Directory Structure

The input recipe directory should be organized as:

Recipe Directory/
├── Recipe Type Dir/
│   ├── Recipe Name/
│   │   ├── image.jpg (single image)
│   │   └── OR
│   │   ├── recipe-name (A).jpg (front of notecard)
│   │   └── recipe-name (B).jpg (back of notecard)

Project Structure

recipe-parser/
├── src/
│   ├── ocr/           # OCR processing modules
│   ├── llm/           # LLM classification and parsing
│   ├── pipeline/      # End-to-end processing pipeline
│   └── utils/         # Helper utilities
├── output/            # Processed recipe data (JSON)
├── tests/             # Unit tests
├── scripts/           # Executable scripts
├── requirements.txt   # Python dependencies
└── .env.example       # Environment variable template

Setup

Prerequisites

  • Python 3.10+
  • Tesseract OCR installed on your system

Installation

  1. Install Tesseract OCR:
brew install tesseract
  1. Install Python dependencies:
pip install -r requirements.txt
  1. Configure environment variables:
cp .env.example .env
# Edit .env with your API keys

Usage

Process a single recipe directory

python scripts/process_recipes.py --input /path/to/recipe/dir --output ./output

Process entire recipe collection

python scripts/process_recipes.py --input /path/to/main/recipe/dir --recursive --output ./output

Output Format

Each recipe is exported as a JSON file with the following structure:

{
  "name": "Chocolate Chip Cookies",
  "type": "Dessert",
  "subtype": "Cookies",
  "ingredients": [...],
  "instructions": [...],
  "cooking_time": "25 minutes",
  "utensils": ["oven", "baking sheet", "mixing bowl"],
  "source_images": ["path/to/image.jpg"]
}

Development

Run tests

pytest tests/

Format code

black src/ tests/ scripts/
ruff check src/ tests/ scripts/

License

MIT

About

Python OCR + LLM system for parsing handwritten recipes

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages