Recipe Parser

A Python-based OCR and LLM classification system for parsing handwritten recipe images into structured data for a searchable web application.

Overview

This project processes directories of handwritten recipe images (notecards, recipe pages) and extracts structured information including:

Recipe name
Recipe type and subtype classification
Ingredients list
Instructions
Cooking time (when present)
Required utensils (e.g., oven, 9x13 pan, etc.)

Directory Structure

The input recipe directory should be organized as:

Recipe Directory/
├── Recipe Type Dir/
│   ├── Recipe Name/
│   │   ├── image.jpg (single image)
│   │   └── OR
│   │   ├── recipe-name (A).jpg (front of notecard)
│   │   └── recipe-name (B).jpg (back of notecard)

Project Structure

recipe-parser/
├── src/
│   ├── ocr/           # OCR processing modules
│   ├── llm/           # LLM classification and parsing
│   ├── pipeline/      # End-to-end processing pipeline
│   └── utils/         # Helper utilities
├── output/            # Processed recipe data (JSON)
├── tests/             # Unit tests
├── scripts/           # Executable scripts
├── requirements.txt   # Python dependencies
└── .env.example       # Environment variable template

Setup

Prerequisites

Python 3.10+
Tesseract OCR installed on your system

Installation

Install Tesseract OCR:

brew install tesseract

Install Python dependencies:

pip install -r requirements.txt

Configure environment variables:

cp .env.example .env
# Edit .env with your API keys

Usage

Process a single recipe directory

python scripts/process_recipes.py --input /path/to/recipe/dir --output ./output

Process entire recipe collection

python scripts/process_recipes.py --input /path/to/main/recipe/dir --recursive --output ./output

Output Format

Each recipe is exported as a JSON file with the following structure:

{
  "name": "Chocolate Chip Cookies",
  "type": "Dessert",
  "subtype": "Cookies",
  "ingredients": [...],
  "instructions": [...],
  "cooking_time": "25 minutes",
  "utensils": ["oven", "baking sheet", "mixing bowl"],
  "source_images": ["path/to/image.jpg"]
}

Development

Run tests

pytest tests/

Format code

black src/ tests/ scripts/
ruff check src/ tests/ scripts/

License

MIT

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Recipe Parser

Overview

Directory Structure

Project Structure

Setup

Prerequisites

Installation

Usage

Process a single recipe directory

Process entire recipe collection

Output Format

Development

Run tests

Format code

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
scripts		scripts
src		src
tests		tests
.env.example		.env.example
.gitignore		.gitignore
AGENTS.md		AGENTS.md
README.md		README.md
pytest.ini		pytest.ini
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

Recipe Parser

Overview

Directory Structure

Project Structure

Setup

Prerequisites

Installation

Usage

Process a single recipe directory

Process entire recipe collection

Output Format

Development

Run tests

Format code

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages