Skip to content

jaydogtwotime/Sefaria

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Sefaria Translation Pipeline

Fetches Hebrew and Aramaic text from the Sefaria API, translates it using large language models, and generates formatted Word documents.

Works with any text available on Sefaria.

Setup

pip install -r requirements.txt
export ANTHROPIC_API_KEY='your-key-here'

Usage

# Basic usage — fetch, translate, clean, and generate a Word doc
python pipeline.py --text "Kessef Mishneh on Mishneh Torah, Rebels" --chapters 7

# With a human-readable display name and translator credit
python pipeline.py \
  --text "Kessef Mishneh on Mishneh Torah, Rebels" \
  --chapters 7 \
  --display-name "Hilchot Mamrim" \
  --translator "Your Name"

# Use a different model
python pipeline.py --text "Rashi on Genesis" --chapters 50 --model claude-sonnet-4-20250514

# Only fetch and translate, skip Word doc generation
python pipeline.py --text "Mishnah Sanhedrin" --chapters 11 --no-docx

# Re-run cleaning and doc generation on already-translated text
python pipeline.py --text "Kessef Mishneh on Mishneh Torah, Rebels" --chapters 7 \
  --skip-fetch --skip-translate

How It Works

  1. Fetch — Downloads Hebrew text chapter-by-chapter from Sefaria's API v3
  2. Translate — Sends each chapter for scholarly translation, with review flags for uncertain terms
  3. Clean — Replaces Hebrew transliterations with English equivalents, removes markdown artifacts
  4. Generate — Produces a formatted Word document and an editorial review sheet

All intermediate data is cached in data/, so you can re-run any stage without repeating earlier work.

Options

Flag Description
--text Sefaria text reference, required
--chapters Number of chapters, required
--display-name Human-readable name, defaults to --text value
--translator Translator name for title page
--output-dir Output directory, default ./outputs
--data-dir Cache directory, default ./data
--model Claude model, default claude-opus-4-6
--workers Parallel translation threads, default 4
--skip-fetch Skip Hebrew fetch stage
--skip-translate Skip translation stage
--skip-clean Skip cleaning stage
--no-docx Skip Word doc generation

Standalone Tools

# Clean translations directly
python clean.py data/translation_cache

Project Structure

pipeline.py          # Main CLI entry point
clean.py             # Translation cleaning module
docx_generator.py    # Word document generation
requirements.txt     # Python dependencies
data/                # Cached Hebrew text and translations
  hebrew_cache/
  translation_cache/
outputs/             # Generated Word documents

Credits

Original pipeline and translations by Jacob Goldman.

Finding Sefaria Text References

Browse https://www.sefaria.org to find the text you want. The --text argument should match Sefaria's title for the text. You can find this in the URL or API.

For example:

  • "Rashi on Genesis" for Rashi's Torah commentary
  • "Mishnah Sanhedrin" for Mishnah Sanhedrin
  • "Kessef Mishneh on Mishneh Torah, Rebels" for Kesef Mishneh on Hilchot Mamrim

About

Open-source pipeline for translating Hebrew texts from Sefaria using Claude. Fetch, translate, clean, and generate formatted Word documents.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages