Sefaria Translation Pipeline

Fetches Hebrew and Aramaic text from the Sefaria API, translates it using large language models, and generates formatted Word documents.

Works with any text available on Sefaria.

Setup

pip install -r requirements.txt
export ANTHROPIC_API_KEY='your-key-here'

Usage

# Basic usage — fetch, translate, clean, and generate a Word doc
python pipeline.py --text "Kessef Mishneh on Mishneh Torah, Rebels" --chapters 7

# With a human-readable display name and translator credit
python pipeline.py \
  --text "Kessef Mishneh on Mishneh Torah, Rebels" \
  --chapters 7 \
  --display-name "Hilchot Mamrim" \
  --translator "Your Name"

# Use a different model
python pipeline.py --text "Rashi on Genesis" --chapters 50 --model claude-sonnet-4-20250514

# Only fetch and translate, skip Word doc generation
python pipeline.py --text "Mishnah Sanhedrin" --chapters 11 --no-docx

# Re-run cleaning and doc generation on already-translated text
python pipeline.py --text "Kessef Mishneh on Mishneh Torah, Rebels" --chapters 7 \
  --skip-fetch --skip-translate

How It Works

Fetch — Downloads Hebrew text chapter-by-chapter from Sefaria's API v3
Translate — Sends each chapter for scholarly translation, with review flags for uncertain terms
Clean — Replaces Hebrew transliterations with English equivalents, removes markdown artifacts
Generate — Produces a formatted Word document and an editorial review sheet

All intermediate data is cached in data/, so you can re-run any stage without repeating earlier work.

Options

Flag	Description
`--text`	Sefaria text reference, required
`--chapters`	Number of chapters, required
`--display-name`	Human-readable name, defaults to `--text` value
`--translator`	Translator name for title page
`--output-dir`	Output directory, default `./outputs`
`--data-dir`	Cache directory, default `./data`
`--model`	Claude model, default `claude-opus-4-6`
`--workers`	Parallel translation threads, default 4
`--skip-fetch`	Skip Hebrew fetch stage
`--skip-translate`	Skip translation stage
`--skip-clean`	Skip cleaning stage
`--no-docx`	Skip Word doc generation

Standalone Tools

# Clean translations directly
python clean.py data/translation_cache

Project Structure

pipeline.py          # Main CLI entry point
clean.py             # Translation cleaning module
docx_generator.py    # Word document generation
requirements.txt     # Python dependencies
data/                # Cached Hebrew text and translations
  hebrew_cache/
  translation_cache/
outputs/             # Generated Word documents

Credits

Original pipeline and translations by Jacob Goldman.

Finding Sefaria Text References

Browse https://www.sefaria.org to find the text you want. The --text argument should match Sefaria's title for the text. You can find this in the URL or API.

For example:

"Rashi on Genesis" for Rashi's Torah commentary
"Mishnah Sanhedrin" for Mishnah Sanhedrin
"Kessef Mishneh on Mishneh Torah, Rebels" for Kesef Mishneh on Hilchot Mamrim

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Sefaria Translation Pipeline

Setup

Usage

How It Works

Options

Standalone Tools

Project Structure

Credits

Finding Sefaria Text References

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
data		data
outputs		outputs
.gitattributes		.gitattributes
.gitignore		.gitignore
README.md		README.md
clean.py		clean.py
docx_generator.py		docx_generator.py
pipeline.py		pipeline.py
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

Sefaria Translation Pipeline

Setup

Usage

How It Works

Options

Standalone Tools

Project Structure

Credits

Finding Sefaria Text References

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages