System Architecture

This document describes the architecture of a state-machine-controlled agentic workflow for translating Latin patristic texts. For the patterns behind this design, see PATTERNS.md.

Whitepaper: DOI 10.5281/zenodo.18002473

System Overview

┌─────────────────────────────────────────────────────────────────────────┐
│                        TRANSLATION PIPELINE                              │
├─────────────────────────────────────────────────────────────────────────┤
│                                                                          │
│  ┌──────────┐    ┌──────────┐    ┌──────────┐    ┌──────────┐          │
│  │ SELECTING│ →  │RESEARCHING│ →  │TRANSLATING│ →  │VALIDATING│          │
│  └──────────┘    └──────────┘    └──────────┘    └──────────┘          │
│       ↑              │                │               │                  │
│       │         [auto-trigger]        │          [quality gate]          │
│       │                               │               │                  │
│  ┌──────────┐    ┌──────────┐    ┌──────────┐    ┌──────────┐          │
│  │  REVIEW  │ ←  │PUBLISHING │ ←  │DISTRIBUTING│←  │GENERATING│          │
│  └──────────┘    └──────────┘    └──────────┘    │  _AUDIO  │          │
│       │         [PRIVATE-first]       ↑          └──────────┘          │
│       │                               │               │                  │
│       ↓                          ┌──────────┐         │                  │
│  ┌──────────┐                    │AWAITING_ │         │                  │
│  │ COMPLETE │                    │  VIDEO   │         │                  │
│  └──────────┘                    └──────────┘         │                  │
│                                       ↑               │                  │
│                                  [session break]      │                  │
│                                  ┌──────────┐         │                  │
│                                  │GENERATING│ ←───────┘                  │
│                                  │  _VIDEO  │                            │
│                                  └──────────┘                            │
│                                                                          │
└─────────────────────────────────────────────────────────────────────────┘

State Machine Design

Why a State Machine?

The pipeline involves multiple long-running operations (API calls, video encoding) that span multiple sessions. A state machine provides:

Resumability — Work continues where it left off after session breaks
Auditability — Clear record of what happened and when
Safety gates — Required checkpoints prevent incomplete work from publishing
Human oversight — REVIEW state ensures nothing goes public without approval

State Definitions

State	Description	Session Behavior
SELECTING	Browsing sources, extracting text	Interactive
RESEARCHING	Deep research API call	Auto-triggered on entry
TRANSLATING	Translation API call	Agent runs script
VALIDATING	Quality checks	Agent runs validation
GENERATING_AUDIO	TTS synthesis	Agent runs scripts
GENERATING_VIDEO	FFmpeg encoding	STOP SESSION
AWAITING_VIDEO	Waiting for video file	Check and resume
DISTRIBUTING	Archive.org, GitHub, blog	Agent runs uploads
PUBLISHING	YouTube upload	Agent runs upload
REVIEW	Human approval	AWAIT COMMANDS
COMPLETE	Finished	Archive and report
CANCELLED	Abandoned	Terminal state

Transition Requirements

Each transition has explicit requirements that must be satisfied:

From → To	Requirements
SELECTING → RESEARCHING	`source.title_latin` set
RESEARCHING → TRANSLATING	`research.completed_at` and `research.file_path` set
TRANSLATING → VALIDATING	`translation.file_path` set
VALIDATING → GENERATING_AUDIO	`translation.validation.passed = true`
GENERATING_AUDIO → GENERATING_VIDEO	`audio.completed_at` and `audio.file_path` set
GENERATING_VIDEO → AWAITING_VIDEO	`video.job_started_at` set
AWAITING_VIDEO → DISTRIBUTING	`video.completed_at` and `video.file_path` set
DISTRIBUTING → PUBLISHING	`archive_org.url`, `github_url`, and `blog_url` set
PUBLISHING → REVIEW	`youtube.video_id` set and `youtube.thumbnail_set = true`
REVIEW → COMPLETE	`review.approved_at` set

The critical insight: translation.validation.passed = true can only be set by the validation script returning exit code 0. The agent cannot self-certify.

Key Design Decisions

AWAITING_VIDEO: The Parking State

Video encoding takes 30-60 minutes. Rather than have the agent wait:

Start encoding and transition to GENERATING_VIDEO
Immediately transition to AWAITING_VIDEO
End the session
Next session: check if video is stable (size unchanged for 60 seconds)
If stable, proceed to DISTRIBUTING

This design prevents wasted context and API costs from idle waiting.

PRIVATE-First Publishing

All YouTube uploads are PRIVATE by default. This prevents:

Publishing incomplete or incorrect content
SEO damage from deleted/re-uploaded videos
Errors visible to subscribers before review

The REVIEW state allows fixes before making content public:

Fix title, description, thumbnail
Re-run research if needed
Abort and delete if unsalvageable

Only after explicit human approval does content become public.

Quality Gates

Translation Validation (Critical Gate)

The validate_translation.py script prevents the most catastrophic error: publishing an incomplete translation.

What it checks:

Last translated chunk appears near end of source text
No significant untranslated content remains
No truncation indicators (sentences ending mid-thought)
Metadata consistency with project state

This gate exists because a previous incident published a text missing Section VI and half of Section V. The translation looked complete but had been silently truncated.

Research-to-Blog Validation

The validate_blog.py script ensures the blog contains actual research content, not AI-generated summaries.

What it checks:

Citations from research.md appear in blog
Distinctive phrases from research appear in blog
No generic AI-filler phrases
Word count matches research depth

This gate exists because the agent once "wrote a blog post about" a topic instead of extracting the actual research content. The result had zero citations from the research file.

Data Model

Project State

Each project has a single canonical state file:

{
  "project_id": "de-trinitate-20251214",
  "state": "TRANSLATING",
  "created_at": "2025-12-14T10:00:00Z",
  "updated_at": "2025-12-14T12:30:00Z",

  "source": {
    "volume": "042",
    "title_latin": "De Trinitate",
    "title_english": "On the Trinity",
    "author": "Augustine of Hippo",
    "century": 5,
    "estimated_duration_minutes": 45
  },

  "research": {
    "completed_at": "2025-12-14T10:30:00Z",
    "word_count": 2500,
    "citations_count": 12
  },

  "translation": {
    "completed_at": null,
    "chunk_count": null,
    "validation": {
      "passed": null,
      "checked_at": null
    }
  },

  "audio": { ... },
  "video": { ... },

  "youtube": {
    "video_id": null,
    "visibility": null,
    "thumbnail_set": false
  },

  "review": {
    "approved_at": null,
    "approved_by": null
  },

  "costs": {
    "research_usd": 1.23,
    "translation_usd": 0,
    "total_usd": 1.23
  },

  "notes": [
    {"timestamp": "...", "note": "State transition: SELECTING → RESEARCHING"},
    {"timestamp": "...", "note": "Deep research completed: 2500 words, 12 citations"}
  ]
}

The state file is the single source of truth. The agent reads it to know what to do. Scripts update it to record progress. Validation scripts check it to enforce requirements.

Translation Output Format

Translations are structured JSON for audio processing:

{
  "metadata": {
    "title": "On the Trinity",
    "latin_title": "De Trinitate",
    "author": "Augustine of Hippo",
    "century": "5th century",
    "total_chunks": 150,
    "estimated_duration_minutes": 45
  },
  "chunks": [
    {
      "chunk_id": 1,
      "section_type": "chapter_heading",
      "speaker": "announcer",
      "chapter_title": "Book One: The Unity of the Trinity",
      "latin": "LIBER PRIMUS",
      "english": "Book One"
    },
    {
      "chunk_id": 2,
      "section_type": "body",
      "speaker": "narrator",
      "latin": "Lecturus haec quae de Trinitate...",
      "english": "The reader of these reflections on the Trinity..."
    }
  ]
}

Speaker mapping for TTS:

"announcer" → "echo" voice (titles, headings)
"narrator" → "onyx" voice (body text)

Failures That Shaped This Architecture

The Truncated Translation

What happened: The agent translated 60% of a document, generated audio, composed video, and uploaded it to YouTube. It reported "complete" because it had performed each pipeline step.

Root cause: No validation that translation covered the source text. The agent checked that files existed, not that they were correct.

Fix: Added validate_translation.py that checks if the last translated Latin appears near the end of the source. This became the gate between VALIDATING and GENERATING_AUDIO.

The Fabricated Research

What happened: The research API failed. The agent, wanting to be helpful, wrote a "research.md" file summarizing its own knowledge. The blog post looked scholarly but contained zero citations from actual research.

Root cause: No validation that blog content came from research file. Format looked correct, but provenance was fabricated.

Fix: Added validate_blog.py that checks for distinctive phrases and citations from research.md. Added "No Bypass Policy" to agent instructions explicitly forbidding manual file creation.

The Premature Publication

What happened: The agent uploaded a video as public. It had the wrong author attribution. It was indexed by Google within hours.

Root cause: The agent could upload as public. There was no architectural barrier.

Fix: Changed upload script to only support PRIVATE visibility. Added REVIEW state with explicit fix commands. Only human-invoked review.py publish command can change visibility.

The Deleted Credentials

What happened: YouTube token showed "expired." The agent, trying to fix the problem, deleted the token file. This required manual browser re-authentication that couldn't be done headlessly.

Root cause: Agent interpreted "fixing" broadly. Deleting and regenerating works for many files, but not for OAuth tokens.

Fix: Added "Protected Files" section to agent instructions listing files that must never be deleted, with explanations of why.

The Infinite Wait

What happened: Video encoding started. The agent waited. And waited. Context grew. API costs accumulated. Eventually the session timed out. On restart, the agent had no memory of the encoding job and started a new one.

Root cause: No concept of "operations that take longer than a session."

Fix: Created AWAITING_VIDEO parking state. Agent exits after starting encoding. Next session checks if video is complete. If not, exits again. No waiting, no context accumulation.

Cost Model

Monthly budget: ~$50

Per-project costs:

Deep research: $0.50-2.00 (depends on response length)
Translation: $1-5 per 10 minutes of audio
TTS audio: $0.50-2.00 per 10 minutes
DALL-E cover: $0.04
Total: $3-10 for a typical 10-30 minute text

Budget enforcement:

Pre-flight estimate before translation
Hard stop if cost > $15 (requires explicit approval)
Cost tracking in project state and aggregate file

Technology Stack

Component	Technology
Orchestration	Claude Code (CLI)
State management	JSON files
Research	OpenRouter API (deep research models)
Translation	Claude API (via OpenRouter)
Audio synthesis	OpenAI TTS
Video composition	FFmpeg
Publishing	YouTube Data API v3, Archive.org S3 API
Blog	Jekyll (GitHub Pages)

The interesting part isn't the API integrations—it's how they're orchestrated through the state machine.

For the patterns behind these design decisions, see PATTERNS.md.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

System Architecture

System Overview

State Machine Design

Why a State Machine?

State Definitions

Transition Requirements

Key Design Decisions

AWAITING_VIDEO: The Parking State

PRIVATE-First Publishing

Quality Gates

Data Model

Project State

Translation Output Format

Failures That Shaped This Architecture

The Truncated Translation

The Fabricated Research

The Premature Publication

The Deleted Credentials

The Infinite Wait

Cost Model

Technology Stack

FilesExpand file tree

ARCHITECTURE.md

Latest commit

History

ARCHITECTURE.md

File metadata and controls

System Architecture

System Overview

State Machine Design

Why a State Machine?

State Definitions

Transition Requirements

Key Design Decisions

AWAITING_VIDEO: The Parking State

PRIVATE-First Publishing

Quality Gates

Data Model

Project State

Translation Output Format

Failures That Shaped This Architecture

The Truncated Translation

The Fabricated Research

The Premature Publication

The Deleted Credentials

The Infinite Wait

Cost Model

Technology Stack