Prompt orchestration tool for building agent-traversable knowledge codices. The prompts (prompts/00-06) are the core product — Claude Code reads and executes them sequentially in target projects.
This repo is the meta-tool, not a codex itself. It doesn't get deployed — it gets pointed at.
Phases run sequentially: 00-init → 01-gather → 02-plan → 03-transcribe → 04-verify → 05-map → 06-review. Each phase reads outputs from prior phases and produces inputs for the next.
Key files in a target project (not this repo):
codex-config.yaml— single source of truth for all settingscodex-state.yaml— resumable progress tracker (per-document status)manifest.yaml— section-level index with line numbers for agent retrievaldocs/{corpus}/— transcribed markdown with YAML frontmatter
- YAML everywhere — configs, state, manifest, chunking plans, frontmatter
- Frontmatter is required on all transcribed docs — fields defined in
codex-config.yaml#frontmatter_schema - Topics evolve organically — agents tag freely during transcription (phase 03), vocabulary is normalized in phase 05
validate.shis dependency-free — bash/awk/grep only, no Python or jq
Prompts reference each other's outputs. Changing a field name or output format in one phase can break downstream phases. Check the full chain before modifying.
Each prompt follows a consistent structure: Purpose → Prerequisites → Instructions (steps) → State Update → Completion Criteria → Error Handling.
Python 3.10+ with PyMuPDF and pyyaml. Both scripts are helpers for phases 02 and 04:
pdf-toc-extract.py— extracts PDF bookmark trees for structure planningverify-transcription.py— automated accuracy checks against source PDFs
examples/rekordbox-codex-config.yaml is a real config. Keep it consistent with the template schema when modifying templates/codex-config.yaml.