voice-os

Six-axis voice scoring. Dual-persona routing. QA gates that enforce stylistic fidelity before a draft ever reaches a human. Built on Claude. Calibrated on 6.9M+ words of executive communication.

This is the engine behind the Voice DNA RAG pipeline I shipped at Google xGE — a system that functions as a digital twin for VP-level communications. It cut drafting latency by 90% and holds 99% stylistic fidelity across production volume.

What it does

Ingests a voice corpus, builds a scored representation across six stylistic axes, and routes drafts through dual personas (generative + adversarial) before a QA gate decides whether output clears or cycles back. The result is drafts that sound like the person they're supposed to sound like — not a generic LLM.

Why it matters

Most "voice matching" is prompt engineering with a few examples. This is a calibrated scoring system. The six axes catch what vibes-based prompting misses: rhetorical pace, risk tolerance, sentence rhythm, escalation pattern, hedging behavior, and editorial register. The kill list — a curated set of rejected drafts — teaches the system what the voice refuses to do, not just what it does.

Quick Start

# Clone the repo
git clone https://github.com/mitwilli-create/voice-os.git
cd voice-os

# Install dependencies
pip install -r requirements.txt

# Set your API key
export ANTHROPIC_API_KEY=your_key_here

# Run a scoring pass against the sample corpus
python score.py --corpus data/sample_corpus.txt --draft data/sample_draft.txt

# Run the full dual-persona pipeline with QA gate
python pipeline.py \
  --corpus data/sample_corpus.txt \
  --kill-list data/kill_list.txt \
  --draft data/sample_draft.txt \
  --output output/scored_draft.json

Output includes axis scores, persona deltas, QA gate decision (pass / cycle), and a revision trace.

Architecture

Layer	Function
Corpus ingestion	Chunks and embeds voice corpus; builds axis score baseline
Six-axis scorer	Evaluates drafts against baseline across six stylistic dimensions
Dual-persona router	Generative persona drafts; adversarial persona stress-tests fidelity
QA gate	Blocks output below threshold; returns structured revision signal
Kill list enforcement	Flags patterns the voice explicitly rejects

What this demonstrates

Production RAG design — not a demo, a system that ran at VP scale inside Google
Evaluation rigor — quantified fidelity scoring, not vibes
Agentic architecture — multi-step pipeline with conditional routing and gate logic
Domain depth — a decade in newsrooms and eight years at Google built the editorial judgment that makes the scoring axes meaningful

Status

Pipeline architecture and scoring logic are documented here. Core corpus and VP-identity data are not included — that's proprietary. Sample data is synthetic but structurally representative.

CI/evaluation harness: in progress.

Built with

Claude (Anthropic) — generation and adversarial persona
Python
Custom embedding + scoring layer

Mitchell Williams · LinkedIn · GitHub · thestorytellermitch.com

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
docs		docs
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

voice-os

What it does

Why it matters

Quick Start

Architecture

What this demonstrates

Status

Built with

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

voice-os

What it does

Why it matters

Quick Start

Architecture

What this demonstrates

Status

Built with

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Packages