This repository contains a small, personal proof-of-concept for a Voice Robustness Lab.
The goal is to experiment with a helper that can:
- Take voice utterance transcripts (and light metadata)
- Exercise prompt + response patterns under varied conditions (noise, phrasing, intent)
- Summarize robustness issues (e.g., ambiguous intents, brittle prompts, mis-routes)
- Append a minimal evidence record for each test run
This is a personal R&D prototype, not a production voice/IVR system.
- Python 3.8+
- pip
pip install -e ".[dev]"Or install dependencies directly:
pip install click pyyaml pytest# Text report (default)
python -m src.cli.main run --test-set fixtures/balance_inquiry.yaml
# JSON report
python -m src.cli.main run --test-set fixtures/balance_inquiry.yaml --format json
# Use the JSON fixture
python -m src.cli.main run --test-set fixtures/balance_inquiry.json --format json
# With a custom evidence log path and note
python -m src.cli.main run \
--test-set fixtures/support_request.yaml \
--log runs/my-evidence.jsonl \
--note "experiment-1"
# Use the LLM classifier (requires LLM_API_KEY env var; falls back to ambiguous)
python -m src.cli.main run \
--test-set fixtures/balance_inquiry.yaml \
--classifier llm
# Verbose mode (debug logging to stderr)
python -m src.cli.main -v run --test-set fixtures/balance_inquiry.yamlUsage: python -m src.cli.main run [OPTIONS]
Options:
--test-set PATH Path to a voice test set file (YAML or JSON). [required]
--classifier [rule|llm] Classifier backend to use. [default: rule]
--log PATH Path to the evidence log file (JSONL). [default: runs/evidence.log.jsonl]
--format [text|json] Report output format. [default: text]
--note TEXT Optional note to attach to the evidence log entry.
--help Show this message and exit.
python -m pytest tests/ -vsrc/
models.py - Data models (Utterance, VoiceTestSet, ClassificationResult, etc.)
classifier/
base.py - Classifier protocol + outcome determination
rule_based.py - Keyword-matching classifier stub
llm_adapter.py - LLM classifier scaffold (requires API key)
runner/
loader.py - YAML/JSON test set loader with validation
runner.py - Test set runner (iterates utterances through classifier)
report/
reporter.py - Report aggregation + JSON/text rendering
evidence/
logger.py - JSONL evidence log writer
cli/
main.py - Click CLI entry point
fixtures/ - Sample voice test case files
tests/ - pytest test suite
runs/ - Evidence log output directory
- Load a voice test set (YAML or JSON) describing utterances and an expected intent label.
- Classify each utterance using a pluggable classifier (rule-based keyword matching by default).
- Determine outcome per utterance:
pass(correct),fail(wrong label), orambiguous(low confidence / no match). - Generate a report with aggregate stats (total, passes, fails, ambiguous, pass rate) and highlighted failures.
- Append an evidence entry (JSONL) with timestamp, test set ID, and counts for tracking over time.
- End-to-end telephony, ASR, or TTS integration
- Real-time audio capture or streaming
- Full NLU engine or production routing logic
- Initial specification (
SPEC.md) - Minimal flow: voice test set -> prompts/queries -> robustness report
- Evidence log of test runs
- Basic CLI
- Run instructions in README
See SPEC.md for the full specification.