feat: TTS end-to-end intelligibility harness#75
Conversation
Add ovoscope/tts_intelligibility.py: synthesise speech with a TTS plugin under test, transcribe the rendered audio back with a reference STT (faster-whisper tiny), and score the round-trip with WER/CER. - score_tts_intelligibility() + TTSIntelligibilityHarness context manager returning an IntelligibilityReport (per-utterance UtteranceScore, mean WER/CER, to_dict/to_markdown_row). - mode="playback" drives the full ovos-audio stack and captures the rendered WAV via a play_audio side_effect; mode="direct" calls tts.get_tts directly. - Extend PlaybackServiceHarness with a tts= arg (default MockTTS, backward compatible) and a captured_wavs list. - Add [tts] optional extra; graceful optional import in __init__. - Unit tests (MockTTS + MockSTT, no model download) cover WER/CER math, report aggregation, playback wav capture, and graceful import. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
📝 WalkthroughWalkthroughAdds a new ChangesTTS Intelligibility Scoring
Sequence Diagram(s)sequenceDiagram
participant Caller
participant TTSIntelligibilityHarness
participant PlaybackServiceHarness
participant TTS
participant ReferenceSTT
Caller->>TTSIntelligibilityHarness: score(utterances)
rect rgba(100, 149, 237, 0.5)
Note over TTSIntelligibilityHarness,PlaybackServiceHarness: playback mode
TTSIntelligibilityHarness->>PlaybackServiceHarness: __enter__
TTSIntelligibilityHarness->>TTS: speak(utterance)
PlaybackServiceHarness-->>TTSIntelligibilityHarness: captured_wavs[0]
end
rect rgba(60, 179, 113, 0.5)
Note over TTSIntelligibilityHarness,TTS: direct mode
TTSIntelligibilityHarness->>TTS: get_tts(utterance, tmp_wav_path)
TTS-->>TTSIntelligibilityHarness: wav bytes
end
TTSIntelligibilityHarness->>ReferenceSTT: execute(wav_bytes, lang)
ReferenceSTT-->>TTSIntelligibilityHarness: transcript
TTSIntelligibilityHarness->>TTSIntelligibilityHarness: _normalize(reference) + _normalize(transcript)
TTSIntelligibilityHarness->>TTSIntelligibilityHarness: _score_pair → UtteranceScore
TTSIntelligibilityHarness-->>Caller: IntelligibilityReport
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Possibly related PRs
Suggested labels
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 4
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@ovoscope/tts_intelligibility.py`:
- Around line 307-311: The WAV file path generation in the direct mode TTS
method uses a hash-based filename that can suffer from collisions and cause the
same path to be reused for repeated utterances, overwriting previous audio files
and corrupting earlier UtteranceScore.wav_path references. Replace the
hash-based filename (the f-string that constructs the filename using
abs(hash(utterance)) & 0xffffffff) with a unique identifier per call, such as a
UUID generated using Python's uuid module, to ensure each utterance gets a
distinct and non-colliding output file path.
- Around line 273-279: The __enter__ method creates a temporary directory via
tempfile.mkdtemp() but if the subsequent PlaybackServiceHarness initialization
or its __enter__ call raises an exception, the __exit__ method is never invoked
and the temp directory is left behind. Wrap the playback harness setup code (the
PlaybackServiceHarness instantiation and __enter__ call) in a try-except block,
and in the except handler, clean up the temporary directory using
shutil.rmtree(self._tmpdir) before re-raising the exception to ensure the temp
directory is always removed on failed context entry.
In `@test/unittests/test_tts_intelligibility.py`:
- Around line 164-177: The test's simulation of the missing [tts] extra is
incomplete. First, add the module ovos_audio to the BLOCKED set (which currently
contains jiwer, ovos_utterance_normalizer, ovos_stt_plugin_fasterwhisper, and
faster_whisper) since ovos_audio is part of the [tts] extra. Second, strengthen
the assertion that validates the harness is absent by checking not just for the
TTSIntelligibilityHarness symbol but for multiple symbols from the TTS export
surface to ensure the entire optional surface is properly blocked when the [tts]
extra is unavailable.
- Around line 180-183: Add a timeout parameter to the subprocess.run() call in
the test to prevent indefinite blocking if import resolution stalls. Include the
timeout argument alongside the existing capture_output and text parameters in
the subprocess.run() invocation to ensure the test has a defined execution
boundary and won't hang CI.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 12b23a44-e14b-4333-b799-221afef2af3c
📒 Files selected for processing (5)
ovoscope/__init__.pyovoscope/audio.pyovoscope/tts_intelligibility.pypyproject.tomltest/unittests/test_tts_intelligibility.py
| def __enter__(self) -> "TTSIntelligibilityHarness": | ||
| self._tmpdir = tempfile.mkdtemp(prefix="ovoscope-tts-") | ||
| if self.mode == "playback": | ||
| from ovoscope.audio import PlaybackServiceHarness | ||
| self._playback = PlaybackServiceHarness(tts=self.tts) | ||
| self._playback.__enter__() | ||
| return self |
There was a problem hiding this comment.
Ensure temp-directory cleanup when context entry fails.
If playback harness setup raises during __enter__, the temp directory is left behind because __exit__ is never reached on failed context entry.
Proposed fix
@@
def __enter__(self) -> "TTSIntelligibilityHarness":
self._tmpdir = tempfile.mkdtemp(prefix="ovoscope-tts-")
- if self.mode == "playback":
- from ovoscope.audio import PlaybackServiceHarness
- self._playback = PlaybackServiceHarness(tts=self.tts)
- self._playback.__enter__()
- return self
+ try:
+ if self.mode == "playback":
+ from ovoscope.audio import PlaybackServiceHarness
+ self._playback = PlaybackServiceHarness(tts=self.tts)
+ self._playback.__enter__()
+ return self
+ except Exception:
+ if self._tmpdir and os.path.isdir(self._tmpdir):
+ shutil.rmtree(self._tmpdir, ignore_errors=True)
+ self._tmpdir = None
+ raise📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| def __enter__(self) -> "TTSIntelligibilityHarness": | |
| self._tmpdir = tempfile.mkdtemp(prefix="ovoscope-tts-") | |
| if self.mode == "playback": | |
| from ovoscope.audio import PlaybackServiceHarness | |
| self._playback = PlaybackServiceHarness(tts=self.tts) | |
| self._playback.__enter__() | |
| return self | |
| def __enter__(self) -> "TTSIntelligibilityHarness": | |
| self._tmpdir = tempfile.mkdtemp(prefix="ovoscope-tts-") | |
| try: | |
| if self.mode == "playback": | |
| from ovoscope.audio import PlaybackServiceHarness | |
| self._playback = PlaybackServiceHarness(tts=self.tts) | |
| self._playback.__enter__() | |
| return self | |
| except Exception: | |
| if self._tmpdir and os.path.isdir(self._tmpdir): | |
| shutil.rmtree(self._tmpdir, ignore_errors=True) | |
| self._tmpdir = None | |
| raise |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@ovoscope/tts_intelligibility.py` around lines 273 - 279, The __enter__ method
creates a temporary directory via tempfile.mkdtemp() but if the subsequent
PlaybackServiceHarness initialization or its __enter__ call raises an exception,
the __exit__ method is never invoked and the temp directory is left behind. Wrap
the playback harness setup code (the PlaybackServiceHarness instantiation and
__enter__ call) in a try-except block, and in the except handler, clean up the
temporary directory using shutil.rmtree(self._tmpdir) before re-raising the
exception to ensure the temp directory is always removed on failed context
entry.
| wav_path = os.path.join( | ||
| self._tmpdir, f"direct_{abs(hash(utterance)) & 0xffffffff}.wav" | ||
| ) | ||
| self.tts.get_tts(utterance, wav_path, lang=self.lang, voice=self.voice) | ||
| return wav_path if os.path.isfile(wav_path) else None |
There was a problem hiding this comment.
Use a unique output WAV path per utterance in direct mode.
The current hash-based filename can reuse the same path for repeated utterances (and collide across different utterances), so earlier UtteranceScore.wav_path entries can point to overwritten audio artifacts.
Proposed fix
@@
import dataclasses
import os
import re
import shutil
import string
import tempfile
import threading
+import uuid
@@
def _render_direct(self, utterance: str) -> Optional[str]:
"""Synthesise via ``tts.get_tts`` directly; return the WAV path."""
wav_path = os.path.join(
- self._tmpdir, f"direct_{abs(hash(utterance)) & 0xffffffff}.wav"
+ self._tmpdir, f"direct_{uuid.uuid4().hex}.wav"
)
self.tts.get_tts(utterance, wav_path, lang=self.lang, voice=self.voice)
return wav_path if os.path.isfile(wav_path) else None📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| wav_path = os.path.join( | |
| self._tmpdir, f"direct_{abs(hash(utterance)) & 0xffffffff}.wav" | |
| ) | |
| self.tts.get_tts(utterance, wav_path, lang=self.lang, voice=self.voice) | |
| return wav_path if os.path.isfile(wav_path) else None | |
| wav_path = os.path.join( | |
| self._tmpdir, f"direct_{uuid.uuid4().hex}.wav" | |
| ) | |
| self.tts.get_tts(utterance, wav_path, lang=self.lang, voice=self.voice) | |
| return wav_path if os.path.isfile(wav_path) else None |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@ovoscope/tts_intelligibility.py` around lines 307 - 311, The WAV file path
generation in the direct mode TTS method uses a hash-based filename that can
suffer from collisions and cause the same path to be reused for repeated
utterances, overwriting previous audio files and corrupting earlier
UtteranceScore.wav_path references. Replace the hash-based filename (the
f-string that constructs the filename using abs(hash(utterance)) & 0xffffffff)
with a unique identifier per call, such as a UUID generated using Python's uuid
module, to ensure each utterance gets a distinct and non-colliding output file
path.
| "BLOCKED = {'jiwer', 'ovos_utterance_normalizer', " | ||
| "'ovos_stt_plugin_fasterwhisper', 'faster_whisper'}\n" | ||
| "class _Block(importlib.abc.MetaPathFinder):\n" | ||
| " def find_spec(self, name, path, target=None):\n" | ||
| " if name.split('.')[0] in BLOCKED:\n" | ||
| " raise ModuleNotFoundError(name=name.split('.')[0])\n" | ||
| " return None\n" | ||
| "sys.meta_path.insert(0, _Block())\n" | ||
| "for m in list(sys.modules):\n" | ||
| " if m.split('.')[0] in BLOCKED:\n" | ||
| " del sys.modules[m]\n" | ||
| "import ovoscope\n" | ||
| "assert not hasattr(ovoscope, 'TTSIntelligibilityHarness'), " | ||
| "'harness should be absent without the tts extra'\n" |
There was a problem hiding this comment.
Strengthen the “no [tts] extra” simulation and export-surface assertion.
Line 164 omits ovos_audio from BLOCKED even though [tts] includes it, and Line 176 validates only one symbol. This can pass while still leaking part of the optional surface.
Suggested test hardening
- "BLOCKED = {'jiwer', 'ovos_utterance_normalizer', "
- "'ovos_stt_plugin_fasterwhisper', 'faster_whisper'}\n"
+ "BLOCKED = {'jiwer', 'ovos_audio', 'ovos_utterance_normalizer', "
+ "'ovos_stt_plugin_fasterwhisper', 'faster_whisper'}\n"
@@
- "assert not hasattr(ovoscope, 'TTSIntelligibilityHarness'), "
- "'harness should be absent without the tts extra'\n"
+ "for sym in ('TTSIntelligibilityHarness', 'score_tts_intelligibility', "
+ "'IntelligibilityReport', 'UtteranceScore'):\n"
+ " assert not hasattr(ovoscope, sym), "
+ "'tts symbols should be absent without the tts extra'\n"📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| "BLOCKED = {'jiwer', 'ovos_utterance_normalizer', " | |
| "'ovos_stt_plugin_fasterwhisper', 'faster_whisper'}\n" | |
| "class _Block(importlib.abc.MetaPathFinder):\n" | |
| " def find_spec(self, name, path, target=None):\n" | |
| " if name.split('.')[0] in BLOCKED:\n" | |
| " raise ModuleNotFoundError(name=name.split('.')[0])\n" | |
| " return None\n" | |
| "sys.meta_path.insert(0, _Block())\n" | |
| "for m in list(sys.modules):\n" | |
| " if m.split('.')[0] in BLOCKED:\n" | |
| " del sys.modules[m]\n" | |
| "import ovoscope\n" | |
| "assert not hasattr(ovoscope, 'TTSIntelligibilityHarness'), " | |
| "'harness should be absent without the tts extra'\n" | |
| "BLOCKED = {'jiwer', 'ovos_audio', 'ovos_utterance_normalizer', " | |
| "'ovos_stt_plugin_fasterwhisper', 'faster_whisper'}\n" | |
| "class _Block(importlib.abc.MetaPathFinder):\n" | |
| " def find_spec(self, name, path, target=None):\n" | |
| " if name.split('.')[0] in BLOCKED:\n" | |
| " raise ModuleNotFoundError(name=name.split('.')[0])\n" | |
| " return None\n" | |
| "sys.meta_path.insert(0, _Block())\n" | |
| "for m in list(sys.modules):\n" | |
| " if m.split('.')[0] in BLOCKED:\n" | |
| " del sys.modules[m]\n" | |
| "import ovoscope\n" | |
| "for sym in ('TTSIntelligibilityHarness', 'score_tts_intelligibility', " | |
| "'IntelligibilityReport', 'UtteranceScore'):\n" | |
| " assert not hasattr(ovoscope, sym), " | |
| "'tts symbols should be absent without the tts extra'\n" |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@test/unittests/test_tts_intelligibility.py` around lines 164 - 177, The
test's simulation of the missing [tts] extra is incomplete. First, add the
module ovos_audio to the BLOCKED set (which currently contains jiwer,
ovos_utterance_normalizer, ovos_stt_plugin_fasterwhisper, and faster_whisper)
since ovos_audio is part of the [tts] extra. Second, strengthen the assertion
that validates the harness is absent by checking not just for the
TTSIntelligibilityHarness symbol but for multiple symbols from the TTS export
surface to ensure the entire optional surface is properly blocked when the [tts]
extra is unavailable.
| result = subprocess.run( | ||
| [sys.executable, "-c", code], | ||
| capture_output=True, text=True, | ||
| ) |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Verify whether subprocess.run calls in this test file include an explicit timeout.
python - <<'PY'
import ast
from pathlib import Path
path = Path("test/unittests/test_tts_intelligibility.py")
tree = ast.parse(path.read_text())
for node in ast.walk(tree):
if isinstance(node, ast.Call) and isinstance(node.func, ast.Attribute):
if isinstance(node.func.value, ast.Name) and node.func.value.id == "subprocess" and node.func.attr == "run":
has_timeout = any(k.arg == "timeout" for k in node.keywords if k.arg is not None)
print(f"Line {node.lineno}: subprocess.run timeout={has_timeout}")
PYRepository: OpenVoiceOS/ovoscope
Length of output: 103
Add a timeout to the subprocess invocation to prevent test hanging.
The subprocess.run() call at line 180 lacks a timeout parameter. If import resolution stalls, this test can block CI indefinitely.
Minimal fix
result = subprocess.run(
[sys.executable, "-c", code],
- capture_output=True, text=True,
+ capture_output=True, text=True, timeout=30,
)📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| result = subprocess.run( | |
| [sys.executable, "-c", code], | |
| capture_output=True, text=True, | |
| ) | |
| result = subprocess.run( | |
| [sys.executable, "-c", code], | |
| capture_output=True, text=True, timeout=30, | |
| ) |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@test/unittests/test_tts_intelligibility.py` around lines 180 - 183, Add a
timeout parameter to the subprocess.run() call in the test to prevent indefinite
blocking if import resolution stalls. Include the timeout argument alongside the
existing capture_output and text parameters in the subprocess.run() invocation
to ensure the test has a defined execution boundary and won't hang CI.
Source: Linters/SAST tools
I've gathered some intelligence on your latest changes. 🕵️♀️I've aggregated the results of the automated checks for this PR below. 🔍 LintBeep boop! Standard processing sub-routine complete. 🦾 ❌ ruff: issues found — see job log 🔒 Security (pip-audit)Looking for any Trojan horses in the dependencies. 🐎 ✅ No known vulnerabilities found (72 packages scanned). 📋 Repo HealthI've checked the repo's social skills (aka issue response time). 🗣️ ✅ All required files present. Latest Version: ✅ 📊 CoverageMeasuring the safety net for your changes. 🥅 ❌ 50.8% total coverage Files below 80% coverage (16 files)
Full report: download the 🏷️ Release PreviewHere's what the next release might look like! 🚀 Current:
✅ PR title follows conventional commit format. 🚀 Release Channel Compatibility Predicted next version:
⚖️ License CheckEnsuring we're respecting the rights of others. 🤝 ✅ No license violations found. Policy: Apache 2.0 (universal donor). StrongCopyleft / NetworkCopyleft / WeakCopyleft / Other / Error categories fail. MPL allowed. 🔨 Build TestsStructural analysis of your contribution is complete. 🔬 ✅ All versions pass
Code quality is our top priority ✨ |
Summary
Adds an end-to-end TTS intelligibility harness to ovoscope: synthesise speech with a TTS plugin under test, transcribe the rendered audio back with a reference STT (faster-whisper
tiny), and score the round-trip with WER/CER. This catches regressions that file-existence unit tests miss — garbled audio, wrong sample rate, broken transforms, silent output — and gives every TTS plugin a comparable intelligibility number.What's included
ovoscope/tts_intelligibility.py—score_tts_intelligibility(tts, utterances, *, lang, voice, reference_stt, mode)+TTSIntelligibilityHarnesscontext manager, returningIntelligibilityReport(per-utteranceUtteranceScore,mean_wer/mean_cer,to_dict(),to_markdown_row()).mode="playback"(default) drives the full ovos-audio stack (speak→ PlaybackService →get_tts→tts_transform→play_audio) and captures the rendered WAV via aplay_audioside_effect, copying it out before the cache prunes it.mode="direct"callstts.get_tts()directly (no bus) for engines that hang under the playback thread or whenovos_audiois absent.AudioFile(wav).read()→AudioData→reference_stt.execute()→ normalise both sides viaovos-utterance-normalizer→jiwer.wer/cer.FasterWhisperSTT({"model":"tiny","compute_type":"int8","beam_size":1}).PlaybackServiceHarnessextended with atts=arg (defaultMockTTS(), backward compatible) and acaptured_wavslist.[tts]optional extra (ovos-audio,jiwer,ovos-utterance-normalizer,ovos-stt-plugin-fasterwhisper); same added todev. Graceful optional import in__init__.py.Reporting-only at launch
Intended to be wired report-only (per-plugin
TTS_MAX_WERdefaults to 1.0).Tests
test/unittests/test_tts_intelligibility.py— MockTTS + a fixed-transcript MockSTT (no model download): WER/CER math, normalisation, report aggregation/serialisation, both modes, playback wav capture, and a graceful-import test (coreimport ovoscopeworks without thettsextra). All 11 pass; existing audio-harness suite (38) still green. The real faster-whisper round-trip was sanity-checked locally (kept out of the committed suite to keep CI light).🤖 Generated with Claude Code
Summary by CodeRabbit
Release Notes
New Features
Chores
ttsdependency extra for intelligibility scoring functionalityTests