Release 1.3.0a1#109
Open
github-actions[bot] wants to merge 79 commits into
Open
Conversation
…elines feat: add NEBULENTO_PIPELINE and PALAVREADO_PIPELINE stage groups
…ms (#55) ConfidenceMatcherPipeline plugins (nebulento, palavreado, padacioso, padatious, adapt, …) all reproduce roughly the same end-to-end test boilerplate: spin up MiniCroft pinned to one pipeline, mutate Configuration()["intents"][config_key], emit utterances, capture either the dispatched intent Message or complete_intent_failure, then restore. This change extracts that shape into ovoscope so a plugin author can focus on engine-specific behaviour. New module ovoscope/e2e.py: - E2EPipelineHarness: unittest.TestCase base. Subclasses declare PIPELINE_ID, CONFIG_KEY, PLUGIN_CONFIG, SKILL_ID and inherit send_and_capture / expect_no_match / make_utterance, plus Configuration save+restore and per-test skill detach. - Standalone bus helpers (no MiniCroft required): make_session, make_utterance_message, wait_for_match, wait_for_failure. - Engine-family registration shims: register_padatious_intent / register_padatious_entity (padatious, padacioso, nebulento, ...) register_adapt_vocab / register_adapt_intent (adapt, palavreado, ...) detach_intent / detach_skill (generic). All names are re-exported from the top-level `ovoscope` package. A focused unit-test module exercises every helper against a FakeBus in well under a second; no MiniCroft startup is required. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
…#58) Adds ovoscope.intent_cases: skill authors describe expected intent routing in plain-text files under test/end2end/cases/<lang>/<IntentName>.intent.test (positive) test/end2end/cases/<lang>/no_match.test (negative) One utterance per line, '#' comments / blank lines ignored. Adding a phrase, intent, or whole new language is a pure text edit; no Python. API: - load_intent_cases(cases_dir) -> [IntentCase] - assert_intent_case(minicroft, skill_id, handlers, case, pipeline) - register_intent_case_tests(globals(), skill_id=..., handlers=..., cases_dir=...) — one call in a test module generates TestCase classes for Padatious / Padacioso / M2V / DefaultPipeline, each with one method per (lang, utterance). A test passes if any tier of the pipeline family routes the utterance correctly, matching production cascade behaviour. Pytest plugin adds an intent-case accuracy reporter: --ovoscope-accuracy-report=PATH write JSON pivot --ovoscope-accuracy-min=RATIO fail session if overall < --ovoscope-accuracy-baseline=PATH fail session if accuracy drops vs a previous report --ovoscope-accuracy-tolerant downgrade individual case fails to xfail; only the aggregate gate can block the run. The terminal summary prints a per-(pipeline, lang, intent) pivot — easy to wire into CI as a regression gate that blocks PRs lowering routing accuracy. Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
… deterministic m2v warmup (#60) * feat(intent-cases): markdown reporter, baseline diff, auto-discovery Three follow-up improvements to the intent-case test framework: 1. **Markdown report (--ovoscope-accuracy-md=PATH)** Render the per-(pipeline, lang, intent) pivot as Markdown with collapsible sections. Drops into the gh-automations PR-comment workflow as a new '🎯 Intent-Case Accuracy' section alongside the existing skill-tests and bus-coverage panels. Also surfaces a 'Hardest utterances' table (top-N by cross-pipeline pass rate) so reviewers can see which phrasings need locale tuning. 2. **Structural baseline diff** Replace the scalar pass-rate baseline gate with a full diff: identifies which (pipeline, lang, intent, utterance) cases regressed (was-pass -> now-fail) vs recovered. The PR comment now lists the regressed cases verbatim, and the session fails if any regression is detected. JSON output includes a baseline_diff block for downstream tooling. 3. **Auto-discovery via conftest** Skills can now opt in to intent-case tests by declaring a single dict in test/end2end/conftest.py: ovoscope_intent_cases = dict( skill_id='my-skill.author', handlers={...}, ) The pytest plugin walks loaded conftest modules at configure time and calls register_intent_case_tests() automatically. The explicit API stays supported and unchanged. Plus a deterministic m2v warm-up: instead of time.sleep(10), wait for the burst of padatious:register_intent events to settle (quiet-window heuristic) then pad for the 3 s in-plugin debounce. Falls back to a sleep if the bus introspection fails for any reason. 7 new unit tests cover the loader, summary, baseline diff, and markdown emitter — no live MiniCroft required, runs in <1 s. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * fix(intent-cases): auto-discovery via pytest_pycollect_makemodule The previous auto-discovery used pytest_configure to walk sys.modules for conftest.py files — but pytest doesn't collect tests from conftest.py, and conftests aren't loaded yet at pytest_configure time either. Switched to a pytest_pycollect_makemodule hookwrapper that fires once per candidate test module: if the module declares a top-level 'ovoscope_intent_cases' dict, the helper injects the generated TestCase classes into its namespace before pytest's standard Python-class collector walks it. Result: a skill's complete intent-case wiring is now this 3-line file: # test/end2end/test_intents.py ovoscope_intent_cases = dict( skill_id='my-skill.author', handlers={'WhoAreYou.intent': 'MySkill.handle_who', ...}, ) Verified end-to-end: - 12 tests collected from a 3-line shim in 0.14s - TestM2V slice ran live (1 XPASS, 2 XFAIL matching the known m2v-misroutes-'who are you' divergence canary) in 36 s - Markdown / JSON / accuracy gate all fired correctly Explicit register_intent_case_tests(globals(), ...) continues to work unchanged. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
* fix(pipeline-harness): defer _SinkSkill bus subscription via property PipelineHarness.__enter__ constructs _SinkSkill(bus=None) and assigns the real bus only after MiniCroft is created. _SinkSkill.__init__ was unconditionally calling bus.on(...) on the None bus, crashing with AttributeError before MiniCroft could be built — so PipelineHarness was unusable in any context. Move the subscription into a bus property setter so: - _SinkSkill(bus=None) is safe - assigning a real bus after construction registers handlers - rebinding to a new bus detaches the old subscriptions first Adds regression tests covering all four paths. * refactor(pipeline-harness): default _SinkSkill bus to FakeBus, forbid None Per review feedback: rather than special-casing bus=None, always have a real bus. _SinkSkill now constructs a FakeBus by default when no bus is supplied; setting bus=None after construction raises ValueError. PipelineHarness drops the explicit bus=None / late-rebind dance — it constructs _SinkSkill() with the default FakeBus and rebinds to MiniCroft's real bus in __enter__.
* feat(phal): add plugin_factories to MiniPHAL and PHALTest Plugins that register bus handlers in __init__ must be constructed with the harness FakeBus, not with a pre-existing bus. The new plugin_factories parameter accepts callables (bus) -> plugin that are invoked during __enter__, ensuring the plugin is always wired to the MiniPHAL bus. Also fixes the deprecated ovos_utils.messagebus import to use ovos_bus_client.message.Message directly. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com> * docs: NLnet/NGI0 funding attribution --------- Co-authored-by: Claude Fable 5 <noreply@anthropic.com>
…67) Add file-driven, in-process harnesses for the three OVOS listener services, each wiring the real service to a FakeBus with mock mic/VAD/STT/wake-word plugins and capturing the recognizer_loop:* bus sequence: - MiniVoiceLoop (ovos-dinkum-listener): feed_chunks drives _detect_ww for the wake-word / verifier-chain gate (closes #64); feed_file runs the full DinkumVoiceLoop.run() state machine over an audio file. - MiniSimpleListener (ovos-simple-listener): drives the SimpleListener loop over an audio file with the canonical bus callbacks. - MiniClassicListener (mycroft-classic-listener): RecognizerLoop event->FakeBus bridge plus a best-effort file-driven harness. Shared ListenerHarness base provides bus capture and the assert_record_begin / assert_wakeword_detected / assert_wakeword_suppressed / assert_utterance helpers. MockFileMicrophone and MockStreamingSTT are shared across backends. Adds tests (gated on each optional listener dependency) and docs/voice-loop.md. Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Use the canonical funding block (developer + funder + correct NGI0 Commons Fund banner) across the repo, replacing divergent ad-hoc credit notes.
…ytest>=8 compat) (#73) * fix: pytest_pycollect_makemodule hook signature for pytest>=8 (drop removed 'path' arg) * fix: declare pytest>=8 as a core dependency ovoscope registers a pytest11 plugin, so pytest is a runtime dependency, not just a test extra. Pin >=8: the pytest_pycollect_makemodule hook dropped the 'path' arg in pytest 8 (the bug this PR fixes).
… from_message skill_ids) (#85)
…rs (#86) * feat: stream audio frames through MiniListener for multi-frame decoders Add ``MiniListener.feed_audio_stream`` which feeds a sequence of audio frames in order and aggregates every message emitted across the whole stream, instead of clearing the capture buffer per call. This is required to test transformers whose decoder only fires after accumulating many frames (e.g. ggwave data-over-sound). - ``ListenerTest`` gains ``feed_method="feed_audio_stream"`` + ``chunk_size`` - document the real-ggwave streaming pattern in docs/listener.md - add unit tests using a stub accumulating transformer (no native deps) Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * ci: install ovos-dinkum-listener so streaming tests run test_listener_stream exercises the MiniListener.feed_audio_stream plugin_instances path, which needs the dinkum AudioTransformersService. Add a [listener] extra (pinned >=0.7.2a1 — the first release that allows ovos-bus-client 2.x, older pins cap it <2.0.0 and conflict with ovos-core) and install it in the build-tests and coverage workflows so the tests run instead of erroring on the missing dependency. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…#88) The pytest11 plugin used the legacy hookwrapper=True / outcome.get_result() protocol. That style is deprecated and slated for removal; pytest 9 standardizes on the wrapper=True return-style. Adopt it so the auto-loaded plugin keeps importing cleanly under pytest 8 and 9 without consumers needing -p no:ovoscope. Behavior is unchanged: the downstream collector now arrives directly from yield and is returned unchanged after intent-case auto-discovery runs. Closes #87
* feat!: audio harness on OVOS spec bus namespace Migrate the AudioServiceHarness / PlaybackServiceHarness / AudioCaptureSession to the ovos.* spec topics via SpecMessage, matching ovos-audio's spec-bus migration (PIPELINE-1 §9.6): - emit speak as SpecMessage.SPEAK (ovos.utterance.speak) - subscribe/capture SpecMessage.AUDIO_OUTPUT_STARTED / AUDIO_OUTPUT_ENDED - subscribe SpecMessage.MIC_LISTEN The harness runs on a plain FakeBus (no modernize bridging), so it must emit and observe the spec topics directly. Bump the audio/tts extras to the spec-migrated ovos-audio and add an explicit ovos-spec-tools dependency. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * fix: bump ovos-audio dev floor to 1.3.0a1 for bus-client 2.x Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * fix: lower ovos-audio audio/tts extra floors to 1.3.0a1 (highest on PyPI) Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * feat: test BOTH legacy and ovos.* bus namespaces via bridging FakeBus The audio harness migrated to the ovos.* spec topics (PR #92) while ovos-audio still emits the legacy topics; on the old non-bridging FakeBus a legacy producer never reached the spec-subscribed harness handler, so several test_audio_harness assertions (ducking, speak lifecycle, capture sequence) failed. ovos-utils #381 makes FakeBus mirror MessageBusClient's legacy<->ovos.* migration, which reconnects them — these were never genuine harness bugs. - pin ovos-utils>=0.12.0a1 (first FakeBus with namespace migration; #381) - thread modernize=/emit_legacy= through MiniCroft, AudioServiceHarness and PlaybackServiceHarness so harness users can exercise either namespace, both, or a single isolated namespace - AudioCaptureSession captures BOTH the legacy and ovos.* audio topics: it observes the raw "message" wire stream (which carries the producer's ORIGINAL topic only — the bridge re-dispatches the counterpart as a typed event, not a second "message"), so it must list both namespaces to record either producer - add test/unittests/test_namespace_bridging.py: legacy->spec, spec->legacy, dual-subscribe dedup, two-genuine-events, and no-bridging isolation - add TestAudioHarnessNamespaceBridging to test_audio_harness.py: ducking via the bridge, ducking via the spec topic natively, single-namespace isolation, and the speak lifecycle through the bridge Stacked on ovos-utils#381 and ovos-spec-tools#26 (NamespaceTranslator); CI stays red until both publish. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * feat: dual-namespace bus tests across all core-service harnesses Extends the audio-harness namespace work (#92) to every core service so each proves its migrated bus topics travel on BOTH the legacy and the ovos.* spec namespace via FakeBus bridging. Each harness gains modernize=/emit_legacy= kwargs (threaded to FakeBus or to MiniCroft) so callers can exercise either namespace, both, or a single isolated one. Per service (harness + Test<Service>NamespaceBridging): - listener / voice_loop / simple_listener: recognizer_loop:utterance -> ovos.utterance.handle, record_begin/end -> ovos.listener.record.started/ended - classic_listener: + mycroft.awoken -> ovos.listener.awoken - e2e MiniCroft / pipeline / ocp: recognizer_loop:utterance -> ovos.utterance.handle - media (OCPPlayer): cork/duck via record + audio.output topics, spec->legacy reaches the legacy-subscribed handlers via emit_legacy - phal: no migrated topics — verifies the harness bus itself bridges Each class covers: legacy->spec bridged, spec->legacy bridged, spec native, and single-namespace isolation with both flags off. Handlers subscribe on both topics (the bridge re-dispatches the counterpart as a typed event, not a second wire 'message'). * fix: raise audio extra ovos-spec-tools floor to 0.10.0a1 The FakeBus namespace bridging (ovos-utils >=0.12.0a1) unconditionally imports NamespaceTranslator, which first ships in ovos-spec-tools 0.10.0a1. The previous >=0.9.0a1 floor could resolve to a version without it. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
Under the 9.x session shape Session.blacklisted_skills and blacklisted_intents can be None rather than an empty list, which made the final-session equality check raise TypeError: 'NoneType' object is not iterable. Coalesce to [] before set() so the assertion compares cleanly. Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
TTS.playback is a class-level attribute shared by every TTS instance in the process. The inherited TTS.__del__ chains into TTS.stop() -> TTS.playback.stop(), so when an earlier PlaybackServiceHarness's MockTTS is garbage-collected its destructor terminated whatever PlaybackThread was *currently* registered there — which by then belongs to a later, still-running harness. The victim thread had _terminated set and exited its loop, so its queued speak never played and ovos.audio.output.ended was never emitted, hanging the next speak() until timeout. GC timing made this a flaky TimeoutError that surfaced only after several harness create/destroy cycles (e.g. mid-file in a consumer's test/end2end suite). Override MockTTS.__del__ as a no-op: the harness already manages playback-thread lifecycle explicitly via PlaybackService.shutdown() on context exit, so a mock instance must never tear down the shared thread on collection. Add regression tests: a deterministic guard that fires a stale mock's destructor while a later harness owns TTS.playback and asserts the live thread is neither terminated nor unable to keep speaking, plus a many-sequential-harnesses smoke test. Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
…True) (#102) * fix: MockTTS destructor must not stop the shared playback thread TTS.playback is a class-level attribute shared by every TTS instance in the process. The inherited TTS.__del__ chains into TTS.stop() -> TTS.playback.stop(), so when an earlier PlaybackServiceHarness's MockTTS is garbage-collected its destructor terminated whatever PlaybackThread was *currently* registered there — which by then belongs to a later, still-running harness. The victim thread had _terminated set and exited its loop, so its queued speak never played and ovos.audio.output.ended was never emitted, hanging the next speak() until timeout. GC timing made this a flaky TimeoutError that surfaced only after several harness create/destroy cycles (e.g. mid-file in a consumer's test/end2end suite). Override MockTTS.__del__ as a no-op: the harness already manages playback-thread lifecycle explicitly via PlaybackService.shutdown() on context exit, so a mock instance must never tear down the shared thread on collection. Add regression tests: a deterministic guard that fires a stale mock's destructor while a later harness owns TTS.playback and asserts the live thread is neither terminated nor unable to keep speaking, plus a many-sequential-harnesses smoke test. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * feat: MockTTS — emit audio_output_end on delay for speak_dialog(wait=True) Skills calling speak_dialog(..., wait=True) block on recognizer_loop:audio_output_end via SessionManager.wait_while_speaking. Without a real TTS the handler thread blocks for 15+s, tripping the §8.3 10s handler backstop and spurious handler.error. MockTTS schedules audio_output_end on a 0.1s Timer from the speak handler. Uses bus.ee.emit (not bus.emit) to bypass FakeBus namespace-migration and on_message side effects so the synthetic event is invisible to test captures. * chore: drop agent scratch (AGENTS.md, TODO.md) from the PR Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
… session reset) (#104) Expands the comment to explain that audio_output_end is a synthetic hardware event, so it must bypass FakeBus.on_message's session rebuild/SessionManager.update (which mirrors the real MessageBusClient and would clobber transient is_speaking/active_skills). Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
…106) recognizer_loop:audio_output_end is a real bus message in production (emitted by the audio service), so the harness now publishes it the same way — plain bus.emit through on_message — instead of bus.ee.emit, which bypassed namespace migration and the capture path. This is now safe and correct because ovos-bus-client's SessionManager keeps one live Session per id (mutates in place, no wholesale replace), so routing through on_message no longer clobbers transient session state; handle_audio_output_end flips is_speaking=False on the shared singleton. Capture position is faithful to a real deployment: with speak(wait=False) the handler emits its end-marker first, so audio_output_end lands after the EOF and is not captured; with speak(wait=True) the handler blocks in wait_while_speaking until it arrives, so it deterministically precedes the end-marker and is captured — as any real bus observer would record it. Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
…audio_output_end (#108) * feat: MockTTS publishes audio_output_end via the full bus (faithful) recognizer_loop:audio_output_end is a real bus message in production (emitted by the audio service), so the harness now publishes it the same way — plain bus.emit through on_message — instead of bus.ee.emit, which bypassed namespace migration and the capture path. This is now safe and correct because ovos-bus-client's SessionManager keeps one live Session per id (mutates in place, no wholesale replace), so routing through on_message no longer clobbers transient session state; handle_audio_output_end flips is_speaking=False on the shared singleton. Capture position is faithful to a real deployment: with speak(wait=False) the handler emits its end-marker first, so audio_output_end lands after the EOF and is not captured; with speak(wait=True) the handler blocks in wait_while_speaking until it arrives, so it deterministically precedes the end-marker and is captured — as any real bus observer would record it. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * feat: emit audio_output_start in _mock_tts alongside audio_output_end The TTS mock previously only emitted recognizer_loop:audio_output_end (unduck) after a delay, missing the recognizer_loop:audio_output_start (duck) that a real TTS playback emits when speech begins. Now _mock_tts emits audio_output_start synchronously on speak (TTS begins) and audio_output_end after 100ms (TTS finishes), properly simulating the full duck/unduck lifecycle. * Refactor TTS playback message handling * fix: update ovoscope unit tests to ignore audio_output_start/end signals The _mock_tts TTS mock now emits both recognizer_loop:audio_output_start (synchronous duck) and recognizer_loop:audio_output_end (delayed unduck) for every speak. Add both topics to HANDLER_LIFECYCLE ignore lists so unit tests that count messages and test routing don't see them. Also propagate source/destination from the original speak message into the mock's emitted context so routing assertions (source/dst matching) still pass. --------- Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Human review requested!