Skip to content

Release 1.1.0a2#105

Open
github-actions[bot] wants to merge 73 commits into
masterfrom
release-1.1.0a2
Open

Release 1.1.0a2#105
github-actions[bot] wants to merge 73 commits into
masterfrom
release-1.1.0a2

Conversation

@github-actions

Copy link
Copy Markdown

Human review requested!

JarbasAl and others added 30 commits March 14, 2026 01:12
…elines

feat: add NEBULENTO_PIPELINE and PALAVREADO_PIPELINE stage groups
…ms (#55)

ConfidenceMatcherPipeline plugins (nebulento, palavreado, padacioso,
padatious, adapt, …) all reproduce roughly the same end-to-end test
boilerplate: spin up MiniCroft pinned to one pipeline, mutate
Configuration()["intents"][config_key], emit utterances, capture either
the dispatched intent Message or complete_intent_failure, then restore.

This change extracts that shape into ovoscope so a plugin author can
focus on engine-specific behaviour.

New module ovoscope/e2e.py:

- E2EPipelineHarness: unittest.TestCase base. Subclasses declare
  PIPELINE_ID, CONFIG_KEY, PLUGIN_CONFIG, SKILL_ID and inherit
  send_and_capture / expect_no_match / make_utterance, plus
  Configuration save+restore and per-test skill detach.
- Standalone bus helpers (no MiniCroft required):
  make_session, make_utterance_message, wait_for_match, wait_for_failure.
- Engine-family registration shims:
  register_padatious_intent / register_padatious_entity
    (padatious, padacioso, nebulento, ...)
  register_adapt_vocab / register_adapt_intent
    (adapt, palavreado, ...)
  detach_intent / detach_skill (generic).

All names are re-exported from the top-level `ovoscope` package.
A focused unit-test module exercises every helper against a FakeBus
in well under a second; no MiniCroft startup is required.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
…#58)

Adds ovoscope.intent_cases: skill authors describe expected intent
routing in plain-text files under

    test/end2end/cases/<lang>/<IntentName>.intent.test    (positive)
    test/end2end/cases/<lang>/no_match.test               (negative)

One utterance per line, '#' comments / blank lines ignored. Adding a
phrase, intent, or whole new language is a pure text edit; no Python.

API:
  - load_intent_cases(cases_dir) -> [IntentCase]
  - assert_intent_case(minicroft, skill_id, handlers, case, pipeline)
  - register_intent_case_tests(globals(), skill_id=..., handlers=...,
                               cases_dir=...) — one call in a test
    module generates TestCase classes for Padatious / Padacioso / M2V /
    DefaultPipeline, each with one method per (lang, utterance).
    A test passes if any tier of the pipeline family routes the
    utterance correctly, matching production cascade behaviour.

Pytest plugin adds an intent-case accuracy reporter:
  --ovoscope-accuracy-report=PATH        write JSON pivot
  --ovoscope-accuracy-min=RATIO          fail session if overall <
  --ovoscope-accuracy-baseline=PATH      fail session if accuracy drops
                                         vs a previous report
  --ovoscope-accuracy-tolerant           downgrade individual case fails
                                         to xfail; only the aggregate
                                         gate can block the run.

The terminal summary prints a per-(pipeline, lang, intent) pivot — easy
to wire into CI as a regression gate that blocks PRs lowering routing
accuracy.

Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
… deterministic m2v warmup (#60)

* feat(intent-cases): markdown reporter, baseline diff, auto-discovery

Three follow-up improvements to the intent-case test framework:

1. **Markdown report (--ovoscope-accuracy-md=PATH)**
   Render the per-(pipeline, lang, intent) pivot as Markdown with
   collapsible sections. Drops into the gh-automations PR-comment
   workflow as a new '🎯 Intent-Case Accuracy' section alongside the
   existing skill-tests and bus-coverage panels. Also surfaces a
   'Hardest utterances' table (top-N by cross-pipeline pass rate)
   so reviewers can see which phrasings need locale tuning.

2. **Structural baseline diff**
   Replace the scalar pass-rate baseline gate with a full diff:
   identifies which (pipeline, lang, intent, utterance) cases
   regressed (was-pass -> now-fail) vs recovered. The PR comment
   now lists the regressed cases verbatim, and the session fails
   if any regression is detected. JSON output includes a
   baseline_diff block for downstream tooling.

3. **Auto-discovery via conftest**
   Skills can now opt in to intent-case tests by declaring a single
   dict in test/end2end/conftest.py:

       ovoscope_intent_cases = dict(
           skill_id='my-skill.author',
           handlers={...},
       )

   The pytest plugin walks loaded conftest modules at configure
   time and calls register_intent_case_tests() automatically.
   The explicit API stays supported and unchanged.

Plus a deterministic m2v warm-up: instead of time.sleep(10), wait for
the burst of padatious:register_intent events to settle (quiet-window
heuristic) then pad for the 3 s in-plugin debounce. Falls back to a
sleep if the bus introspection fails for any reason.

7 new unit tests cover the loader, summary, baseline diff, and
markdown emitter — no live MiniCroft required, runs in <1 s.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* fix(intent-cases): auto-discovery via pytest_pycollect_makemodule

The previous auto-discovery used pytest_configure to walk sys.modules
for conftest.py files — but pytest doesn't collect tests from
conftest.py, and conftests aren't loaded yet at pytest_configure time
either. Switched to a pytest_pycollect_makemodule hookwrapper that
fires once per candidate test module: if the module declares a
top-level 'ovoscope_intent_cases' dict, the helper injects the
generated TestCase classes into its namespace before pytest's standard
Python-class collector walks it.

Result: a skill's complete intent-case wiring is now this 3-line file:

    # test/end2end/test_intents.py
    ovoscope_intent_cases = dict(
        skill_id='my-skill.author',
        handlers={'WhoAreYou.intent': 'MySkill.handle_who', ...},
    )

Verified end-to-end:
  - 12 tests collected from a 3-line shim in 0.14s
  - TestM2V slice ran live (1 XPASS, 2 XFAIL matching the known
    m2v-misroutes-'who are you' divergence canary) in 36 s
  - Markdown / JSON / accuracy gate all fired correctly

Explicit register_intent_case_tests(globals(), ...) continues to work
unchanged.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
* fix(pipeline-harness): defer _SinkSkill bus subscription via property

PipelineHarness.__enter__ constructs _SinkSkill(bus=None) and assigns
the real bus only after MiniCroft is created. _SinkSkill.__init__ was
unconditionally calling bus.on(...) on the None bus, crashing with
AttributeError before MiniCroft could be built — so PipelineHarness
was unusable in any context.

Move the subscription into a bus property setter so:
- _SinkSkill(bus=None) is safe
- assigning a real bus after construction registers handlers
- rebinding to a new bus detaches the old subscriptions first

Adds regression tests covering all four paths.

* refactor(pipeline-harness): default _SinkSkill bus to FakeBus, forbid None

Per review feedback: rather than special-casing bus=None, always have a
real bus. _SinkSkill now constructs a FakeBus by default when no bus is
supplied; setting bus=None after construction raises ValueError.
PipelineHarness drops the explicit bus=None / late-rebind dance — it
constructs _SinkSkill() with the default FakeBus and rebinds to
MiniCroft's real bus in __enter__.
* feat(phal): add plugin_factories to MiniPHAL and PHALTest

Plugins that register bus handlers in __init__ must be constructed with
the harness FakeBus, not with a pre-existing bus. The new plugin_factories
parameter accepts callables (bus) -> plugin that are invoked during
__enter__, ensuring the plugin is always wired to the MiniPHAL bus.

Also fixes the deprecated ovos_utils.messagebus import to use
ovos_bus_client.message.Message directly.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

* docs: NLnet/NGI0 funding attribution

---------

Co-authored-by: Claude Fable 5 <noreply@anthropic.com>
…67)

Add file-driven, in-process harnesses for the three OVOS listener services,
each wiring the real service to a FakeBus with mock mic/VAD/STT/wake-word
plugins and capturing the recognizer_loop:* bus sequence:

- MiniVoiceLoop (ovos-dinkum-listener): feed_chunks drives _detect_ww for the
  wake-word / verifier-chain gate (closes #64); feed_file runs the full
  DinkumVoiceLoop.run() state machine over an audio file.
- MiniSimpleListener (ovos-simple-listener): drives the SimpleListener loop over
  an audio file with the canonical bus callbacks.
- MiniClassicListener (mycroft-classic-listener): RecognizerLoop event->FakeBus
  bridge plus a best-effort file-driven harness.

Shared ListenerHarness base provides bus capture and the assert_record_begin /
assert_wakeword_detected / assert_wakeword_suppressed / assert_utterance helpers.
MockFileMicrophone and MockStreamingSTT are shared across backends. Adds tests
(gated on each optional listener dependency) and docs/voice-loop.md.

Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Use the canonical funding block (developer + funder + correct NGI0 Commons
Fund banner) across the repo, replacing divergent ad-hoc credit notes.
…ytest>=8 compat) (#73)

* fix: pytest_pycollect_makemodule hook signature for pytest>=8 (drop removed 'path' arg)

* fix: declare pytest>=8 as a core dependency

ovoscope registers a pytest11 plugin, so pytest is a runtime dependency, not just
a test extra. Pin >=8: the pytest_pycollect_makemodule hook dropped the 'path' arg
in pytest 8 (the bug this PR fixes).
JarbasAl and others added 30 commits June 24, 2026 18:32
* feat: add GUICaptureSession.assert_template_shown for SYSTEM_* templates

Ergonomic GUI assertion for the template-based GUI: asserts a built-in
SYSTEM_* template was shown (prefix optional) plus its accompanying
gui.value.set session data, in one call. Backend-agnostic. Adds unit tests
and a docs section.

* fix: normalize single source_message robustly across Message classes

End2EndTest normalized source_message to a list via isinstance(Message), but
the message class can come from ovos_bus_client / ovos_spec_tools /
ovos_utils.fakebus depending on installed versions; a cross-class isinstance is
False, leaving a single non-iterable Message that broke later iteration
(TypeError: 'Message' object is not iterable on newer stacks). Normalize on
'not a list' instead. Fixes the test_remote_recorder build failure.
…tra (#89)

* feat: export ovos-media OCP harness from the package + add [media] extra

ovoscope.media (OCPPlayerHarness, OCPCaptureSession, MockOCPBackend) was importable
only via the submodule path, while the ovos-audio AudioServiceHarness is re-exported
from the top-level package. Mirror that: re-export the media harness from
ovoscope/__init__.py (guarded like the audio block) and add a [media] extra declaring
ovos-media, so OCP/ovos-media backends get a first-class 'from ovoscope import
OCPPlayerHarness' entry point alongside the legacy audio one.

ovos-media is imported lazily inside the harness, so the export needs no extra at
import time; the [media] extra is required only to run the harness.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* feat: MediaProviderHarness + injectable backend for OCPPlayerHarness

Add ovos-media test-harness support for the two e2e shapes that previously had to
be hand-rolled by consumers:

- MediaProviderHarness (ovoscope/media_provider.py): a dependency-free, duck-typed
  harness for opm.media.provider catalog/search plugins. from_entrypoint/from_class
  constructors, drivers mirroring the pipeline path (serves -> search_safe), and
  assert helpers. Imports neither mediavocab nor opm's MediaProvider, so it needs
  no extra and is re-exported unconditionally.
- OCPPlayerHarness backend_factory: optional bus->AudioBackend callable; when given,
  the harness wires a real AudioService (no autoload) with the injected backend so
  the player's play->load_track->LOADED_MEDIA->backend.play() path actually drives a
  real OCP backend (e.g. a Music Assistant audio backend). Default stays MockOCPBackend.

Tests + docs included; CI build-tests/coverage extras gain 'media' so the new
harness tests run.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
…rs (#86)

* feat: stream audio frames through MiniListener for multi-frame decoders

Add ``MiniListener.feed_audio_stream`` which feeds a sequence of audio
frames in order and aggregates every message emitted across the whole
stream, instead of clearing the capture buffer per call. This is required
to test transformers whose decoder only fires after accumulating many
frames (e.g. ggwave data-over-sound).

- ``ListenerTest`` gains ``feed_method="feed_audio_stream"`` + ``chunk_size``
- document the real-ggwave streaming pattern in docs/listener.md
- add unit tests using a stub accumulating transformer (no native deps)

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* ci: install ovos-dinkum-listener so streaming tests run

test_listener_stream exercises the MiniListener.feed_audio_stream
plugin_instances path, which needs the dinkum AudioTransformersService.
Add a [listener] extra (pinned >=0.7.2a1 — the first release that allows
ovos-bus-client 2.x, older pins cap it <2.0.0 and conflict with ovos-core)
and install it in the build-tests and coverage workflows so the tests run
instead of erroring on the missing dependency.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…#88)

The pytest11 plugin used the legacy hookwrapper=True / outcome.get_result()
protocol. That style is deprecated and slated for removal; pytest 9 standardizes
on the wrapper=True return-style. Adopt it so the auto-loaded plugin keeps
importing cleanly under pytest 8 and 9 without consumers needing -p no:ovoscope.

Behavior is unchanged: the downstream collector now arrives directly from yield
and is returned unchanged after intent-case auto-discovery runs.

Closes #87
* feat!: audio harness on OVOS spec bus namespace

Migrate the AudioServiceHarness / PlaybackServiceHarness / AudioCaptureSession
to the ovos.* spec topics via SpecMessage, matching ovos-audio's spec-bus
migration (PIPELINE-1 §9.6):

- emit speak as SpecMessage.SPEAK (ovos.utterance.speak)
- subscribe/capture SpecMessage.AUDIO_OUTPUT_STARTED / AUDIO_OUTPUT_ENDED
- subscribe SpecMessage.MIC_LISTEN

The harness runs on a plain FakeBus (no modernize bridging), so it must emit
and observe the spec topics directly. Bump the audio/tts extras to the
spec-migrated ovos-audio and add an explicit ovos-spec-tools dependency.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* fix: bump ovos-audio dev floor to 1.3.0a1 for bus-client 2.x

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* fix: lower ovos-audio audio/tts extra floors to 1.3.0a1 (highest on PyPI)

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* feat: test BOTH legacy and ovos.* bus namespaces via bridging FakeBus

The audio harness migrated to the ovos.* spec topics (PR #92) while ovos-audio
still emits the legacy topics; on the old non-bridging FakeBus a legacy producer
never reached the spec-subscribed harness handler, so several test_audio_harness
assertions (ducking, speak lifecycle, capture sequence) failed. ovos-utils #381
makes FakeBus mirror MessageBusClient's legacy<->ovos.* migration, which
reconnects them — these were never genuine harness bugs.

- pin ovos-utils>=0.12.0a1 (first FakeBus with namespace migration; #381)
- thread modernize=/emit_legacy= through MiniCroft, AudioServiceHarness and
  PlaybackServiceHarness so harness users can exercise either namespace, both,
  or a single isolated namespace
- AudioCaptureSession captures BOTH the legacy and ovos.* audio topics: it
  observes the raw "message" wire stream (which carries the producer's ORIGINAL
  topic only — the bridge re-dispatches the counterpart as a typed event, not a
  second "message"), so it must list both namespaces to record either producer
- add test/unittests/test_namespace_bridging.py: legacy->spec, spec->legacy,
  dual-subscribe dedup, two-genuine-events, and no-bridging isolation
- add TestAudioHarnessNamespaceBridging to test_audio_harness.py: ducking via
  the bridge, ducking via the spec topic natively, single-namespace isolation,
  and the speak lifecycle through the bridge

Stacked on ovos-utils#381 and ovos-spec-tools#26 (NamespaceTranslator);
CI stays red until both publish.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* feat: dual-namespace bus tests across all core-service harnesses

Extends the audio-harness namespace work (#92) to every core service so each
proves its migrated bus topics travel on BOTH the legacy and the ovos.* spec
namespace via FakeBus bridging. Each harness gains modernize=/emit_legacy=
kwargs (threaded to FakeBus or to MiniCroft) so callers can exercise either
namespace, both, or a single isolated one.

Per service (harness + Test<Service>NamespaceBridging):
- listener / voice_loop / simple_listener: recognizer_loop:utterance ->
  ovos.utterance.handle, record_begin/end -> ovos.listener.record.started/ended
- classic_listener: + mycroft.awoken -> ovos.listener.awoken
- e2e MiniCroft / pipeline / ocp: recognizer_loop:utterance -> ovos.utterance.handle
- media (OCPPlayer): cork/duck via record + audio.output topics, spec->legacy
  reaches the legacy-subscribed handlers via emit_legacy
- phal: no migrated topics — verifies the harness bus itself bridges

Each class covers: legacy->spec bridged, spec->legacy bridged, spec native, and
single-namespace isolation with both flags off. Handlers subscribe on both
topics (the bridge re-dispatches the counterpart as a typed event, not a second
wire 'message').

* fix: raise audio extra ovos-spec-tools floor to 0.10.0a1

The FakeBus namespace bridging (ovos-utils >=0.12.0a1) unconditionally
imports NamespaceTranslator, which first ships in ovos-spec-tools 0.10.0a1.
The previous >=0.9.0a1 floor could resolve to a version without it.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
Under the 9.x session shape Session.blacklisted_skills and
blacklisted_intents can be None rather than an empty list, which made the
final-session equality check raise TypeError: 'NoneType' object is not
iterable. Coalesce to [] before set() so the assertion compares cleanly.

Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
TTS.playback is a class-level attribute shared by every TTS instance in the
process. The inherited TTS.__del__ chains into TTS.stop() -> TTS.playback.stop(),
so when an earlier PlaybackServiceHarness's MockTTS is garbage-collected its
destructor terminated whatever PlaybackThread was *currently* registered there
— which by then belongs to a later, still-running harness. The victim thread
had _terminated set and exited its loop, so its queued speak never played and
ovos.audio.output.ended was never emitted, hanging the next speak() until
timeout. GC timing made this a flaky TimeoutError that surfaced only after
several harness create/destroy cycles (e.g. mid-file in a consumer's
test/end2end suite).

Override MockTTS.__del__ as a no-op: the harness already manages playback-thread
lifecycle explicitly via PlaybackService.shutdown() on context exit, so a mock
instance must never tear down the shared thread on collection.

Add regression tests: a deterministic guard that fires a stale mock's
destructor while a later harness owns TTS.playback and asserts the live
thread is neither terminated nor unable to keep speaking, plus a
many-sequential-harnesses smoke test.

Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
…True) (#102)

* fix: MockTTS destructor must not stop the shared playback thread

TTS.playback is a class-level attribute shared by every TTS instance in the
process. The inherited TTS.__del__ chains into TTS.stop() -> TTS.playback.stop(),
so when an earlier PlaybackServiceHarness's MockTTS is garbage-collected its
destructor terminated whatever PlaybackThread was *currently* registered there
— which by then belongs to a later, still-running harness. The victim thread
had _terminated set and exited its loop, so its queued speak never played and
ovos.audio.output.ended was never emitted, hanging the next speak() until
timeout. GC timing made this a flaky TimeoutError that surfaced only after
several harness create/destroy cycles (e.g. mid-file in a consumer's
test/end2end suite).

Override MockTTS.__del__ as a no-op: the harness already manages playback-thread
lifecycle explicitly via PlaybackService.shutdown() on context exit, so a mock
instance must never tear down the shared thread on collection.

Add regression tests: a deterministic guard that fires a stale mock's
destructor while a later harness owns TTS.playback and asserts the live
thread is neither terminated nor unable to keep speaking, plus a
many-sequential-harnesses smoke test.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* feat: MockTTS — emit audio_output_end on delay for speak_dialog(wait=True)

Skills calling speak_dialog(..., wait=True) block on
recognizer_loop:audio_output_end via SessionManager.wait_while_speaking.
Without a real TTS the handler thread blocks for 15+s, tripping the §8.3
10s handler backstop and spurious handler.error.

MockTTS schedules audio_output_end on a 0.1s Timer from the speak handler.
Uses bus.ee.emit (not bus.emit) to bypass FakeBus namespace-migration and
on_message side effects so the synthetic event is invisible to test captures.

* chore: drop agent scratch (AGENTS.md, TODO.md) from the PR

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
… session reset) (#104)

Expands the comment to explain that audio_output_end is a synthetic hardware event,
so it must bypass FakeBus.on_message's session rebuild/SessionManager.update (which
mirrors the real MessageBusClient and would clobber transient is_speaking/active_skills).

Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant