diff --git a/APPENDIX.md b/APPENDIX.md index 0f9fc71..e608da9 100644 --- a/APPENDIX.md +++ b/APPENDIX.md @@ -3,588 +3,1485 @@ **Non-normative.** This document is a companion to the OVOS formal specifications. It records design rationale, comparisons with other systems, the catalogue of *deliberate* divergences from current OVOS -code, and topics worth discussing that do not belong in a normative -specification. Nothing here is binding — OVOS-INTENT-1, OVOS-INTENT-2, -OVOS-INTENT-3, and OVOS-MSG-1 are the only normative documents. This -appendix exists so the specs themselves can stay terse and -requirement-focused. +code, and implementer-facing reference material that does not belong +in a normative specification body. Nothing here is binding — the +normative documents are OVOS-INTENT-1, OVOS-INTENT-2, +OVOS-INTENT-3, OVOS-INTENT-4, OVOS-MSG-1, OVOS-SESSION-1, +OVOS-PIPELINE-1, OVOS-CONTEXT-1, and OVOS-TRANSFORM-1. + +Pointers to specific OVOS code (file paths, class names, function +names) and to specific real projects (HiveMind, Adapt, +padatious, ovos-audio, ovos-workshop, …) are deliberately kept +*out* of the spec bodies and collected here, because implementation +code moves and specifications must not. --- -## 1. These specifications formalize an existing system - -The OVOS stack — the engines (padatious, Adapt), the skill ecosystem, -the resource file formats, the pipeline, the bus, the session model — -already exists and runs in production. These specifications were -written **after** the system they describe. They are a *formalization -pass*: they document an existing design implementation-agnostically, -tighten under-defined corners, and remove accidental inconsistencies, -so the contracts can be implemented by new engines, new hosts, and -adopted by other assistants. - -This matters for how to read them. They are **prescriptive** — each -spec states a clean target, and where it diverges from current OVOS -behaviour the divergence is a deliberate cleanup (catalogued in §6) — -but they are not speculative. The target is a lightly-cleaned version -of a working system, not a greenfield design. `padacioso`, -`ovos-workshop`, and `ovos-bus-client` are the closest existing -implementations; none yet fully conforms, and bringing them into -conformance is planned work. OVOS-MSG-1 is the closest to current code -of all the specs — it is largely a verbatim formalization of what -`ovos-bus-client` already does. +## 1. About the OVOS specifications + +### 1.0 The voice operating system concept + +The term *voice operating system* is precise, not marketing. The +distinction matters because OVOS is routinely conflated with two +things it is not: + +**It is not a voice assistant product.** A voice assistant is a +closed, vertically-integrated product — a single vendor controls +the NLU, the dialogue policy, the skill ecosystem, and the output +layer. It answers questions. A voice operating system is a +*platform*: it defines contracts that arbitrary third-party +components implement independently, and the platform's job is to +arbitrate between them. The analogy to a general-purpose OS is +direct. The pipeline is a scheduler: it has a priority order, a +first-match-wins dispatch policy, and a circuit-breaker for failing +components. The bus is IPC: broadcast delivery, no central +authority, no guaranteed ordering beyond the single-flip routing +model. The session carrier is shared memory: it propagates opaquely +through every message and every component reads and writes its +owned slice. The handler-lifecycle trio is process supervision: the +orchestrator wraps every handler invocation with start/complete/error +events regardless of what the handler does. Pipeline plugins and +transformer plugins are loadable modules: swapped, replaced, and +composed at deployment time with no changes to the ABI. + +**It is not an LLM wrapper.** A language model fits the voice OS +model as a first-class plugin — and in multiple roles. As a +*pipeline plugin*, it implements `match(utterances, lang, session) +→ Match`, returning a match immediately and deferring generation to +its handler (PIPELINE-1 §4.4). As an *utterance transformer*, it +paraphrases, normalizes, or expands the inbound candidate list +before matching (TRANSFORM-1 §3.2). As a *dialog transformer*, it +rewrites the handler's natural-language response before delivery +(TRANSFORM-1 §3.5). As a *metadata transformer*, it enriches the +utterance with detected intent signals before the pipeline sees it +(TRANSFORM-1 §3.3). In each role, the model is one implementation +of a defined plugin contract — swappable, composable, and neutral +to the platform. Whether any LLM is loaded at all, and in which +roles and at what priority, is a deployment decision. An +architecture organized around a single model call is not a voice OS; +it is one possible single-plugin deployment of one. + +The consequence of the OS framing: a skill written against the +intent stack runs on any conformant orchestrator, under any pipeline +configuration, with any combination of NLU backends, in any language +the deployment supports. The platform's only invariant is the ABI — +the wire contracts these specifications define. + +### 1.1 Formalization of an existing system + +The OVOS stack — the engines (padatious, Adapt), the skill +ecosystem, the resource file formats, the pipeline, the bus, the +session model — already exists and runs in production. The +specifications were written **after** the system they describe. +They are a *formalization pass*: they document an existing design +implementation-agnostically, tighten under-defined corners, and +remove accidental inconsistencies, so the contracts can be +implemented by new engines, new hosts, and adopted by other +assistants. + +This matters for how to read them. They are **prescriptive** — +each spec states a clean target, and where it diverges from +current OVOS behaviour the divergence is a deliberate cleanup +(catalogued in §5) — but they are not speculative. The target is +a lightly-cleaned version of a working system, not a greenfield +design. `padacioso`, `ovos-workshop`, and `ovos-bus-client` are +the closest existing implementations; none yet fully conforms, +and bringing them into conformance is planned work. OVOS-MSG-1 +is the closest to current code of all the specs — it is largely +a verbatim formalization of what `ovos-bus-client` already does. + +### 1.2 The spec set, in three stacks + +The specifications are built bottom-up in three stacks: + +- **The intent stack**, in dependency order: OVOS-INTENT-1 + (template grammar) → OVOS-INTENT-2 (resource files) → + OVOS-INTENT-3 (the intent concept) → OVOS-INTENT-4 (the + registration wire format on the bus). +- **The bus stack**: OVOS-MSG-1 formalizes the envelope, routing, + session carrier, and `forward`/`reply`/`response` derivations. + OVOS-SESSION-1 formalizes the wire shape of the session + carrier and the field-registry mechanism by which other specs + claim session fields. +- **The orchestrator stack**: OVOS-PIPELINE-1 defines the + orchestrator, the pipeline-plugin abstraction, the utterance + lifecycle, and the handler-lifecycle trio. OVOS-CONTEXT-1 + defines per-session intent-context state (the **declarative** + continuous-dialog primitive). OVOS-CONVERSE-1 defines the + active-handler recency stack, the converse plugin role, and + the interactive response-collection mechanism (the + **imperative** continuous-dialog primitive, complementary to + CONTEXT-1 — its §7 fixes the evaluation order between the two + surfaces). OVOS-TRANSFORM-1 defines the six injection-point + transformer chains. OVOS-SESSION-2 defines the session + lifecycle and state-ownership model (stateless orchestrator + for named sessions, orchestrator-owned default session, + SHOULD-project pathway for cross-utterance state with + MAY-internal as the alternative for state too large or + externally coupled to project). The orchestrator stack sits on top + of the bus stack (uses MSG-1's envelope and routing, + SESSION-1's session carrier with SESSION-2's lifecycle) and + around the intent stack (intent registrations are one kind + of input pipeline plugins consume). + +### 1.3 Compatibility levels ---- +Each specification carries its own integer `Version`, bumped per +PR per the contributing rules in the README. + +For the **intent stack**, a single integer identifies a coherent +grammar / resources / intent-definition snapshot checked by +`ovos-spec-lint`. The ladder: + +- **V0** — undocumented pre-spec baseline; no `.blacklist`, no `` references. +- **V1** — INTENT-1, -2, -3 at v1; headline addition is the `.blacklist` role. +- **V2** — V1 plus inline vocabulary references (``); a V2 template cannot be expanded by a V1 tool. + +The bus and orchestrator stacks are versioned **individually** +and not placed on a unified ladder — a tool targeting them cites +per-spec versions ("MSG-1 v2, PIPELINE-1 v2"). + +### 1.4 Reference implementations and ecosystem tooling + +The **reference implementation for the intent stack** is +**`ovos-spec-tools`** — expander, resource loader, dialog +renderer, language matching, locale linter — in one +dependency-light Python package. New tools that consume locale +folders or expand templates should depend on it rather than +reimplementing. + +The bus and orchestrator stacks do not yet have a comparable +ground-up reference implementation; `ovos-bus-client` is the +closest match for OVOS-MSG-1 and `ovos-core` is the closest +match for OVOS-PIPELINE-1 + OVOS-INTENT-4, but both predate the +specs. + +**`ovos-localize`** is the i18n-operation layer atop the intent +stack: a GitHub-native localization platform for OVOS skills, +built specifically around the resource roles of OVOS-INTENT-2. +It scans skill repositories for locale files; analyzes each +skill's Python source (via AST) to recover the **handler +context** of a resource — which function uses a file, what its +slots mean, what dialog it triggers, which is exactly the +intent↔handler binding of OVOS-INTENT-3 §1; validates +translations against a rule set (slot preservation, expansion +validity, variant counts); and lets translators browse, edit, +preview, and submit translations as pull requests. It is the +OVOS counterpart to Home Assistant's managed `intents` +repository. -## 2. Comparison with Home Assistant and Rhasspy - -OVOS, Home Assistant (HA), and Rhasspy share a common lineage. The -bracket-expansion grammar of OVOS-INTENT-1 — `(a|b)` alternatives, `[optional]` -segments, `{slot}` placeholders — is the same family as HA's `hassil` sentence -templates and Rhasspy's `sentences.ini`. The *syntax* is not novel. What is -distinctive about the OVOS approach is everything around the grammar. - -### 2.1 What the OVOS design does differently - -- **An implementation-agnostic spec at all.** HA and Rhasspy have no - format-level specification independent of their implementation — the code is - the contract. OVOS now has one, which is what lets multiple engines (and - other assistants) implement the same contract. -- **Engine-agnostic matching.** OVOS-INTENT-1 §4 treats templates as *training - data* and leaves matching, scoring, and generalization to the engine. HA's - core matching is `hassil`, a deterministic template matcher; Rhasspy compiles - templates into a closed ASR grammar. The OVOS contract accommodates a - deterministic matcher, a neural classifier, or an LLM behind one interface. -- **Templates are training data, not a closed grammar.** A capable OVOS engine - generalizes beyond the authored samples. Rhasspy's closed-grammar model is - deterministic and offline-guaranteed but brittle — an utterance not derivable - from `sentences.ini` cannot be recognized at all. -- **A multi-stage pipeline** (see §3). Intent engines are two stage kinds among - many. Neither HA nor Rhasspy exposes an intent layer this structured. -- **An intent is bound to one handler, owned by one skill** (OVOS-INTENT-3 §1). - See §2.3 — this follows necessarily from the open skill ecosystem. -- **A bus substrate that is openable to layer-2 systems** (OVOS-MSG-1 - §3.4, §4.4). The `source`/`destination` boundary pair plus - `session.session_id` give third parties everything they need to - layer authentication, routing, and remote participation on top of - OVOS without modifying it. HiveMind is the canonical example. - Neither HA nor Rhasspy exposes their bus this openly. See §5. - -### 2.2 What Home Assistant and Rhasspy do better - -- **Reusable template fragments.** `hassil` has `expansion_rules` and Rhasspy - has `` references — named, reusable sub-templates that let authors share - common fragments (politeness prefixes, articles, recurring phrasings). The - version-1 OVOS grammar had no equivalent. **OVOS-INTENT-1 version 2 closes - this** with the `` inline vocabulary reference (issue #1), which expands - a named `.voc` in place — reusing the existing slot-free format rather than - adding a new construct (see §4). -- **i18n corpus maturity.** HA's community `intents` repository is a large, - managed, professionally-translated corpus covering many languages. OVOS has - the tooling counterpart in **ovos-localize** (§8) — a GitHub-native - localization platform built around the OVOS-INTENT-2 resource roles — so the - gap here is the *scale and maturity* of the corpus, not the absence of - tooling. -- **Concrete, testable completeness.** HA and Rhasspy ship systems where the - hard parts — matching, number and range handling, slot typing — are solved - concretely. The OVOS specs deliberately defer some of these (slot typing to a - future normalization spec; matching to the engine). That deferral is - intellectually consistent but means the specs' value depends on the engines - and tooling that fill the gaps. - -### 2.3 Closed domain vs open ecosystem - -The sharpest difference is not technical but structural. **Home Assistant is a -curated, closed domain**: home automation, with a vendor-managed intent -vocabulary. HA can treat an intent such as `HassTurnOn` as a *shared contract* -honoured uniformly across hundreds of integrations and many languages, because -HA controls and curates that vocabulary. - -**OVOS is an open ecosystem.** Skills are arbitrary third-party Python packages, -installed by pip, developed independently, running as arbitrary code in -process. A skill can do anything; OVOS voice-enables anything. In that setting a -shared global intent vocabulary is not a missing feature — it is incoherent. -When skills are unbounded, an intent *must* be private to the skill that -defines it and bound directly to that skill's handler. OVOS-INTENT-3's "an -intent is not an event" stance is therefore the correct model for an open -ecosystem, just as HA's shared-vocabulary model is correct for a curated one. -The two models are right for different platforms; neither is universally -better. - -### 2.4 Summary - -OVOS is not out-designed by HA or Rhasspy at the architecture level — at the -pipeline layer (§3) it is ahead of both, and its intent-as-handler-binding -model is the correct consequence of being an open platform. HA's real advantage -is the maturity and scale of its translation corpus — an ecosystem investment, -not an architectural one, and one OVOS now has tooling for in ovos-localize -(§8). The grammar itself is a commodity shared by all three; the OVOS bet is the -engine-agnostic contract and the pipeline. --- -## 3. The pipeline — what these specs do not cover - -The intent specs (OVOS-INTENT-1/2/3) formalize **intent definition**: -the grammar, the resource files, what an intent is, the intent-engine -contract. OVOS-MSG-1 formalizes the bus that carries the result. -The piece that sits *around* both — the multi-stage **pipeline** that -decides which intent engine even gets a turn, interleaves -confidence tiers, runs `converse` / `fallback` / `common_query` / -`ocp` / `persona` stages, and produces the universal -`ovos.utterance.handled` end-marker — is not formalized by any spec -in this repository yet. - -That gap is what makes OVOS structurally distinctive (HA and Rhasspy -have no equivalent layer), and what most reviewers ask about -first. The natural next formalization is a pipeline / utterance- -lifecycle specification; see §7 known gaps. - -One observation worth flagging here: **the engine-agnostic intent -contract is already realized**, not hypothetical. `ovos-persona` plugs -into the pipeline as a first-class LLM stage (`persona-high`, -`persona-low`) — the OVOS-INTENT-3 §6.2 non-normative note about -LLM-backed engines describes something that ships today. The -ordered confidence-tier chain (deterministic Adapt before fuzzy -Padatious before an LLM persona last) is also how the system -*bounds* engine generalization in practice: generalization is not -unconstrained, it is bounded by where an engine sits. +## 2. Comparison with other voice-assistant systems + +The OVOS specifications occupy territory adjacent to several +existing voice-assistant systems. This section locates the +design choices against each comparator. The summary in §2.5 +records where the voice OS leads architecturally, where it +follows, and where it makes a deliberately different choice. + +### 2.1 Home Assistant and Rhasspy — shared grammar lineage + +OVOS, Home Assistant (HA), and Rhasspy share a common lineage. +The bracket-expansion grammar of OVOS-INTENT-1 — `(a|b)` +alternatives, `[optional]` segments, `{slot}` placeholders — is +the same family as HA's `hassil` sentence templates and +Rhasspy's `sentences.ini`. The *syntax* is not novel. What is +distinctive about the OVOS approach is everything around the +grammar. + +**What OVOS does differently:** + +- **An implementation-agnostic spec at all.** HA and Rhasspy + have no format-level specification independent of their + implementation — the code is the contract. OVOS now has one, + which is what lets multiple engines (and other assistants) + implement the same contract. +- **Engine-agnostic matching.** OVOS-INTENT-1 §4 treats + templates as *training data* and leaves matching, scoring, + and generalization to the engine. HA's core matching is + `hassil`, a deterministic template matcher; Rhasspy compiles + templates into a closed ASR grammar. The OVOS contract + accommodates a deterministic matcher, a neural classifier, + or an LLM behind one interface. +- **Templates are training data, not a closed grammar.** A + capable OVOS engine generalizes beyond the authored samples. + Rhasspy's closed-grammar model is deterministic and + offline-guaranteed but brittle — an utterance not derivable + from `sentences.ini` cannot be recognized at all. +- **A multi-stage pipeline** (§3.2). Intent engines are two + stage kinds among many. Neither HA nor Rhasspy exposes an + intent layer this structured. +- **An intent is bound to one handler, owned by one skill** + (OVOS-INTENT-3 §1). See §2.2 — this follows necessarily from + the open skill ecosystem. +- **A bus substrate openable to layer-2 systems** (§3.1). + Neither HA nor Rhasspy exposes their bus this openly. + +**What HA and Rhasspy do better:** + +- **Reusable template fragments.** `hassil` has + `expansion_rules` and Rhasspy has `` references — + named, reusable sub-templates that let authors share common + fragments (politeness prefixes, articles, recurring + phrasings). OVOS-INTENT-1 version 2 closes this with the + `` inline vocabulary reference, which expands a named + `.voc` in place — reusing the existing slot-free format + rather than adding a new construct. +- **i18n corpus maturity.** HA's community `intents` + repository is a large, managed, professionally-translated + corpus covering many languages. OVOS has the tooling + counterpart in `ovos-localize` (§1.4) — so the gap here is + the *scale and maturity* of the corpus, not the absence of + tooling. +- **Concrete, testable completeness.** HA and Rhasspy ship + systems where the hard parts — matching, number and range + handling, slot typing — are solved concretely. The OVOS + specs deliberately defer some of these (slot typing to a + future normalization spec; matching to the engine). That + deferral is intellectually consistent but means the specs' + value depends on the engines and tooling that fill the gaps. + +### 2.2 Closed domain vs open ecosystem + +The sharpest difference between OVOS and HA is not technical +but structural. **Home Assistant is a curated, closed domain**: +home automation, with a vendor-managed intent vocabulary. HA +can treat an intent such as `HassTurnOn` as a *shared contract* +honoured uniformly across hundreds of integrations and many +languages, because HA controls and curates that vocabulary. + +**OVOS is an open ecosystem.** Skills are arbitrary third-party +Python packages, installed by pip, developed independently, +running as arbitrary code in process. A skill can do anything; +OVOS voice-enables anything. In that setting a shared global +intent vocabulary is not a missing feature — it is incoherent. +When skills are unbounded, an intent *must* be private to the +skill that defines it and bound directly to that skill's +handler. OVOS-INTENT-3's "an intent is not an event" stance is +therefore the correct model for an open ecosystem, just as HA's +shared-vocabulary model is correct for a curated one. The two +models are right for different platforms; neither is +universally better. + +### 2.3 Rasa — closest comparator for intent context + +Rasa's "active forms" and slot mappings perform context-aware +matching, but they are baked into the policy engine; you +cannot run a Rasa NLU pipeline without Rasa policies. +OVOS-CONTEXT-1 separates **gating** (`requires_context` / +`excludes_context`, §6 / §6.1 of that spec) from **match-time +capture** (the context-supplied capture rule, §7) from **engine +matching hints** (engine-internal use of values, §6), so every +intent engine that consumes OVOS-INTENT-3 registrations can +gate uniformly without buying into a particular dialog policy. + +Rasa wins on conversation-level evaluation infrastructure — +story-based testing, end-to-end success metrics — for which +the OVOS specs have no analogue yet (§7 catalogues this as a +known gap). + +Rasa's NLU pipeline is also the closest analogue to +OVOS-TRANSFORM-1's utterance / metadata / intent chains, but +it is a single sequence per language model and the +policy/preference split (TRANSFORM-1 §5.3) does not exist. +TRANSFORM-1's six-injection-point model is genuinely more +expressive. + +### 2.4 Amazon ASK / Alexa Skills Kit, Google Dialogflow + +Both are closed-domain centrally-trained stacks. Their +built-in entity-type systems (`AMAZON.DATE`, +`@sys.date-time`) are what OVOS-TRANSFORM-1 §3.4 replicates as +an *injectable, deployer-replaceable, engine-agnostic* +contract — at the spec level OVOS is strictly more flexible, +though OVOS defers the **typed value formats themselves** +(date encoding, number representation, duration units) to a +future text-normalization spec (§7), while ASK and Dialogflow +ship them as built-ins. + +Neither ASK nor Dialogflow has a `session.pipeline`-equivalent +(the assistant picks one matcher per skill); neither has +anything like the layer-2 substrate of OVOS-MSG-1 §3.4. ASK +has built-in intents (`AMAZON.HelpIntent`) but they are +handled inside the skill; Dialogflow has fallback intents but +they do not have first-class dispatch identity. OVOS-PIPELINE-1's dispatch polymorphism +(`skill_id == pipeline_id` for plugin-bundled handlers) lets a +non-skill component advertise its own intent identity on the bus, +indistinguishable from a skill — original to this architecture. + +### 2.5 Summary — where the voice OS leads, follows, and differs + +**OVOS leads architecturally** in three places: + +- **The pipeline-plugin model with first-class dispatch + polymorphism.** No comparator lets a non-skill component + (LLM persona, chatbot, fallback) be a first-class handler + owner on the same dispatch surface. +- **The six-injection-point transformer chain with per-session + preference/policy separation.** Nothing in HA, Rhasspy, + Rasa, ASK, or Dialogflow has a comparable lifecycle-uniform + extensibility surface. +- **Negative gating (`excludes_context` "match if absent") + in CONTEXT-1.** ASK/Dialogflow contexts are purely + positive; Rasa forms are not engine-agnostic; HA has no + context model. The fire-once and modal-suppression patterns + fall out of negative gating. + +**OVOS follows** where ecosystem investment matters more than +architecture: + +- HA's translation corpus scale (the `intents` repository). +- ASK / Dialogflow's typed entity systems. +- Rasa's conversation-level evaluation infrastructure. + +**OVOS makes a deliberately different choice** in two places: + +- *Engine-agnostic templates as training data* (OVOS-INTENT-1 + §4) rather than Rhasspy-style closed grammars. The trade-off: + generalization beyond authored samples vs. offline-deterministic + recognition guarantees. +- *Open skill ecosystem with skill-private intents* + (OVOS-INTENT-3 §1) rather than HA-style curated vocabulary. + The trade-off: skill author freedom vs. cross-integration + vocabulary sharing. --- -## 4. Design rationale - -Short notes on *why* the specifications make the choices they do — the -reasoning, not the requirement. - -### Intent grammar and resources (INTENT-1, -2, -3) - -- **ASR-normalized input, no escaping** (OVOS-INTENT-1 §2). The grammar targets - voice input. By contract, text reaching an engine is already lowercased, - punctuation-stripped, single-spaced. Bracket metacharacters therefore cannot - occur as literal input, so no escape mechanism is needed. This is a - simplification *bought* by scoping the grammar to voice. -- **Templates are training data** (OVOS-INTENT-1 §4). Enumerating every - phrasing is futile for natural speech. A template describes the *shape* of - the training data; the engine generalizes. This is why expansion is defined - precisely but matching is not. -- **An intent is not an event** (OVOS-INTENT-3 §1). See §2.3 — necessary for an - open skill ecosystem. -- **Two non-interoperable methods** (OVOS-INTENT-3 §2). Keyword and template - intents describe a command in fundamentally different shapes. Rather than - forcing one model, the spec keeps both and makes engines declare which they - accept. The cost is that a developer must choose per intent and know which - engines an installation runs. -- **Slot typing is deferred** (OVOS-INTENT-1 §5.3). Interpreting a slot value - as a number or date is inseparable from how ASR output is normalized — and - normalization is not yet specified. Specifying typing first would be - incoherent, so a value is, for now, an opaque sequence of words. -- **`.blacklist` vs `excluded`** (OVOS-INTENT-3 §4.2, §5.4). The template - grammar is purely generative — it cannot express "not this". Template intents - therefore need a separate `.blacklist` artifact for suppression. Keyword - intents express the same idea natively with the `excluded` constraint role. - The asymmetry follows from the grammar, not from inconsistency. -- **No regular expressions** (OVOS-INTENT-3 §4.4). Free-form structured text is - a slot — use a template intent and the slot extractor. Regexes are also - notoriously hard to localize, which conflicts with the per-language model. -- **Inline vocabulary references reuse `.voc`** (OVOS-INTENT-1 §3.7). A - reusable template fragment and a keyword vocabulary are the same thing — a - named, slot-free phrase set — so `` resolves to a `.voc` rather than - introducing a new file role. The change is one grammar token plus an - expander step. - -### Bus, session, and routing (MSG-1) - -- **One spec, not two.** Envelope + routing + session + derivations - are tightly coupled — every routing key lives in `context`, every - derivation manipulates routing or session, and all of them - formalize *existing* OVOS code. Splitting them was tried; the split - did not survive the derivations (which can only meaningfully be - defined where the routing keys are), so they were merged into a - single bus-message spec. -- **`context` is extensible by design.** Only the keys other systems - already key behaviour off (`source`, `destination`, `session`) are - given normative meaning. Everything else — GUI routing, tracing, - security — is layered by other specs without touching the - envelope. -- **`source`/`destination` are informational, not authorization** - (MSG-1 §3.3). The bus is not a security boundary. Layer-2 systems - (HiveMind) build authentication and routing enforcement on top of - the pair without OVOS itself learning about peers. -- **The boundary is user ↔ assistant, not core ↔ handler.** The - `(source, destination)` pair marks who is currently talking to whom - across one boundary only: the external participant (user, chat UI, - satellite client, test harness) on one side, the assistant — OVOS - core *and* every skill handler — on the other. Skills are not on the - other side of this boundary from OVOS core; from the user's - perspective the assistant is one thing. The flip happens **once** - per conversational turn (§5.1), not on every internal hop. -- **`session_id == "default"` is the only normative-magic value** - (MSG-1 §4.1). It marks "originated by the device itself" and is the - hook `ovos-audio` already uses to decide whether to play TTS - locally. One reserved string, one well-defined consequence — enough - for layer-2 routing without specifying a full session model. -- **Absent `session` equals `session_id: "default"`** (MSG-1 §4.3). - Code paths that never set a session shouldn't accidentally get - treated as untrusted; the rule makes the substrate forgiving for - in-process subsystems while keeping the policy hook intact. -- **No central correlation, no central state** (MSG-1 §5.4). The bus - is fully asynchronous. There is no per-message ID, no - in-reply-to chain, no host-managed request/response index, and no - spec-level state tracking of any kind. Components that need to - correlate or remember things do it themselves, keyed on - `session.session_id` (the interaction-channel identifier — §5.2 - below). Multi-turn conversation, intent context, cross-skill - state, and similar concerns are deferred to future specifications; - see §5.2 for the model and §7 for the list of planned work. +## 3. Architectural patterns ---- +Two patterns recur across the spec family and are worth a +dedicated treatment. -## 5. The OVOS bus as a substrate +### 3.1 The bus as a substrate -Under MSG-1's `source` / `destination` / `session` model, the bus is -not just an internal transport — it is the **substrate higher-level -systems plug into without modifying OVOS**. Two mechanics make that -work: **single-flip routing** (§5.1), which keeps the routing pair -correct end-to-end without per-component effort; and **no central -state or correlation** (§5.2), which makes layer-2 systems -composable. HiveMind is the canonical example of what both -together enable (§5.3). +Under OVOS-MSG-1's `source` / `destination` / `session` model, +the bus is not just an internal transport — it is the +**substrate higher-level systems plug into without modifying +the assistant core**. Two mechanics make that work: +**single-flip routing** (§3.1.1), which keeps the routing pair +correct end-to-end without per-component effort; and **no +central state or correlation** (§3.1.2), which makes layer-2 +systems composable. HiveMind is the canonical example of what +both together enable (§3.1.3). -### 5.1 The single-flip routing model +#### 3.1.1 The single-flip routing model -The most important bus invariant in OVOS, and the one most often -reinvented incorrectly. The routing pair (`source`, `destination`) -flips **exactly once per conversational turn**, performed by -ovos-core, before the intent dispatch is emitted. From that point -on, every handler-side emission is *already* addressed back at the -user. +The most important bus invariant in OVOS, and the one most +often reinvented incorrectly. The routing pair (`source`, +`destination`) flips **exactly once per conversational turn**, +performed by ovos-core, before the intent dispatch is emitted. +From that point on, every handler-side emission is *already* +addressed back at the user. Three steps: -1. **The user side emits.** An external component — microphone - service, chat UI, satellite client, test harness — emits an - utterance with `source` set to itself: +1. **The user side emits.** An external component — + microphone service, chat UI, satellite client, test harness + — emits an utterance with `source` set to itself: context: { source: "audio", destination: null, session: {...} } -2. **ovos-core flips, then dispatches.** When the intent service - matches an intent it derives the dispatch via - `Message.reply(match_type, data)` (`ovos-core/.../service.py:340`). - The `.reply` rule of MSG-1 §5.2 swaps the routing pair: +2. **ovos-core flips, then dispatches.** When the intent + service matches an intent it derives the dispatch via + `Message.reply(match_type, data)` + (`ovos-core/.../service.py:340`). The `.reply` rule of + MSG-1 §5.2 swaps the routing pair: context: { source: "ovos-core", destination: "audio", session: {...} } The dispatch goes out on the per-intent topic - `:`. The flip has already classified the - message as *going back at the user*, even though a skill handler - is what actually runs. + `:`. The flip has already classified + the message as *going back at the user*, even though a + skill handler is what actually runs. -3. **The handler `.forward`s.** Every message the skill emits in - response — `speak`, the handler lifecycle trio, GUI events — - uses `Message.forward(...)` (`ovos-workshop/.../ovos.py:1461, - 1472, …`). `.forward` preserves `context` unchanged, so every - handler emission is already addressed back at the original - user-side component. +3. **The handler `.forward`s.** Every message the skill emits + in response — `speak`, the handler lifecycle trio, GUI + events — uses `Message.forward(...)` + (`ovos-workshop/.../ovos.py:1461, 1472, …`). `.forward` + preserves `context` unchanged, so every handler emission is + already addressed back at the original user-side component. Two consequences fall out: -- **The boundary is user ↔ assistant, not core ↔ handler.** Skill - handlers are on OVOS's side of the boundary; from outside, OVOS - is one thing. The user doesn't know or care which skill answered - them. +- **The boundary is user ↔ assistant, not core ↔ handler.** + Skill handlers are on OVOS's side of the boundary; from + outside, OVOS is one thing. The user doesn't know or care + which skill answered them. - **Handler authors never write addressing code.** Because - `.forward` preserves the flipped pair, no skill anywhere needs - to understand `source` / `destination`. Get the inversion right - once in ovos-core, and every downstream skill is automatically - correct. - -What this rules out: no per-hop addressing (handlers don't pick -their own `destination`); no second flip (handlers `.forward`, -they don't `.reply` to the dispatch); the dispatch topic -`:` selects the handler, not `destination` -(the destination belongs to the user). Implementers using `.reply` -where `.forward` is appropriate produce mis-routed messages that -work in local tests but silently break layer-2 routing. - -### 5.2 No central correlation, no central state + `.forward` preserves the flipped pair, no skill anywhere + needs to understand `source` / `destination`. Get the + inversion right once in ovos-core, and every downstream + skill is automatically correct. + +What this rules out: no per-hop addressing (handlers don't +pick their own `destination`); no second flip (handlers +`.forward`, they don't `.reply` to the dispatch); the dispatch +topic `:` selects the handler, not +`destination` (the destination belongs to the user). +Implementers using `.reply` where `.forward` is appropriate +produce mis-routed messages that work in local tests but +silently break layer-2 routing. + +#### 3.1.2 No central correlation, no central state The bus is **fully asynchronous**. OVOS does not centrally -correlate request/response chains, and does not centrally track -per-conversation state. There is no per-message identifier, no -in-reply-to field, no host-side index mapping a `.response` back to -its request, no shared "current conversation" record. +correlate request/response chains, and does not centrally +track per-conversation state. There is no per-message +identifier, no in-reply-to field, no host-side index mapping +a `.response` back to its request, no shared "current +conversation" record. `session.session_id` identifies an **interaction channel** — -nothing more. Two messages sharing a `session_id` are on the same -channel, but the spec guarantees nothing about ordering, state -continuity, or pending requests. +nothing more. Two messages sharing a `session_id` are on the +same channel, but the spec guarantees nothing about ordering, +state continuity, or pending requests. Every component — skills, pipeline plugins, external clients, layer-2 systems — owns whatever state it needs. An asker that wants `.response` correlation keeps its own outstanding-request table; a skill that wants conversational memory keeps its own per-session store; a layer-2 system that wants per-peer state -keys on `session_id`. Whatever a later consumer needs is **in the -Message** (`data` / `context` / `session`) or **out of band** — -never recovered from a hidden host-side index. - -This is what lets layer-2 systems plug in cleanly: if OVOS kept a -central correlation index or a central conversation state, every -layer-2 system would have to replicate it, hook into it, or work -around it. Because OVOS keeps neither, they compose without -contention. - -Several real concerns are deferred by this stance and are listed -under §7 known gaps: multi-turn conversation, intent context -(adapt's `add_context`/`remove_context`), the other session knobs -current OVOS carries beyond `session_id` and `lang` (`pipeline`, -`site_id`, `persona_id`, `time_format`, `date_format`, -`system_unit`, `tts_preferences`, …), and the eventual shape of -conversational state. The async-by-default model means those -future specs only need to define *what* the state is, not *how* -it travels. - -### 5.3 Why HiveMind works - -HiveMind is the canonical layer-2 system this design enables. A -HiveMind satellite is just another user-side emitter — it sets -`source` to its peer ID, populates `session` with a per-peer -session, and emits a Message. Inside OVOS: - -- ovos-core runs the same `.reply` flip (§5.1 step 2) — - `destination` becomes the satellite's peer ID instead of the - local microphone. -- Skills `.forward` as usual — `destination` stays the satellite - ID through every handler emission. -- HiveMind, watching the bus, sees each message addressed to its - peer and routes it back over the HiveMind transport. - -The pre-existing `session_id == "default"` rule keeps device-local -TTS on the device's speakers (per `ovos-audio/utils.py`'s -`require_default_session`), because remote HiveMind sessions -carry their own `session_id` and never `"default"`. - -None of this required HiveMind to modify OVOS core. The mechanism -that makes it work — single-flip routing + opaque per-session -identifiers + no central state — was already in -`ovos-bus-client/message.py:194-198`; MSG-1 just names and -formalizes it. +keys on `session_id`. Whatever a later consumer needs is **in +the Message** (`data` / `context` / `session`) or **out of +band** — never recovered from a hidden host-side index. + +This is what lets layer-2 systems plug in cleanly: if OVOS +kept a central correlation index or a central conversation +state, every layer-2 system would have to replicate it, hook +into it, or work around it. Because OVOS keeps neither, they +compose without contention. + +Several real concerns are deferred by this stance and are +listed under §7 (Known gaps): multi-turn conversation, the +other session knobs current OVOS carries beyond `session_id` +and `lang` (`persona_id`, `time_format`, `date_format`, +`system_unit`, `tts_preferences`, …), and the eventual shape +of conversational state. The async-by-default model means +those future specs only need to define *what* the state is, +not *how* it travels. + +#### 3.1.3 Layer-2 substrates + +The single-flip routing model and the no-central-state +design make layer-2 federation composable without modifying +the assistant core. A remote peer is just another user-side +emitter: it sets `source` to its peer ID, populates `session` +with its own named session, and emits a Message. The +orchestrator runs the same `.reply` flip; response messages +carry `destination == peer ID`; the bridge (watching the bus) +routes them back over the transport. The +`session_id == "default"` rule keeps device-local TTS on the +device's speakers because remote sessions carry their own +`session_id` and never `"default"`. + +Layer-2 bridges also inherit the session-field +**preference/policy split** without extra mechanism: client +sessions populate the preference fields +(`pipeline`, `_transformers`) to request behaviour; +the bridge populates the policy fields +(`blacklisted_pipelines`, `blacklisted__transformers`) +from the peer's grant. PIPELINE-1 §5.5 and TRANSFORM-1 §5.3 +compose them deterministically at the orchestrator. + +### 3.2 The pipeline-plugin model + +The piece that sits *around* the intent and bus stacks — the +multi-stage orchestrator that decides which engine even gets +a turn, runs `converse` / `fallback` / `common_query` / `ocp` / +`persona` stages, and produces the universal +`ovos.utterance.handled` end-marker — is what makes OVOS +structurally distinctive (HA and Rhasspy have no equivalent +layer). + +The plugin abstraction is **already in current code**: +`OVOSPipelineFactory` loads pipeline plugins by id at startup, +the orchestrator holds them in a `pipeline_plugins` dict +keyed on `pipeline_id`, and the default `Session.pipeline` is +an ordered list of plugin identifiers (with a migration map +translating legacy `padatious_high`-style names into modern +`ovos-padatious-pipeline-plugin-high`-style ones). The +official `ovos-padatious-pipeline-plugin`, +`ovos-adapt-pipeline-plugin`, `ovos-converse-pipeline-plugin`, +`ovos-fallback-pipeline-plugin`, +`ovos-common-query-pipeline-plugin`, +`ovos-ocp-pipeline-plugin`, and the persona plugins all +already conform to this model. + +OVOS-PIPELINE-1's contribution is therefore a **prescriptive +refinement**, not a wholesale new abstraction. It: + +- formalizes the plugin contract (the `match` shape, the + `Match` result, the side-effect-free discipline); +- defines `:` **dispatch + polymorphism** so a plugin can bundle its own handler (a + language-model persona, a chatbot) as a first-class + participant alongside skill-owned handlers; +- prescribes the **universal `ovos.utterance.handled` + end-marker** on every terminal path; +- renames the `mycroft.skill.handler.*` trio → + `ovos.intent.handler.*`. + +The current high/medium/low confidence-tier convention is +**compatible** with PIPELINE-1 and out of scope for the spec. +From the bus's perspective each tier is already a distinct +`pipeline_id` in the session's pipeline list (e.g. +`padatious_high`, `padatious_medium`, `padatious_low`), which +is exactly what the spec prescribes. How a Python plugin +class internally serves multiple `pipeline_id`s — one class +with `match_high` / `match_medium` / `match_low` methods, +three separate plugin instances, an orchestrator-side +suffix-decoding helper — is implementation choice the spec +does not constrain. + +Three properties make the resulting model unusually +expressive: + +- **All plugins are equivalent.** No spec-level distinction + between intent engines, converse handlers, fallbacks, + language-model personas, classic chatbots, anything else. + They all expose the same `match` contract. +- **Skills and plugin-bundled handlers are indistinguishable + as handler owners.** From outside, the assistant + responded — the user does not know or care whether a skill + matched against a registered intent or a language-model + plugin generated the response on the fly. +- **The engine-agnostic intent contract is already + realized**, not hypothetical. OVOS persona plugins + (`ovos-persona`, `ovos-persona-server`, + `ovos-claude-plugin`, `ovos-openai-plugin`, etc.) plug into + the pipeline as first-class language-model stages. The + ordered chain (deterministic keyword engines before fuzzy + template engines before language-model fallbacks last) is + also how the system *bounds* generalization in practice. + +What OVOS-PIPELINE-1 deliberately leaves out: **per-plugin +behavioural contracts**. A `converse` plugin, a `fallback` +plugin, a persona plugin: each defines itself. PIPELINE-1 +only defines the contract every plugin conforms to and the +universal utterance lifecycle around the iteration. + +### 3.3 Interoperability with external protocols + +The spec family does not define new transport protocols and +does not aim to replace existing ones. Where an external +voice-assistant protocol — Wyoming, OpenAI Chat Completions, +MCP tool calls, hassil templates, MQTT-based stacks — already +exists and serves a population, the spec family is designed to +**interoperate** with it through three well-defined injection +points. An adapter that plugs an external protocol into the +right injection point is a third-party implementation concern; +the spec family makes the integration shape predictable. + +**1. Pipeline plugins (OVOS-PIPELINE-1 §3) — the dispatch-layer +adapter.** A pipeline plugin wraps an external matcher, +consumes the utterance, and returns a `Match` with the +plugin's own `pipeline_id` as `skill_id`. The external +protocol becomes a first-class participant in the dispatch +surface, indistinguishable from a skill from the bus's +perspective. This is how language-model APIs, deterministic +template matchers, and external intent classifiers attach. + +**2. Transformer chains (OVOS-TRANSFORM-1 §3) — the +artifact-pipeline adapter.** A transformer wraps an external +protocol that operates on an audio, text, or rendered-output +artifact but does not claim intents. Examples: a +bidirectional-translation service at the utterance and dialog +chains; an external STT-confidence validator at the utterance +chain; a content-policy filter at the dialog or TTS chain; an +acoustic-event detector at the audio chain. + +**3. Bus boundary (OVOS-MSG-1 §3.4) — the wire-level +adapter.** A bridge component subscribes to the bus, translates +to and from an external transport, and either operates entirely +external (Wyoming-style audio / STT / TTS services talking +over TCP to a bridge that proxies the OVOS bus) or remotes the +whole bus (HiveMind-style layer-2 substrates). The +single-flip routing of §3.1.1 and the no-central-state stance +of §3.1.2 are what make the bus-boundary adapter feasible +without modifying the assistant core. + +#### Per-protocol notes + +- **Wyoming** (the component protocol used by Home Assistant + Voice and its ecosystem) operates at the audio-input / STT / + intent / TTS service boundary. A Wyoming bridge sits at the + bus boundary (§3.1, injection point 3 above): translate + Wyoming's `transcript` event into an `ovos.utterance.handle` + emission and translate the assistant's `speak` Messages + into Wyoming's `synthesize` event. Pipeline plugins are + unaffected; Wyoming components plug in *under* the + utterance lifecycle, not into it. +- **OpenAI Chat Completions and compatible APIs** (the + de-facto LLM interface). A persona-style pipeline plugin + wraps an OpenAI-compatible client (§3 of PIPELINE-1, + injection point 1 above). The plugin emits `Match` with + `skill_id = ` and bundles its own handler + using the dispatch polymorphism of OVOS-PIPELINE-1 §7. The + user sees a normal response; the LLM is a first-class + intent owner. +- **MCP (Model Context Protocol) and similar agent-tool + protocols.** A pipeline plugin can expose OVOS intents to + an MCP client (the OVOS-INTENT-4 §10 introspection topics + enumerate available intents) or call out to MCP tools from + within a plugin-bundled handler. Either direction sits at + injection point 1. +- **hassil templates and the Home Assistant `intents` + corpus.** A pipeline plugin can wrap hassil as a + deterministic template matcher (injection point 1). + Separately, the OVOS-INTENT-1 / hassil grammar lineage is + close enough that a **translation tool** between + OVOS-INTENT-2 locale resources and HA's `intents` YAML is + mostly mechanical — both formats are template-and-vocabulary + YAML at the same level of abstraction. Such a tool would + let the HA `intents` corpus and the OVOS locale corpus + cross-pollinate without either project changing its + format. This is concrete planned tooling, not just an + architectural possibility (§7). +- **MQTT-based stacks** (Rhasspy 2.x, miscellaneous IoT + voice systems). Bridge at the bus boundary (injection + point 3), same shape as Wyoming. +- **A2A and other agent-bus protocols.** Same shape as MCP; + pipeline-plugin wrapper or bus-boundary bridge depending + on whether the protocol participates in intent dispatch + or in cross-process bus routing. + +The three injection points are not exhaustive of where +adapters *could* go — a determined integrator can hook +almost anywhere — but they are the points the spec family +deliberately designs to keep clean. Any new protocol that +needs deeper integration than the three points permit is a +signal that the protocol genuinely overlaps the assistant's +own architecture rather than complementing it, at which +point the integration is a co-architecture decision rather +than an adapter. --- -## 6. Where the specs differ from current OVOS code +## 4. Design rationale, per specification + +Short notes on *why* the specifications make the choices they +do — the reasoning, not the requirement. Cross-reference into +the normative sections. + +### 4.1 Intent grammar and resources (INTENT-1, -2, -3) + +- **ASR-normalized input, no escaping** (INTENT-1 §2). The + grammar targets voice input. By contract, text reaching an + engine is already lowercased, punctuation-stripped, + single-spaced. Bracket metacharacters therefore cannot + occur as literal input, so no escape mechanism is needed. + A simplification *bought* by scoping the grammar to voice. +- **Templates are training data** (INTENT-1 §4). Enumerating + every phrasing is futile for natural speech. A template + describes the *shape* of the training data; the engine + generalizes. This is why expansion is defined precisely + but matching is not. +- **An intent is not an event** (INTENT-3 §1). Necessary for + an open skill ecosystem — see §2.2. +- **Two non-interoperable methods** (INTENT-3 §2). Keyword + and template intents describe a command in fundamentally + different shapes. Rather than forcing one model, the spec + keeps both and makes engines declare which they accept. + The cost is that a developer must choose per intent and + know which engines an installation runs. +- **Slot typing is deferred** (INTENT-1 §5.3). Interpreting + a slot value as a number or date is inseparable from how + ASR output is normalized — and normalization is not yet + specified. Specifying typing first would be incoherent, so + a value is, for now, an opaque sequence of words. +- **`.blacklist` vs `excluded`** (INTENT-3 §4.2, §5.4). The + template grammar is purely generative — it cannot express + "not this". Template intents therefore need a separate + `.blacklist` artifact for suppression. Keyword intents + express the same idea natively with the `excluded` + constraint role. The asymmetry follows from the grammar, + not from inconsistency. +- **No regular expressions** (INTENT-3 §4.4). Free-form + structured text is a slot — use a template intent and the + slot extractor. Regexes are also notoriously hard to + localize, which conflicts with the per-language model. +- **Inline vocabulary references reuse `.voc`** (INTENT-1 + §3.7). A reusable template fragment and a keyword + vocabulary are the same thing — a named, slot-free phrase + set — so `` resolves to a `.voc` rather than + introducing a new file role. The change is one grammar + token plus an expander step. + +### 4.2 Bus message envelope (MSG-1) + +- **One spec, not two.** Envelope + routing + derivations + are tightly coupled — every routing key lives in + `context`, every derivation manipulates routing, and all + of them formalize *existing* OVOS code. Splitting them + was tried; the split did not survive the derivations + (which can only meaningfully be defined where the routing + keys are), so they were merged into a single bus-message + spec. The session carrier, by contrast, did split out + cleanly into OVOS-SESSION-1. +- **`context` is extensible by design.** Only the keys + other systems already key behaviour off (`source`, + `destination`, `session`) are given normative meaning. + Everything else — GUI routing, tracing, security — is + layered by other specs without touching the envelope. +- **`source`/`destination` are informational, not + authorization** (MSG-1 §3.3). The bus is not a security + boundary. Layer-2 systems (HiveMind) build authentication + and routing enforcement on top of the pair without OVOS + itself learning about peers. +- **The boundary is user ↔ assistant, not core ↔ handler.** + The `(source, destination)` pair marks who is currently + talking to whom across one boundary only: the external + participant on one side, the assistant — core and every + skill handler — on the other. The flip happens **once** + per conversational turn (§3.1.1), not on every internal + hop. +- **No central correlation, no central state** (MSG-1 §5.4, + §3.1.2 above). The bus is fully asynchronous. Components + that need correlation or state own it themselves, keyed + on `session.session_id`. Multi-turn conversation, intent + context, cross-skill state, and similar concerns are + deferred to other specifications. +- **Topic naming conventions** (MSG-1 v2 §2.1.2). The + conventions other specs in the family already follow are + now codified as SHOULD-rules: dot-separated hierarchy + with `:` reserved for component-pair shapes; stable + ecosystem-identifying root; verb-tense pattern for the + trailing segment; request/terminal pairs sharing a root + verb (`handle` ↔ `handled`); `.response` suffix for + response derivations; per-instance + `...` form. + +### 4.3 Session carrier (SESSION-1) + +- **Why a separate session spec.** `Message.context.session` + is a load-bearing carrier claimed by multiple specs + (PIPELINE-1, CONTEXT-1, TRANSFORM-1) — without a single + owner, its wire contract drifts. SESSION-1 consolidates + the wire shape and fixes a **registry mechanism** so + future specs claim fields without amending SESSION-1 + itself. +- **Prescriptive, not descriptive.** Only the fields + normatively claimed by other specs are recognized. + Implementations carrying extra per-session state + (current OVOS Session has `persona_id`, `system_unit`, + `time_format`, `date_format`, `location`, `is_speaking`, + `is_recording`, …) are non-normative under v1 — they + ride through as opaque pass-through and can be claimed + by future per-domain specs. +- **Omission means "let the orchestrator decide".** Single + deferral mechanism: omitted single field, empty + `session: {}`, absent `session`, explicit + `session_id: "default"` — all equivalent on the wire, + all resolve at consumption to deployment defaults filled + by each consumer. No `null`, no sentinels. +- **Language signals.** Six BCP-47 fields with normative + meanings but stage-dependent consolidation: `lang` (user + preference, base), `secondary_langs` (additional + understood languages, constrains lang-detect predictions + and fallback selection), `output_lang` (renderer's + preferred output language; simplifies the + bidirectional-translation transformer to a fallback role), + `stt_lang` / `request_lang` / `detected_lang` + (per-utterance signals from STT, emitter, and lang-detect + respectively). `request_lang` is an emitter-reported hint + (per-wakeword language assignment in multi-wakeword + setups), not an override. + +### 4.4 Intent registration broadcast (INTENT-4) + +- **Registrations are broadcast — already how OVOS works.** + Skills emit registration messages on the bus; plugins + that care about a particular registration kind subscribe + to the corresponding topic. There has never been a + central routing party in OVOS; INTENT-4 just gives this + existing model normative topic names. The legacy bus + topics (`padatious:register_intent`, `register_vocab`, + etc.) are renamed into the `ovos.intent.*` namespace — + see §5.7 for the mapping. Migration is mostly a string + replacement. +- **No "no plugin claimed" error.** Following from the + broadcast model: a registration that no plugin consumes + is silently dropped. The producer gets no signal — the + introspection topics (`ovos.intent.list` / + `ovos.intent.describe`) are the supported way to verify + what the orchestrator's passive index recorded. +- **The orchestrator passively indexes; it does not + gate.** The introspection topics serve from a passive + registration index built by listening to broadcasts + (this *is* new — current OVOS has no central index). The + index reflects what skills *declared*, not what plugins + actually match against — observability-only. +- **Skill self-identification on every emission** + (INTENT-4 §3.1). Every Message a skill emits or + modifies in place carries `Message.context["skill_id"]`. + Enforcement is structural on the dispatch path: the + orchestrator stamps `context.skill_id` from the + `:` dispatch topic prefix + (PIPELINE-1 §7.1), and skill emissions via + `forward`/`reply` inherit automatically. + +### 4.5 Pipeline and lifecycle (PIPELINE-1) + +- **The plugin model is already in place; PIPELINE-1 + refines it** (§3.2). The current orchestrator already + loads plugins by id through `OVOSPipelineFactory` and + iterates `Session.pipeline`. PIPELINE-1 tightens the + contract rather than introducing the abstraction. +- **Orchestrator and plugin contracts live in one spec**, + since the orchestrator's job *is* iterating plugins and + translating their matches into bus events. Splitting + them would leave neither coherent. +- **Plugin contract is minimal.** `match(utterances, lang, + session) → Match | None`. Side-effect-free during + `match`; everything else (state, registrations, + language-model calls, response generation) is + plugin-internal black box. The smaller the contract, the + wider the set of plugins it accommodates. +- **`lang` parameter is propagation-only.** The + orchestrator passes `lang` through from + `Message.data.lang`; it **MUST NOT** synthesize a value + from `session.lang` or any per-utterance signal field + when `data.lang` is absent. Absence is a faithful + "unknown" signal; consumer-side fallback policy is the + consumer's. +- **Tier conventions are out of scope.** The current + high / medium / low suffix is implementation strategy: + from the bus, each tier is already a distinct + `pipeline_id` in `Session.pipeline`. The current + convention is compatible with PIPELINE-1 unchanged. +- **Skills and plugins are equivalent handler owners.** + The dispatch topic `:` is uniform: + for a pure-matcher plugin the `skill_id` is the matched + skill's id; for a plugin that bundles its own handler + (e.g. a language-model persona) `skill_id == pipeline_id`. + Both are addressed the same way. +- **Universal `ovos.utterance.handled` end-marker on every + terminal path.** One reserved invariant lets observers + count turns, route fallbacks, and know "the assistant + is idle now" without per-stage knowledge. +- **Three-stage composition** (PIPELINE-1 §5.5) — + preference (from `session.pipeline` or default-session + pipeline) → availability (drop unloaded plugins) → + policy (drop denylisted). Mirrors TRANSFORM-1 §5.3 + exactly. The same shape supports the + client-requests/layer-2-enforces split (§3.1). + +### 4.6 Intent context (CONTEXT-1) + +- **Lifts intent context out of Adapt.** The Adapt-specific + `add_context` / `remove_context` mechanism, and the + legacy `mycroft.skill.set_cross_context` / + `remove_cross_context` fan-out for cross-skill use, are + Adapt-only at the matcher level — Padatious and other + engines ignore them. CONTEXT-1 generalizes the mechanism + into a session-bound, decaying flat key/value store + consumed by every intent engine uniformly via + `requires_context` and `excludes_context` declarations. +- **Two explicit scopes encoded in the key shape.** + `private` (orchestrator auto-prefixes with + `:`) and `shared` (flat, cross-skill). The + current OVOS code models the same distinction informally + (`MycroftSkill.set_context` auto-prefixes with + `alphanumeric_skill_id`; `set_cross_skill_context` fans + out via a bus event); CONTEXT-1 names the scopes + explicitly and routes both through one bus surface. +- **Why private is the default.** A skill that calls + `ovos.context.set` without specifying `scope` gets a + private entry. This optimises for the safer case: a + cross-skill leak from an accidentally-shared entry is + harder to debug than a cross-skill miss from an + accidentally-private entry. The current Adapt + `set_context` pattern is effectively skill-private; the + default preserves migration fidelity. Cross-skill + coordination is a conscious decision that deserves an + explicit `scope: "shared"`. +- **Prior art for the negative gate.** Three in-tree + intent engines under `/plugins-pipeline/` — + [jurebes](https://github.com/OpenJarbas/jurebes), + [nebulento](https://github.com/OpenJarbas/nebulento), + and [palavreado](https://github.com/OpenJarbas/palavreado) + — independently implement `exclude_context` as a + first-class negative gate. CONTEXT-1's `excludes_context` + adopts the same primitive at the spec level, addressing + patterns ("fire once", "modal suppression") that + positive gating alone cannot express. +- **Engine-side mutation as a sanctioned non-bus + pathway.** The Adapt pipeline plugin auto-injects matched + entities into context *inside* `match()`, which conflicts + with PIPELINE-1 §4.2's side-effect-free `match` rule. + CONTEXT-1 §5.3 carves an explicit window between + match-accept and dispatch-emit for engine-side session + mutation, with the orchestrator (not the bus) carrying + the write. This both legitimizes the established + practice and resolves the PIPELINE-1 contradiction. +- **Eight-level lifecycle-position owner precedence** + (CONTEXT-1 §5.2). When a Message carries multiple + component-identity keys (skill_id, pipeline_id, the six + `_transformer_ids`) from a derivation chain that + crossed component boundaries, the orchestrator picks the + owner by lifecycle position: the latest stage to run is + the most specific. + +### 4.7 Transformer plugins (TRANSFORM-1) + +- **Spec'd as an architectural pattern, not a feature + list.** An orchestrator MAY implement chains at any + subset of six injection points (audio, utterance, + metadata, intent, dialog, TTS); a null-implementation is + conformant. For each chain it does implement, the + per-type contract binds. Each injection point's + existence is justified by what the lifecycle holds at + that exact moment — what's possible there that isn't + possible elsewhere. +- **Intent transformers as the system-typing home.** + INTENT-1 §5.3 defers slot value typing pending a text + normalization specification. TRANSFORM-1 §3.4 is the + spec'd injection home for typing: a deployer ships + date / number / duration parsing once, and every skill + receives typed values in `Match.slots` regardless of + which engine matched. The OVOS analogue of ASK's + `AMAZON.DATE` and Dialogflow's `@sys.date-time`, but as + an injected enrichment rather than a built-in engine + feature. +- **Concrete in-tree plugins as prior art.** Nine plugins + live under `/plugins-transformer/` today, covering five + of the six injection points: utterance transformers + (`ovos-utterance-normalizer`, + `ovos-utterance-corrections-plugin`, + `ovos-transcription-validator-plugin`, + `ovos-utterance-plugin-cancel`, + `ovos-bidirectional-translation-plugin`); dialog + transformers (`ovos-dialog-normalizer-plugin`, + `ovos-bidirectional-translation-plugin`, + `ovos-dialog-transformer-openai-plugin`); audio + transformers + (`ovos-audio-transformer-plugin-speechbrain-langdetect`, + `ovos-audio-transformer-plugin-ggwave`, + `ovos-audio-transformer-redis-publish`); intent + transformers (`ovos-keyword-template-matcher`, + `ovos-ahocorasick-ner-plugin`). The + `bidirectional-translation` plugin exercises the + cross-chain coordination via `Message.context` that + TRANSFORM-1 §7 formalizes. +- **Ascending priority.** TRANSFORM-1 §4 specifies + ascending priority (lower = earlier, default 50). + Current OVOS sorts transformer chains **descending** + (`ovos_core/transformers.py:53,117,205`, `reverse=True`); + the spec aligns with the **ascending** convention + already used by fallback skills (`fallback_service.py:49`, + default 101 = run last) and the natural "stages count + up" reading. Bringing current plugins into conformance + only requires flipping relative priorities, not + rewriting. +- **Cancellation aligned with prior plugin convention.** + Two existing utterance transformers + (`ovos-utterance-plugin-cancel`, + `ovos-transcription-validator-plugin`) already signal + the lifecycle should abort by returning empty utterance + lists with `{canceled: true, cancel_word: }` + context keys. TRANSFORM-1 §8 keeps the convention, + renaming `cancel_word` to `cancel_reason` (the structured + concept the field encodes) and adding orchestrator-stamped + `cancel_by: `. The spec's + `ovos.utterance.cancelled` terminal event sits alongside + `ovos.intent.unmatched`, keeping cancellation and + failure observably distinct on the bus. +- **`lang` parameter is bidirectional** (TRANSFORM-1 §3.0). + Four of the six per-type contracts (audio, utterance, + dialog, TTS) take `lang` as input and return it as + output. A bidirectional-translation transformer that + takes Spanish in and produces English out returns the + destination language; the orchestrator writes the + chain's final `lang` back into `Message.data.lang` for + downstream stages. Language-detector and clearing cases + fall out of the same channel. +- **Per-type self-identification keys, list-valued.** + TRANSFORM-1 §1.3 claims six `Message.context` keys + (one per transformer type) rather than a single generic + key. Role matters: a Message may have been touched by + multiple types in sequence, and a multi-type plugin + (e.g., both utterance and dialog) would be ambiguous + in a single-key model. Keys are lists because + transformers chain — the full per-type chain is + preserved in order. +- **Per-type denylists complete the policy surface.** + TRANSFORM-1 §5.2 claims six + `blacklisted__transformers` session fields, + paralleling the six `_transformers` chain-ordering + fields of §5.1 and the + `pipeline` / `blacklisted_pipelines` pair of PIPELINE-1 + §5. Three-stage composition (preference → availability + → policy) in §5.3 mirrors PIPELINE-1 §5.5 exactly. +- **The per-type "explosion" is deliberate.** Twelve flat + session fields (six chain-orderings + six denylists) plus + six `Message.context` attribution keys. A prefix-encoded + single namespace would require prefix parsing at every + lookup; the per-type partition matches the existing + registry and chain-ordering structure. Under + SESSION-1 §3.4's SHOULD-omit rule the common case carries + zero of these on the wire. +- **Language signals live in SESSION-1.** Language signals + (`stt_lang`, `request_lang`, `detected_lang`, alongside + `lang`, `secondary_langs`, `output_lang`) are + session-scoped fields with normative meanings but a + non-binding consolidation order — the right priority is + stage-dependent. TRANSFORM-1 §7.1 names which + transformer types are natural producers of which + signals; consolidation is the consumer's decision per + SESSION-1 §3.2.7. -These specifications are *prescriptive*. Some of what they prescribe -matches what runs in OVOS today verbatim; some is a deliberate -cleanup the implementations are expected to grow into. This section -catalogues every known divergence so implementers know what to -migrate and reviewers know what to expect. (OVOS-MSG-1 is by far the -spec closest to current code; the catalogue below is correspondingly -short. Later specs will add more entries.) +--- + +## 5. Where the specs differ from current OVOS code -### 6.1 Already aligned +These specifications are *prescriptive*. Some of what they +prescribe matches what runs in OVOS today verbatim; some is a +deliberate cleanup the implementations are expected to grow +into. This section catalogues every known divergence so +implementers know what to migrate and reviewers know what to +expect. -The following are formalizations of behaviour that already exists in -current OVOS code paths and need no implementation change: +### 5.1 Already aligned + +Formalizations of behaviour that exists in current OVOS code +and needs no implementation change: - The Message envelope (`type` / `data` / `context`) — matches `ovos-bus-client.Message`. -- `source`, `destination` semantics, including the +- `source`, `destination` semantics including the `Message.reply` swap — matches `ovos-bus-client/message.py`. - `context.session` as a serialized Session object — matches - `ovos-bus-client/client/client.py`'s `message.context["session"] = - sess.serialize()`. -- `session.session_id == "default"` for device-local origin — matches - `ovos-audio/utils.py`'s `require_default_session` decorator. -- `session.lang` as the user's preferred language — matches the - Session class's `lang` attribute and existing OVOS read paths. -- `forward` / `reply` / `response` derivation semantics — matches - `ovos-bus-client.Message.{forward,reply,response}`. -- The `.response` suffix convention — pervasive across OVOS topics - today. - -### 6.2 New, no legacy - -The only thing OVOS-MSG-1 introduces that has no direct precedent in -current code: - -- The **materialize-default-session** rule on `forward` / `reply` / - `response` (MSG-1 §4.3) — formalizes a "MAY" convenience for - in-process subsystems; not currently implemented, but compatible - with current behaviour (today `session` is propagated only when - present, never materialized). - -### 6.3 Things the spec does *not* change - -- The session object's internal shape beyond `session_id` and `lang` - — every other field current OVOS puts inside `context.session` - remains opaque under this spec until the future session - specification. -- The Mycroft-era `mycroft.*` topic prefix outside the intent layer - (e.g. `mycroft.audio.*`) — these are not part of any spec here and - are out of scope. - ---- - -## 7. Known gaps and planned work - -- **A bus-level intent registration and dispatch spec.** OVOS-MSG-1 - defines the envelope and the routing/session keys, but the - *concrete topics* for intent registration, match notification, - handler dispatch, and the handler-lifecycle messages - (`mycroft.skill.handler.{start,complete,error}` etc.) are still - informal. The natural next bus spec is OVOS-INTENT-4, which builds - on OVOS-MSG-1 + OVOS-INTENT-3. -- **A pipeline specification.** Stage ordering, the confidence-tier - model, and the contracts for `converse`, `fallback`, - `common_query`, `ocp`, and `persona` stages are unspecified (§3). -- **A session specification.** MSG-1 §4 carries `session` opaquely - and names only `session_id` and `lang`. Everything else about the - session is deferred — see §5.2 for the explicit list: session - lifecycle (start, end, expiry, resumption), the full set of - session preferences current OVOS already carries (`pipeline`, - `site_id`, `persona_id`, `time_format`, `date_format`, - `system_unit`, `tts_preferences`, …), and the shape of any - conversational state. The future session specification will pick - these up; MSG-1's job is to make sure the carrier is in place. -- **A multi-turn conversation specification.** When a skill asks a - question and waits for the next utterance, the "next utterance - belongs to that pending question" link is not formalized today - (handled informally by `converse` + skill-side state). MSG-1's - async-by-default stance (§5.2) leaves room for this to be - formalized either in the session spec or as a separate one. -- **Intent context.** Adapt's `add_context` / `remove_context` - feature — where one intent's match influences a later intent's - eligibility — is not formalized at the spec level. See §5.2. -- **Text normalization of ASR output.** The basis for slot value - typing (OVOS-INTENT-1 §5.3). Deferred to its own specification. -- **A machine-checkable conformance corpus** of `template → sample - set` pairs for OVOS-INTENT-1 expansion, so expander conformance - can be verified automatically. A parallel corpus of bus-message - fixtures for MSG-1 would be the equivalent at the bus layer. -- **An end-to-end worked example.** The specs have local examples; - none shows a single skill defining one keyword intent and one - template intent through the whole path — files, registration, - match, handler. -- **i18n corpus.** OVOS-INTENT-2 defines the locale file format, and - ovos-localize (§8) provides the operations layer; what remains is - the *scale* of the translated corpus. + `ovos-bus-client/client/client.py`'s + `message.context["session"] = sess.serialize()`. +- `session.session_id == "default"` for device-local origin — + matches `ovos-audio/utils.py`'s `require_default_session` + decorator. +- `session.lang` as the user's preferred language — matches + the Session class's `lang` attribute. +- `forward` / `reply` / `response` derivation semantics — + matches `ovos-bus-client.Message.{forward,reply,response}`. +- The `.response` suffix convention — pervasive across OVOS + topics today. +- `ovos.utterance.cancelled` and `ovos.utterance.handled` + (PIPELINE-1) — match current topic names verbatim. +- Per-utterance first-match-wins iteration (PIPELINE-1) — + matches `ovos-core/intent_services/service.py`'s + `handle_utterance` / `get_pipeline`. +- Per-session pipeline configuration (PIPELINE-1) — matches + `Session.pipeline`. +- The `:` dispatch topic shape + (PIPELINE-1) — matches current OVOS practice; skills + already subscribe to these topics. + +### 5.2 Prescriptive renames + +| Spec | Current | Prescribed | Notes | +|------|---------|------------|-------| +| INTENT-3 v1.1 | "host" | "orchestrator" | Editorial; conformance unchanged. | +| PIPELINE-1 | `mycroft.skill.handler.start` / `.complete` / `.error` | `ovos.intent.handler.start` / `.complete` / `.error` | Renamed into the `ovos.intent.*` namespace for uniformity. Breaks every existing handler-lifecycle observer; the migration cost is real. | +| PIPELINE-1 | `recognizer_loop:utterance` | `ovos.utterance.handle` | See §5.4 entry. Migration touches `ovos-dinkum-listener`, `ovos-simple-listener`, `ovos-audio`, and `ovos-core/intent_services/service.py`. | +| PIPELINE-1 | `complete_intent_failure` | `ovos.intent.unmatched` | Follows `ovos.intent.*` namespace; pairs with `ovos.intent.matched`. | + +### 5.2.1 Topics to remove from ovos-core + +The following topics exist in current ovos-core but are **not +defined by any spec** and should be removed or replaced: + +- **`ovos.session.sync` / `ovos.session.update_default`** — + emitted by `SessionManager` to broadcast the current default + session to interested components. SESSION-2 §6.4 acknowledges + that an orchestrator MAY emit default-session state on a + deployer-defined topic but assigns no normative name. These + ad-hoc topics should be retired: any component that needs the + default-session state can subscribe to `ovos.utterance.handled` + (PIPELINE-1 §9.5) and read the session it carries, or listen + to any other assistant-emitted Message on the default session. + A named sync topic adds an implicit state-broadcast contract + that the specs deliberately avoid; clients are expected to + track session from Message flow, not from dedicated sync + broadcasts. + +### 5.3 Prescriptive shape changes + +- **Keyword intent registration is atomic** (INTENT-4 §5). + Today a keyword intent is built up via multiple + `register_vocab` messages followed by a `register_intent` + with an Adapt `IntentBuilder.__dict__` payload. INTENT-4 + collapses this into a single message with structured + `{required, optional, one_of, excluded}` arrays of + vocabulary descriptors. Every skill's keyword-intent path + needs to be rewritten in the workshop layer. +- **Template intent registration uses structured identity** + (INTENT-4 §6). Today `padatious:register_intent` carries + `{name, samples, file_name, lang, blacklisted_words}`; the + prescribed shape uses the structured `(skill_id, + intent_name, lang)` triple plus `samples|file` and + `blacklist|blacklist_file`. +- **Dispatch payload is minimal** (PIPELINE-1 §7.1). Today + dispatch carries `skill_id` and `intent_name` in the + payload. PIPELINE-1 drops both from the payload — they + are already in the topic (`:`); + a consumer that needs them splits the topic. The + prescribed payload is `{lang, utterance, slots}`. + For plugin-bundled handlers (`pipeline_id == skill_id`), + the same uniform dispatch applies. +- **Handler-lifecycle payload updated** (PIPELINE-1 §8.2). + Today the trio payload is `{name: }`. + Prescribed: `{skill_id, intent_name, optional exception}`. + +### 5.4 Architectural divergences + +- **The orchestrator maintains a passive registration index** + (INTENT-4 §10). Today there is no central index — each + plugin knows what it consumed; nothing aggregates that + view. INTENT-4 prescribes the orchestrator subscribe to + all registration topics in parallel with plugins and serve + `ovos.intent.list` / `ovos.intent.describe` from the + passive view. This is a new orchestrator responsibility, + not a change to existing behaviour. +- **The match contract is the single obligation** (PIPELINE-1 + §4.2). The plugin's `match` operation has one MUST: return + a `Match` or `null`. Bus emissions during `match` are + allowed — converse plugins, LLM-backed matchers, and + agent-backed shapes are all conformant. Session mutation + during `match` goes via `Match.updated_session` so + declined matches' mutations never escape. +- **`Match.updated_session` as the match-phase session channel** + (PIPELINE-1 §4.1, §4.2). Promotes the existing ovos-core + code pattern + `sess = match.updated_session or SessionManager.get(message)` + to a normative Match field. The plugin that produces a + claiming match composes any session mutations it needs + (decrementing a response-mode counter, pre-promoting an + active-handler to the head, setting intent_context + alongside the match) into a fresh snapshot returned in + `Match.updated_session`. The orchestrator uses that + snapshot for the dispatch and every downstream stage; a + declined-match (plugin returns `null`) drops the snapshot + at the plugin boundary. This is what makes match-phase + mutation safe under §6.2 first-match-wins iteration. +- **`ovos.utterance.handled` on every terminal path** + (PIPELINE-1 §9.5). Current `ovos-workshop`'s + `_on_event_error` does not emit it on the handler-error + path (`ovos.py:1478-1497`). PIPELINE-1 §8 places trio + emission on the orchestrator-wrapper around the handler, + not on the handler itself — workshop is the wrapper in + current OVOS, and the spec contract requires the wrapper + to emit `ovos.utterance.handled` unconditionally. +- **Handler-trio is orchestrator-owned** (PIPELINE-1 §8). + The orchestrator that invokes the handler wraps the call + and emits `ovos.intent.handler.start` / `.complete` / + `.error` around it. Third-party handler code carries **no + normative obligation** to participate in trio emission. + Skill authors are not protocol authors; the wrapper + observes start / return / exception around an opaque + callable. +- **Per-pipeline_id intent introspection** (PIPELINE-1 §10). + Pull-query / scatter-response surface keyed on + `pipeline_id`, giving consumers visibility into *which + intents a particular pipeline plugin's matcher has + compiled*, distinct from the orchestrator's manifest of + declared intents (INTENT-4 §10). No current OVOS analogue. +- **CONTEXT-1 scope and ownership encoded in the key shape** + (CONTEXT-1 §2, §3). A bare key `Person` is shared; a + prefixed key `music.skill:Person` is private to + `music.skill`. The `:` is load-bearing — mirroring the + `:` dispatch topic. Drops separate + `scope` and `origin` fields on stored entries (both were + redundant with the key shape). `requires_context` and + `excludes_context` declarations take an OPTIONAL + `scope: private|shared` discriminator (default `private`) + to express which lookup the gate uses; bare-string + declarations default to private to prevent shared-leak. +- **Skill self-identification on every emission** (INTENT-4 + §3.1). Current OVOS skills set `context.skill_id` on some + emissions but not uniformly. Enforcement is structural on + the dispatch path: the orchestrator stamps + `context.skill_id` from the `:` + dispatch topic prefix, and skill emissions via + `forward`/`reply` inherit automatically. Loader-side + interception covers off-dispatch emissions. +- **Entry-point topic renamed `ovos.utterance.handle`** + (PIPELINE-1 §9.1). `recognizer_loop:utterance` fails + MSG-1 §2.1.2 naming conventions: `:` as a segment + separator, an implementation-role prefix, and no pairing + with the terminal `ovos.utterance.handled`. Migration cost + is real — every audio-input service and intent-service + handler is affected. A transitional deployment MAY + subscribe to both names during migration. + +### 5.5 New topics with no direct precedent + +- **`ovos.intent.matched`** (PIPELINE-1 §9.2). The + positive-match broadcast notification. No current equivalent. +- **`ovos.intent.unmatched`** (PIPELINE-1 §9.4). Renamed from + `complete_intent_failure`; follows the `ovos.intent.*` + namespace for symmetry with `ovos.intent.matched`. +- **`ovos.utterance.speak`** (PIPELINE-1 §9.6). The NL output + exit point; symmetric to `ovos.utterance.handle`. No current + equivalent — TTS trigger is currently implicit. +- **`ovos.intent.list` / `ovos.intent.describe`** (INTENT-4 + §10). Introspection topics served from the orchestrator's + passive registration index. +- **`ovos.context.set` / `.unset` / `.clear` / `.list`** + (CONTEXT-1 §5). Skill-facing API replacing Adapt-specific + `add_context` / `remove_context` plus + `mycroft.skill.set_cross_context`. +- **`ovos.transformer.{type}.list`** (TRANSFORM-1 §6). + Per-type introspection of loaded transformers. +- **Materialize-default-session rule** on `forward` / + `reply` / `response` (MSG-1 §4.3). Formalizes a "MAY" + convenience for in-process subsystems; not currently + implemented but compatible with current behaviour. + +### 5.6 Things the specs do *not* change + +- The session object's internal shape is owned by + OVOS-SESSION-1; the field set is the closed set defined + there plus whatever future specs claim via SESSION-1 §2.1. + The "extra" fields current OVOS Session carries + (`persona_id`, `system_unit`, `time_format`, `date_format`, + …) ride through as non-normative pass-through and may be + claimed by future per-domain specs. +- The `mycroft.*` topic prefix outside the intent layer (e.g. + `mycroft.audio.*`) — these are not part of any spec here. +- The `:` dispatch topic — kept + verbatim from current OVOS so no skill needs to migrate + its handler subscription. +- **Engine-specific introspection topics.** The standard + plugins expose their own debug / inspection topics — for + example `intent.service.adapt.reply`, + `intent.service.adapt.manifest`, + `intent.service.adapt.vocab.manifest`, and + `intent.service.padatious.get`. These are plugin-specific + surface, parallel to the spec's generic + `ovos.intent.list` / `ovos.intent.describe` (INTENT-4 + §10). The specs do not claim authority over them — they + remain plugin-defined and may continue to coexist with + the orchestrator's generic index. + +### 5.7 Predecessor-topic mapping + +The bus topics formalized by INTENT-4 and PIPELINE-1 replace +a number of legacy names. Implementer migration aid: + +#### Registration topics (INTENT-4) + +| Legacy topic | v1 replacement | Notes | +|--------------|---------------|-------| +| `register_vocab` | folded into `ovos.intent.register.keyword` | Vocabularies in v1 are inline `samples` or `file`-by-path inside the registration. | +| `register_intent` (Adapt parser) | `ovos.intent.register.keyword` | Adapt's `IntentBuilder.__dict__` payload replaced by the structured shape. | +| `padatious:register_intent` | `ovos.intent.register.template` | Same content, structured payload. | +| `padatious:register_entity` | `ovos.entity.register` | Entities are not Padatious-specific. | +| `detach_intent` | `ovos.intent.deregister` | Identity now expressed as the structured triple, not the munged `skill_id:intent_name` string. | +| `detach_skill` | `ovos.skill.deregister` | | +| `mycroft.skill.enable_intent` / `mycroft.skill.disable_intent` | `ovos.intent.enable` / `ovos.intent.disable` | First-class topics under v1, with the prefix dropped. | + +#### Utterance-lifecycle topics (PIPELINE-1) + +| Legacy topic | Status | +|--------------|--------| +| `recognizer_loop:utterance` | renamed to `ovos.utterance.handle` (see §5.4) | +| `complete_intent_failure` | renamed to `ovos.intent.unmatched` — follows `ovos.intent.*` namespace. | +| `ovos.utterance.cancelled` | **unchanged** — kept as the cancellation signal. | +| `ovos.utterance.handled` | **unchanged** — kept as the universal end-marker. | +| `:` | **unchanged** — dispatch topic; a plugin-bundled handler has `skill_id == pipeline_id`. | +| `mycroft.skill.handler.start` / `.complete` / `.error` | renamed to `ovos.intent.handler.start` / `.complete` / `.error` | + +#### Out of scope + +| Legacy topic | Status | +|--------------|--------| +| `add_context` / `remove_context` | Replaced by `ovos.context.set` / `.unset` under CONTEXT-1. | +| `mycroft.skill.set_cross_context` / `remove_cross_context` | Replaced by `ovos.context.set` / `.unset` with `scope: "shared"` under CONTEXT-1. | +| `.activate` | Activity-tracking emit currently in `ovos-core`; not part of any spec here. | --- -## 8. Ecosystem tooling: ovos-localize - -The specifications define formats and contracts; turning those into a working -i18n operation takes tooling. **ovos-localize** is that layer — a GitHub-native -localization platform for OVOS skills, built specifically around the resource -roles of OVOS-INTENT-2. - -It scans skill repositories for locale files; analyzes each skill's Python -source (via AST) to recover the **handler context** of a resource — which -function uses a file, what its slots mean, what dialog it triggers, which is -exactly the intent↔handler binding of OVOS-INTENT-3 §1; validates translations -against a rule set (slot preservation, expansion validity, variant counts); and -lets translators browse, edit, preview, and submit translations as pull -requests. It also exports a unified intent/dialog/vocabulary dataset. - -ovos-localize is the OVOS counterpart to Home Assistant's managed -`intents` repository. Two honest notes: it is currently -**descriptive** of real OVOS skills — it also handles legacy file -types these specs deliberately drop — so as the specs and the -ecosystem converge, its file-type coverage and the specs will need to -meet in the middle; and its translation validators are a natural home -for spec conformance checks, distinct from but related to the planned -grammar-level conformance corpus (§7). +## 6. Implementer reference + +Material an implementer reaches for repeatedly: cross-spec +tables that don't fit cleanly in any single normative spec. + +### 6.1 Topic-name conventions across the family + +The naming conventions of OVOS-MSG-1 v2 §2.1.2 — dot-separated +hierarchy, stable root, verb-tense pattern for the trailing +segment, request/terminal pairs sharing a root verb, +`.response` suffix, per-instance +`...` form — apply across the family. +The four-way collision of the word "intent" in introspection +topics deserves an explicit callout: + +- `ovos.intent.list` (INTENT-4 §10) — list of registered + *intents* (skills declare them; `data` entries name + `intent_name`). +- `ovos.pipeline..intents.list` (PIPELINE-1 + §10) — list of *intents currently compiled by one plugin's + matcher* (`data` entries name `intent_name`). +- `ovos.transformer.intent.list` (TRANSFORM-1 §6) — list of + *intent-transformer plugins* loaded at the intent-transformer + injection point (`data` entries name `transformer_id`). + Despite the topic shape, this is **not** an intent-listing + surface; it follows the per-chain pattern + `ovos.transformer..list` where `` happens to + be `intent` for this chain (alongside `audio`, `utterance`, + `metadata`, `dialog`, `tts`). + +The collision is at the human-reading level only; payload +shapes are distinct and a consumer subscribing to one cannot +accidentally parse responses from another. + +### 6.2 Session-field cheat-sheet + +Every spec in the family that claims a `session` field does +so via the OVOS-SESSION-1 §2.1 registry mechanism. The full +set spans four specs; this table consolidates them. All +fields follow the canonical SHOULD-omit / +`[]`-equivalent-to-omission wire-weight rule of +OVOS-SESSION-1 §3.4. + +| Field | Owner | Role | Empty-array semantics | +|-------|-------|------|------------------------| +| `session_id` | SESSION-1 §3.1 | identity / channel | n/a (string; `"default"` reserved) | +| `lang` | SESSION-1 §3.2.1 | preference (user) | n/a (string) | +| `secondary_langs` | SESSION-1 §3.2.2 | preference (user) | ≡ absent | +| `output_lang` | SESSION-1 §3.2.3 | preference (renderer) | n/a (string) | +| `stt_lang` | SESSION-1 §3.2.4 | signal (per-utterance) | n/a (string) | +| `request_lang` | SESSION-1 §3.2.5 | signal (emitter hint) | n/a (string) | +| `detected_lang` | SESSION-1 §3.2.6 | signal (lang-detect) | n/a (string) | +| `site_id` | SESSION-1 §3.3 | opaque group identifier | n/a (string) | +| `pipeline` | PIPELINE-1 §5.1 | preference (ordering) | ≡ absent | +| `blacklisted_pipelines` | PIPELINE-1 §5.2 | policy (denylist) | ≡ absent | +| `blacklisted_skills` | PIPELINE-1 §5.3 | policy (denylist) | ≡ absent | +| `blacklisted_intents` | PIPELINE-1 §5.4 | policy (denylist) | ≡ absent | +| `audio_transformers` | TRANSFORM-1 §5.1 | preference (chain) | ≡ absent | +| `utterance_transformers` | TRANSFORM-1 §5.1 | preference (chain) | ≡ absent | +| `metadata_transformers` | TRANSFORM-1 §5.1 | preference (chain) | ≡ absent | +| `intent_transformers` | TRANSFORM-1 §5.1 | preference (chain) | ≡ absent | +| `dialog_transformers` | TRANSFORM-1 §5.1 | preference (chain) | ≡ absent | +| `tts_transformers` | TRANSFORM-1 §5.1 | preference (chain) | ≡ absent | +| `blacklisted_audio_transformers` | TRANSFORM-1 §5.2 | policy (denylist) | ≡ absent | +| `blacklisted_utterance_transformers` | TRANSFORM-1 §5.2 | policy (denylist) | ≡ absent | +| `blacklisted_metadata_transformers` | TRANSFORM-1 §5.2 | policy (denylist) | ≡ absent | +| `blacklisted_intent_transformers` | TRANSFORM-1 §5.2 | policy (denylist) | ≡ absent | +| `blacklisted_dialog_transformers` | TRANSFORM-1 §5.2 | policy (denylist) | ≡ absent | +| `blacklisted_tts_transformers` | TRANSFORM-1 §5.2 | policy (denylist) | ≡ absent | +| `intent_context` | CONTEXT-1 §2 | per-session state | object; absent ≡ empty | + +**Role glossary:** + +- *Preference* — populated by the session origin to request + specific behaviour. Orchestrator narrows the request by + availability and policy. +- *Policy* — populated by deployment / layer-2 substrate to + enforce constraints. Overrides preference at the + composition stage (PIPELINE-1 §5.5, TRANSFORM-1 §5.3). +- *Signal* — recorded by a producer or earlier lifecycle + stage to communicate information about this specific + utterance. +- *Identity / channel* — names the session itself; not a + preference or policy knob. + +### 6.3 Component-identity stamp-rule cheat-sheet + +Each component type self-identifies via a reserved context +key. The keys coexist freely on a single Message when the +derivation chain crosses component boundaries; attribution +consumers apply the eight-level lifecycle-position precedence +of CONTEXT-1 §5.2 to pick a single owner when needed. + +| Context key | Owner | Stamps on (origination + modify-in-place) | `.reply` / `.response` | `.forward` | +|-------------|-------|------|----------|--------| +| `skill_id` | INTENT-4 §3.1 | yes | yes (authorial — overwrite) | no (preserve inherited) | +| `pipeline_id` | PIPELINE-1 §3.1 | yes | yes (authorial — overwrite) | no (preserve inherited) | +| six `_transformer_ids` (list-valued) | TRANSFORM-1 §1.3 | yes (append) | yes (append) | no (list rides through) | + +The `_transformer_ids` list-valued form preserves the +full per-type chain provenance on the wire (every transformer +of that type that touched the Message, in order of touch). +Single-string `skill_id` / `pipeline_id` reflect that those +component types *originate* Messages rather than chain over +them. + +### 6.4 Introspection patterns + +Four specs in this set define pull-query / scatter-response +introspection surfaces. The shapes are intentionally similar +but serve different scopes: + +| Spec | Topic | Scope | Authoritative responder | +|------|-------|-------|-------------------------| +| INTENT-4 §10 | `ovos.intent.list` / `.describe` | Declared intents observed on the bus | Orchestrator (the manifest) | +| PIPELINE-1 §10 | `ovos.pipeline..intents.list` | Intents currently compiled inside a specific plugin's matcher | The pipeline plugin | +| CONTEXT-1 §5.4 | `ovos.context.list` | Post-decay session-context snapshot | The orchestrator process owning the match round | +| TRANSFORM-1 §6 | `ovos.transformer..list` | Loaded transformers per injection point | The orchestrator process implementing that chain | + +Three properties hold across all four: + +1. **Pull-query is the source of truth.** Producers MAY + broadcast load-time announcements; consumers MUST NOT + rely on having received them. The bus is asynchronous + and gives no delivery guarantee; a consumer that started + late missed the broadcast. +2. **No completeness signal.** A consumer that wants + completeness keeps its own roster of expected responders + and times out non-responders. +3. **Per-process slices under split orchestrators.** When + the orchestrator is split (PIPELINE-1 §2), each process + responds from its own slice; consumers aggregate. + +All four surfaces share the `ovos..` prefix; verb +segments vary by domain (some nest, some don't). The +uniformity is in the namespace, not in a fixed depth. --- -## 9. Design history - -How the specification set was arrived at — context that explains -the *why*, but that has no place in a normative document. - -### 9.1 The set, in two stacks - -Built bottom-up in two stacks: - -- The **intent stack**, in dependency order: OVOS-INTENT-1 (template - grammar) → OVOS-INTENT-2 (resource files built on it) → - OVOS-INTENT-3 (the intent concept, built on both). -- The **bus stack**, anchored on existing `ovos-bus-client` wire - format: OVOS-MSG-1 formalizes the envelope, routing, session - carrier, and `forward`/`reply`/`response` derivations. - Originally drafted as two specs (envelope + session/routing) and - merged once it became clear the derivations could only - meaningfully be defined where the routing keys lived. - -Each was a formalization pass over machinery already running in -production (§1), not a greenfield design. - -### 9.2 The reference implementation - -The specs are implementation-agnostic, but a spec benefits from -one conformant implementation. **ovos-spec-tools** is that for the -intent stack — expander, resource loader, dialog renderer, language -matching, locale linter, in one dependency-light package. It -exists because the same machinery had drifted across six separate -copies in the ecosystem; ovos-spec-tools is what those components -are meant to converge on, and the intended home of the planned -conformance corpus. - -The bus stack does not yet have a comparable reference; -`ovos-bus-client` is the closest match for MSG-1 but predates the -spec. - -### 9.3 Audit-driven refinement - -Before initial release, each spec was revised across several review -rounds — malformed-form rules, the expansion algorithm, slot -handling, the envelope/routing split (later un-split, see §9.1), -cross-spec terminology. Those rounds happened pre-release, so they -left no intermediate version numbers behind: the audited result -*is* version 1. The CHANGELOG records versioned changes from there -on. - ---- - -## 10. Compatibility levels +## 7. Known gaps and planned work -Each specification carries its own integer `Version`, bumped per -PR per the contributing rules in the README. The architecture as a -whole is also spoken of at **compatibility levels** — versioned -snapshots a tool may target, and that `ovos-spec-lint` checks -against. - -The levels defined to date apply to the **intent stack** -(OVOS-INTENT-1/2/3): - -- **V0** — *informal.* The undocumented, de-facto behaviour of - Mycroft- and OVOS-derived code from before these specifications - existed. V0 is not specified anywhere; it is the baseline the - formalization started from, named here only so tools can refer to - "pre-spec" behaviour. V0 has no notion of the `.blacklist` - resource role or of `` references. -- **V1** — the specifications as first formalized: OVOS-INTENT-1, - -2 and -3, each at version 1. V1's headline addition over V0 is - the `.blacklist` role — formalized intent suppression. -- **V2** — V1 plus **inline vocabulary references** (the `` - token): OVOS-INTENT-1 and OVOS-INTENT-2 at version 2. A V2 - template cannot be expanded by a V1 tool, so V2 is not backward - compatible with V1. - -A specification that does not change between levels keeps its -lower version number — OVOS-INTENT-3 is at version 1 in both V1 -and V2. - -### How the bus stack will be layered in - -OVOS-MSG-1 introduces the bus envelope, which is structurally -orthogonal to the intent stack — a tool can implement the intent -stack without the bus envelope and vice versa. As more bus-layer -specs land, the compatibility-level model is expected to evolve; -the current V0–V2 ladder may grow a second axis or be replaced -with per-stack ladders. - -Until that's settled, the bus-layer specs (OVOS-MSG-1 and the -others in the pipeline behind it) are versioned individually but -not yet placed on a compatibility ladder. +- **Per-plugin behavioural specs.** OVOS-PIPELINE-1 defines + the plugin contract (the `match` shape, the orchestrator's + iteration semantics) but explicitly defers what each + non-trivial plugin type actually *does*. Real candidates + for their own specifications: `converse`, `fallback`, + `common_query`, `ocp`, `persona`, `stop`. Each defines its + own internal behaviour and its own bus emissions beyond + the universal lifecycle PIPELINE-1 prescribes. +- **Session preference fields not yet claimed.** SESSION-1 + defines the wire shape and OVOS-SESSION-2 (in flight at + PR #27) defines the lifecycle and state-ownership model; + what remains deferred is the full set of session + preferences current OVOS already carries (`persona_id`, + `time_format`, `date_format`, `system_unit`, + `tts_preferences`, `location`, …) — these need to be + claimed under SESSION-1 §2.1's field registry by their + respective owning specs (a future preferences spec, + OCP / persona / locale specs as appropriate). +- **Text normalization of ASR output.** The basis for slot + value typing (INTENT-1 §5.3). Deferred to its own + specification. +- **A machine-checkable conformance corpus** of `template → + sample set` pairs for INTENT-1 expansion, so expander + conformance can be verified automatically. A parallel + corpus of bus-message fixtures for MSG-1 would be the + equivalent at the bus layer. +- **An end-to-end worked example.** The specs have local + examples; none shows a single skill defining one keyword + intent and one template intent through the whole path — + files, registration, match, handler. +- **Conversation-level evaluation infrastructure.** Rasa + has story-based testing and end-to-end success metrics; + the OVOS specs do not currently have a counterpart. +- **OVOS-INTENT-2 ↔ hassil `intents` translation tool.** + The grammar lineage (§2.1) makes a mechanical translator + between OVOS-INTENT-2 locale resources and HA's `intents` + YAML feasible. Such a tool would let the two corpora + cross-pollinate without either format changing. Sits at + injection point 3 of §3.3 conceptually but is + build-time rather than runtime tooling. +- **i18n corpus.** OVOS-INTENT-2 defines the locale file + format, and `ovos-localize` (§1.4) provides the + operations layer; what remains is the *scale* of the + translated corpus.