Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1,491 changes: 20 additions & 1,471 deletions APPENDIX.md

Large diffs are not rendered by default.

9 changes: 5 additions & 4 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -121,11 +121,11 @@ picture — the tables above are an index.
- *Writing a skill?* INTENT-1 → INTENT-2 → INTENT-3. INTENT-4 only if you need the registration wire format.
- *Building a pipeline plugin?* PIPELINE-1, then SESSION-1 + SESSION-2, then the role spec (CONVERSE-1, CONTEXT-1, or TRANSFORM-1).
- *Building an orchestrator?* MSG-1 → SESSION-1 → SESSION-2 → PIPELINE-1, then INTENT-4, CONTEXT-1, CONVERSE-1, TRANSFORM-1.
- *Surveying the architecture?* [APPENDIX §1](APPENDIX.md) for the three-stack narrative.
- *Surveying the architecture?* [appendix/overview.md §1](appendix/overview.md) for the three-stack narrative.

For background — design rationale, comparisons with other systems,
the catalogue of known divergences from current code, and known
gaps — see [APPENDIX.md](APPENDIX.md). For term definitions, see
gaps — see [APPENDIX.md](APPENDIX.md) (index) or browse by topic under [appendix/](appendix/). For term definitions, see
[GLOSSARY.md](GLOSSARY.md). For the version history of each spec,
see [CHANGELOG.md](CHANGELOG.md).

Expand Down Expand Up @@ -163,8 +163,9 @@ implementations and conformance results can name the version they
target.

PRs that touch only the non-normative material —
[APPENDIX.md](APPENDIX.md), [GLOSSARY.md](GLOSSARY.md), this
README, examples — do not require a version bump.
[APPENDIX.md](APPENDIX.md) and [appendix/](appendix/) files,
[GLOSSARY.md](GLOSSARY.md), this README, examples — do not
require a version bump.

---

Expand Down
183 changes: 183 additions & 0 deletions appendix/comparisons.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,183 @@
---
[← APPENDIX.md](../APPENDIX.md) · Non-normative

> **⚠️ AI-generated draft — not yet fully reviewed.** This content
> was produced by a large language model (Claude Code) and
> has not yet been fully reviewed for accuracy, completeness, or
> consistency with the specifications. The normative specifications
> themselves are human-reviewed; this appendix is supplementary
> context. Readers should verify claims before relying on them.

## 2. Comparison with other voice-assistant systems

The OVOS specifications occupy territory adjacent to several
existing voice-assistant systems. This section locates the
design choices against each comparator. The summary in §2.5
records where the voice OS leads architecturally, where it
follows, and where it makes a deliberately different choice.

### 2.1 Home Assistant and Rhasspy — shared grammar lineage

OVOS, Home Assistant (HA), and Rhasspy share a common lineage.
The bracket-expansion grammar of OVOS-INTENT-1 — `(a|b)`
alternatives, `[optional]` segments, `{slot}` placeholders — is
the same family as HA's `hassil` sentence templates and
Rhasspy's `sentences.ini`. The *syntax* is not novel. What is
distinctive about the OVOS approach is everything around the
grammar.

**What OVOS does differently:**

- **An implementation-agnostic spec at all.** HA and Rhasspy
have no format-level specification independent of their
implementation — the code is the contract. OVOS now has one,
which is what lets multiple engines (and other assistants)
implement the same contract.
- **Engine-agnostic matching.** OVOS-INTENT-1 §4 treats
templates as *training data* and leaves matching, scoring,
and generalization to the engine. HA's core matching is
`hassil`, a deterministic template matcher; Rhasspy compiles
templates into a closed ASR grammar. The OVOS contract
accommodates a deterministic matcher, a neural classifier,
or an LLM behind one interface.
- **Templates are training data, not a closed grammar.** A
capable OVOS engine generalizes beyond the authored samples.
Rhasspy's closed-grammar model is deterministic and
offline-guaranteed but brittle — an utterance not derivable
from `sentences.ini` cannot be recognized at all.
- **A multi-stage pipeline** (§3.2). Intent engines are two
stage kinds among many. Neither HA nor Rhasspy exposes an
intent layer this structured.
- **An intent is bound to one handler, owned by one skill**
(OVOS-INTENT-3 §1). See §2.2 — this follows necessarily from
the open skill ecosystem.
- **A bus substrate openable to layer-2 systems** (§3.1).
Neither HA nor Rhasspy exposes their bus this openly.

**What HA and Rhasspy do better:**

- **Reusable template fragments.** `hassil` has
`expansion_rules` and Rhasspy has `<rule>` references —
named, reusable sub-templates that let authors share common
fragments (politeness prefixes, articles, recurring
phrasings). OVOS-INTENT-1 version 2 closes this with the
`<name>` inline vocabulary reference, which expands a named
`.voc` in place — reusing the existing slot-free format
rather than adding a new construct.
- **i18n corpus maturity.** HA's community `intents`
repository is a large, managed, professionally-translated
corpus covering many languages. OVOS has the tooling
counterpart in `ovos-localize` (§1.4) — so the gap here is
the *scale and maturity* of the corpus, not the absence of
tooling.
- **Concrete, testable completeness.** HA and Rhasspy ship
systems where the hard parts — matching, number and range
handling, slot typing — are solved concretely. The OVOS
specs deliberately defer some of these (slot typing to a
future normalization spec; matching to the engine). That
deferral is intellectually consistent but means the specs'
value depends on the engines and tooling that fill the gaps.

### 2.2 Closed domain vs open ecosystem

The sharpest difference between OVOS and HA is not technical
but structural. **Home Assistant is a curated, closed domain**:
home automation, with a vendor-managed intent vocabulary. HA
can treat an intent such as `HassTurnOn` as a *shared contract*
honoured uniformly across hundreds of integrations and many
languages, because HA controls and curates that vocabulary.

**OVOS is an open ecosystem.** Skills are arbitrary third-party
Python packages, installed by pip, developed independently,
running as arbitrary code in process. A skill can do anything;
OVOS voice-enables anything. In that setting a shared global
intent vocabulary is not a missing feature — it is incoherent.
When skills are unbounded, an intent *must* be private to the
skill that defines it and bound directly to that skill's
handler. OVOS-INTENT-3's "an intent is not an event" stance is
therefore the correct model for an open ecosystem, just as HA's
shared-vocabulary model is correct for a curated one. The two
models are right for different platforms; neither is
universally better.

### 2.3 Rasa — closest comparator for intent context

Rasa's "active forms" and slot mappings perform context-aware
matching, but they are baked into the policy engine; you
cannot run a Rasa NLU pipeline without Rasa policies.
OVOS-CONTEXT-1 separates **gating** (`requires_context` /
`excludes_context`, §6 / §6.1 of that spec) from **match-time
capture** (the context-supplied capture rule, §7) from **engine
matching hints** (engine-internal use of values, §6), so every
intent engine that consumes OVOS-INTENT-3 registrations can
gate uniformly without buying into a particular dialog policy.

Rasa wins on conversation-level evaluation infrastructure —
story-based testing, end-to-end success metrics — for which
the OVOS specs have no analogue yet (§7 catalogues this as a
known gap).

Rasa's NLU pipeline is also the closest analogue to
OVOS-TRANSFORM-1's utterance / metadata / intent chains, but
it is a single sequence per language model and the
policy/preference split (TRANSFORM-1 §5.3) does not exist.
TRANSFORM-1's six-injection-point model is genuinely more
expressive.

### 2.4 Amazon ASK / Alexa Skills Kit, Google Dialogflow

Both are closed-domain centrally-trained stacks. Their
built-in entity-type systems (`AMAZON.DATE`,
`@sys.date-time`) are what OVOS-TRANSFORM-1 §3.4 replicates as
an *injectable, deployer-replaceable, engine-agnostic*
contract — at the spec level OVOS is strictly more flexible,
though OVOS defers the **typed value formats themselves**
(date encoding, number representation, duration units) to a
future text-normalization spec (§7), while ASK and Dialogflow
ship them as built-ins.

Neither ASK nor Dialogflow has a `session.pipeline`-equivalent
(the assistant picks one matcher per skill); neither has
anything like the layer-2 substrate of OVOS-MSG-1 §3.4. ASK
has built-in intents (`AMAZON.HelpIntent`) but they are
handled inside the skill; Dialogflow has fallback intents but
they do not have first-class dispatch identity. OVOS-PIPELINE-1's dispatch polymorphism
(`skill_id == pipeline_id` for plugin-bundled handlers) lets a
non-skill component advertise its own intent identity on the bus,
indistinguishable from a skill — original to this architecture.

### 2.5 Summary — where the voice OS leads, follows, and differs

**OVOS leads architecturally** in three places:

- **The pipeline-plugin model with first-class dispatch
polymorphism.** No comparator lets a non-skill component
(LLM persona, chatbot, fallback) be a first-class handler
owner on the same dispatch surface.
- **The six-injection-point transformer chain with per-session
preference/policy separation.** Nothing in HA, Rhasspy,
Rasa, ASK, or Dialogflow has a comparable lifecycle-uniform
extensibility surface.
- **Negative gating (`excludes_context` "match if absent")
in CONTEXT-1.** ASK/Dialogflow contexts are purely
positive; Rasa forms are not engine-agnostic; HA has no
context model. The fire-once and modal-suppression patterns
fall out of negative gating.

**OVOS follows** where ecosystem investment matters more than
architecture:

- HA's translation corpus scale (the `intents` repository).
- ASK / Dialogflow's typed entity systems.
- Rasa's conversation-level evaluation infrastructure.

**OVOS makes a deliberately different choice** in two places:

- *Engine-agnostic templates as training data* (OVOS-INTENT-1
§4) rather than Rhasspy-style closed grammars. The trade-off:
generalization beyond authored samples vs. offline-deterministic
recognition guarantees.
- *Open skill ecosystem with skill-private intents*
(OVOS-INTENT-3 §1) rather than HA-style curated vocabulary.
The trade-off: skill author freedom vs. cross-integration
vocabulary sharing.
Loading