Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
27 changes: 27 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -251,6 +251,33 @@ version 2: its `{{ … }}` sequences become substitution points, and its
- See-also — cross-references OVOS-AUDIO-1 §4.4 as the defining spec
for `ovos.mic.listen`.

## OVOS-COMMON-QUERY-1 — Common Query Pipeline Plugin

### 2

- Initial draft. Specifies the common query pipeline plugin: a
scatter-gather contest that answers factual questions by
broadcasting the utterance, collecting competing answers from
skills, ranking them, and speaking the best. Reserves the
`common_query` intent_name (PIPELINE-1 §7.3). The full contest runs
in the plugin's blocking `match` (a deliberate, documented exception
to PIPELINE-1 §4.4 latency discipline, since the answer is the claim
decision): a fast `ovos.common_query.ping`/`pong` poll filters
skills down to plausible answerers using only cheap local checks,
then `<skill_id>:common_query` requests full answers (where network
and DB I/O are expected) collected on `<skill_id>.common_query.response`.
Filtering and selection (minimum confidence, denylist, fast-win,
optional reranker) run against the live session; if no answer
survives, `match` returns `None` so the pipeline reaches fallback.
A surviving answer is carried in `Match.slots.answer` and spoken by
the plugin's trivial handler — skills never speak. Defines an
optional question gate (SHOULD, for latency) and an early-start
optimisation subscribing to `ovos.utterance.handle` to overlap the
contest with upstream pipeline stages, caching only raw responses
keyed by `(session_id, utterance)`. All poll/answer messages carry
the `utterance` as correlation key and derive via MSG-1 `reply`,
with the session in `context.session`. Tunable defaults and
confidence-range guidance are collected in appendices.
## OVOS-FALLBACK-1 — Fallback Pipeline Plugin

### 2
Expand Down
3 changes: 3 additions & 0 deletions GLOSSARY.md
Original file line number Diff line number Diff line change
Expand Up @@ -37,3 +37,6 @@ open a PR adding it.
| **Context** | The assistant-metadata object on a Message; an extensible JSON object whose keys are defined by companion specs ([MSG-1 §2.3](msg-1.md)). |
| **Session** | The per-conversation carrier in `context.session`; carries `session_id` (with `"default"` reserved for "originates from the device itself") and `lang` (the user's preferred language, distinct from any `data.lang` describing the payload's own language) ([MSG-1 §4](msg-1.md)). |
| **Listening lifecycle signal** | A payload-free bus signal the audio input service emits or consumes around voice-command capture and sleep mode — `ovos.listener.record.started` / `.record.ended`, `ovos.listener.sleep`, `ovos.listener.awoken` ([AUDIO-IN-1 §6](audio-in.md)). |
| **Common query** | A pipeline plugin that answers factual questions by holding a timed contest among skills — broadcast, collect competing answers, rank, speak the best ([COMMON-QUERY-1 §2](common-query.md)). |
| **Scatter-gather** | The contest pattern: one broadcast fans out to many skills (scatter), their answers are collected and ranked (gather) ([COMMON-QUERY-1 §2](common-query.md)). |
| **Wants-to-answer poll** | Common query's fast ping/pong phase — a cheap local filter where skills self-nominate before the expensive full-answer phase ([COMMON-QUERY-1 §6](common-query.md)). |
39 changes: 39 additions & 0 deletions appendix/comparisons.md
Original file line number Diff line number Diff line change
Expand Up @@ -181,3 +181,42 @@ architecture:
(OVOS-INTENT-3 §1) rather than HA-style curated vocabulary.
The trade-off: skill author freedom vs. cross-integration
vocabulary sharing.

### 2.6 Mycroft CommonQuerySkill — the direct ancestor

COMMON-QUERY-1's closest comparator is not another assistant but
OVOS's own lineage: Mycroft's `CommonQuerySkill` base class, from
which the scatter-gather question-answering pattern is inherited.
The shapes rhyme — broadcast a query, let skills self-nominate,
collect answers, speak the best — but the formalization diverges in
three ways worth recording.

**Two phases, different reason.** Mycroft's CommonQuery was also
two-phase (a query broadcast, then answer collection), but the split
was driven by **message-bus timeout management** — the framework
needed a bounded window to gather responses from skills that might
never reply. COMMON-QUERY-1 keeps a two-phase poll for a different,
sharper reason (§6): the ping is a *cheap local filter* that exists
to keep I/O-heavy skills from querying their backends on every
utterance. The window is incidental; the filtering is the point.

**Where the contest lives.** Mycroft ran the gather inside a skill
handler — common query was itself a skill. COMMON-QUERY-1 lifts it
into a pipeline plugin and runs the entire contest in `match`, so
the no-answer case returns `None` and the pipeline reaches fallback
(rationale §4.9). A skill-layer implementation cannot do this: by
the time a skill handler runs, the claim is already made and
fallback is foreclosed — the same layering argument that puts STOP-1
in the pipeline (rationale §4.8).

**Single speaker.** In COMMON-QUERY-1 the plugin is the only voice:
skills return answer *strings* and the plugin speaks the winner
(§10). This removes the ambiguity, present in the original, about
which component renders speech, and lets the plugin re-rank or
suppress answers without a skill having already spoken.

No mainstream closed stack (Alexa, Google) exposes a comparable
mechanism, because answer resolution there happens centrally in the
cloud rather than as an open contest among independently authored
local skills. The scatter-gather-over-a-bus shape is specific to the
open-ecosystem voice OS.
64 changes: 64 additions & 0 deletions appendix/rationale.md
Original file line number Diff line number Diff line change
Expand Up @@ -679,3 +679,67 @@ subscribed to `<own_skill_id>:stop`. The pipeline plugin matches
and selects; the skill stops. Stop is one of the few cases in
the spec set where the pipeline / skill split is not
substitutable.

### 4.10 Common query pipeline plugin (COMMON-QUERY-1)

Common query answers factual questions by holding a timed contest
among skills — broadcast the question, collect competing answers,
rank them, speak the best. Four of its design choices are
unusual enough to be worth recording, because each one trades
against an instinct a reader brings to the spec.

**`match` blocks, and that is deliberate.** PIPELINE-1 §4.4 tells
plugins to return from `match` quickly and defer expensive work to
the handler, because match-phase latency is response latency. Common
query openly violates that discipline, and it has to: the answer
*is* the claim decision. The plugin cannot return a `Match` and
collect afterwards, because whether it claims at all depends on
whether any skill produced an answer above threshold. Routing and
processing are the same act here, so both happen in `match`. This is
the one place the spec set says "yes, this matcher blocks for
seconds" — and it pays for that admission with the early-start
optimisation and explicit pipeline positioning, rather than
pretending the cost away.

**Returning `None` on no-answer is what keeps fallback alive.** The
earlier, discarded design had the plugin claim the utterance, then
discover during the handler that no skill could answer, and speak a
dead-end "I don't know." That permanently starves fallback: once a
plugin claims, first-match-wins means no later stage runs. Moving the
whole contest into `match` lets the plugin make an honest claim — it
returns a `Match` only when it actually has an answer, and `None`
otherwise — so a failed contest flows naturally to fallback. The
correctness of the whole pipeline tail depends on the contest
finishing before the claim is made.

**Ping/pong is a cheap filter gating an expensive operation, not
ceremony.** It would be simpler to broadcast the question once and
let skills answer directly. The two-phase poll earns its place
because the full-answer request invites real I/O — a knowledge skill
will hit Wikipedia, Wolfram, or a database. Without the cheap local
pong filter, every such skill performs that I/O for every question
that passes the gate, including ones far outside its domain. The
~500ms poll window buys the right to *not* hammer every backend on
every utterance. (Mycroft's original CommonQuerySkill was also
two-phase, but for a different reason — message-bus timeout
management; see comparisons §2.6.)

**Early start hides latency without shrinking the contest.** Because
`match` blocks, the plugin MAY begin the contest the instant the
utterance arrives (`ovos.utterance.handle`), running it in parallel
with the upstream stop/converse/intent stages that get first refusal
anyway. The subtle requirement is that the early-start cache holds
only *raw* responses — never a selected answer — and all filtering
and selection run at `match` time against the *live* session. That
keeps the optimisation transparent: an upstream stage that
blacklists a skill or changes session state still takes full effect,
because the denylist and confidence filters never saw the stale
snapshot.

The question gate (COMMON-QUERY-1 §4) is the other half of the
latency story: a cheap up-front classifier that rejects weather
requests, music commands, timers, and plain statements before any
broadcast. It is a SHOULD, not a MUST — the confidence filter
guarantees correctness without it — but on mixed traffic it is the
single largest latency win available, since it skips the entire
contest for utterances no knowledge skill would answer anyway.
Loading