Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
13 changes: 13 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,19 @@ status quo, `2` once it is not backwards compatible. Entries are grouped under
the spec's current class. Every pull request that alters normative content adds
an entry here.

## OVOS-AUDIO-1 — Audio Output Service

### 2

- The audio output service: the rendering pipeline (dialog-transformer
chain, TTS synthesis, TTS-transformer chain, playback queue), the
sequential playback queue shared by speech (`ovos.utterance.speak`) and
sound effects (`ovos.audio.queue` / `ovos.audio.play_sound`), the
remote-client rendering mode (`ovos.utterance.speak.b64` →
`ovos.audio.speech`), output lifecycle signals
(`ovos.audio.output.started` / `.ended`), the speaking-status query
(`ovos.audio.is_speaking`), stop integration (`ovos.audio.stop`,
`ovos.stop`), and the `listen`-triggered `ovos.mic.listen` follow-up.
## OVOS-PERSONA-1 — Persona Pipeline Plugin

### 2
Expand Down
15 changes: 15 additions & 0 deletions appendix/divergences.md
Original file line number Diff line number Diff line change
Expand Up @@ -195,6 +195,21 @@ defined by any spec** and should be removed or replaced:
- **`ovos.utterance.speak`** (PIPELINE-1 §9.6). The NL output
exit point; symmetric to `ovos.utterance.handle`. No current
equivalent — TTS trigger is currently implicit.
- **`ovos.utterance.speak.b64`** (AUDIO-1 §3.4). Variant of
`ovos.utterance.speak` for remote-client delivery: the audio
output service runs the same TTS pipeline but emits synthesised
audio as base64 via `ovos.audio.speech` instead of queuing for
local playback. Used by bridges serving satellites without TTS
(BRIDGE-1 §4.2.4).
- **`ovos.audio.speech`** (AUDIO-1 §4.3). Base64-encoded
synthesised audio broadcast; emitted in response to
`ovos.utterance.speak.b64`. Carries a `listen` flag. Remote
clients (e.g. satellites relayed by a bridge) decode and play
the audio themselves.
- **`ovos.audio.queue`** / **`ovos.audio.play_sound`** (AUDIO-1
§4.1, §4.2). Sound-effect playback topics. Payloads accept
either a `uri` or inline base64 `audio` field, enabling
cross-host audio delivery without shared filesystem access.
- **`ovos.intent.list` / `ovos.intent.describe`** (INTENT-4
§10). Introspection topics served from the orchestrator's
passive registration index.
Expand Down
19 changes: 19 additions & 0 deletions appendix/rationale.md
Original file line number Diff line number Diff line change
Expand Up @@ -680,6 +680,25 @@ and selects; the skill stops. Stop is one of the few cases in
the spec set where the pipeline / skill split is not
substitutable.


### 4.9 Audio output service (AUDIO-1)

**Sentence segmentation as a latency-reduction technique (AUDIO-1 §3.2).**
When a TTS engine synthesises a long utterance as a single unit, the
user must wait for the entire synthesis to complete before hearing
anything. An implementation can reduce perceived latency by splitting
the utterance at sentence boundaries, synthesising each sentence
independently, and enqueuing each segment as soon as it is ready —
so the first sentence begins playing while later sentences are still
being synthesised.

This is an internal implementation strategy: no other bus participant
observes whether the TTS engine segments or not. The visible contract
is unchanged — `ovos.audio.output.started` fires when the first
audio begins, `ovos.audio.output.ended` fires when the last audio
completes. The `listen` flag is honoured after all audio for the
originating utterance has played, regardless of how many internal
segments were used.
### 4.10 Common query pipeline plugin (COMMON-QUERY-1)

Common query answers factual questions by holding a timed contest
Expand Down
Loading