diff --git a/CHANGELOG.md b/CHANGELOG.md
index a187812..3b2658f 100644
--- a/CHANGELOG.md
+++ b/CHANGELOG.md
@@ -7,6 +7,19 @@ status quo, `2` once it is not backwards compatible. Entries are grouped under
 the spec's current class. Every pull request that alters normative content adds
 an entry here.
 
+## OVOS-AUDIO-1 — Audio Output Service
+
+### 2
+
+- The audio output service: the rendering pipeline (dialog-transformer
+  chain, TTS synthesis, TTS-transformer chain, playback queue), the
+  sequential playback queue shared by speech (`ovos.utterance.speak`) and
+  sound effects (`ovos.audio.queue` / `ovos.audio.play_sound`), the
+  remote-client rendering mode (`ovos.utterance.speak.b64` →
+  `ovos.audio.speech`), output lifecycle signals
+  (`ovos.audio.output.started` / `.ended`), the speaking-status query
+  (`ovos.audio.is_speaking`), stop integration (`ovos.audio.stop`,
+  `ovos.stop`), and the `listen`-triggered `ovos.mic.listen` follow-up.
 ## OVOS-PERSONA-1 — Persona Pipeline Plugin
 
 ### 2
diff --git a/appendix/divergences.md b/appendix/divergences.md
index 328aecf..33c7477 100644
--- a/appendix/divergences.md
+++ b/appendix/divergences.md
@@ -195,6 +195,21 @@ defined by any spec** and should be removed or replaced:
 - **`ovos.utterance.speak`** (PIPELINE-1 §9.6). The NL output
   exit point; symmetric to `ovos.utterance.handle`. No current
   equivalent — TTS trigger is currently implicit.
+- **`ovos.utterance.speak.b64`** (AUDIO-1 §3.4). Variant of
+  `ovos.utterance.speak` for remote-client delivery: the audio
+  output service runs the same TTS pipeline but emits synthesised
+  audio as base64 via `ovos.audio.speech` instead of queuing for
+  local playback. Used by bridges serving satellites without TTS
+  (BRIDGE-1 §4.2.4).
+- **`ovos.audio.speech`** (AUDIO-1 §4.3). Base64-encoded
+  synthesised audio broadcast; emitted in response to
+  `ovos.utterance.speak.b64`. Carries a `listen` flag. Remote
+  clients (e.g. satellites relayed by a bridge) decode and play
+  the audio themselves.
+- **`ovos.audio.queue`** / **`ovos.audio.play_sound`** (AUDIO-1
+  §4.1, §4.2). Sound-effect playback topics. Payloads accept
+  either a `uri` or inline base64 `audio` field, enabling
+  cross-host audio delivery without shared filesystem access.
 - **`ovos.intent.list` / `ovos.intent.describe`** (INTENT-4
   §10). Introspection topics served from the orchestrator's
   passive registration index.
diff --git a/appendix/rationale.md b/appendix/rationale.md
index 72bb31f..7d0aaa5 100644
--- a/appendix/rationale.md
+++ b/appendix/rationale.md
@@ -680,6 +680,25 @@ and selects; the skill stops. Stop is one of the few cases in
 the spec set where the pipeline / skill split is not
 substitutable.
 
+
+### 4.9 Audio output service (AUDIO-1)
+
+**Sentence segmentation as a latency-reduction technique (AUDIO-1 §3.2).**
+When a TTS engine synthesises a long utterance as a single unit, the
+user must wait for the entire synthesis to complete before hearing
+anything. An implementation can reduce perceived latency by splitting
+the utterance at sentence boundaries, synthesising each sentence
+independently, and enqueuing each segment as soon as it is ready —
+so the first sentence begins playing while later sentences are still
+being synthesised.
+
+This is an internal implementation strategy: no other bus participant
+observes whether the TTS engine segments or not. The visible contract
+is unchanged — `ovos.audio.output.started` fires when the first
+audio begins, `ovos.audio.output.ended` fires when the last audio
+completes. The `listen` flag is honoured after all audio for the
+originating utterance has played, regardless of how many internal
+segments were used.
 ### 4.10 Common query pipeline plugin (COMMON-QUERY-1)
 
 Common query answers factual questions by holding a timed contest
diff --git a/audio-out.md b/audio-out.md
new file mode 100644
index 0000000..ccbfc30
--- /dev/null
+++ b/audio-out.md
@@ -0,0 +1,418 @@
+# Audio Output Service Specification
+
+**Spec ID:** OVOS-AUDIO-1 · **Version:** 2 · **Status:** Draft
+
+This specification defines the **audio output service** — the
+pipeline's output-side counterpart that consumes natural-language
+responses and renders them as audio. It covers two rendering modes
+(`ovos.utterance.speak` for local playback and
+`ovos.utterance.speak.b64` for remote-client delivery), a sequential
+playback queue for speech and sound effects, fire-and-forget instant
+sounds, and the output lifecycle signals that bookend audio playback.
+
+It builds on three companion specifications:
+
+- the *Utterance Lifecycle and Pipeline Specification*
+  (OVOS-PIPELINE-1) — the pipeline iteration, the `Match` and
+  dispatch contract, the handler-lifecycle trio, and the
+  `ovos.utterance.speak` output exit point;
+- the *Bus Message Specification* (OVOS-MSG-1) — the envelope,
+  routing keys, session carrier, and derivations every Message
+  defined here travels in;
+- the *Transformer Injection Point Specification*
+  (OVOS-TRANSFORM-1) — the dialog-transformer and TTS-transformer
+  chains that run before and after TTS synthesis.
+
+The key words **MUST**, **MUST NOT**, **SHOULD**, **SHOULD NOT**,
+and **MAY** are used as in RFC 2119.
+
+---
+
+## 1. Scope
+
+This specification defines:
+
+- **the audio output service role** (§2) — the component that
+  receives natural-language responses and renders them as audio;
+- **the rendering pipeline** (§3) — two rendering modes sharing the
+  same TTS pipeline: `ovos.utterance.speak` enqueues for local
+  playback; `ovos.utterance.speak.b64` emits synthesised audio as
+  base64 for remote clients instead;
+- **the playback model** (§4) — the scheduled queue for TTS
+  speech and queued sounds, and fire-and-forget instant sounds
+  for immediate playback;
+- **output lifecycle signals** (§5) — the start/end markers that
+  bookend audio playback;
+- **stop integration** (§6) — how the audio service responds to
+  stop signals;
+- **bus surface** (§7);
+- **conformance** (§8).
+
+It does **not** define:
+
+- **the internal machinery of TTS synthesis** — how a TTS plugin
+  converts text to audio, including model inference, voice
+  selection, and audio formatting, is entirely the plugin's
+  business. The spec fixes only the observable bus contract;
+- **the transformer plugin internals** — dialog and TTS
+  transformer chains are defined by OVOS-TRANSFORM-1; this spec
+  only fixes when they run in the output pipeline;
+- **the audio-input pipeline** — microphone capture, wake-word
+  detection, and speech-to-text are separate services covered by
+  other specifications;
+- **hardware access** — how the service accesses audio output
+  hardware is a deployment concern;
+- **volume control, audio routing, or hardware abstraction** —
+  these are deployment-level concerns;
+- **music and media playback** — long-form audio is managed by a
+  separate media-playback service. This spec covers TTS speech and
+  sound effects only.
+
+---
+
+## 2. The audio output service role
+
+The **audio output service** is the component that receives
+natural-language response text from the pipeline and renders it as
+audible output. It:
+
+- subscribes to `ovos.utterance.speak` (OVOS-PIPELINE-1 §9.6) and
+  `ovos.utterance.speak.b64` (§3.4) and processes each through the
+  same TTS rendering pipeline (§3), differing only in output stage;
+- maintains a **scheduled playback queue** (§4.1) for TTS speech
+  and queued sounds, ensuring that audio is played back in order
+  without overlapping;
+- plays **instant sounds** (§4.2) immediately on receipt,
+  independently of the scheduled queue and without stopping it;
+- emits **output lifecycle signals** (§5) around each playback
+  session;
+- responds to **stop signals** (§6) by clearing the queue and
+  terminating in-progress playback.
+
+A deployment **MAY** have no audio output service. The pipeline
+and handler lifecycle are unaffected by its absence.
+
+The handler does not block on audio output; playback may occur after
+`ovos.utterance.handled` has fired (PIPELINE-1 §6.1).
+
+---
+
+## 3. Rendering pipeline
+
+Both `ovos.utterance.speak` and `ovos.utterance.speak.b64` pass
+through the same TTS pipeline. They differ only in the output stage:
+
+```
+ovos.utterance.speak        ovos.utterance.speak.b64
+         │                           │
+         ▼                           ▼
+ [dialog transformers]      [dialog transformers]   ← TRANSFORM-1 §3.5
+         │                           │
+         ▼                           ▼
+   TTS synthesis               TTS synthesis
+         │                           │
+         ▼                           ▼
+  [tts transformers]         [tts transformers]      ← TRANSFORM-1 §3.6
+         │                           │
+         ▼                           ▼
+ scheduled queue          ovos.audio.speech (§4.3)
+  → local playback          (b64 for remote client)
+```
+
+All rendering stages execute in the audio output service, which MAY
+run in the same process as the utterance orchestrator or separately.
+
+### 3.1 Dialog transformer stage
+
+Before TTS synthesis, the utterance text is passed through the
+**dialog-transformer chain** (OVOS-TRANSFORM-1 §3.5) hosted by the
+audio output service. Each transformer plugin in the chain receives
+the text and the Message context and MAY mutate either.
+
+The transformed text replaces the original `utterance` field for
+all downstream stages.
+
+### 3.2 TTS synthesis
+
+The audio output service synthesises the utterance text into audio.
+Language is taken from `data.lang` in the received Message
+(PIPELINE-1 §9.6); when absent, the service resolves it from the
+session (OVOS-SESSION-1 §3.2).
+
+When synthesis fails, the service **SHOULD** attempt a fallback.
+Selection and fallback logic are deployment concerns.
+
+For `ovos.utterance.speak`, the synthesised audio is enqueued for
+local playback (§4). For `ovos.utterance.speak.b64`, the synthesised
+audio is emitted as `ovos.audio.speech` (§3.4) instead — it is not
+enqueued and does not play locally.
+
+> **Note (non-normative):** See appendix §4.9 for a discussion of
+> sentence-segmentation as a latency-reduction technique.
+
+### 3.3 TTS transformer stage
+
+After synthesis, the audio data and Message context are passed
+through the **TTS-transformer chain** (OVOS-TRANSFORM-1 §3.6)
+hosted by the audio output service. Each transformer plugin MAY mutate the audio data.
+
+The transformed audio replaces the original for playback.
+
+### 3.4 Remote-client rendering mode — `ovos.utterance.speak.b64`
+
+The audio output service **MUST** subscribe to
+`ovos.utterance.speak.b64`. A Message on this topic carries the same
+`utterance` text as `ovos.utterance.speak` and passes through the
+same dialog-transformer, TTS-synthesis, and TTS-transformer stages
+(§3.1–§3.3). The output stage differs: instead of enqueueing for
+local playback, the service **MUST** emit `ovos.audio.speech` (§4.3)
+with the synthesised audio encoded as base64. The audio is not
+enqueued and does not play on the local device.
+
+The `listen` flag (§4.4) applies: if the originating Message carries
+`listen: true`, the service **MUST** emit `ovos.mic.listen` after
+emitting `ovos.audio.speech`.
+
+---
+
+## 4. Playback model
+
+The audio output service has one scheduled queue and a separate
+instant-sound mechanism:
+
+- **Scheduled playback queue** (§4.1) — sequential, one-at-a-time
+  playback for TTS speech and queued sound effects. Audio plays in
+  FIFO order without overlapping.
+- **Instant sounds** (§4.2) — fire-and-forget playback that starts
+  immediately on receipt. Instant sounds are independent of the
+  queue: they play over whatever is currently scheduled, MAY overlap
+  each other, and are not stoppable.
+
+### 4.1 Scheduled playback queue
+
+This queue holds TTS speech (from `ovos.utterance.speak`, §3.2)
+and queued sounds (from `ovos.audio.queue`, below).
+
+**Session scope.** The audio output service MUST only enqueue items
+whose `context.session.session_id` matches a session it is
+configured to serve locally. A service co-located with the
+orchestrator on a single device SHOULD serve only
+`session_id: "default"` (**OVOS-SESSION-2 §5**) and MUST NOT
+enqueue audio for named sessions — those sessions belong to remote
+participants and their audio is delivered via
+`ovos.utterance.speak.b64` / `ovos.audio.speech` (§3.4, §4.3).
+
+**Discipline:**
+- **FIFO**. Items are dequeued in the order they were enqueued.
+- **Sequential**. Each item plays to completion before the next
+  item begins.
+- **Clearable**. On a stop signal (§6), the queue is emptied of
+  all pending items and any in-progress playback is terminated.
+
+**Queued sounds** use topic `ovos.audio.queue`:
+
+| Field | Type | Required | Meaning |
+|-------|------|----------|---------|
+| `uri` | string | no | URI referencing the audio data. |
+| `audio` | string | no | Base64-encoded audio data, used when the audio source is on a different host (alternative to `uri`). |
+| `listen` | bool | no | When `true`, re-opens the user input channel after this item plays (§4.4). |
+
+Exactly one of `uri` or `audio` MUST be present.
+
+### 4.2 Instant sounds
+
+Instant sounds are played via `ovos.audio.play_sound`. They start
+immediately on receipt, play over any audio currently in progress
+from the scheduled queue, MAY overlap each other, and are **not**
+affected by stop signals (§6).
+
+**Play-sound topic** `ovos.audio.play_sound`:
+
+| Field | Type | Required | Meaning |
+|-------|------|----------|---------|
+| `uri` | string | no | URI referencing the audio data. |
+| `audio` | string | no | Base64-encoded audio data, used when the audio source is on a different host (alternative to `uri`). |
+
+Exactly one of `uri` or `audio` MUST be present.
+
+### 4.3 Synthesised audio delivery — `ovos.audio.speech`
+
+`ovos.audio.speech` is emitted by the audio output service when
+processing an `ovos.utterance.speak.b64` Message (§3.4). It carries
+the synthesised audio as base64; the receiving client is responsible
+for decoding and playing it.
+
+| Field | Type | Required | Meaning |
+|-------|------|----------|---------|
+| `audio` | string | yes | Base64-encoded synthesised audio. |
+| `listen` | bool | no | When `true`, the client SHOULD re-open its microphone after playback. |
+
+The session is identified via `context.session` as usual. A bridge
+(OVOS-BRIDGE-1 §4.2.4) subscribes by `session_id` or `destination`
+and relays this message to the client.
+
+### 4.4 Listen flag
+
+The `listen` field on `ovos.utterance.speak` is defined by
+OVOS-PIPELINE-1 §9.6. When a received Message carries `listen: true`,
+the audio output service **MUST** emit `ovos.mic.listen` after all
+audio for that utterance has completed and after
+`ovos.audio.output.ended` (§5.2).
+
+On a stop-initiated end (§6), `ovos.mic.listen` is **NOT** emitted
+regardless of the `listen` flag.
+
+---
+
+## 5. Output lifecycle signals
+
+The audio output service emits lifecycle signals around playback
+to notify other components of audio state.
+
+### 5.1 Playback start
+
+When the first item in a playback session begins (queue was empty,
+first item dequeued), the audio output service **MUST** emit:
+
+`ovos.audio.output.started`
+
+Payload:
+
+No payload. The session is identified by `context.session.session_id`
+of this Message.
+
+A playback session runs from the first item's start until the queue
+is empty and the last item completes. `ovos.audio.output.started`
+fires once per idle→active transition.
+
+### 5.2 Playback end
+
+When the queue becomes empty and the last item has completed
+playback, the audio output service **MUST** emit:
+
+`ovos.audio.output.ended`
+
+Payload:
+
+No payload. The session is identified by `context.session.session_id`
+of this Message.
+
+Components that subscribed to `ovos.audio.output.started` use this
+signal to restore state.
+
+If the last completed item carried `listen: true` (§4.4), the audio
+output service emits `ovos.mic.listen` **after** `ovos.audio.output.ended`.
+On a stop-initiated end, `ovos.mic.listen` is not emitted (§4.4).
+
+### 5.3 Speaking-status query
+
+A component MAY query whether the audio output service is
+currently speaking by emitting:
+
+`ovos.audio.is_speaking`
+
+Request payload: none. To scope the query to a specific session,
+the requester sets `context.session.session_id` in the request
+Message; the service answers for that session only. An absent or
+`"default"` `session_id` asks about the device-local default session
+(OVOS-SESSION-1 §3.1); it is not a wildcard over all sessions.
+
+The service replies with:
+
+```json
+{ "speaking": true }
+```
+
+| Field | Type | Required | Meaning |
+|-------|------|----------|---------|
+| `speaking` | bool | yes | Whether audio is currently playing for the session identified by `context.session.session_id` of the request. |
+
+---
+
+## 6. Stop integration
+
+When the audio output service receives a stop signal, it:
+
+1. **clears** the scheduled playback queue of all pending items;
+2. **terminates** any in-progress scheduled playback;
+3. **emits** `ovos.audio.output.ended` if a playback session was
+   active.
+
+Instant sounds (§4.2) are not affected by stop signals — they play
+to completion regardless.
+
+The stop signal topics are:
+
+| Topic | Purpose |
+|-------|---------|
+| `ovos.audio.stop` | Stop audio output. |
+| `ovos.stop` | Universal stop broadcast (OVOS-STOP-1). |
+
+Both signals carry `context.session.session_id` (OVOS-MSG-1 §4).
+The audio output service **MAY** scope its response to that session.
+
+---
+
+## 7. Bus surface
+
+| Topic | Direction | Purpose |
+|-------|-----------|---------|
+| `ovos.utterance.speak` | handler → audio | Natural-language response text for TTS + local playback (PIPELINE-1 §9.6). |
+| `ovos.utterance.speak.b64` | handler/bridge → audio | Natural-language response text for TTS + remote delivery via `ovos.audio.speech` (§3.4). |
+| `ovos.audio.queue` | any component → audio | Queue a sound for scheduled playback (§4.1). |
+| `ovos.audio.play_sound` | any component → audio | Play a sound immediately (§4.2). |
+| `ovos.audio.stop` | any component → audio | Stop audio playback and clear queue (§6). |
+| `ovos.audio.is_speaking` | any component → audio | Query whether audio is currently playing (§5.3). |
+| `ovos.audio.output.started` | audio → broadcast | Playback session started (§5.1). |
+| `ovos.audio.output.ended` | audio → broadcast | Playback session ended (§5.2). |
+| `ovos.audio.speech` | audio → broadcast | Synthesised audio as base64 for remote clients (§4.3). |
+| `ovos.mic.listen` | audio → broadcast | Request microphone re-open after `listen: true` (§4.4). |
+
+---
+
+## 8. Conformance
+
+### An audio output service **MUST**:
+
+- subscribe to `ovos.utterance.speak` and process each Message
+  through the TTS rendering pipeline for local playback (§3);
+- subscribe to `ovos.utterance.speak.b64` and process each Message
+  through the same TTS pipeline, emitting `ovos.audio.speech`
+  instead of enqueueing for local playback (§3.4);
+- maintain a scheduled playback queue that plays one item at a
+  time in FIFO order (§4.1);
+- support queued sound playback via `ovos.audio.queue` (§4.1);
+- play instant sounds immediately on `ovos.audio.play_sound` without
+  queuing or stopping scheduled playback (§4.2);
+- emit `ovos.audio.output.started` when a playback session begins
+  (§5.1);
+- emit `ovos.audio.output.ended` when a playback session ends (§5.2);
+- clear the scheduled queue and terminate playback on stop signals (§6);
+- emit `ovos.mic.listen` after playback when the last item carries
+  `listen: true` (§4.4);
+- suppress `ovos.mic.listen` when playback ends due to a stop signal (§4.4, §6).
+
+### An audio output service **SHOULD**:
+
+- pass utterance text through the dialog-transformer chain before
+  TTS synthesis (§3.1);
+- pass the synthesized audio through the TTS-transformer chain
+  before enqueueing (§3.3);
+
+### An audio output service **MAY**:
+
+- scope stop responses to the `context.session.session_id` in the stop signal (§6).
+
+---
+
+## See also
+
+- *Utterance Lifecycle and Pipeline Specification* (OVOS-PIPELINE-1)
+  — the pipeline iteration, `ovos.utterance.speak`, and `ovos.utterance.handled`.
+- *Bus Message Specification* (OVOS-MSG-1) — the envelope and
+  derivations used for all bus communication.
+- *Transformer Injection Point Specification* (OVOS-TRANSFORM-1) —
+  the dialog-transformer and TTS-transformer chains that plug into
+  the rendering pipeline.
+- *Stop Pipeline Plugin Specification* (OVOS-STOP-1) — the universal
+  `ovos.stop` broadcast that the audio output service responds to.