-
Notifications
You must be signed in to change notification settings - Fork 8
Add visualizer@v1 role
#86
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -218,7 +218,7 @@ sequenceDiagram | |
| Server->>Client: binary Types 8-11 (artwork channels 0-3) | ||
| end | ||
| alt Visualizer role | ||
| Server->>Client: binary Type 16 (visualization data) | ||
| Server->>Client: binary Types 16-20 (loudness, beat, f_peak, spectrum, peak) | ||
| end | ||
| end | ||
|
|
||
|
|
@@ -395,10 +395,11 @@ Instructs clients to clear buffers without ending the stream. Used for seek oper | |
|
|
||
| ### Client → Server: `stream/request-format` | ||
|
|
||
| Request different stream format (upgrade or downgrade). Available for clients with the `player` or `artwork` role. | ||
| Request different stream format (upgrade or downgrade). Available for clients with the `player`, `artwork`, or `visualizer` role. | ||
|
|
||
| - `player?`: object - only for clients with the `player` role ([see player object details](#client--server-streamrequest-format-player-object)) | ||
| - `artwork?`: object - only for clients with the `artwork` role ([see artwork object details](#client--server-streamrequest-format-artwork-object)) | ||
| - `visualizer?`: object - only for clients with the `visualizer` role ([see visualizer object details](#client--server-streamrequest-format-visualizer-object)) | ||
|
|
||
| [Application-specific roles](#application-specific-roles) may also include objects in this message (keys starting with `_`). | ||
|
|
||
|
|
@@ -704,36 +705,107 @@ The timestamp indicates when this artwork should be displayed. Clients must tran | |
| **Clearing artwork:** To clear the currently displayed artwork on a specific channel, the server sends an empty binary message (only the message type byte and timestamp, with no image data) for that channel. | ||
|
|
||
| ## Visualizer messages | ||
| This section describes messages specific to clients with the `visualizer` role, which create visual representations of the audio being played. Visualizer clients receive audio analysis data like FFT information that corresponds to the current audio timeline. | ||
| This section describes messages specific to clients with the `visualizer` role, which create visual representations of the audio being played. Visualizer clients receive audio analysis data computed from the audio currently playing in the group. | ||
|
|
||
| Each visualizer binary message carries exactly one frame. The server emits messages in non-decreasing timestamp order so clients can process them in arrival order. Types the server cannot stream for the current source are silently omitted from the set echoed in [`stream/start`](#server--client-streamstart-visualizer-object). `beat` and `peak` are event-driven and not throttled by `rate_max`; all other types are periodic. | ||
|
|
||
| **`beat` vs `peak`:** `beat` is a musical pulse derived from tempo/beat tracking, landing on the rhythmic grid with downbeats marking bar starts. Accurate beat detection often relies on offline analysis (e.g. neural beat trackers); servers without such analysis omit the type. `peak` is an energy onset detected live from the audio stream and fires on any transient (drum hits, cymbal crashes, attacks), independent of the rhythmic grid. A `beat` and a `peak` can fire on the same hit, or a `peak` can fire mid-bar with no `beat`. | ||
|
|
||
| ### Client → Server: `client/hello` visualizer@v1 support object | ||
|
|
||
| The `visualizer@v1_support` object in [`client/hello`](#client--server-clienthello) has this structure: | ||
|
|
||
| - `visualizer@v1_support`: object | ||
| - Desired FFT details (to be determined) | ||
| - `buffer_capacity`: integer - max size in bytes of visualization data messages in the buffer that are yet to be displayed | ||
| - `types`: string[] - visualization data types requested by the client: 'beat', 'loudness', 'f_peak', 'peak', 'pitch', 'spectrum' | ||
| - `buffer_capacity`: integer - max total size in bytes of buffered visualizer binary messages, counting each message's full wire size (message-type byte + timestamp + data) | ||
| - `rate_max`: integer - maximum periodic visualization frames per second (applies to `loudness`, `f_peak`, `spectrum`). Beat events are not throttled and are bounded by tempo. Clients should set this to their display refresh rate | ||
| - `spectrum?`: object - spectrum configuration, required if `types` includes 'spectrum' | ||
| - `n_disp_bins`: integer - number of display bins (i.e. bars on a graphical equalizer) | ||
| - `scale`: 'mel' | 'log' | 'lin' - mapping from FFT frequencies to display bins. 'mel' uses the HTK mel formula (`m = 2595 * log10(1 + f/700)`), 'log' uses base-10 logarithm of frequency, 'lin' uses linear frequency spacing | ||
| - `f_min`: integer - lowest frequency in Hz to bin | ||
| - `f_max`: integer - highest frequency in Hz to bin | ||
|
|
||
| ### Server → Client: `stream/start` visualizer object | ||
|
|
||
| The `visualizer` object in [`stream/start`](#server--client-streamstart) has this structure: | ||
|
|
||
| - `visualizer`: object | ||
| - FFT details (to be determined) | ||
| - `types`: string[] - visualization data types the server will stream | ||
| - `rate_max`: integer - periodic frames per second the server will emit | ||
| - `tracks_downbeats`: boolean - only if `types` includes 'beat'. True if the server's beat tracker also identifies bar starts (downbeats). When false, the downbeat flag on `beat` messages is always 0 | ||
| - `spectrum?`: object - spectrum configuration, only if `types` includes 'spectrum' | ||
| - `n_disp_bins`: integer - number of display bins | ||
| - `scale`: 'mel' | 'log' | 'lin' - mapping from FFT frequencies to display bins | ||
| - `f_min`: integer - lowest frequency in Hz | ||
| - `f_max`: integer - highest frequency in Hz | ||
|
|
||
| ### Client → Server: `stream/request-format` visualizer object | ||
|
|
||
| The `visualizer` object in [`stream/request-format`](#client--server-streamrequest-format) has this structure: | ||
|
|
||
| - `visualizer`: object | ||
| - `types?`: string[] - new set of visualization data types | ||
| - `rate_max?`: integer - new periodic frames-per-second cap | ||
| - `buffer_capacity?`: integer - new buffer capacity in bytes | ||
| - `spectrum?`: object - new spectrum configuration ([see spectrum object details](#client--server-clienthello-visualizerv1-support-object)) | ||
|
|
||
| All fields are optional; omitted fields keep their current value. | ||
|
|
||
| Response: [`stream/start`](#server--client-streamstart) with the new visualizer configuration. | ||
|
|
||
| ### Server → Client: `stream/clear` visualizer | ||
|
|
||
| When [`stream/clear`](#server--client-streamclear) includes the visualizer role, clients should clear all buffered visualization data and continue with data received after this message. | ||
|
|
||
| ### Server → Client: Visualization Data (Binary) | ||
|
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Another potential concern is the amount of messages we are sending. There are two reasons why messages are completely split now:
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. In case this is really a problem, we could also later release a This just needs to be tested (with encryption) on a ESP8266 or similar. |
||
|
|
||
| Binary messages should be rejected if there is no active stream. | ||
| Binary messages should be rejected if there is no active stream. Each visualization `type` has its own binary message type. Every message carries exactly one frame of `[timestamp:8][data]`: | ||
|
|
||
| - Byte 0: message type (uint8, one of the types listed below) | ||
| - Bytes 1-8: timestamp (big-endian int64) - server clock time in microseconds when this data should be displayed. Clients must translate this server timestamp to their local clock using the offset computed from clock synchronization | ||
| - Remaining bytes: data, layout per type below | ||
|
|
||
| `loudness`, `spectrum` bins, and the `f_peak` amplitude use the full `uint16` range 0-65535, where 0 = silence and 65535 = full scale. Values are A-weighted and dB-scaled: -60 dB → 0, 0 dB → 65535, mapped linearly across that range. | ||
|
|
||
| Message types `22` and `23` are reserved for future visualizer types within the role's 16-23 allocation and must not be used by implementations. | ||
|
|
||
| #### `loudness` — message type `16` | ||
|
|
||
| - 2 bytes: `uint16` value | ||
|
|
||
| Overall A-weighted loudness in dB (see scaling above). | ||
|
|
||
| #### `beat` — message type `17` | ||
|
|
||
| - 1 byte: `uint8` flags. Bit 0 = downbeat (bar start). Bits 1-7 reserved, must be zero by the server, ignored by the client | ||
|
|
||
| Musical beat event. Bit 0 is only meaningful when [`stream/start`](#server--client-streamstart-visualizer-object) sets `tracks_downbeats: true`; otherwise it is always 0. | ||
|
|
||
| #### `f_peak` — message type `18` | ||
|
|
||
| - 2 bytes: `uint16` freq - dominant frequency in Hz (0 = no peak detected, amp must also be 0) | ||
| - 2 bytes: `uint16` amp - amplitude (see scaling above) | ||
|
|
||
| Tracks the dominant FFT bin. For pitched sources strong harmonics can dominate the fundamental, so `f_peak` is not a substitute for `pitch`. | ||
|
|
||
| #### `spectrum` — message type `19` | ||
|
|
||
| - 2*n bytes: `uint16[n]` bins from low to high frequency. `n` = `n_disp_bins` in [`stream/start`](#server--client-streamstart-visualizer-object) | ||
|
|
||
| Magnitude per display bin. Servers may impose an implementation-defined upper bound on `n_disp_bins` to keep per-frame size sensible. | ||
|
|
||
| #### `peak` — message type `20` | ||
|
|
||
| - 1 byte: `uint8` strength | ||
|
|
||
| Energy onset event. Fires on any transient (drum hits, cymbal crashes, attacks), independent of musical timing. `strength` 0-255 lets clients scale flash intensity. | ||
|
|
||
| #### `pitch` — message type `21` | ||
|
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I have a couple concerns with pitch that came up while implementing this into sendspin-cli.
But if there is no reliable way to get a single useful pitch value, we could also just consider removing |
||
|
|
||
| - Byte 0: message type `16` (uint8) | ||
| - Bytes 1-8: timestamp (big-endian int64) - server clock time in microseconds when the visualization should be displayed by the device | ||
| - Rest of bytes: visualization data | ||
| - 2 bytes: `uint16` midi (8.8 fixed-point) - fractional MIDI note (integer part = MIDI note number, e.g. 69 = A4; fractional part = sub-semitone for vibrato/glissando) | ||
| - 1 byte: `uint8` confidence - 0-255. Clients should ignore pitches below their own threshold | ||
|
|
||
| The timestamp indicates when this visualization data should be displayed, corresponding to the audio timeline. Clients must translate this server timestamp to their local clock using the offset computed from clock synchronization. | ||
| Perceived pitch. Emitted periodically up to `rate_max`. Distinct from `f_peak`, which tracks the dominant FFT bin. | ||
|
|
||
| ## Color messages | ||
| This section describes messages specific to clients with the `color` role, which receive colors derived from the current audio. Colors may be extracted from album artwork, provided by the music source, or manually programmed by the server. | ||
|
|
||
Uh oh!
There was an error while loading. Please reload this page.