Add visualizer@v1 role#86
Conversation
|
@Aircoookie this looks to be the successor to #28. Does this include all the information you would like available? |
Implements the `visualizer@v1` role: the server computes audio features (loudness, spectrum, dominant frequency, onset peaks, pitch, and beats) and streams them to visualizer clients as per-frame binary messages, replacing the batched `_draft_r1` blob. Legacy `visualizer@_draft_r1` clients are still accepted. Spec: * Sendspin/spec#86 Beats come from server-fed offline analysis via `append_beat_schedule` and ride the wire interleaved with periodic frames in timestamp order. `beat` is deferred from `stream/start` until the first schedule lands. ### Late join and pacing A visualizer grouped onto an active stream now receives buffered audio immediately instead of waiting up to the producer-buffer depth (about 30s) for the first FFT frames. While a beat schedule is still computing, periodic frames are not sent too far in advance (only 3s or so). This ensures that all visualizer data keeps having non-decreasing timestamps and still appears as fast as possible for clients. ### Optional `pitch` computation `SendspinServer.set_visualizer_pitch_enabled(enabled=False)` drops the heaviest feature (YINFFT) server wide for if it turns out to be too computationally intensive. Quality of `pitch` data and algorithm also needs to be tested more. All other data is rather simple to compute since FFT constants (Hanning window, frequency grid, spectrum bin assignment) are cached so steady per-frame cost stays low. ### Breaking changes `Roles.VISUALIZER` switches to `"visualizer@v1"` and the exported visualizer models (`VisualizerFrame`, `ClientHelloVisualizerSupport`, `StreamStartVisualizer.from_support`) are updated to the newer spec version. Connected clients stay backwards-compatible. `visualizer@_draft_r1` remains registered and keeps working as before.
|
|
||
| When [`stream/clear`](#server--client-streamclear) includes the visualizer role, clients should clear all buffered visualization data and continue with data received after this message. | ||
|
|
||
| ### Server → Client: Visualization Data (Binary) |
There was a problem hiding this comment.
Another potential concern is the amount of messages we are sending.
I don't think the overhead of WebSocket messages is too large though so this just needs testing on more low powered hardware.
There are two reasons why messages are completely split now:
- Consistency with other roles, all other roles already have one message per datum
- Difficulty of defining batching behavior. Requiring batching of multiple messages is difficult since it's always a compromise between latency and message count. But leaving batching open to the server would cause most implementations to never use them, defeating the whole purpose.
There was a problem hiding this comment.
In case this is really a problem, we could also later release a visualizer@v2 if the overhead turns out to be bigger than expected.
This just needs to be tested (with encryption) on a ESP8266 or similar.
#4042) Implements the `visualizer@v1` role on the Sendspin server and updates Music Assistant to aiosendspin 6.0.1. The older `visualizer@_draft_r1` role is left unchanged and still carries no beats. Clients connecting with the `visualizer@_draft_r1` role still remain fully functional. ## Visualizer The Sendspin player derives a per-track beat schedule from `smart_fades` analysis and streams it to visualizer clients over `visualizer@v1`. The Hue Entertainment plugin is reworked to use the v1 role, including beat-based colour cycling with selectable modes. Older clients using `visualizer@_draft_r1` stay fully compatible after this PR. ## Player timing Reports each player's lead-time and live-source hints to the push stream so it schedules the first chunk far enough ahead. This stops the AirPlay bridge from cutting off the start of a track. Older clients not implementing this stay fully compatible after this PR. ## Repeat and shuffle Repeat and shuffle now ride on controller state, following the aiosendspin 6.0 move off metadata. The server sets the controller state and keeps the legacy metadata copy in sync so older and newer clients both work. ## Known limitations The Hue plugin's beat effects need the track's beat analysis. On the first play of a track whose analysis has not been computed yet, the lights use the peak and onset fallback for the first stretch (up to ~30s) until beats arrive. During a smart-fades transition the beats of the crossfading tracks are used as is and can drift slightly while the two tracks overlap. Alignment is correct again once the transition completes. The Hue bridge uses a small visualizer buffer for near-realtime delivery, so after a track change beats take a few seconds to start arriving (the first moments of the new schedule are kept near the playhead and not delivered yet). The lights use the peak and onset fallback until they do. Beat data is loaded by a lightweight retry poller rather than an audio-analysis event subscription. This is intentional, to keep the changes contained to the Sendspin provider and avoid touching core Music Assistant code before the 2.9 release. Visualizer pitch detection is disabled server-wide for now. It is the heaviest visualizer DSP and its result quality is still mixed, so it needs more testing before being enabled. ## Relevant Specification this PR implements - Sendspin/spec#86 - Sendspin/spec#69 - Sendspin/spec#81 ## Testing I tested this locally, on a Raspberry Pi 4, and a Home Assistant Green. Lower powered clients may not be powerful enough to compute `beats` in time, but the Hue bridge falls back to `peaks` if thats the case.
|
|
||
| Energy onset event. Fires on any transient (drum hits, cymbal crashes, attacks), independent of musical timing. `strength` 0-255 lets clients scale flash intensity. | ||
|
|
||
| #### `pitch` — message type `21` |
There was a problem hiding this comment.
I have a couple concerns with pitch that came up while implementing this into sendspin-cli.
- First of all, the pitch given by
aiosendspinisn't too precise and useful. (disabled in Music Assistant for this reason). But thats more of an implementation issue. - Secondly, how long is a pitch supposed to be valid? If the server stops emitting when there's nothing tonal, the last value just sticks until the next track or so. One could interpret "ignore below your own threshold" as "clear when confidence is below your threshold or 0", but thats not defined in the Specification.
- And then the confidence scale itself: every client picking its own threshold and no defined meaning of the threshold makes behavior between server and client implementations inconsistent.
But if there is no reliable way to get a single useful pitch value, we could also just consider removing pitch from the specification.
Adds the visualizer role, based on the previous
visualizer@_draft_r1proposal.This alternative version aims to resolve a couple of gaps that were open in the previous PR:
One frame per binary message
Batching multiple frames of mixed types into one WebSocket message, as in the previous proposal, forces the server to either delay frames waiting for siblings (hurting low-latency playback) or send tiny batches anyway. It also makes ordering awkward across batches and pushes sort work onto the client. Per-type messages keep ordering trivial. Also more consistent with how other roles structure their binary messages.
The only concern with this approach is the increased number of messages. We could alternatively batch multiple message types with the same timestamp together.
Added
peakandpitchBoth give clients more to work with for more reactive effects. Does not replace the existing types since they are still different:
peakfires on any transient independent of the musical grid (wherebeatis rhythmic), andpitchtracks the perceived fundamental (wheref_peaktracks the dominant FFT bin, which strong harmonics can hijack).Downbeat flag on
beatLets clients drive bar-aware effects.
stream/startadvertisestracks_downbeatsso clients know whether to trust the bit. Accurate beat detection is hard and often relies on offline analysis, so servers without it omit thebeattype entirely. Even when supported, it may be unavailable for some content (live streams, sparse non-percussive material).Top-level
rate_maxNow bounds all periodic types, not just
spectrum.beatandpeakare event-driven and unthrottled.Scaling
Pins down what was previously hand-waved as "perceptual weighting" so implementations agree on the numbers.
Version name
Called
@v1here, but we might rename to@_draft_r2if we find some concerns after implementing a prototype. This is also the reason why this is marked as draft for now.