Skip to content

Reduce audible startup sync correction#255

Open
trisweb wants to merge 1 commit into
Sendspin:mainfrom
trisweb:fix-audio-startup-warble
Open

Reduce audible startup sync correction#255
trisweb wants to merge 1 commit into
Sendspin:mainfrom
trisweb:fix-audio-startup-warble

Conversation

@trisweb
Copy link
Copy Markdown
Contributor

@trisweb trisweb commented May 24, 2026

Summary

This reduces audible pitch shift / warble during Sendspin client playback startup and other stream transitions.

The main issue appears to be that the client can start playback with a consistent initial sync offset, then correct that offset quite aggressively using sample insert/drop correction. On my Linux endpoint this was especially noticeable at the beginning of tracks. Some discussion of this occurred at #107, however it was indeterminate and difficult to pin down the cause.

With analysis and help from GPT5.5, with my full understanding and detailed review, this PR makes three related changes:

  • Fixes the drop-frame correction path so it discards one input frame and outputs the following frame, instead of repeating the previous output frame while consuming two input frames (bug fix)
  • Reduces the maximum correction rate from +/-4% to +/-0.2% (much more reliably below audible threshold)
  • Adds a short startup grace period before sync correction begins, so DAC/time-sync estimates can settle before the client starts inserting or dropping samples (most impactful improvement; basically correction was correcting issues that were not present simply due to lack of data)

Analysis

I was hearing pitch shift and warble on a fully up-to-date sendspin endpoint, most noticeably at playback start and during track changes.

Looking through sendspin/audio.py, the most suspicious path was the sync correction logic. The previous drop correction branch did this:

  1. Read one frame.
  2. Read another frame.
  3. Output the previous frame again.

That effectively produced a duplicate-then-skip pattern, which is more audible than a simple one-frame drop.

Separately, the correction loop allowed up to +/-4% playback speed correction over a 2 second target window. On real playback that is enough to sound like pitch movement, especially right after startup when the first sync estimate is still settling.

This was changed to a maximum +/-0.2% correction over an 8 second window, which is more conservative, but still within reasonable sync delay expectations (counting to 8 will help provide confidence; if we believe users would reasonably be OK with out-of-sync clients converging within 8 seconds, then this is a reasonable default). Importantly, this is well below the threshold of audible pitch shift or warble while still providing a means to converge.

In any case, if the sync is too far out, a reanchor will be triggered.

Additionally, a 750ms sync correction delay was added; this does not delay audible playback, it only suppresses sample insert/drop correction briefly after playback enters the PLAYING state. That gives the DAC timing and clock-sync estimates a short window to settle before the client starts making speed adjustments based on them. In practice this avoids reacting to the first unstable startup measurements while still allowing scheduled playback to begin on time.

Real Endpoint Comparison

I tested this on my actual Sendspin Linux endpoint using the same daemon config, audio device, and negotiated format:

  • Device: HiFiBerry DAC+
  • Format: flac:48000:24:2
  • Server: Music Assistant
  • Output latency reported by PortAudio: ~42.7 ms

The original version repeatedly started streams around -42 ms sync error, then corrected aggressively.

Original observed debug stats:

  • Underflows: 0
  • Reanchors: 0
  • Warnings/errors: 0
  • Speed range: 98.29% to 100.11%
  • Max inserted frames per 1s log window: 821
  • Max dropped frames per 1s log window: 55

Patched observed debug stats:

  • Underflows: 0
  • Reanchors: 0
  • Warnings/errors: 0
  • Speed range: 99.78% to 100.04%
  • Max inserted frames per 1s log window: 106
  • Max dropped frames per 1s log window: 21

The startup offset is still visible, but correction is much less aggressive and avoids the previous drop-frame artifact.

Tests

Added focused regression tests for:

  • Drop correction discarding one frame without repeating the previous output frame.
  • Startup correction grace period suppressing immediate insert/drop correction.

Local verification:

uv run --extra test ruff check sendspin/audio.py tests/test_audio.py
uv run --extra test mypy sendspin
uv run --extra test pytest

Results:

All checks passed
Success: no issues found in 30 source files
91 passed

Testing for a day in real world scenarios has also been very successful.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant