Skip to content

Define minimum sync standards for the player@v1 role#104

Open
maximmaxim345 wants to merge 4 commits into
mainfrom
feat/define-minimum-player-syncing-standards
Open

Define minimum sync standards for the player@v1 role#104
maximmaxim345 wants to merge 4 commits into
mainfrom
feat/define-minimum-player-syncing-standards

Conversation

@maximmaxim345

@maximmaxim345 maximmaxim345 commented Jun 16, 2026

Copy link
Copy Markdown
Member

The specification didn't exactly define a minimum bar for synced playback. That meant that terribly out-of-sync players were still valid, while their end-user experience was unusable when grouped with other Sendspin players.

The new rules:

  • require the use of time-filter and the bursting strategy
  • client/time cadence floor
  • inaudible corrections in steady state
  • max ±0.5% speed in steady state (sliding average over the maximum chunk size, 150 ms)
  • ±2 ms accuracy in steady state
  • a rare one-shot resync (startup, underrun, large disturbance) is exempt from the speed and accuracy bounds
  • no startup warble
  • server chunk duration bounded to 15-150 ms (covers Opus at 20 ms and FLAC at 105 ms)

To give a head start for new implementations, this PR also adds a simple suggested strategy: discrete, bit-exact sample deletion and insertion. The player drops or duplicates whole frames to correct drift, leaving the audio untouched except at the moments it corrects. N scales with sample rate. Other algorithms like ASRC are still encouraged.

Constants

I'm still not sure what exact values we should pick. The values I picked in this draft are:

  • Maximum speed deviation ±0.5%: tighter than the cli's old ±4% warble; looser than what cpp/cli hold today (~0.1%/~0.2%). This is a ~8.6 cent pitch shift, on the edge of being inaudible with music. In steady state pitch tracks clock drift, so this cap is rarely reached.
  • Accuracy floor ±2 ms: achievable continuously by native clients. Might be difficult for some implementations like sendspin-js.
  • Accuracy target ±1 ms: in-room target, enough so individual speakers are not discernible when grouped.
  • Chunk duration bounds 15-150 ms: the 150 ms cap gives headroom over aiosendspin's current 105 ms max (FLAC at 44.1 kHz). The 15 ms floor keeps enough samples per chunk to correct within the ±0.5% cap.
  • Soft correction baseline ≈21 µs per chunk: one frame at 48 kHz, scaled by sample rate. Implementations may use a larger step to keep up with drift, bounded by the ±0.5% cap.
  • Dead band 100 µs: matches sendspin-cpp's existing value.
  • One-shot resync threshold 2 ms: set equal to the accuracy floor so the snap fires before the floor is violated.

Related issues

Make the correction-quality rule outcome-based (Inaudible corrections)
and exempt a rare one-shot resync from both the speed cap and the
accuracy floor. The speed cap is now a sliding average over 150 ms.
Interpolation only very slightly decreased distortion, lets drop it to
keep the Spec simpler. Other (better) strategies like ASRC are
encouraged.
@maximmaxim345 maximmaxim345 marked this pull request as ready for review June 17, 2026 07:53
maximmaxim345 added a commit to Sendspin/sendspin-js that referenced this pull request Jun 18, 2026
With the proposed minimum sync standards in Sendspin/spec#104 the
default `sync` mode in `sendspin-js` was no longer spec compliant.

This PR tweaks them to follow the spec.
There were two problems before this PR:
- Drift was corrected with up to ±2% playback-rate changes and was
therefore audible
- And startup errors could sit at 100-150 ms for tens of seconds because
a resync re-anchored its backlog forward instead of dropping it.
@maximmaxim345

Copy link
Copy Markdown
Member Author

The limits/constants even work for tricky platforms like sendspin-js running on Chromium (which randomizes the clock for security AFAIK). It still held within about +-1 ms there.

This works because we define accuracy from the Kalman filter to the output, not end-to-end. And since we also define how the filter has to be implemented and used, this essentially factors out the things we can't control, like network delay and stability.

So even listening through a VPN across the world, it might not be perfectly in sync, but the implementation is still spec compliant, since it's doing as well as it can.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant