Skip to content

Silero VAD#608

Open
docteurZ wants to merge 12 commits into
mainfrom
silero-vad
Open

Silero VAD#608
docteurZ wants to merge 12 commits into
mainfrom
silero-vad

Conversation

@docteurZ
Copy link
Copy Markdown
Collaborator

This PR improves audio processing and VAD reliability for transcription.

It refactors utterance buffering to explicitly track silence, trims trailing silence, and filters low-quality utterances, reducing noise and false positives. VAD is now pluggable (WebRTC or Silero via VAD_PROVIDER). Silero is more CPU-intensive but generally more robust in noisy conditions, and it does not require high RMS levels to perform well. Docker was updated to support CPU-only Silero.

Streaming audio can optionally buffer chunks for async transcription, with cleaner flushing and tuned silence thresholds. Overall, this improves accuracy, configurability, and async transcription support.

…torch. Didn't manage to it in requirement.txt without pulling 2Gb of Cuda
- WebRTC VAD (default): Fast, lightweight, good for most cases
- Silero VAD: More accurate, especially for noisy environments
…ering

- Pluggable VAD: WebRTC (default) or Silero via VAD_PROVIDER env var
- Trim trailing silence from utterances before saving
- Filter low-quality utterances: min 15% speech ratio, min 200ms duration
@docteurZ docteurZ requested a review from a team as a code owner January 21, 2026 05:29
…cing false cutoffs and

utterance fragmentation.
…ic VAD on the server side). add a basic RMS energy filter to avoid sending pure silence though.
…ion. Streaming providers (Deepgram, Kyutai) have their own VAD in the streaming manager. Async transcription uses audio_chunk_buffer_manager inside the streaming manager
@docteurZ docteurZ requested a review from noah-duncan January 24, 2026 20:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant