Silero VAD by docteurZ · Pull Request #608 · attendee-labs/attendee

docteurZ · 2026-01-21T05:29:20Z

This PR improves audio processing and VAD reliability for transcription.

It refactors utterance buffering to explicitly track silence, trims trailing silence, and filters low-quality utterances, reducing noise and false positives. VAD is now pluggable (WebRTC or Silero via VAD_PROVIDER). Silero is more CPU-intensive but generally more robust in noisy conditions, and it does not require high RMS levels to perform well. Docker was updated to support CPU-only Silero.

Streaming audio can optionally buffer chunks for async transcription, with cleaner flushing and tuned silence thresholds. Overall, this improves accuracy, configurability, and async transcription support.

…torch. Didn't manage to it in requirement.txt without pulling 2Gb of Cuda

- WebRTC VAD (default): Fast, lightweight, good for most cases - Silero VAD: More accurate, especially for noisy environments

…ering - Pluggable VAD: WebRTC (default) or Silero via VAD_PROVIDER env var - Trim trailing silence from utterances before saving - Filter low-quality utterances: min 15% speech ratio, min 200ms duration

…cing false cutoffs and utterance fragmentation.

…ic VAD on the server side). add a basic RMS energy filter to avoid sending pure silence though.

…ion. Streaming providers (Deepgram, Kyutai) have their own VAD in the streaming manager. Async transcription uses audio_chunk_buffer_manager inside the streaming manager

docteurZ added 5 commits January 20, 2026 15:43

Allow to save audio in streaming mode

a54218b

torch and silero-vad installed separately in Dockerfile for CPU-only …

cb7e006

…torch. Didn't manage to it in requirement.txt without pulling 2Gb of Cuda

Interface for VAD with two implementations:

b2c3165

- WebRTC VAD (default): Fast, lightweight, good for most cases - Silero VAD: More accurate, especially for noisy environments

Add VAD abstraction with trailing silence trimming and utterance filt…

39bcb91

…ering - Pluggable VAD: WebRTC (default) or Silero via VAD_PROVIDER env var - Trim trailing silence from utterances before saving - Filter low-quality utterances: min 15% speech ratio, min 200ms duration

add tests

1e51eab

docteurZ requested a review from a team as a code owner January 21, 2026 05:29

docteurZ added 7 commits January 21, 2026 00:53

revert non_streaming_audio_silence_duration_limit

ebb9bae

add Hysteresis to prevent Silero VAD flickering near thresholds, redu…

6a11a67

…cing false cutoffs and utterance fragmentation.

don't instantiate Silero VAD when using Kuytai (it has its own semant…

84cfa40

…ic VAD on the server side). add a basic RMS energy filter to avoid sending pure silence though.

Only create non-streaming manager when NOT using streaming transcript…

79ef742

…ion. Streaming providers (Deepgram, Kyutai) have their own VAD in the streaming manager. Async transcription uses audio_chunk_buffer_manager inside the streaming manager

refactor and logs some RMS stats

3fa4cc2

Prevent OOM using defer audio_blob + use iterator

d6e0a08

Fix test: increase audio duration to pass MIN_DURATION_MS filter

8445743

docteurZ requested a review from noah-duncan January 24, 2026 20:16

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Silero VAD#608

Silero VAD#608
docteurZ wants to merge 12 commits into
mainfrom
silero-vad

docteurZ commented Jan 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

docteurZ commented Jan 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant