Various optimizations and reliability improvements by docteurZ · Pull Request #658 · attendee-labs/attendee

docteurZ · 2026-02-07T17:19:37Z

No description provided.

…torch. Didn't manage to it in requirement.txt without pulling 2Gb of Cuda

- WebRTC VAD (default): Fast, lightweight, good for most cases - Silero VAD: More accurate, especially for noisy environments

…ering - Pluggable VAD: WebRTC (default) or Silero via VAD_PROVIDER env var - Trim trailing silence from utterances before saving - Filter low-quality utterances: min 15% speech ratio, min 200ms duration

…cing false cutoffs and utterance fragmentation.

…ic VAD on the server side). add a basic RMS energy filter to avoid sending pure silence though.

…ion. Streaming providers (Deepgram, Kyutai) have their own VAD in the streaming manager. Async transcription uses audio_chunk_buffer_manager inside the streaming manager

…-speech signal, not a micro-paus signal

…rds received)

…tually spoke), not when Kyutai returned the transcription 500ms later.

The health check only detected initial connection failures (no words ever received). Once any word was received, _ever_received_word became True and the health check was effectively disabled for the session.

… threshold The new VAD filtering in PerParticipantNonStreamingAudioInputManager discards utterances shorter than 200ms. MockPCMAudioFrame was only generating 10ms of audio, causing all zoom bot tests that depend on utterance creation to fail.

… is disabled Add use_streaming_transcription() to should_capture_audio_chunks() and bump test audio duration to meet MIN_DURATION_MS threshold.

…ro-gaps in recordings

Zoom SDK delivers audio/video on separate C++ threads that compete for Python's GIL. When the video thread holds the GIL during scale_i420() or blocks on push-buffer while x264 catches up, the audio callback can't run and the SDK silently drops 10ms audio frames. This produces thousands of micro-gaps that worsen as participant count increases. Three changes: 1. Decouple SDK audio callback via deque + drain thread (zoom_bot_adapter.py) The callback now just copies bytes and returns in ~20μs. A separate thread pushes to GStreamer at its own pace. Timestamps captured at SDK delivery time preserve A/V sync. 2. Set block=False on video and audio appsrcs (gstreamer_pipeline.py) push-buffer returns immediately regardless of pipeline backpressure. 3. Make video queues q1/q2/q3 leaky=downstream (gstreamer_pipeline.py) x264 backpressure drops old video frames instead of propagating to the muxer and starving the audio path. Google Meet/Teams are unaffected — they use separate pipelines or none. Zoom RTMS uses its own RTMSGstreamerPipeline class, also unaffected. Tradeoff: if x264 can't keep up, video frames drop (logged by queue monitor) instead of invisible audio gaps. Video drops are far less perceptible than audio micro-gaps.

- Decouple audio SDK callback from GStreamer via deque + drain thread - Decouple video SDK callback from scale_i420 via deque + drain thread - Set block=False on appsrcs, leaky downstream video queues - Remove audiorate element (amplified jitter into silence insertions) - Pass SDK timestamps through to GStreamer for accurate A/V sync

docteurZ and others added 20 commits January 20, 2026 15:43

Allow to save audio in streaming mode

a54218b

torch and silero-vad installed separately in Dockerfile for CPU-only …

cb7e006

…torch. Didn't manage to it in requirement.txt without pulling 2Gb of Cuda

Interface for VAD with two implementations:

b2c3165

- WebRTC VAD (default): Fast, lightweight, good for most cases - Silero VAD: More accurate, especially for noisy environments

Add VAD abstraction with trailing silence trimming and utterance filt…

39bcb91

…ering - Pluggable VAD: WebRTC (default) or Silero via VAD_PROVIDER env var - Trim trailing silence from utterances before saving - Filter low-quality utterances: min 15% speech ratio, min 200ms duration

add tests

1e51eab

revert non_streaming_audio_silence_duration_limit

ebb9bae

add Hysteresis to prevent Silero VAD flickering near thresholds, redu…

6a11a67

…cing false cutoffs and utterance fragmentation.

don't instantiate Silero VAD when using Kuytai (it has its own semant…

84cfa40

…ic VAD on the server side). add a basic RMS energy filter to avoid sending pure silence though.

Only create non-streaming manager when NOT using streaming transcript…

79ef742

…ion. Streaming providers (Deepgram, Kyutai) have their own VAD in the streaming manager. Async transcription uses audio_chunk_buffer_manager inside the streaming manager

refactor and logs some RMS stats

3fa4cc2

Prevent OOM using defer audio_blob + use iterator

d6e0a08

implement a flushing trick

6299fbf

One flush per speech segment, not per word gap. The flush is a end-of…

3e93870

…-speech signal, not a micro-paus signal

more tweaks

8655af5

increase SILENCE_BEFORE_FLUSH

944d144

Health check: detect silent connection failures (audio sent but no wo…

8e0c169

…rds received)

The timestamp now reflects when the audio arrived (when the person ac…

7023bc2

…tually spoke), not when Kyutai returned the transcription 500ms later.

fix: prevent crash when audio arrives before participant join event

f5a7266

fix(kyutai): detect and recover from mid-session transcriber stalls

bc6c36d

The health check only detected initial connection failures (no words ever received). Once any word was received, _ever_received_word became True and the health check was effectively disabled for the session.

destroy the renderer

2868382

docteurZ requested a review from a team as a code owner February 7, 2026 17:19

docteurZ added 9 commits February 7, 2026 12:21

Merge branch 'main' into optim_kyutai

10040c8

oups, didn't want to commit that

f88931c

Fix web adapters not sending audio to Kyutai when async transcription…

7bbb3d9

… is disabled Add use_streaming_transcription() to should_capture_audio_chunks() and bump test audio duration to meet MIN_DURATION_MS threshold.

ruff fixes

17ad597

more ruff

565ecf1

fix: decouple Zoom SDK audio callback from GStreamer to eliminate mic…

39f0bff

…ro-gaps in recordings

docteurZ added 2 commits February 12, 2026 21:02

Merge remote-tracking branch 'origin/main' into optim_kyutai

0b85ed8

ruffing

4945247

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Various optimizations and reliability improvements#658

Various optimizations and reliability improvements#658
docteurZ wants to merge 31 commits into
mainfrom
optim_kyutai

docteurZ commented Feb 7, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

docteurZ commented Feb 7, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants