Various optimizations and reliability improvements#658
Open
docteurZ wants to merge 31 commits into
Open
Conversation
…torch. Didn't manage to it in requirement.txt without pulling 2Gb of Cuda
- WebRTC VAD (default): Fast, lightweight, good for most cases - Silero VAD: More accurate, especially for noisy environments
…ering - Pluggable VAD: WebRTC (default) or Silero via VAD_PROVIDER env var - Trim trailing silence from utterances before saving - Filter low-quality utterances: min 15% speech ratio, min 200ms duration
…cing false cutoffs and utterance fragmentation.
…ic VAD on the server side). add a basic RMS energy filter to avoid sending pure silence though.
…ion. Streaming providers (Deepgram, Kyutai) have their own VAD in the streaming manager. Async transcription uses audio_chunk_buffer_manager inside the streaming manager
…-speech signal, not a micro-paus signal
…tually spoke), not when Kyutai returned the transcription 500ms later.
The health check only detected initial connection failures (no words ever received). Once any word was received, _ever_received_word became True and the health check was effectively disabled for the session.
… threshold The new VAD filtering in PerParticipantNonStreamingAudioInputManager discards utterances shorter than 200ms. MockPCMAudioFrame was only generating 10ms of audio, causing all zoom bot tests that depend on utterance creation to fail.
… is disabled Add use_streaming_transcription() to should_capture_audio_chunks() and bump test audio duration to meet MIN_DURATION_MS threshold.
…ro-gaps in recordings
Zoom SDK delivers audio/video on separate C++ threads that compete for Python's GIL. When the video thread holds the GIL during scale_i420() or blocks on push-buffer while x264 catches up, the audio callback can't run and the SDK silently drops 10ms audio frames. This produces thousands of micro-gaps that worsen as participant count increases. Three changes: 1. Decouple SDK audio callback via deque + drain thread (zoom_bot_adapter.py) The callback now just copies bytes and returns in ~20μs. A separate thread pushes to GStreamer at its own pace. Timestamps captured at SDK delivery time preserve A/V sync. 2. Set block=False on video and audio appsrcs (gstreamer_pipeline.py) push-buffer returns immediately regardless of pipeline backpressure. 3. Make video queues q1/q2/q3 leaky=downstream (gstreamer_pipeline.py) x264 backpressure drops old video frames instead of propagating to the muxer and starving the audio path. Google Meet/Teams are unaffected — they use separate pipelines or none. Zoom RTMS uses its own RTMSGstreamerPipeline class, also unaffected. Tradeoff: if x264 can't keep up, video frames drop (logged by queue monitor) instead of invisible audio gaps. Video drops are far less perceptible than audio micro-gaps.
- Decouple audio SDK callback from GStreamer via deque + drain thread - Decouple video SDK callback from scale_i420 via deque + drain thread - Set block=False on appsrcs, leaky downstream video queues - Remove audiorate element (amplified jitter into silence insertions) - Pass SDK timestamps through to GStreamer for accurate A/V sync
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
No description provided.