@@ -132,6 +132,45 @@ Always proofread technical translations.
132132
133133---
134134
135+ ### How accurate is speaker diarization?
136+
137+ Speaker diarization (enabled with ` --speakers ` ) identifies different speakers in audio and labels them as SPEAKER_00, SPEAKER_01, etc.
138+
139+ ** Accuracy depends on:**
140+ - ** Audio quality:** single-channel recordings may reduce accuracy
141+ - ** Recording setup:** studio mics (high quality) vs phone recordings (lower quality)
142+ - ** Speaker overlap:** people talking over each other causes confusion
143+ - ** Voice similarity:** similar-sounding speakers are harder to distinguish
144+ - ** Microphone distance changes:** one speaker moving closer/farther may be split into multiple labels
145+
146+ ** Typical results:**
147+ - ✅ Clean studio recordings with distinct voices: 85–95% accuracy
148+ - ⚠️ Phone/video calls: 70–85% accuracy
149+ - ⚠️ Noisy environments or overlapping speech: 50–70% accuracy
150+
151+ ** Known limitations:**
152+ - ⚠️ ** Over-segmentation:** One speaker may be assigned multiple labels (e.g., SPEAKER_00 and SPEAKER_01 for the same person)
153+ - Common when speaker changes tone, distance from mic, or there are long pauses
154+ - Manual review recommended for critical applications
155+ - ⚠️ ** Under-segmentation:** Multiple speakers may be assigned the same label
156+ - Less common, happens with very similar voices
157+
158+ ** Recommendation:** Use speaker labels as a guide, but verify important segments manually.
159+
160+ ** Built-in audio preprocessing:**
161+ The system automatically preprocesses audio before diarization to improve accuracy:
162+ - ✅ Conversion to mono 16kHz (standard for speech models)
163+ - ✅ RMS volume normalization to -20 dBFS (prevents quiet sections from being misclassified)
164+ - ✅ Clipping prevention (avoids distortion)
165+ - 🔄 Optional noise reduction (available with ` noisereduce ` library - install separately)
166+
167+ This preprocessing helps reduce false speaker clusters caused by:
168+ - Volume variations (one speaker at different distances from mic)
169+ - Background noise (can be classified as separate "speaker")
170+ - Audio quality inconsistencies
171+
172+ ---
173+
135174## Troubleshooting
136175
137176### “FFmpeg not found”
@@ -228,6 +267,68 @@ Ensure the desired model is pulled (`ollama pull qwen2.5:3b`).
228267
229268---
230269
270+ ### Warning: "torchcodec is not installed correctly" (Speaker Diarization)
271+
272+ ** Message:**
273+ ```
274+ UserWarning: torchcodec is not installed correctly so built-in audio decoding will fail.
275+ Could not load libtorchcodec... FFmpeg is not properly installed...
276+ We support versions 4, 5, 6 and 7.
277+ ```
278+
279+ ** Cause:** FFmpeg 8.0 is installed, but pyannote's torchcodec expects FFmpeg 4-7.
280+
281+ ** Is this critical?** ❌ ** No, this is safe to ignore.**
282+
283+ The speaker diarization system has built-in fallback audio loaders:
284+ 1 . First tries ` soundfile ` (doesn't need FFmpeg)
285+ 2 . Falls back to ` librosa ` if needed
286+ 3 . Only uses direct file loading as last resort
287+
288+ ** What happens:**
289+ - ✅ Speaker diarization works correctly
290+ - ✅ Audio is loaded via soundfile/librosa
291+ - ⚠️ Warning appears but can be ignored
292+
293+ ** If you want to suppress the warning:**
294+
295+ Option 1: Keep FFmpeg 8 (recommended, everything works)
296+ ``` bash
297+ # Do nothing - the fallback works perfectly
298+ ```
299+
300+ Option 2: Downgrade to FFmpeg 7 (optional, only to remove warning)
301+ ``` bash
302+ # macOS
303+ brew uninstall ffmpeg
304+ brew install ffmpeg@7
305+ brew link ffmpeg@7
306+ ```
307+
308+ ** Note:** Downgrading FFmpeg is unnecessary since the fallback mechanism works reliably.
309+
310+ ---
311+
312+ ### Warning: "std(): degrees of freedom is <= 0" (Speaker Diarization)
313+
314+ ** Message:**
315+ ```
316+ UserWarning: std(): degrees of freedom is <= 0. Correction should be strictly less than...
317+ ```
318+
319+ ** Cause:** Internal pyannote.audio calculation during speaker diarization.
320+
321+ ** Is this critical?** ❌ ** No, this is safe to ignore.**
322+
323+ This warning appears during normal operation of the speaker diarization pipeline and does not affect:
324+ - ✅ Accuracy of speaker detection
325+ - ✅ Quality of diarization results
326+ - ✅ Stability of the process
327+
328+ ** What to do:** Nothing - the process will complete successfully and identify speakers correctly.
329+
330+ ---
331+
231332### Where are logs and outputs saved?
232333
233334- Transcripts & translations: ` output/ `
0 commit comments