biyachuev
diff --git a/‎CHANGELOG.md‎
Lines changed: 35 additions & 0 deletions b/‎CHANGELOG.md‎
Lines changed: 35 additions & 0 deletions
diff --git a/‎CHEATSHEET.md‎
Lines changed: 3 additions & 3 deletions b/‎CHEATSHEET.md‎
Lines changed: 3 additions & 3 deletions
diff --git a/‎FAQ.md‎
Lines changed: 101 additions & 0 deletions b/‎FAQ.md‎
Lines changed: 101 additions & 0 deletions
diff --git a/‎QUICKSTART.md‎
Lines changed: 3 additions & 3 deletions b/‎QUICKSTART.md‎
Lines changed: 3 additions & 3 deletions
diff --git a/‎README.md‎
Lines changed: 8 additions & 2 deletions b/‎README.md‎
Lines changed: 8 additions & 2 deletions
@@ -2,6 +2,41 @@
 
 All significant changes to this project are documented here.
 
+## [Unreleased]
+
+### Fixed
+- 🐛 **TextRefiner topic detection now respects backend setting**
+  - Fixed hardcoded Ollama call in `_detect_topic()` method
+  - Topic detection now correctly uses OpenAI API when `--refine-backend openai_api` is specified
+  - Resolves 404 error when using OpenAI backend without Ollama running
+
+### Added
+- ✨ **OpenAI support for Whisper prompt generation**
+  - Enhanced `create_whisper_prompt_with_llm()` to support both Ollama and OpenAI backends
+  - Automatically uses the same backend as refinement (`--refine-backend`) for prompt generation
+  - Improves consistency when using OpenAI API throughout the pipeline
+- 🎯 **Audio preprocessing for speaker diarization**
+  - Automatic conversion to mono 16kHz
+  - RMS volume normalization to -20 dBFS
+  - Clipping prevention
+  - Optional noise reduction support (via noisereduce library)
+  - Helps reduce false speaker clusters from volume variations and background noise
+  - New function: `_preprocess_audio_for_diarization()`
+
+### Documentation
+- 📝 **Added FAQ entries for speaker diarization warnings**
+  - Documented torchcodec FFmpeg version warning (safe to ignore)
+  - Documented pyannote std() warning (safe to ignore)
+  - Explained fallback audio loading mechanism
+  - Added quick reference in README troubleshooting section
+- 📝 **Added speaker diarization accuracy information**
+  - New FAQ section: "How accurate is speaker diarization?"
+  - Documented over-segmentation limitation (one speaker → multiple labels)
+  - Added accuracy guidelines based on audio quality
+  - Recommendation to verify speaker labels manually for critical applications
+  - Added warning note in README highlights
+  - Documented automatic audio preprocessing features
+
 ## [1.3.0] - 2025-10-10
 
 ### Added
 
@@ -90,7 +90,7 @@ python -m src.main --url "..." --transcribe whisper_base --prompt prompt.txt
 ## 📁 Project layout
 
 ```
-youtube-transcriber/
+yt-transcriber/
 ├── src/              # Source code
 ├── tests/            # Automated tests
 ├── output/           # Results ← start here
@@ -192,9 +192,9 @@ ffmpeg -version          # verify FFmpeg
 ## 🐳 Docker essentials
 
 ```bash
-docker build -t youtube-transcriber .
+docker build -t yt-transcriber .
 
-docker run -v $(pwd)/output:/app/output   youtube-transcriber   --url "YOUTUBE_URL"   --transcribe whisper_base
+docker run -v $(pwd)/output:/app/output   yt-transcriber   --url "YOUTUBE_URL"   --transcribe whisper_base
 
 # docker compose
 docker-compose up           # foreground
 
@@ -132,6 +132,45 @@ Always proofread technical translations.
 
 ---
 
+### How accurate is speaker diarization?
+
+Speaker diarization (enabled with `--speakers`) identifies different speakers in audio and labels them as SPEAKER_00, SPEAKER_01, etc.
+
+**Accuracy depends on:**
+- **Audio quality:** single-channel recordings may reduce accuracy
+- **Recording setup:** studio mics (high quality) vs phone recordings (lower quality)
+- **Speaker overlap:** people talking over each other causes confusion
+- **Voice similarity:** similar-sounding speakers are harder to distinguish
+- **Microphone distance changes:** one speaker moving closer/farther may be split into multiple labels
+
+**Typical results:**
+- ✅ Clean studio recordings with distinct voices: 85–95% accuracy
+- ⚠️ Phone/video calls: 70–85% accuracy
+- ⚠️ Noisy environments or overlapping speech: 50–70% accuracy
+
+**Known limitations:**
+- ⚠️ **Over-segmentation:** One speaker may be assigned multiple labels (e.g., SPEAKER_00 and SPEAKER_01 for the same person)
+  - Common when speaker changes tone, distance from mic, or there are long pauses
+  - Manual review recommended for critical applications
+- ⚠️ **Under-segmentation:** Multiple speakers may be assigned the same label
+  - Less common, happens with very similar voices
+
+**Recommendation:** Use speaker labels as a guide, but verify important segments manually.
+
+**Built-in audio preprocessing:**
+The system automatically preprocesses audio before diarization to improve accuracy:
+- ✅ Conversion to mono 16kHz (standard for speech models)
+- ✅ RMS volume normalization to -20 dBFS (prevents quiet sections from being misclassified)
+- ✅ Clipping prevention (avoids distortion)
+- 🔄 Optional noise reduction (available with `noisereduce` library - install separately)
+
+This preprocessing helps reduce false speaker clusters caused by:
+- Volume variations (one speaker at different distances from mic)
+- Background noise (can be classified as separate "speaker")
+- Audio quality inconsistencies
+
+---
+
 ## Troubleshooting
 
 ### “FFmpeg not found”
@@ -228,6 +267,68 @@ Ensure the desired model is pulled (`ollama pull qwen2.5:3b`).
 
 ---
 
+### Warning: "torchcodec is not installed correctly" (Speaker Diarization)
+
+**Message:**
+```
+UserWarning: torchcodec is not installed correctly so built-in audio decoding will fail.
+Could not load libtorchcodec... FFmpeg is not properly installed...
+We support versions 4, 5, 6 and 7.
+```
+
+**Cause:** FFmpeg 8.0 is installed, but pyannote's torchcodec expects FFmpeg 4-7.
+
+**Is this critical?** ❌ **No, this is safe to ignore.**
+
+The speaker diarization system has built-in fallback audio loaders:
+1. First tries `soundfile` (doesn't need FFmpeg)
+2. Falls back to `librosa` if needed
+3. Only uses direct file loading as last resort
+
+**What happens:**
+- ✅ Speaker diarization works correctly
+- ✅ Audio is loaded via soundfile/librosa
+- ⚠️ Warning appears but can be ignored
+
+**If you want to suppress the warning:**
+
+Option 1: Keep FFmpeg 8 (recommended, everything works)
+```bash
+# Do nothing - the fallback works perfectly
+```
+
+Option 2: Downgrade to FFmpeg 7 (optional, only to remove warning)
+```bash
+# macOS
+brew uninstall ffmpeg
+brew install ffmpeg@7
+brew link ffmpeg@7
+```
+
+**Note:** Downgrading FFmpeg is unnecessary since the fallback mechanism works reliably.
+
+---
+
+### Warning: "std(): degrees of freedom is <= 0" (Speaker Diarization)
+
+**Message:**
+```
+UserWarning: std(): degrees of freedom is <= 0. Correction should be strictly less than...
+```
+
+**Cause:** Internal pyannote.audio calculation during speaker diarization.
+
+**Is this critical?** ❌ **No, this is safe to ignore.**
+
+This warning appears during normal operation of the speaker diarization pipeline and does not affect:
+- ✅ Accuracy of speaker detection
+- ✅ Quality of diarization results
+- ✅ Stability of the process
+
+**What to do:** Nothing - the process will complete successfully and identify speakers correctly.
+
+---
+
 ### Where are logs and outputs saved?
 
 - Transcripts & translations: `output/`
 
@@ -9,7 +9,7 @@ Get up and running with YouTube Transcriber in five minutes.
 ```bash
 # Clone the repository
 git clone <repository-url>
-cd youtube-transcriber
+cd yt-transcriber
 
 # Create a virtual environment
 python -m venv venv
@@ -104,7 +104,7 @@ python -m src.main --url "URL" --transcribe whisper_base --translate NLLB
 ## 📁 Where to find results
 
 ```
-youtube-transcriber/
+yt-transcriber/
 ├── output/              # ← Processed documents
 │   ├── Video_Title.docx
 │   └── Video_Title.md
@@ -239,7 +239,7 @@ Speedups:
 ## 💬 Need help?
 
 - Check the [FAQ](FAQ.md)
-- Open an [issue on GitHub](https://github.com/yourusername/youtube-transcriber/issues)
+- Open an [issue on GitHub](https://github.com/yourusername/yt-transcriber/issues)
 - Reach out to the maintainers
 
 ---
 
@@ -11,6 +11,7 @@ A flexible toolkit for transcribing and translating YouTube videos, audio files,
   - Works with both local Whisper and OpenAI API
   - Optimal speaker detection using VAD integration
   - Enable with `--speakers` flag
+  - ⚠️ Note: May over-segment speakers (one person → multiple labels); manual review recommended for critical use
 - ✅ **Enhanced logging with colored output**
   - Color-coded log levels for better visibility
   - WARNING messages in orange for important notices
@@ -84,7 +85,7 @@ A flexible toolkit for transcribing and translating YouTube videos, audio files,
 
 ```bash
 git clone <repository-url>
-cd youtube-transcriber
+cd yt-transcriber
 ```
 
 ### 2. Create a virtual environment
@@ -291,7 +292,7 @@ python -m src.main --help
 ## 📁 Project structure
 
 ```
-youtube-transcriber/
+yt-transcriber/
 ├── src/                      # Source code
 │   ├── main.py              # Entry point
 │   ├── config.py            # Configuration
@@ -384,6 +385,11 @@ python -m src.main --url "..." --transcribe whisper_base
 - Confirm that GPU/MPS acceleration is active (see logs)
 - Close other resource-heavy applications
 
+**Safe to ignore:** Speaker diarization warnings
+- `UserWarning: torchcodec is not installed correctly` — Audio loading uses soundfile/librosa fallback (works correctly)
+- `UserWarning: std(): degrees of freedom is <= 0` — Internal pyannote calculation (does not affect results)
+- See [FAQ.md](FAQ.md) for detailed explanations
+
 ## 🧪 Testing
 
 ```bash