biyachuev
diff --git a/‎README.md‎
Lines changed: 43 additions & 4 deletions b/‎README.md‎
Lines changed: 43 additions & 4 deletions
diff --git a/‎ROADMAP.md‎
Lines changed: 32 additions & 31 deletions b/‎ROADMAP.md‎
Lines changed: 32 additions & 31 deletions
@@ -4,7 +4,21 @@ A flexible toolkit for transcribing and translating YouTube videos, audio files,
 
 ## 🎯 Highlights
 
-### Version 1.4 (current)
+### Version 1.5 (current)
+- ✅ **Speaker Diarization**
+  - Automatic speaker identification using pyannote.audio
+  - Speaker labels in transcripts ([SPEAKER_00], [SPEAKER_01], etc.)
+  - Works with both local Whisper and OpenAI API
+  - Optimal speaker detection using VAD integration
+  - Enable with `--speakers` flag
+- ✅ **Enhanced logging with colored output**
+  - Color-coded log levels for better visibility
+  - WARNING messages in orange for important notices
+  - INFO messages in green for successful operations
+  - ERROR/CRITICAL messages in red for failures
+  - Smart warnings (e.g., missing Whisper prompt suggestions)
+
+### Version 1.4
 - ✅ **Video file support**
   - Process local video files (MP4, MKV, AVI, MOV, etc.)
   - Automatic audio extraction using FFmpeg
@@ -45,9 +59,8 @@ A flexible toolkit for transcribing and translating YouTube videos, audio files,
 - ✅ Apple M1/M2 optimisations
 
 ### In progress
-- 🔄 Whisper via OpenAI API
-- 🔄 Translation via OpenAI API
-- 🔄 Speaker diarisation
+- 🔄 Optimized chunk processing for OpenAI API
+- 🔄 Batch processing support
 - 🔄 Docker support
 
 ## 📋 Requirements
@@ -188,6 +201,32 @@ Produces two documents:
 python -m src.main     --url "https://youtube.com/watch?v=YOUR_VIDEO_ID"     --transcribe whisper_base     --prompt prompt.txt
 ```
 
+#### 7. Enable speaker diarization (v1.5)
+
+```bash
+# Transcribe with automatic speaker identification
+python -m src.main \
+    --url "https://youtube.com/watch?v=YOUR_VIDEO_ID" \
+    --transcribe whisper_medium \
+    --speakers
+```
+
+**Requirements for speaker diarization:**
+1. Get HuggingFace token: https://huggingface.co/settings/tokens (create a "Read" token)
+2. Accept model terms for all required models:
+   - https://huggingface.co/pyannote/speaker-diarization-3.1
+   - https://huggingface.co/pyannote/segmentation-3.0
+   - https://huggingface.co/pyannote/speaker-diarization-community-1
+   - https://huggingface.co/pyannote/voice-activity-detection (optional, for better chunking)
+3. Set token in environment: `export HF_TOKEN=your_token_here` (add to `~/.zshrc` or `~/.bashrc`)
+
+Output will include speaker labels:
+```
+[00:00] [SPEAKER_00] Hello everyone, welcome to the show
+[00:05] [SPEAKER_01] Thanks for having me
+[00:08] [SPEAKER_00] Let's get started with today's topic
+```
+
 ## ⚖️ Legal notice
 - Make sure you respect YouTube Terms of Service and copyright law before downloading or processing any content. Only use the tool for media you own or have explicit permission to process.
 - Output documents and logs may contain fragments of the original content. Store them locally and review licences before sharing.
 
@@ -2,6 +2,32 @@
 
 ## Recently Completed
 
+### ✅ Speaker Diarization (v1.5.0)
+**Status:** Implemented and committed (2025-10-13)
+
+**What was done:**
+- Implemented speaker identification using pyannote.audio
+- Added `_perform_speaker_diarization()` method in Transcriber
+- Integrated speaker labels with TranscriptionSegment
+- Updated document_writer to format speaker labels in output
+- Works with both local Whisper and OpenAI Whisper API
+- Graceful fallback when HF_TOKEN not available
+- Added comprehensive test suite (8 tests)
+
+**Technical details:**
+- Uses pyannote/speaker-diarization-3.1 model
+- Assigns speakers based on maximum overlap with speech segments
+- Seamlessly integrates with existing VAD infrastructure
+- Speaker labels automatically included in DOCX and Markdown outputs
+- Enable with `--speakers` CLI flag
+
+**Benefits:**
+- Better readability for multi-speaker content (interviews, podcasts)
+- Professional quality output with clear speaker attribution
+- Foundation for future speaker name mapping
+
+---
+
 ### ✅ VAD-based Intelligent Audio Chunking (v1.4.0)
 **Status:** Implemented and committed (2025-10-12)
 
@@ -23,34 +49,8 @@
 
 ### 🎯 High Priority
 
-#### 1. Speaker Diarization
-**Target:** v1.5.0
-**Dependencies:** HF_TOKEN, pyannote.audio (already installed)
-
-**Description:**
-Implement speaker identification and labeling in transcriptions using pyannote.audio.
-
-**Implementation plan:**
-- Add `_perform_speaker_diarization()` method using `pyannote/speaker-diarization-3.1`
-- Integrate speaker labels with TranscriptionSegment (already has speaker field)
-- Match diarization timestamps with Whisper transcription segments
-- Add `--with-speakers` CLI flag functionality
-- Update document output to include speaker labels (e.g., "[Speaker 1]:")
-- Reuse VAD data from chunking to improve diarization accuracy
-
-**Benefits:**
-- Better readability for multi-speaker content (interviews, podcasts)
-- Synergy with existing VAD implementation
-- Professional quality output
-
-**Requirements:**
-- HuggingFace token with access to pyannote/speaker-diarization-3.1
-- Accept terms: https://huggingface.co/pyannote/speaker-diarization-3.1
-
----
-
-#### 2. Optimized Chunk Processing for OpenAI API
-**Target:** v1.4.1
+#### 1. Optimized Chunk Processing for OpenAI API
+**Target:** v1.5.1
 **Dependencies:** None
 
 **Description:**
@@ -70,8 +70,8 @@ Improve processing efficiency when handling chunked audio files.
 
 ---
 
-#### 3. Batch Processing Support
-**Target:** v1.5.0
+#### 2. Batch Processing Support
+**Target:** v1.6.0
 **Dependencies:** None
 
 **Description:**
@@ -182,6 +182,7 @@ If you want to work on any of these features:
 
 ## Version History
 
+- **v1.5.0** (2025-10-13): Speaker diarization
 - **v1.4.0** (2025-10-12): VAD-based intelligent chunking
 - **v1.3.0** (2025-10-XX): OpenAI API integration (Whisper + GPT)
 - **v1.2.0** (2025-XX-XX): NLLB translation support
@@ -190,5 +191,5 @@ If you want to work on any of these features:
 
 ---
 
-**Last updated:** 2025-10-12
+**Last updated:** 2025-10-13
 **Maintainer:** @biyachuev