TubeScript

TubeScript is an application for generating speaker-labeled transcripts from YouTube videos with timestamps and punctuation using state-of-the-art AI models.

Features

Extract audio from YouTube videos
Perform speaker diarization (who spoke when)
Generate accurate transcriptions with proper punctuation
Label speakers and timestamps
Interactive UI for reviewing and editing transcripts
Export in multiple formats (.txt, .srt, .vtt)

Requirements

Python 3.9+
GPU with CUDA support (NVIDIA RTX 4070 Super or better recommended)
FFmpeg installed and accessible via system PATH
HuggingFace account and API token (for accessing pyannote.audio models)

Installation

1. Clone the repository

git clone https://github.com/yourusername/TubeScript.git
cd TubeScript

2. Set up the Python backend

cd backend
python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate
pip install -r requirements.txt

3. Configure environment variables

Create a .env file in the backend directory with the following content:

HUGGINGFACE_TOKEN=your_huggingface_token_here

You can obtain your HuggingFace token from: https://huggingface.co/settings/tokens

4. Preload AI models (recommended)

python preload_models.py

5. Set up the frontend

cd ../frontend
npm install

Usage

1. Using the start script (Windows)

For convenience, a start script is included that launches both backend and frontend servers:

start.bat

2. Manual startup

Start the backend server

cd backend
source venv/bin/activate  # On Windows: venv\Scripts\activate
python app.py

The API will be available at http://localhost:8000

Start the frontend development server

cd frontend
npm run dev

The web interface will be available at http://localhost:5173

3. Process a YouTube video

Enter a YouTube URL in the input field
Click "Process Video"
Wait for the processing to complete
Review the transcript and rename speakers if desired
Export the transcript in your preferred format

How It Works

YouTube Audio Extraction: Downloads and extracts audio using yt-dlp
Speaker Diarization: Uses pyannote.audio to identify different speakers
Transcription: Applies OpenAI's Whisper to generate accurate text with punctuation
Transcript Assembly: Combines speaker information with transcribed text
Frontend Display: Shows an interactive transcript with editing capabilities

License

MIT License

Acknowledgements

Pyannote Audio for speaker diarization
OpenAI Whisper for transcription
yt-dlp for YouTube downloading

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

TubeScript

Features

Requirements

Installation

1. Clone the repository

2. Set up the Python backend

3. Configure environment variables

4. Preload AI models (recommended)

5. Set up the frontend

Usage

1. Using the start script (Windows)

2. Manual startup

Start the backend server

Start the frontend development server

3. Process a YouTube video

How It Works

License

Acknowledgements

FilesExpand file tree

README.md

Latest commit

History

README.md

File metadata and controls

TubeScript

Features

Requirements

Installation

1. Clone the repository

2. Set up the Python backend

3. Configure environment variables

4. Preload AI models (recommended)

5. Set up the frontend

Usage

1. Using the start script (Windows)

2. Manual startup

Start the backend server

Start the frontend development server

3. Process a YouTube video

How It Works

License

Acknowledgements