SubGen is a streamlined subtitle generation tool based on VideoLingo. It focuses on high-quality subtitle recognition and translation while removing dubbing components. The project features updated dependencies with full support for RTX 50 series graphics cards and the latest CUDA/CuDNN environments.
- 🎙️ Word-level accurate speech recognition with WhisperX
- 📝 AI-powered subtitle segmentation with NLP
- 📚 Custom terminology + AI-generated terms for consistent translation
- 🔄 3-step Translate-Reflect-Adaptation for cinema-quality results
- ✅ Netflix-standard single-line subtitles
- 🚀 Command-line interface with batch processing
- 📝 Detailed logging with resume capability
- 🎯 RTX 50 series optimization support
Input Language Support:
🇺🇸 English | 🇷🇺 Russian | 🇫🇷 French | 🇩🇪 German | 🇮🇹 Italian | 🇪🇸 Spanish | 🇯🇵 Japanese | 🇨🇳 Chinese*
*Chinese uses a separate punctuation-enhanced Whisper model
Translation supports all languages
- Python 3.10-3.12
- CUDA ≥ 12.3 (Tested on 12.8)
- CuDNN 9
- Windows/Linux/macOS
- FFmpeg (for audio/video processing)
Note: FFmpeg is required. Please install it via package managers:
- Windows:
choco install ffmpeg(via Chocolatey)- macOS:
brew install ffmpeg(via Homebrew)- Linux:
sudo apt install ffmpeg(Debian/Ubuntu)
Note: For Windows users with NVIDIA GPU, complete these steps before installation:
- Install CUDA Toolkit 12.8
- Install CUDNN 9
- Restart your computer
-
Clone Repository
git clone https://github.com/tukipona/SubGen.git cd SubGen -
Install uv (Recommended package manager)
pip install uv
-
Create virtual environment and install dependencies
uv sync
-
Configure Settings
cp config.example.yaml config.yaml # Edit config.yaml to set API keys and other parameters -
Model Download (automatic on first run)
- Whisper models: Downloaded to
models/whisper_models/ - Spacy NLP models: Downloaded to
models/spacy_models/ - Speech alignment models: Downloaded to
models/alignment_models/
- Whisper models: Downloaded to
-
Place video/audio files in
input/directory -
Run main program
uv run main.py
-
Interactive configuration The program will guide you through:
- Select input files
- Set source and target languages
- Choose whether to enable translation
- Configure other advanced options
-
Processing Pipeline The program will automatically execute the following steps:
- Step 1: Automatic Speech Recognition (ASR)
- Step 2: NLP sentence splitting and semantic segmentation
- Step 3: Content summarization and translation
- Step 4: Subtitle splitting optimization
- Step 5: Timestamp alignment
- Step 6: Cleanup and archiving
You can also execute each step individually
-
Output Results
- Source subtitles:
output/src.srt - Translated subtitles:
output/trans.srt(if translation enabled) - Bilingual subtitles:
output/src_trans.srtandoutput/trans_src.srt(if translation enabled)
- Source subtitles:
SubGen supports OpenAI-compatible API format:
api:
key: 'your-api-key'
base_url: 'https://api.openai.com/v1' # or other compatible API endpoint
model: 'gpt-4'For detailed configuration options, refer to config.yaml:
- Subtitle length control:
subtitle.max_length - Translation quality settings:
reflect_translate - Concurrency control:
max_workers
-
WhisperX transcription performance may be affected by video background noise, as it uses wav2vec model for alignment.
-
Using weaker models can lead to errors during processes due to strict JSON format requirements for responses. If this error occurs, please retry with a different LLM.
-
Multilingual video transcription recognition will only retain the main language, as WhisperX uses a specialized model for a single language when forcibly aligning word-level subtitles.
This project is licensed under the Apache 2.0 License. Special thanks to:
- VideoLingo - Original project foundation
- WhisperX - Speech recognition
- Spacy - Natural language processing