SubGen - High-Quality Subtitle Generation Tool

Overview

SubGen is a streamlined subtitle generation tool based on VideoLingo. It focuses on high-quality subtitle recognition and translation while removing dubbing components. The project features updated dependencies with full support for RTX 50 series graphics cards and the latest CUDA/CuDNN environments.

Key Features

🎙️ Word-level accurate speech recognition with WhisperX
📝 AI-powered subtitle segmentation with NLP
📚 Custom terminology + AI-generated terms for consistent translation
🔄 3-step Translate-Reflect-Adaptation for cinema-quality results
✅ Netflix-standard single-line subtitles
🚀 Command-line interface with batch processing
📝 Detailed logging with resume capability
🎯 RTX 50 series optimization support

Language Support

Input Language Support:

*Chinese uses a separate punctuation-enhanced Whisper model

Translation supports all languages

System Requirements

Python 3.10-3.12
CUDA ≥ 12.3 (Tested on 12.8)
CuDNN 9
Windows/Linux/macOS
FFmpeg (for audio/video processing)

Installation

Note: FFmpeg is required. Please install it via package managers:

Windows: choco install ffmpeg (via Chocolatey)

macOS: brew install ffmpeg (via Homebrew)

Linux: sudo apt install ffmpeg (Debian/Ubuntu)

Note: For Windows users with NVIDIA GPU, complete these steps before installation:

Install CUDA Toolkit 12.8

Install CUDNN 9

Restart your computer

Clone Repository

git clone https://github.com/tukipona/SubGen.git
cd SubGen

Install uv (Recommended package manager)
```
pip install uv
```
Create virtual environment and install dependencies
```
uv sync
```

Configure Settings

cp config.example.yaml config.yaml
# Edit config.yaml to set API keys and other parameters

Model Download (automatic on first run)
- Whisper models: Downloaded to models/whisper_models/
- Spacy NLP models: Downloaded to models/spacy_models/
- Speech alignment models: Downloaded to models/alignment_models/

Usage

Place video/audio files in input/ directory
Run main program
```
uv run main.py
```
Interactive configuration The program will guide you through:
- Select input files
- Set source and target languages
- Choose whether to enable translation
- Configure other advanced options
Processing Pipeline The program will automatically execute the following steps:
- Step 1: Automatic Speech Recognition (ASR)
- Step 2: NLP sentence splitting and semantic segmentation
- Step 3: Content summarization and translation
- Step 4: Subtitle splitting optimization
- Step 5: Timestamp alignment
- Step 6: Cleanup and archiving
You can also execute each step individually
Output Results
- Source subtitles: output/src.srt
- Translated subtitles: output/trans.srt (if translation enabled)
- Bilingual subtitles: output/src_trans.srt and output/trans_src.srt (if translation enabled)

API Configuration

SubGen supports OpenAI-compatible API format:

api:
  key: 'your-api-key'
  base_url: 'https://api.openai.com/v1'  # or other compatible API endpoint
  model: 'gpt-4'

Advanced Configuration

For detailed configuration options, refer to config.yaml:

Subtitle length control: subtitle.max_length
Translation quality settings: reflect_translate
Concurrency control: max_workers

Current Limitations

WhisperX transcription performance may be affected by video background noise, as it uses wav2vec model for alignment.
Using weaker models can lead to errors during processes due to strict JSON format requirements for responses. If this error occurs, please retry with a different LLM.
Multilingual video transcription recognition will only retain the main language, as WhisperX uses a specialized model for a single language when forcibly aligning word-level subtitles.

License

This project is licensed under the Apache 2.0 License. Special thanks to:

VideoLingo - Original project foundation
WhisperX - Speech recognition
Spacy - Natural language processing

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
common		common
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
README_zh.md		README_zh.md
config.example.yaml		config.example.yaml
custom_terms.xlsx		custom_terms.xlsx
main.py		main.py
pyproject.toml		pyproject.toml
step1_asr.py		step1_asr.py
step2_1_split_nlp.py		step2_1_split_nlp.py
step2_2_split_meaning.py		step2_2_split_meaning.py
step3_1_summarize.py		step3_1_summarize.py
step3_2_translate.py		step3_2_translate.py
step4_split_sub.py		step4_split_sub.py
step5_gen_sub.py		step5_gen_sub.py
step6_cleanup.py		step6_cleanup.py
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

SubGen - High-Quality Subtitle Generation Tool

Overview

Key Features

Language Support

System Requirements

Installation

Usage

API Configuration

Advanced Configuration

Current Limitations

License

About

Uh oh!

Releases

Packages

Languages

License

tukipona/SubGen

Folders and files

Latest commit

History

Repository files navigation

SubGen - High-Quality Subtitle Generation Tool

Overview

Key Features

Language Support

System Requirements

Installation

Usage

API Configuration

Advanced Configuration

Current Limitations

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages