Skip to content

High-quality subtitle generation tool based on VideoLingo, focused on speech recognition and translation

License

Notifications You must be signed in to change notification settings

tukipona/SubGen

Repository files navigation

SubGen - High-Quality Subtitle Generation Tool

Overview

SubGen is a streamlined subtitle generation tool based on VideoLingo. It focuses on high-quality subtitle recognition and translation while removing dubbing components. The project features updated dependencies with full support for RTX 50 series graphics cards and the latest CUDA/CuDNN environments.

Key Features

  • 🎙️ Word-level accurate speech recognition with WhisperX
  • 📝 AI-powered subtitle segmentation with NLP
  • 📚 Custom terminology + AI-generated terms for consistent translation
  • 🔄 3-step Translate-Reflect-Adaptation for cinema-quality results
  • Netflix-standard single-line subtitles
  • 🚀 Command-line interface with batch processing
  • 📝 Detailed logging with resume capability
  • 🎯 RTX 50 series optimization support

Language Support

Input Language Support:

🇺🇸 English | 🇷🇺 Russian | 🇫🇷 French | 🇩🇪 German | 🇮🇹 Italian | 🇪🇸 Spanish | 🇯🇵 Japanese | 🇨🇳 Chinese*

*Chinese uses a separate punctuation-enhanced Whisper model

Translation supports all languages

System Requirements

  • Python 3.10-3.12
  • CUDA ≥ 12.3 (Tested on 12.8)
  • CuDNN 9
  • Windows/Linux/macOS
  • FFmpeg (for audio/video processing)

Installation

Note: FFmpeg is required. Please install it via package managers:

  • Windows: choco install ffmpeg (via Chocolatey)
  • macOS: brew install ffmpeg (via Homebrew)
  • Linux: sudo apt install ffmpeg (Debian/Ubuntu)

Note: For Windows users with NVIDIA GPU, complete these steps before installation:

  1. Install CUDA Toolkit 12.8
  2. Install CUDNN 9
  3. Restart your computer
  1. Clone Repository

    git clone https://github.com/tukipona/SubGen.git
    cd SubGen
  2. Install uv (Recommended package manager)

    pip install uv
  3. Create virtual environment and install dependencies

    uv sync
  4. Configure Settings

    cp config.example.yaml config.yaml
    # Edit config.yaml to set API keys and other parameters
  5. Model Download (automatic on first run)

    • Whisper models: Downloaded to models/whisper_models/
    • Spacy NLP models: Downloaded to models/spacy_models/
    • Speech alignment models: Downloaded to models/alignment_models/

Usage

  1. Place video/audio files in input/ directory

  2. Run main program

    uv run main.py
  3. Interactive configuration The program will guide you through:

    • Select input files
    • Set source and target languages
    • Choose whether to enable translation
    • Configure other advanced options
  4. Processing Pipeline The program will automatically execute the following steps:

    • Step 1: Automatic Speech Recognition (ASR)
    • Step 2: NLP sentence splitting and semantic segmentation
    • Step 3: Content summarization and translation
    • Step 4: Subtitle splitting optimization
    • Step 5: Timestamp alignment
    • Step 6: Cleanup and archiving

    You can also execute each step individually

  5. Output Results

    • Source subtitles: output/src.srt
    • Translated subtitles: output/trans.srt (if translation enabled)
    • Bilingual subtitles: output/src_trans.srt and output/trans_src.srt (if translation enabled)

API Configuration

SubGen supports OpenAI-compatible API format:

api:
  key: 'your-api-key'
  base_url: 'https://api.openai.com/v1'  # or other compatible API endpoint
  model: 'gpt-4'

Advanced Configuration

For detailed configuration options, refer to config.yaml:

  • Subtitle length control: subtitle.max_length
  • Translation quality settings: reflect_translate
  • Concurrency control: max_workers

Current Limitations

  1. WhisperX transcription performance may be affected by video background noise, as it uses wav2vec model for alignment.

  2. Using weaker models can lead to errors during processes due to strict JSON format requirements for responses. If this error occurs, please retry with a different LLM.

  3. Multilingual video transcription recognition will only retain the main language, as WhisperX uses a specialized model for a single language when forcibly aligning word-level subtitles.

License

This project is licensed under the Apache 2.0 License. Special thanks to:

About

High-quality subtitle generation tool based on VideoLingo, focused on speech recognition and translation

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages