Skip to content

Merserk/BeatSync-Engine

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 

History

20 Commits
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 

Repository files navigation

BeatSync Engine - AI Audio-Visual Music Video Generator

Downloads Windows Python CUDA NVENC FFmpeg Gradio Qwen3--VL

A portable Windows app that creates beat-synchronized AMV/GMV/music videos from one audio track and one or more source videos.

BeatSync Engine analyzes music rhythm, energy, sections, and source-video moments, then builds a frame-locked edit timeline with optional Qwen3-VL semantic scene matching.

image

Goal: upload audio + video clips, click one button, and get a finished rhythmic music video with clean beat cuts, strong source-moment selection, and GPU-accelerated rendering when available.


๐ŸŽฅ Demo Video

Watch the BeatSync Engine demo video

Watch the special showcase video for BeatSync Engine on YouTube.


โœจ Features

  • ๐ŸŽต Automatic Beat Editing: Detects a stable beat grid and cuts source footage to the music rhythm.
  • ๐ŸŒŠ Energy-Wave Cut Density: Calm sections hold longer; drops, impacts, and high-energy parts cut faster.
  • ๐ŸŽผ Song Structure Detection: Finds broad intro, verse, chorus, bridge, drop, build, body, and outro-style sections.
  • ๐Ÿฅ Rhythm Feature Analysis: Reads kick, clap, bass, hi-hat, novelty, impact, bar anchors, and phrase anchors.
  • ๐ŸŽฌ Source Video Moment Library: Scans source videos for motion, quality, scene changes, action, beauty, and usable moments.
  • ๐Ÿง  Qwen3-VL Semantic Tags: Optional local vision-language tagging for action, combat, chase, beauty, emotion, drops, builds, and soft moments.
  • ๐ŸŽฏ Audio-Visual Planner: Chooses planned source moments instead of relying only on random clip sampling.
  • โšก NVIDIA GPU Acceleration: Uses CuPy/CUDA for analysis when available and FFmpeg NVENC for fast H.264/H.265 encoding.
  • ๐ŸŽž๏ธ Frame-Locked Timeline: Quantizes cut boundaries to absolute output frames to avoid timing drift.
  • ๐ŸŽฌ Multiple Export Modes: NVENC H.264, NVENC HEVC, CPU H.264, and ProRes 422 Proxy precise mode.
  • ๐Ÿ“ฆ Portable Runtime: Designed around bundled Python, CUDA, FFmpeg, Transformers, and local model folders.
  • ๐ŸŒ Local Web UI: Launches a Gradio interface at http://127.0.0.1:7860 by default.
  • ๐Ÿงน Local Caches: Reuses visual analysis data so repeated runs can be faster.

๐Ÿ“ฆ Portable Folder Layout

A complete portable release is expected to look like this:

BeatSync Engine/
โ”œโ”€โ”€ install.bat                              # One-click Windows installer
โ”œโ”€โ”€ run.bat                                  # One-click Windows launcher
โ”œโ”€โ”€ requirements.txt                         # Python package requirements
โ”œโ”€โ”€ scripts/
โ”‚   โ””โ”€โ”€ install.ps1                          # Portable runtime/model installer
โ”œโ”€โ”€ bin/
โ”‚   โ”œโ”€โ”€ python-3.13.13-embed-amd64/          # Main portable Python runtime
โ”‚   โ”œโ”€โ”€ CUDA/v13.0/                          # Portable CUDA runtime files
โ”‚   โ”œโ”€โ”€ ffmpeg/ffmpeg.exe                    # Portable FFmpeg
โ”‚   โ”œโ”€โ”€ ffmpeg/ffprobe.exe                   # Portable FFprobe
โ”‚   โ””โ”€โ”€ models/
โ”‚       โ””โ”€โ”€ Qwen3-VL-2B-Instruct/            # Local vision-language model
โ”œโ”€โ”€ src/
โ”‚   โ”œโ”€โ”€ gui.py                               # Gradio web UI
โ”‚   โ”œโ”€โ”€ video_processor.py                   # Rendering pipeline
โ”‚   โ”œโ”€โ”€ video_analysis.py                    # Source-video visual library
โ”‚   โ”œโ”€โ”€ ffmpeg_processing.py                 # FFmpeg/FFprobe helpers
โ”‚   โ”œโ”€โ”€ auto_mode/
โ”‚   โ”‚   โ”œโ”€โ”€ __init__.py                      # Auto Mode pipeline entry point
โ”‚   โ”‚   โ”œโ”€โ”€ stage1_audio.py                  # Beat grid detection
โ”‚   โ”‚   โ”œโ”€โ”€ stage2_features.py               # Energy/rhythm features
โ”‚   โ”‚   โ”œโ”€โ”€ stage3_sections.py               # Music section analysis
โ”‚   โ”‚   โ”œโ”€โ”€ stage4_select.py                 # Rhythmic cut selection
โ”‚   โ”‚   โ”œโ”€โ”€ stage5_qwen_scene_worker.py      # Qwen/Transformers semantic worker
โ”‚   โ”‚   โ””โ”€โ”€ stage6_av_planner.py             # Audio-visual clip planner
โ”‚   โ””โ”€โ”€ ui_content.py                        # UI labels and status text
โ”œโ”€โ”€ input/
โ”‚   โ”œโ”€โ”€ audio/                               # Latest uploaded audio copy
โ”‚   โ”œโ”€โ”€ video/                               # Latest uploaded source videos
โ”‚   โ”œโ”€โ”€ processing/                          # Temporary processing files
โ”‚   โ”œโ”€โ”€ gradio_uploads/                      # Gradio upload/session temp files
โ”‚   โ””โ”€โ”€ video_analysis_cache/                # Visual analysis cache
โ””โ”€โ”€ output/                                  # Final exported videos

The app creates input/ and output/ subfolders automatically if they are missing.


๐Ÿ› ๏ธ Quick Start - Windows Portable

  1. Download or extract BeatSync Engine.

  2. Keep the folder path simple, for example:

    D:\AI\BeatSync Engine\
    
  3. Run the installer once if the portable runtime is not already included:

    install.bat
  4. Start the app:

    run.bat
  5. The browser opens automatically:

    http://127.0.0.1:7860
    
  6. Upload:

    • one audio file: .mp3, .wav, or .flac;
    • one or more source videos: .mp4 or .mkv.
  7. Choose a processing mode.

  8. Click:

    ๐ŸŽฌ Create Music Video
    
  9. Finished videos are saved to:

    output/
    

๐ŸŽฎ How to Use the Web UI

1. Upload files

Use the left panel to upload an audio file and one or more source video files.

BeatSync copies the latest files into:

input/audio/
input/video/

This keeps processing local and avoids depending on browser upload temp paths.

2. Pick FPS

Leave Custom FPS empty to auto-detect FPS from the first source video.

Common values:

24
30
60

3. Choose processing mode

Mode Output Best for
NVIDIA NVENC H.264 .mp4 Fast, compatible exports.
NVIDIA NVENC HEVC .mp4 Smaller files, slower compatibility on old players.
CPU H.264 .mp4 Systems without NVENC.
ProRes 422 Proxy .mov Frame-perfect precise workflow and editing handoff.

4. Create the video

The CMD window shows clean stage progress like:

Stage 1 processing started:
  Beat grid: 259 beats at 152.0 BPM
Stage 1 ended in 8 seconds.

Stage 5 processing started:
  Source videos: 1
  Qwen: enabled
  Qwen tags: 116/116 in 97.5s
  Visual library: 804 visual moments, action=0.54, beauty=0.47, quality=0.62
Stage 5 ended in 192 seconds.

Total time processing: 217 seconds

The UI shows a preview and success stats after the render finishes.


๐ŸŽต Auto Mode Details

Auto Mode is the main creative engine.

Stage 1 - Beat grid

Detects stable beat positions and tempo from the percussive part of the song.

Stage 2 - Energy and rhythm

Builds beat-synchronous curves for:

  • RMS energy;
  • spectral brightness;
  • flux and novelty;
  • kick, clap, bass, and hi-hat strength;
  • impact score;
  • bar and phrase anchors.

Stage 3 - Sections

Groups the song into broad musical sections so the edit can breathe instead of cutting every transient.

Stage 4 - Cut selection

Selects a deliberate subset of beats. The selector prefers downbeats, bar anchors, phrase anchors, strong impacts, and section-aware cut density.

Stage 5 - Video analysis + Qwen3-VL

Builds a visual library from the uploaded source videos.

Deterministic analysis reads:

  • scene changes;
  • motion strength;
  • visual quality;
  • action score;
  • beauty score;
  • reusable candidate moments.

Optional local Qwen3-VL analysis through Transformers adds semantic tags like:

action
combat
chase
explosion
character_focus
camera_motion
emotion
recommended_use
visual_quality

Stage 6 - Audio-visual render

The renderer creates a frame-accurate cut timeline, chooses source moments for each segment, extracts clips with FFmpeg, concatenates the result, and adds the music track.


๐ŸŽฌ Export Modes

NVIDIA NVENC H.264

Fast hardware-encoded H.264 export.

Best for:

  • general sharing;
  • YouTube/TikTok/Discord workflows;
  • fast iteration.

NVIDIA NVENC HEVC

Hardware-encoded H.265/HEVC export.

Best for:

  • smaller files;
  • archival previews;
  • modern playback devices.

CPU H.264

Software encoding with libx264.

Best for:

  • systems without NVIDIA NVENC;
  • compatibility fallback.

ProRes 422 Proxy

Precise workflow that converts input videos to ProRes 422 Proxy, extracts frame-counted segments, concatenates, and adds the music track.

Best for:

  • editing handoff;
  • frame-perfect workflows;
  • maximum timeline stability.

ProRes files are larger. The app can create an H.264 preview for the Gradio video player while keeping the .mov output.


If BeatSync Engine saves you editing time, give the repository a star! โญ

About

BeatSync Engine is a portable AI video editor that automatically creates beat-synced AMV, GMV, and music videos from your audio and source clips.

Topics

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors