A portable Windows app that creates beat-synchronized AMV/GMV/music videos from one audio track and one or more source videos.
BeatSync Engine analyzes music rhythm, energy, sections, and source-video moments, then builds a frame-locked edit timeline with optional Qwen3-VL semantic scene matching.
Goal: upload audio + video clips, click one button, and get a finished rhythmic music video with clean beat cuts, strong source-moment selection, and GPU-accelerated rendering when available.
Watch the special showcase video for BeatSync Engine on YouTube.
- ๐ต Automatic Beat Editing: Detects a stable beat grid and cuts source footage to the music rhythm.
- ๐ Energy-Wave Cut Density: Calm sections hold longer; drops, impacts, and high-energy parts cut faster.
- ๐ผ Song Structure Detection: Finds broad intro, verse, chorus, bridge, drop, build, body, and outro-style sections.
- ๐ฅ Rhythm Feature Analysis: Reads kick, clap, bass, hi-hat, novelty, impact, bar anchors, and phrase anchors.
- ๐ฌ Source Video Moment Library: Scans source videos for motion, quality, scene changes, action, beauty, and usable moments.
- ๐ง Qwen3-VL Semantic Tags: Optional local vision-language tagging for action, combat, chase, beauty, emotion, drops, builds, and soft moments.
- ๐ฏ Audio-Visual Planner: Chooses planned source moments instead of relying only on random clip sampling.
- โก NVIDIA GPU Acceleration: Uses CuPy/CUDA for analysis when available and FFmpeg NVENC for fast H.264/H.265 encoding.
- ๐๏ธ Frame-Locked Timeline: Quantizes cut boundaries to absolute output frames to avoid timing drift.
- ๐ฌ Multiple Export Modes: NVENC H.264, NVENC HEVC, CPU H.264, and ProRes 422 Proxy precise mode.
- ๐ฆ Portable Runtime: Designed around bundled Python, CUDA, FFmpeg, Transformers, and local model folders.
- ๐ Local Web UI: Launches a Gradio interface at
http://127.0.0.1:7860by default. - ๐งน Local Caches: Reuses visual analysis data so repeated runs can be faster.
A complete portable release is expected to look like this:
BeatSync Engine/
โโโ install.bat # One-click Windows installer
โโโ run.bat # One-click Windows launcher
โโโ requirements.txt # Python package requirements
โโโ scripts/
โ โโโ install.ps1 # Portable runtime/model installer
โโโ bin/
โ โโโ python-3.13.13-embed-amd64/ # Main portable Python runtime
โ โโโ CUDA/v13.0/ # Portable CUDA runtime files
โ โโโ ffmpeg/ffmpeg.exe # Portable FFmpeg
โ โโโ ffmpeg/ffprobe.exe # Portable FFprobe
โ โโโ models/
โ โโโ Qwen3-VL-2B-Instruct/ # Local vision-language model
โโโ src/
โ โโโ gui.py # Gradio web UI
โ โโโ video_processor.py # Rendering pipeline
โ โโโ video_analysis.py # Source-video visual library
โ โโโ ffmpeg_processing.py # FFmpeg/FFprobe helpers
โ โโโ auto_mode/
โ โ โโโ __init__.py # Auto Mode pipeline entry point
โ โ โโโ stage1_audio.py # Beat grid detection
โ โ โโโ stage2_features.py # Energy/rhythm features
โ โ โโโ stage3_sections.py # Music section analysis
โ โ โโโ stage4_select.py # Rhythmic cut selection
โ โ โโโ stage5_qwen_scene_worker.py # Qwen/Transformers semantic worker
โ โ โโโ stage6_av_planner.py # Audio-visual clip planner
โ โโโ ui_content.py # UI labels and status text
โโโ input/
โ โโโ audio/ # Latest uploaded audio copy
โ โโโ video/ # Latest uploaded source videos
โ โโโ processing/ # Temporary processing files
โ โโโ gradio_uploads/ # Gradio upload/session temp files
โ โโโ video_analysis_cache/ # Visual analysis cache
โโโ output/ # Final exported videos
The app creates
input/andoutput/subfolders automatically if they are missing.
-
Download or extract BeatSync Engine.
-
Keep the folder path simple, for example:
D:\AI\BeatSync Engine\ -
Run the installer once if the portable runtime is not already included:
install.bat
-
Start the app:
run.bat
-
The browser opens automatically:
http://127.0.0.1:7860 -
Upload:
- one audio file:
.mp3,.wav, or.flac; - one or more source videos:
.mp4or.mkv.
- one audio file:
-
Choose a processing mode.
-
Click:
๐ฌ Create Music Video -
Finished videos are saved to:
output/
Use the left panel to upload an audio file and one or more source video files.
BeatSync copies the latest files into:
input/audio/
input/video/
This keeps processing local and avoids depending on browser upload temp paths.
Leave Custom FPS empty to auto-detect FPS from the first source video.
Common values:
24
30
60
| Mode | Output | Best for |
|---|---|---|
| NVIDIA NVENC H.264 | .mp4 |
Fast, compatible exports. |
| NVIDIA NVENC HEVC | .mp4 |
Smaller files, slower compatibility on old players. |
| CPU H.264 | .mp4 |
Systems without NVENC. |
| ProRes 422 Proxy | .mov |
Frame-perfect precise workflow and editing handoff. |
The CMD window shows clean stage progress like:
Stage 1 processing started:
Beat grid: 259 beats at 152.0 BPM
Stage 1 ended in 8 seconds.
Stage 5 processing started:
Source videos: 1
Qwen: enabled
Qwen tags: 116/116 in 97.5s
Visual library: 804 visual moments, action=0.54, beauty=0.47, quality=0.62
Stage 5 ended in 192 seconds.
Total time processing: 217 seconds
The UI shows a preview and success stats after the render finishes.
Auto Mode is the main creative engine.
Detects stable beat positions and tempo from the percussive part of the song.
Builds beat-synchronous curves for:
- RMS energy;
- spectral brightness;
- flux and novelty;
- kick, clap, bass, and hi-hat strength;
- impact score;
- bar and phrase anchors.
Groups the song into broad musical sections so the edit can breathe instead of cutting every transient.
Selects a deliberate subset of beats. The selector prefers downbeats, bar anchors, phrase anchors, strong impacts, and section-aware cut density.
Builds a visual library from the uploaded source videos.
Deterministic analysis reads:
- scene changes;
- motion strength;
- visual quality;
- action score;
- beauty score;
- reusable candidate moments.
Optional local Qwen3-VL analysis through Transformers adds semantic tags like:
action
combat
chase
explosion
character_focus
camera_motion
emotion
recommended_use
visual_quality
The renderer creates a frame-accurate cut timeline, chooses source moments for each segment, extracts clips with FFmpeg, concatenates the result, and adds the music track.
Fast hardware-encoded H.264 export.
Best for:
- general sharing;
- YouTube/TikTok/Discord workflows;
- fast iteration.
Hardware-encoded H.265/HEVC export.
Best for:
- smaller files;
- archival previews;
- modern playback devices.
Software encoding with libx264.
Best for:
- systems without NVIDIA NVENC;
- compatibility fallback.
Precise workflow that converts input videos to ProRes 422 Proxy, extracts frame-counted segments, concatenates, and adds the music track.
Best for:
- editing handoff;
- frame-perfect workflows;
- maximum timeline stability.
ProRes files are larger. The app can create an H.264 preview for the Gradio video player while keeping the
.movoutput.
If BeatSync Engine saves you editing time, give the repository a star! โญ
