Skip to content

HITsz-TMG/MSVBench

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

[ACL 2026 Findings] MSVBench: Towards Human-Level Evaluation of Multi-Shot Video Generation

🎏 Overview

Dataset Construction

MSVBench Dataset

MSVBench Dataset adopts a hierarchical data construction paradigm that decomposes complex stories into global priors, scene-level segments, and shot-level conditions.

Evaluation Metrics

MSVBench Evaluation Metrics

MSVBench Evaluation Metrics establishes an innovative hybrid evaluation framework that integrates the high-level semantic reasoning of Large Multimodal Models (LMMs) with the fine-grained perceptual rigor of domain-specific expert models.

Leaderboard

MSVBench LeaderBoard

MSVBench LeaderBoard presents a thorough evaluation of 20 diverse video generation methods.

🚩 Latest Updates

  • [2025.04.16] 🚀 Project launch and code release.
  • [2026.04.06] 🔥 ACL 2026 Findings accepted!
  • [2025.02.27] 📄 ArXiv v1 has been published.

🛠️ Setup

Download

git clone --recursive https://github.com/MSVBench/MSVBench.git
cd MSVBench

Download required assets from HuggingFace: https://huggingface.co/datasets/MrSunshy/MSVBench

# 1) Download archives
wget https://huggingface.co/datasets/MrSunshy/MSVBench/resolve/main/Dataset.zip
wget https://huggingface.co/datasets/MrSunshy/MSVBench/resolve/main/ConsistencyEvalModels.zip

# 2) Place Dataset.zip contents into ./Dataset
unzip -o Dataset.zip -d Dataset

# 3) Place consistency model files into VideoConsistency output
unzip -o ConsistencyEvalModels.zip
cp -f cartooncharacter.pth Metrics/VideoConsistency/tools/Inceptionnext/output/
cp -f cartoonface.pth Metrics/VideoConsistency/tools/Inceptionnext/output/
cp -f model_best.pth Metrics/VideoConsistency/tools/Inceptionnext/output/

Expected model paths:

  • Metrics/VideoConsistency/tools/Inceptionnext/output/cartooncharacter.pth
  • Metrics/VideoConsistency/tools/Inceptionnext/output/cartoonface.pth
  • Metrics/VideoConsistency/tools/Inceptionnext/output/realcharacter.pth

Note: Model deployment and download guidelines for submetrics can be found in Metrics/*/checkpoints/instruction.txt and Metrics/*/tools/instruction.txt.

Environment

conda create -n MSVBench python=3.10
conda activate MSVBench
# for cuda 12.4
pip install pytorch==2.4.0 torchvision==0.19.0 torchaudio==2.4.0 pytorch-cuda=12.4 -c pytorch -c nvidia
pip install -r requirements.txt

Choose the torch version that suits you on this website: https://pytorch.org/get-started/previous-versions/

🚀 Usage

Data Preparation

  1. Generate videos from Dataset/script/<story_id>.json and save to:
Evaluation/data/videos/<story_id>/*.mp4
  1. Input lookup for prompt/script/camera/characters:
  • Default: Dataset/<type>/<story_id>.*
  • Custom override (higher priority): Dataset/baselineinfo/<method>/<type>/<story_id>.*

<type> includes prompt, script, camera, characters.

  1. Video lookup:
  • Default: Evaluation/data/videos/<story_id>
  • Custom override (higher priority): Dataset/baselineinfo/<method>/videos/<story_id>

Run Evaluation

bash MSVBench.sh

Useful options:

# skip cases that already have output json
SKIP_IF_EXISTS=1 bash MSVBench.sh

# run selected modules only
MODULES="visual_quality,story_alignment" bash MSVBench.sh

# run selected submetrics (format: module=sub1,sub2;module=sub1)
SUBMETRICS="story_alignment=blip_bleu_score,shot_perspective_alignment;motion_quality=action_recognition" bash MSVBench.sh

Run one single case directly:

METHOD=<method> STORY_ID="01" python3 MSVBench.py

Output Structure

MSVBench evaluation results are saved under Evaluation/results/ with one JSON file per (method, story_id) pair:

  • Evaluation/results: Root directory for evaluation outputs.
  • method folder: Method name, e.g., LongLive, CogVideo, Wan2.2-i2v.
  • story_id.json: Evaluation result for one story (e.g., 01.json, 27.json).

Each JSON file typically contains:

  • evaluation_info: metadata (method, story_id, video_directory, evaluation_timestamp, modules_evaluated).
  • timing_info: runtime per module.
  • Module outputs: visual_quality, story_alignment, video_consistency, motion_quality.

By default, MSVBench.sh writes results to:

  • Evaluation/results/<method>/<story_id>.json

📚 Citation

@article{shi2026msvbench,
  title={MSVBench: Towards Human-Level Evaluation of Multi-Shot Video Generation},
  author={Shi, Haoyuan and Li, Yunxin and Deng, Nanhao and Xu, Zhenran and Chen, Xinyu and Wang, Longyue and Hu, Baotian and Zhang, Min},
  journal={arXiv preprint arXiv:2602.23969},
  year={2026}
}

About

[ACL 2026 Findings] MSVBench: Towards Human-Level Evaluation of Multi-Shot Video Generation

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors