A pipeline that automatically generate a summarizing clip from a video based on text description of the video

Usage

Specify necessary directories Run bash run.sh

shot_detection: segment video into shots
text2frame: module that does description to frames matching => find the frames that matches the best with regard to the description of the video

Download the necessary trained model for image captioning and put to <text2frame/caption/checkpoints>

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
vsumm		vsumm
.gitignore		.gitignore
README.md		README.md
run.sh		run.sh
text2clip.yaml		text2clip.yaml