A pipeline that automatically generate a summarizing clip from a video based on text description of the video
Specify necessary directories
Run bash run.sh
- shot_detection: segment video into shots
- text2frame: module that does description to frames matching => find the frames that matches the best with regard to the description of the video
Download the necessary trained model for image captioning and put to <text2frame/caption/checkpoints>