This repository provides the Python implementations of our proposed model architecture, which integrates a video adapter into MusicGen, introduced in our paper titled "Video-Guided Text-to-Music Generation Using Public Domain Movie Collections" from ISMIR 2025.
If you find this repository useful for your research, please consider citing our paper.
@article{kim2025ossl,
title = {Video-Guided Text-to-Music Generation Using Public Domain Movie Collections},
author = {Haven Kim and Zachary Novack and Weihan Xu and Julian McAuley and Hao-Wen Dong},
journal = {ISMIR 2025},
year = {2025},
url = {https://arxiv.org/abs/2506.12573}
}
Our implementation builds heavily on the official audiocraft repository.
Please see this webpage for downloading the dataset.
