Video-Guided Text-to-Music Generation Using Public Domain Movie Collections (ISMIR 2025)

This repository provides the Python implementations of our proposed model architecture, which integrates a video adapter into MusicGen, introduced in our paper titled "Video-Guided Text-to-Music Generation Using Public Domain Movie Collections" from ISMIR 2025.

If you find this repository useful for your research, please consider citing our paper.

@article{kim2025ossl,
  title = {Video-Guided Text-to-Music Generation Using Public Domain Movie Collections},
  author = {Haven Kim and Zachary Novack and Weihan Xu and Julian McAuley and Hao-Wen Dong},
  journal = {ISMIR 2025},
  year = {2025},
  url = {https://arxiv.org/abs/2506.12573}
}

Acknowledgements

Our implementation builds heavily on the official audiocraft repository.

Open Screen Sound Library Version 1 (OSSL-v1.)

Please see this webpage for downloading the dataset.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
audiocraft		audiocraft
architecture.png		architecture.png
eval.py		eval.py
infer.py		infer.py
readme.md		readme.md
train.py		train.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Video-Guided Text-to-Music Generation Using Public Domain Movie Collections (ISMIR 2025)

Acknowledgements

Open Screen Sound Library Version 1 (OSSL-v1.)

About

Uh oh!

Releases

Packages

Languages

havenpersona/ossl-v1

Folders and files

Latest commit

History

Repository files navigation

Video-Guided Text-to-Music Generation Using Public Domain Movie Collections (ISMIR 2025)

Acknowledgements

Open Screen Sound Library Version 1 (OSSL-v1.)

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages