diff --git a/README.md b/README.md index 1e4c86e8..8d459151 100644 --- a/README.md +++ b/README.md @@ -28,15 +28,17 @@ developers to train custom multimodal large language model (MLLM), focusing on < 6. [Citation](#citation) # News -- [Update Jun. 12, 2024] Recipes for [MaLa-ASR](examples/mala_asr_slidespeech/README.md) has been supported. +- [Update Sep. 28, 2024] Recipes for [CoT-ST](examples/st_covost2/README.md) have been supported. +- [Update Sep. 25, 2024] Recipes for [DRCap](examples/drcap_zeroshot_aac/README.md) have been supported. +- [Update Jun. 12, 2024] Recipes for [MaLa-ASR](examples/mala_asr_slidespeech/README.md) have been supported. - **[CALL FOR EXAMPLE]** We sincerely invite developers and researchers to develop new applications, conduct academic research based on SLAM-LLM, and pull request your examples! We also acknowledge engineering PR (such as improving and speeding up multi-node training). - [Update May. 22, 2024] Please join [slack](https://join.slack.com/t/slam-llm/shared_invite/zt-2mc0pkhhs-5jjOi8Cwc8R1Xc8IQmykDA) or [WeChat group](./docs/Wechat.jpg). We will sync our updates and Q&A here. -- [Update May. 21, 2024] Recipes for [Spatial Audio Understanding](examples/seld_spatialsoundqa/README.md) has been supported. -- [Update May. 20, 2024] Recipes for [music caption (MC)](examples/mc_musiccaps/README.md) has been supported. -- [Update May. 8, 2024] Recipes for [visual speech recognition (VSR)](examples/vsr_LRS3/README.md) has been supported. -- [Update May. 4, 2024] Recipes for [zero-shot text-to-speech (TTS)](examples/vallex/README.md) has been supported. -- [Update Apr. 28, 2024] Recipes for [automated audio captioning (AAC)](examples/aac_audiocaps/README.md) has been supported. -- [Update Mar. 31, 2024] Recipes for [automatic speech recognition (ASR)](examples/asr_librispeech/README.md) has been supported. +- [Update May. 21, 2024] Recipes for [Spatial Audio Understanding](examples/seld_spatialsoundqa/README.md) have been supported. +- [Update May. 20, 2024] Recipes for [music caption (MC)](examples/mc_musiccaps/README.md) have been supported. +- [Update May. 8, 2024] Recipes for [visual speech recognition (VSR)](examples/vsr_LRS3/README.md) have been supported. +- [Update May. 4, 2024] Recipes for [zero-shot text-to-speech (TTS)](examples/vallex/README.md) have been supported. +- [Update Apr. 28, 2024] Recipes for [automated audio captioning (AAC)](examples/aac_audiocaps/README.md) have been supported. +- [Update Mar. 31, 2024] Recipes for [automatic speech recognition (ASR)](examples/asr_librispeech/README.md) have been supported. # Installation ```bash @@ -75,12 +77,24 @@ docker run -it --gpus all --name slam --shm-size=256g slam-llm:latest /bin/bash ## List of Recipes We provide reference implementations of various LLM-based speech, audio, and music tasks: - **Speech Task** - - [Automatic Speech Recognition (ASR)](examples/asr_librispeech/README.md) - - [Text-to-Speech (TTS)](examples/vallex/README.md) - - [Visual Speech Recognition (VSR)](examples/vsr_LRS3/README.md) + - Automatic Speech Recognition (ASR) + - [SLAM-ASR](examples/asr_librispeech/README.md) + + - Contextual Automatic Speech Recognition (CASR) + - [ Mala-ASR](examples/mala_asr_slidespeech/README.md) + + - [Visual Speech Recognition (VSR)](examples/vsr_LRS3/README.md) + - Speech-to-Text Translation (S2TT) + - [CoT-ST](examples/st_covost2/README.md) + + - Text-to-Speech (TTS) + - [VALL-E-X](examples/vallex/README.md) + - **Audio Task** - [Automated Audio Captioning (AAC)](examples/aac_audiocaps/README.md) - - [Spatial Audio Understanding](examples/seld_spatialsoundqa/README.md) + - [DRCap](examples/drcap_zeroshot_aac/README.md) + - Spatial Audio Understanding + - [BAT](examples/seld_spatialsoundqa/README.md) - **Music Task** - [Music Caption (MC)](examples/mc_musiccaps/README.md) diff --git a/examples/st_covost2/README.md b/examples/st_covost2/README.md index df5f2c15..13d708f8 100755 --- a/examples/st_covost2/README.md +++ b/examples/st_covost2/README.md @@ -59,10 +59,10 @@ bash infer.sh ## Citation You can refer to the paper for more results. ``` -@article{ma2024embarrassingly, - title={An Embarrassingly Simple Approach for LLM with Strong ASR Capacity}, - author={Ma, Ziyang and Yang, Guanrou and Yang, Yifan and Gao, Zhifu and Wang, Jiaming and Du, Zhihao and Yu, Fan and Chen, Qian and Zheng, Siqi and Zhang, Shiliang and others}, - journal={arXiv preprint arXiv:2402.08846}, +@article{du2024cot, + title={CoT-ST: Enhancing LLM-based Speech Translation with Multimodal Chain-of-Thought}, + author={Yexing Du, Ziyang Ma, Yifan Yang, Keqi Deng, Xie Chen, Bo Yang, Yang Xiang, Ming Liu, Bing Qin}, + journal={arXiv preprint arXiv:2409.19510}, year={2024} } ``` \ No newline at end of file