Skip to content

Commit fbe3b65

Browse files
authored
Merge pull request #139 from X-LANCE/dev-mzy
Update readme
2 parents 3a7c195 + b93bbf3 commit fbe3b65

File tree

2 files changed

+29
-15
lines changed

2 files changed

+29
-15
lines changed

README.md

Lines changed: 25 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -28,15 +28,17 @@ developers to train custom multimodal large language model (MLLM), focusing on <
2828
6. [Citation](#citation)
2929

3030
# News
31-
- [Update Jun. 12, 2024] Recipes for [MaLa-ASR](examples/mala_asr_slidespeech/README.md) has been supported.
31+
- [Update Sep. 28, 2024] Recipes for [CoT-ST](examples/st_covost2/README.md) have been supported.
32+
- [Update Sep. 25, 2024] Recipes for [DRCap](examples/drcap_zeroshot_aac/README.md) have been supported.
33+
- [Update Jun. 12, 2024] Recipes for [MaLa-ASR](examples/mala_asr_slidespeech/README.md) have been supported.
3234
- **[CALL FOR EXAMPLE]** We sincerely invite developers and researchers to develop new applications, conduct academic research based on SLAM-LLM, and pull request your examples! We also acknowledge engineering PR (such as improving and speeding up multi-node training).
3335
- [Update May. 22, 2024] Please join [slack](https://join.slack.com/t/slam-llm/shared_invite/zt-2mc0pkhhs-5jjOi8Cwc8R1Xc8IQmykDA) or [WeChat group](./docs/Wechat.jpg). We will sync our updates and Q&A here.
34-
- [Update May. 21, 2024] Recipes for [Spatial Audio Understanding](examples/seld_spatialsoundqa/README.md) has been supported.
35-
- [Update May. 20, 2024] Recipes for [music caption (MC)](examples/mc_musiccaps/README.md) has been supported.
36-
- [Update May. 8, 2024] Recipes for [visual speech recognition (VSR)](examples/vsr_LRS3/README.md) has been supported.
37-
- [Update May. 4, 2024] Recipes for [zero-shot text-to-speech (TTS)](examples/vallex/README.md) has been supported.
38-
- [Update Apr. 28, 2024] Recipes for [automated audio captioning (AAC)](examples/aac_audiocaps/README.md) has been supported.
39-
- [Update Mar. 31, 2024] Recipes for [automatic speech recognition (ASR)](examples/asr_librispeech/README.md) has been supported.
36+
- [Update May. 21, 2024] Recipes for [Spatial Audio Understanding](examples/seld_spatialsoundqa/README.md) have been supported.
37+
- [Update May. 20, 2024] Recipes for [music caption (MC)](examples/mc_musiccaps/README.md) have been supported.
38+
- [Update May. 8, 2024] Recipes for [visual speech recognition (VSR)](examples/vsr_LRS3/README.md) have been supported.
39+
- [Update May. 4, 2024] Recipes for [zero-shot text-to-speech (TTS)](examples/vallex/README.md) have been supported.
40+
- [Update Apr. 28, 2024] Recipes for [automated audio captioning (AAC)](examples/aac_audiocaps/README.md) have been supported.
41+
- [Update Mar. 31, 2024] Recipes for [automatic speech recognition (ASR)](examples/asr_librispeech/README.md) have been supported.
4042

4143
# Installation
4244
```bash
@@ -75,12 +77,24 @@ docker run -it --gpus all --name slam --shm-size=256g slam-llm:latest /bin/bash
7577
## List of Recipes
7678
We provide reference implementations of various LLM-based speech, audio, and music tasks:
7779
- **Speech Task**
78-
- [Automatic Speech Recognition (ASR)](examples/asr_librispeech/README.md)
79-
- [Text-to-Speech (TTS)](examples/vallex/README.md)
80-
- [Visual Speech Recognition (VSR)](examples/vsr_LRS3/README.md)
80+
- Automatic Speech Recognition (ASR)
81+
- [SLAM-ASR](examples/asr_librispeech/README.md)
82+
83+
- Contextual Automatic Speech Recognition (CASR)
84+
- [ Mala-ASR](examples/mala_asr_slidespeech/README.md)
85+
86+
- [Visual Speech Recognition (VSR)](examples/vsr_LRS3/README.md)
87+
- Speech-to-Text Translation (S2TT)
88+
- [CoT-ST](examples/st_covost2/README.md)
89+
90+
- Text-to-Speech (TTS)
91+
- [VALL-E-X](examples/vallex/README.md)
92+
8193
- **Audio Task**
8294
- [Automated Audio Captioning (AAC)](examples/aac_audiocaps/README.md)
83-
- [Spatial Audio Understanding](examples/seld_spatialsoundqa/README.md)
95+
- [DRCap](examples/drcap_zeroshot_aac/README.md)
96+
- Spatial Audio Understanding
97+
- [BAT](examples/seld_spatialsoundqa/README.md)
8498
- **Music Task**
8599
- [Music Caption (MC)](examples/mc_musiccaps/README.md)
86100

examples/st_covost2/README.md

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -59,10 +59,10 @@ bash infer.sh
5959
## Citation
6060
You can refer to the paper for more results.
6161
```
62-
@article{ma2024embarrassingly,
63-
title={An Embarrassingly Simple Approach for LLM with Strong ASR Capacity},
64-
author={Ma, Ziyang and Yang, Guanrou and Yang, Yifan and Gao, Zhifu and Wang, Jiaming and Du, Zhihao and Yu, Fan and Chen, Qian and Zheng, Siqi and Zhang, Shiliang and others},
65-
journal={arXiv preprint arXiv:2402.08846},
62+
@article{du2024cot,
63+
title={CoT-ST: Enhancing LLM-based Speech Translation with Multimodal Chain-of-Thought},
64+
author={Yexing Du, Ziyang Ma, Yifan Yang, Keqi Deng, Xie Chen, Bo Yang, Yang Xiang, Ming Liu, Bing Qin},
65+
journal={arXiv preprint arXiv:2409.19510},
6666
year={2024}
6767
}
6868
```

0 commit comments

Comments
 (0)