You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We only train the q-former projector in this recipe.
5
15
Encoder | Projector | LLM
@@ -33,24 +43,22 @@ You can find the test jsonl in "test_st.jsonl"
33
43
{"audio": "/userhome/speech/data/common/4/en/clips/common_voice_en_699711.mp3", "prompt": "\"She'll be all right.\"<|zh|>", "gt": "\"She'll be all right.\"<|zh|>她会没事的。", "source": "covost_enenzh"}
34
44
```
35
45
## Train Stage
36
-
Here, we have designed a four-step training process, where each training session uses the checkpoint obtained from the previous training session.
46
+
Here, we have designed a three-step training process, where each training session uses the checkpoint obtained from the previous training session.
37
47
```
38
48
#In this step, we perform ASR pretraining to acquire speech recognition capabilities.
39
49
bash asr_pretrain.sh
40
50
41
51
#In this phase, we conduct multimodal machine translation training to enhance the final performance.
42
52
bash mmt.sh
43
53
44
-
#monolingual SRT training.
54
+
#monolingual SRT training and multitask training.
45
55
bash srt.sh
46
-
47
-
#multilingual multitask training.
48
56
bash zsrt.sh
49
57
```
50
58
51
59
52
60
## Infer Stage
53
-
You can try our pre-trained model. Example for en-zh translation of CoVoST-2.
61
+
You can try our pre-trained model.
54
62
55
63
```
56
64
bash infer_enzh.sh
@@ -59,10 +67,10 @@ bash infer_enzh.sh
59
67
## Citation
60
68
You can refer to the paper for more results.
61
69
```
62
-
@article{du2024cot,
63
-
title={CoT-ST: Enhancing LLM-based Speech Translation with Multimodal Chain-of-Thought},
64
-
author={Yexing Du, Ziyang Ma, Yifan Yang, Keqi Deng, Xie Chen, Bo Yang, Yang Xiang, Ming Liu, Bing Qin},
65
-
journal={arXiv preprint arXiv:2409.19510},
70
+
@article{ma2024embarrassingly,
71
+
title={An Embarrassingly Simple Approach for LLM with Strong ASR Capacity},
72
+
author={Ma, Ziyang and Yang, Guanrou and Yang, Yifan and Gao, Zhifu and Wang, Jiaming and Du, Zhihao and Yu, Fan and Chen, Qian and Zheng, Siqi and Zhang, Shiliang and others},
0 commit comments