Skip to content

Commit 12b8772

Browse files
committed
md
1 parent e1be609 commit 12b8772

File tree

1 file changed

+17
-9
lines changed

1 file changed

+17
-9
lines changed

examples/st_covost2/README.md

Lines changed: 17 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,15 @@
11
# ST_covost2
22

3+
4+
## Model Stracture
5+
<img src="image/framework.jpg" alt="示例图片" style="width:75%;">
6+
7+
8+
## Multitask
9+
<img src="image/prompt.png" alt="示例图片" style="width:50%;">
10+
11+
12+
313
## Download Model
414
We only train the q-former projector in this recipe.
515
Encoder | Projector | LLM
@@ -33,24 +43,22 @@ You can find the test jsonl in "test_st.jsonl"
3343
{"audio": "/userhome/speech/data/common/4/en/clips/common_voice_en_699711.mp3", "prompt": "\"She'll be all right.\"<|zh|>", "gt": "\"She'll be all right.\"<|zh|>她会没事的。", "source": "covost_enenzh"}
3444
```
3545
## Train Stage
36-
Here, we have designed a four-step training process, where each training session uses the checkpoint obtained from the previous training session.
46+
Here, we have designed a three-step training process, where each training session uses the checkpoint obtained from the previous training session.
3747
```
3848
#In this step, we perform ASR pretraining to acquire speech recognition capabilities.
3949
bash asr_pretrain.sh
4050
4151
#In this phase, we conduct multimodal machine translation training to enhance the final performance.
4252
bash mmt.sh
4353
44-
#monolingual SRT training.
54+
#monolingual SRT training and multitask training.
4555
bash srt.sh
46-
47-
#multilingual multitask training.
4856
bash zsrt.sh
4957
```
5058

5159

5260
## Infer Stage
53-
You can try our pre-trained model. Example for en-zh translation of CoVoST-2.
61+
You can try our pre-trained model.
5462

5563
```
5664
bash infer_enzh.sh
@@ -59,10 +67,10 @@ bash infer_enzh.sh
5967
## Citation
6068
You can refer to the paper for more results.
6169
```
62-
@article{du2024cot,
63-
title={CoT-ST: Enhancing LLM-based Speech Translation with Multimodal Chain-of-Thought},
64-
author={Yexing Du, Ziyang Ma, Yifan Yang, Keqi Deng, Xie Chen, Bo Yang, Yang Xiang, Ming Liu, Bing Qin},
65-
journal={arXiv preprint arXiv:2409.19510},
70+
@article{ma2024embarrassingly,
71+
title={An Embarrassingly Simple Approach for LLM with Strong ASR Capacity},
72+
author={Ma, Ziyang and Yang, Guanrou and Yang, Yifan and Gao, Zhifu and Wang, Jiaming and Du, Zhihao and Yu, Fan and Chen, Qian and Zheng, Siqi and Zhang, Shiliang and others},
73+
journal={arXiv preprint arXiv:2402.08846},
6674
year={2024}
6775
}
6876
```

0 commit comments

Comments
 (0)