Skip to content

Commit d599ce4

Browse files
authored
Merge pull request #140 from X-LANCE/yxdu
Update README
2 parents fbe3b65 + 12b8772 commit d599ce4

File tree

4 files changed

+13
-5
lines changed

4 files changed

+13
-5
lines changed

examples/st_covost2/README.md

Lines changed: 13 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,15 @@
11
# ST_covost2
22

3+
4+
## Model Stracture
5+
<img src="image/framework.jpg" alt="示例图片" style="width:75%;">
6+
7+
8+
## Multitask
9+
<img src="image/prompt.png" alt="示例图片" style="width:50%;">
10+
11+
12+
313
## Download Model
414
We only train the q-former projector in this recipe.
515
Encoder | Projector | LLM
@@ -33,18 +43,16 @@ You can find the test jsonl in "test_st.jsonl"
3343
{"audio": "/userhome/speech/data/common/4/en/clips/common_voice_en_699711.mp3", "prompt": "\"She'll be all right.\"<|zh|>", "gt": "\"She'll be all right.\"<|zh|>她会没事的。", "source": "covost_enenzh"}
3444
```
3545
## Train Stage
36-
Here, we have designed a four-step training process, where each training session uses the checkpoint obtained from the previous training session.
46+
Here, we have designed a three-step training process, where each training session uses the checkpoint obtained from the previous training session.
3747
```
3848
#In this step, we perform ASR pretraining to acquire speech recognition capabilities.
3949
bash asr_pretrain.sh
4050
4151
#In this phase, we conduct multimodal machine translation training to enhance the final performance.
4252
bash mmt.sh
4353
44-
#monolingual SRT training.
54+
#monolingual SRT training and multitask training.
4555
bash srt.sh
46-
47-
#multilingual multitask training.
4856
bash zsrt.sh
4957
```
5058

@@ -53,7 +61,7 @@ bash zsrt.sh
5361
You can try our pre-trained model.
5462

5563
```
56-
bash infer.sh
64+
bash infer_enzh.sh
5765
```
5866

5967
## Citation
1.52 MB
Loading
221 KB
Loading
File renamed without changes.

0 commit comments

Comments
 (0)