Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
18 changes: 13 additions & 5 deletions examples/st_covost2/README.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,15 @@
# ST_covost2


## Model Stracture
<img src="image/framework.jpg" alt="示例图片" style="width:75%;">


## Multitask
<img src="image/prompt.png" alt="示例图片" style="width:50%;">



## Download Model
We only train the q-former projector in this recipe.
Encoder | Projector | LLM
Expand Down Expand Up @@ -33,18 +43,16 @@ You can find the test jsonl in "test_st.jsonl"
{"audio": "/userhome/speech/data/common/4/en/clips/common_voice_en_699711.mp3", "prompt": "\"She'll be all right.\"<|zh|>", "gt": "\"She'll be all right.\"<|zh|>她会没事的。", "source": "covost_enenzh"}
```
## Train Stage
Here, we have designed a four-step training process, where each training session uses the checkpoint obtained from the previous training session.
Here, we have designed a three-step training process, where each training session uses the checkpoint obtained from the previous training session.
```
#In this step, we perform ASR pretraining to acquire speech recognition capabilities.
bash asr_pretrain.sh

#In this phase, we conduct multimodal machine translation training to enhance the final performance.
bash mmt.sh

#monolingual SRT training.
#monolingual SRT training and multitask training.
bash srt.sh

#multilingual multitask training.
bash zsrt.sh
```

Expand All @@ -53,7 +61,7 @@ bash zsrt.sh
You can try our pre-trained model.

```
bash infer.sh
bash infer_enzh.sh
```

## Citation
Expand Down
Binary file added examples/st_covost2/image/framework.jpg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added examples/st_covost2/image/prompt.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading