Skip to content

Commit 85191fb

Browse files
committed
update readme
1 parent 2b2f33f commit 85191fb

File tree

2 files changed

+23
-5
lines changed

2 files changed

+23
-5
lines changed

examples/s2s/README.md

Lines changed: 19 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -46,6 +46,7 @@ We also support JSONL format for its concise structure. Below is an example:
4646
We reproduced the single-stage fine-tuning results of SLAM-Omni with a group size of **3**. The following checkpoints are available for download:
4747
- [Single-Round Dialogue (English)](https://drive.google.com/drive/folders/1ZmM1h5ZTvS-piuN-msmctmZdi51GWLAu?usp=sharing): Trained on VoiceAssistant-400K.
4848
- [Multi-Round Dialogue (English)](https://drive.google.com/drive/folders/1xBNrqR2LWC0uEjezjx4aUgdsbstisboS?usp=sharing): Trained on VoiceAssistant-400K and UltraChat-300K.
49+
- [Multi-Round Dialogue (Chinese)](https://drive.google.com/drive/folders/1sExIp-UDdL37gb-mh9YlhuDIib0-wUVP?usp=sharing): Trained on Belle_1.4M.
4950

5051

5152
## Training
@@ -114,4 +115,21 @@ bash ./examples/s2s/scripts/inference/mini-omni/inference_s2s_batch.sh
114115

115116
## Acknowledgement
116117
- We borrow some code from [Mini-Omni](https://github.com/gpt-omni/mini-omni) for SNAC-based modeling.
117-
- We borrow some code from [CosyVoice](https://github.com/FunAudioLLM/CosyVoice) for the vocoder.
118+
- We borrow some code from [CosyVoice](https://github.com/FunAudioLLM/CosyVoice) for the vocoder.
119+
120+
## Citation
121+
<!-- ```bibtex
122+
123+
``` -->
124+
125+
```bibtex
126+
@article{xie2024mini,
127+
title={Mini-omni: Language models can hear, talk while thinking in streaming},
128+
author={Xie, Zhifei and Wu, Changqiao},
129+
journal={arXiv preprint arXiv:2408.16725},
130+
year={2024}
131+
}
132+
```
133+
134+
## License
135+
Our code is released under MIT License. The Chinese dialogue model is licensed under GPL-3.0 due to its use of Belle data and is intended for research purposes only.

examples/s2s/scripts/inference/inference_s2s_online_multi-round.sh

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -33,7 +33,7 @@ num_latency_tokens=0 # number of latency tokens (same as the numb
3333
do_layershift=false # if false, tokens in each layers use the same codebook, otherwise, use different codebooks
3434

3535
# load the backbone model
36-
ckpt_path=/valleblob/v-wenxichen/exp/s2s/en-mix/s2s_train_v4-Qwen2-0.5b-gpu4-btz3-lr1e-4-fp16-epochs10-whisper_small-latency0-group3-multiround-from_pretrained/s2s_epoch_2_step_23152
36+
ckpt_path=/valleblob/v-wenxichen/exp/s2s/zh-single/s2s_train_v4-Qwen2-0.5b-gpu4-btz3-lr1e-4-fp16-epochs10-whisper_small-latency0-group3-chinese-multiround-from_scratch/s2s_epoch_2_step_82467
3737

3838
# model settings
3939
group_decode=true
@@ -56,9 +56,9 @@ output_text_only=false
5656
speech_sample_rate=22050 # 22050 for CosyVoice, 24000 for SNAC
5757
inference_online=true
5858
multi_round=true
59-
online_output_dir=/home/v-wenxichen/exp/cosyvoice/multi-round-en
60-
# audio_prompt_path=./examples/s2s/audio_prompt/zh/prompt_6.wav # replace this with your own audio prompt path or our provided audio prompt path
61-
audio_prompt_path=./examples/s2s/audio_prompt/en/prompt_6.wav # replace this with your own audio prompt path or our provided audio prompt path
59+
online_output_dir=/home/v-wenxichen/exp/cosyvoice/multi-round-zh
60+
audio_prompt_path=./examples/s2s/audio_prompt/zh/prompt_6.wav # replace this with your own audio prompt path or our provided audio prompt path
61+
# audio_prompt_path=./examples/s2s/audio_prompt/en/prompt_6.wav # replace this with your own audio prompt path or our provided audio prompt path
6262

6363
decode_log=$ckpt_path/s2s_decode_${split}_trp${text_repetition_penalty}_arp${audio_repetition_penalty}_seed${dataset_sample_seed}_greedy
6464
if [ "$do_sample" = true ] ; then

0 commit comments

Comments
 (0)