Skip to content

Commit f9fed09

Browse files
authored
Merge pull request #10 from alibaba-damo-academy/dev
update FunASR version==0.1.4
2 parents fd27829 + 0b83483 commit f9fed09

349 files changed

Lines changed: 34018 additions & 2968 deletions

File tree

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

README.md

Lines changed: 16 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -1,9 +1,15 @@
1-
<div align="left"><img src="image/funasr_logo.jpg" width="400"/></div>
1+
<div align="left"><img src="docs/images/funasr_logo.jpg" width="400"/></div>
22

33
# FunASR: A Fundamental End-to-End Speech Recognition Toolkit
44

55
<strong>FunASR</strong> hopes to build a bridge between academic research and industrial applications on speech recognition. By supporting the training & finetuning of the industrial-grade speech recognition model released on [ModelScope](https://www.modelscope.cn/models?page=1&tasks=auto-speech-recognition), researchers and developers can conduct research and production of speech recognition models more conveniently, and promote the development of speech recognition ecology. ASR for Fun!
66

7+
## Highlights
8+
- FunASR supports many types of models, such as, Tranformer, Conformer, [Paraformer](https://arxiv.org/abs/2206.08317).
9+
- A large number of ASR models trained on academic datasets or industrial datasets are open sourced on [ModelScope](https://www.modelscope.cn/models?page=1&tasks=auto-speech-recognition),
10+
- The pretrained model [Paraformer-large](https://www.modelscope.cn/models/damo/speech_paraformer-large_asr_nat-zh-cn-16k-common-vocab8404-pytorch/summary) obtains the first place on many task in [SpeechIO leaderboard](https://github.com/SpeechColab/Leaderboard)
11+
- FunASR supports large-scale dataset dataloader and multi-GPU training.
12+
713
## Installation(Training and Developing)
814

915
- Clone the repo:
@@ -27,27 +33,33 @@ conda activate funasr
2733
| 10.2 | conda install pytorch==1.8.0 torchvision==0.9.0 torchaudio==0.8.0 cudatoolkit=10.2 -c pytorch |
2834
| 11.1 | conda install pytorch==1.8.0 torchvision==0.9.0 torchaudio==0.8.0 cudatoolkit=11.1 -c pytorch |
2935

30-
For more versions, please see https://pytorch.org/get-started/locally/
36+
For more versions, please see [https://pytorch.org/get-started/locally](https://pytorch.org/get-started/locally)
3137

3238
- Install ModelScope:
3339
``` sh
3440
pip install "modelscope[audio]" -f https://modelscope.oss-cn-beijing.aliyuncs.com/releases/repo.html
3541
```
3642

37-
- Install other packages:
43+
For more details about modelscope, please see [modelscope installation](https://modelscope.cn/docs/%E7%8E%AF%E5%A2%83%E5%AE%89%E8%A3%85)
44+
45+
- Install FunASR and other packages:
3846

3947
``` sh
4048
pip install --editable ./
4149
```
4250

51+
## Pretrained model hub
52+
53+
We have trained many academic and industrial models, [model hub](docs/modelscope_models.md)
54+
4355
## Contact
4456

4557
If you have any questions about FunASR, please contact us by
4658

4759
- email: [funasr@list.alibaba-inc.com](funasr@list.alibaba-inc.com)
4860

4961
- Dingding group:
50-
<div align="left"><img src="image/dingding.jpg" width="400"/></div>
62+
<div align="left"><img src="docs/images/dingding.jpg" width="400"/></div>
5163

5264

5365
## Acknowledge

docs/images/.DS_Store

6 KB
Binary file not shown.
File renamed without changes.

docs/modelscope_models.md

Lines changed: 34 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,34 @@
1+
# Pretrained models on ModelScope
2+
3+
## Model License
4+
- Apache License 2.0
5+
6+
## Model Zoo
7+
Here we provided several pretrained models on different datasets. The details of models and datasets can be found on [ModelScope](https://www.modelscope.cn/models?page=1&tasks=auto-speech-recognition).
8+
9+
| Datasets | Hours | Model | Online/Offline | Language | Framework | Checkpoint |
10+
|:-----:|:-----:|:--------------:|:--------------:| :---: | :---: | --- |
11+
| Alibaba Speech Data | 60000 | Paraformer | Offline | CN | Pytorch |[speech_paraformer-large_asr_nat-zh-cn-16k-common-vocab8404-pytorch](https://www.modelscope.cn/models/damo/speech_paraformer-large_asr_nat-zh-cn-16k-common-vocab8404-pytorch/summary) |
12+
| Alibaba Speech Data | 50000 | Paraformer | Offline | CN | Tensorflow |[speech_paraformer-large_asr_nat-zh-cn-16k-common-vocab8358-tensorflow1](https://www.modelscope.cn/models/damo/speech_paraformer-large_asr_nat-zh-cn-16k-common-vocab8358-tensorflow1/summary) |
13+
| Alibaba Speech Data | 50000 | Paraformer | Offline | CN | Tensorflow |[speech_paraformer_asr_nat-zh-cn-16k-common-vocab8358-tensorflow1](https://www.modelscope.cn/models/damo/speech_paraformer_asr_nat-zh-cn-16k-common-vocab8358-tensorflow1/summary) |
14+
| Alibaba Speech Data | 50000 | Paraformer | Online | CN | Tensorflow |[speech_paraformer_asr_nat-zh-cn-16k-common-vocab3444-tensorflow1-online](http://www.modelscope.cn/models/damo/speech_paraformer_asr_nat-zh-cn-16k-common-vocab3444-tensorflow1-online/summary) |
15+
| Alibaba Speech Data | 50000 | UniASR | Online | CN | Tensorflow |[speech_UniASR_asr_2pass-zh-cn-16k-common-vocab8358-tensorflow1-online](https://www.modelscope.cn/models/damo/speech_UniASR_asr_2pass-zh-cn-16k-common-vocab8358-tensorflow1-online/summary) |
16+
| Alibaba Speech Data | 50000 | UniASR | Offline | CN | Tensorflow |[speech_UniASR-large_asr_2pass-zh-cn-16k-common-vocab8358-tensorflow1-offline](https://www.modelscope.cn/models/damo/speech_UniASR-large_asr_2pass-zh-cn-16k-common-vocab8358-tensorflow1-offline/summary) |
17+
| Alibaba Speech Data | 50000 | UniASR | Online | CN&EN | Tensorflow |[speech_UniASR_asr_2pass-cn-en-moe-16k-vocab8358-tensorflow1-online](https://www.modelscope.cn/models/damo/speech_UniASR_asr_2pass-cn-en-moe-16k-vocab8358-tensorflow1-online/summary) |
18+
| Alibaba Speech Data | 50000 | UniASR | Offline | CN&EN | Tensorflow |[speech_UniASR_asr_2pass-cn-en-moe-16k-vocab8358-tensorflow1-offline](https://www.modelscope.cn/models/damo/speech_UniASR_asr_2pass-cn-en-moe-16k-vocab8358-tensorflow1-offline/summary) |
19+
| Alibaba Speech Data | 20000 | UniASR | Online | CN-Accent | Tensorflow |[speech_UniASR_asr_2pass-cn-dialect-16k-vocab8358-tensorflow1-online](https://www.modelscope.cn/models/damo/speech_UniASR_asr_2pass-cn-dialect-16k-vocab8358-tensorflow1-online/summary) |
20+
| Alibaba Speech Data | 20000 | UniASR | Offline | CN-Accent | Tensorflow |[speech_UniASR_asr_2pass-cn-dialect-16k-vocab8358-tensorflow1-offline](https://www.modelscope.cn/models/damo/speech_UniASR_asr_2pass-cn-dialect-16k-vocab8358-tensorflow1-offline/summary) |
21+
| Alibaba Speech Data | 30000 | Paraformer-8K | Online | CN | Tensorflow |[speech_paraformer_asr_nat-zh-cn-8k-common-vocab3444-tensorflow1-online](https://www.modelscope.cn/models/damo/speech_paraformer_asr_nat-zh-cn-8k-common-vocab3444-tensorflow1-online/summary) |
22+
| Alibaba Speech Data | 30000 | Paraformer-8K | Offline | CN | Tensorflow |[speech_paraformer_asr_nat-zh-cn-8k-common-vocab8358-tensorflow1](https://www.modelscope.cn/models/damo/speech_paraformer_asr_nat-zh-cn-8k-common-vocab8358-tensorflow1/summary) |
23+
| Alibaba Speech Data | 30000 | Paraformer-8K | Online | CN | Pytorch |[speech_UniASR_asr_2pass-zh-cn-8k-common-vocab3445-pytorch-online](https://www.modelscope.cn/models/damo/speech_UniASR_asr_2pass-zh-cn-8k-common-vocab3445-pytorch-online/summary) |
24+
| Alibaba Speech Data | 30000 | Paraformer-8K | Offline | CN | Pytorch |[speech_UniASR_asr_2pass-zh-cn-8k-common-vocab3445-pytorch-offline](https://www.modelscope.cn/models/damo/speech_UniASR_asr_2pass-zh-cn-8k-common-vocab3445-pytorch-offline/summary) |
25+
| Alibaba Speech Data | 30000 | UniASR-8K | Online | CN | Tensorflow |[speech_UniASR_asr_2pass-zh-cn-8k-common-vocab8358-tensorflow1-online](https://www.modelscope.cn/models/damo/speech_UniASR_asr_2pass-zh-cn-8k-common-vocab8358-tensorflow1-online/summary) |
26+
| Alibaba Speech Data | 30000 | UniASR-8K | Offline | CN | Tensorflow |[speech_UniASR_asr_2pass-zh-cn-8k-common-vocab8358-tensorflow1-offline](https://www.modelscope.cn/models/damo/speech_UniASR_asr_2pass-zh-cn-8k-common-vocab8358-tensorflow1-offline/summary) |
27+
| Alibaba Speech Data | 30000 | UniASR-8K | Online | CN | Pytorch |[speech_UniASR_asr_2pass-zh-cn-8k-common-vocab3445-pytorch-online](https://www.modelscope.cn/models/damo/speech_UniASR_asr_2pass-zh-cn-8k-common-vocab3445-pytorch-online/summary) |
28+
| Alibaba Speech Data | 30000 | UniASR-8K | Offline | CN | Pytorch |[speech_UniASR_asr_2pass-zh-cn-8k-common-vocab3445-pytorch-offline](https://www.modelscope.cn/models/damo/speech_UniASR_asr_2pass-zh-cn-8k-common-vocab3445-pytorch-offline/summary) |
29+
| AISHELL-1 | 178 | Paraformer | Offline | CN | Pytorch | [speech_paraformer_asr_nat-aishell1-pytorch](https://www.modelscope.cn/models/damo/speech_paraformer_asr_nat-aishell1-pytorch/summary) |
30+
| AISHELL-2 | 1000 | Paraformer | Offline | CN | Pytorch | [speech_paraformer_asr_nat-aishell2-pytorch](https://www.modelscope.cn/models/damo/speech_paraformer_asr_nat-aishell2-pytorch/summary) |
31+
| AISHELL-1 | 178 | ParaformerBert | Offline | CN | Pytorch | [speech_paraformerbert_asr_nat-zh-cn-16k-aishell1-vocab4234-pytorch](https://modelscope.cn/models/damo/speech_paraformerbert_asr_nat-zh-cn-16k-aishell1-vocab4234-pytorch/summary) |
32+
| AISHELL-2 | 1000 | ParaformerBert | Offline | CN | Pytorch | [speech_paraformerbert_asr_nat-zh-cn-16k-aishell2-vocab5212-pytorch](https://modelscope.cn/models/damo/speech_paraformerbert_asr_nat-zh-cn-16k-aishell2-vocab5212-pytorch/summary) |
33+
| AISHELL-1 | 178 | Conformer | Offline | CN | Pytorch | [speech_conformer_asr_nat-zh-cn-16k-aishell1-vocab4234-pytorch](https://modelscope.cn/models/damo/speech_conformer_asr_nat-zh-cn-16k-aishell1-vocab4234-pytorch/summary) |
34+
| AISHELL-2 | 1000 | Conformer | Offline | CN | Pytorch | [speech_conformer_asr_nat-zh-cn-16k-aishell2-vocab5212-pytorch](https://modelscope.cn/models/damo/speech_conformer_asr_nat-zh-cn-16k-aishell2-vocab5212-pytorch/summary) |

egs/aishell/conformer/run.sh

Lines changed: 66 additions & 26 deletions
Original file line numberDiff line numberDiff line change
@@ -10,9 +10,10 @@ gpu_inference=true # Whether to perform gpu decoding, set false for cpu decodin
1010
# for gpu decoding, inference_nj=ngpu*njob; for cpu decoding, inference_nj=njob
1111
njob=8
1212
train_cmd=utils/run.pl
13+
infer_cmd=utils/run.pl
1314

1415
# general configuration
15-
feats_dir=".." #feature output dictionary, for large data
16+
feats_dir="../DATA" #feature output dictionary, for large data
1617
exp_dir="."
1718
lang=zh
1819
dumpdir=dump/fbank
@@ -59,8 +60,10 @@ ngpu=$(echo $gpuid_list | awk -F "," '{print NF}')
5960

6061
if ${gpu_inference}; then
6162
inference_nj=$[${ngpu}*${njob}]
63+
_ngpu=1
6264
else
6365
inference_nj=$njob
66+
_ngpu=0
6467
fi
6568

6669
if [ ${stage} -le 0 ] && [ ${stop_stage} -ge 0 ]; then
@@ -83,18 +86,18 @@ if [ ${stage} -le 1 ] && [ ${stop_stage} -ge 1 ]; then
8386
echo "stage 1: Feature Generation"
8487
# compute fbank features
8588
fbankdir=${feats_dir}/fbank
86-
utils/compute_fbank.sh --cmd "$train_cmd" --nj $nj --speed_perturb ${speed_perturb} \
89+
utils/compute_fbank.sh --cmd "$train_cmd" --nj $nj --feats_dim ${feats_dim} --sample_frequency ${sample_frequency} --speed_perturb ${speed_perturb} \
8790
${feats_dir}/data/train ${exp_dir}/exp/make_fbank/train ${fbankdir}/train
8891
utils/fix_data_feat.sh ${fbankdir}/train
89-
utils/compute_fbank.sh --cmd "$train_cmd" --nj $nj \
92+
utils/compute_fbank.sh --cmd "$train_cmd" --nj $nj --feats_dim ${feats_dim} --sample_frequency ${sample_frequency} \
9093
${feats_dir}/data/dev ${exp_dir}/exp/make_fbank/dev ${fbankdir}/dev
9194
utils/fix_data_feat.sh ${fbankdir}/dev
92-
utils/compute_fbank.sh --cmd "$train_cmd" --nj $nj \
95+
utils/compute_fbank.sh --cmd "$train_cmd" --nj $nj --feats_dim ${feats_dim} --sample_frequency ${sample_frequency} \
9396
${feats_dir}/data/test ${exp_dir}/exp/make_fbank/test ${fbankdir}/test
9497
utils/fix_data_feat.sh ${fbankdir}/test
9598

9699
# compute global cmvn
97-
utils/compute_cmvn.sh --cmd "$train_cmd" --nj $nj \
100+
utils/compute_cmvn.sh --cmd "$train_cmd" --nj $nj --feats_dim ${feats_dim} \
98101
${fbankdir}/train ${exp_dir}/exp/make_fbank/train
99102

100103
# apply cmvn
@@ -112,6 +115,10 @@ if [ ${stage} -le 1 ] && [ ${stop_stage} -ge 1 ]; then
112115
utils/fix_data_feat.sh ${feat_train_dir}
113116
utils/fix_data_feat.sh ${feat_dev_dir}
114117
utils/fix_data_feat.sh ${feat_test_dir}
118+
119+
#generate ark list
120+
utils/gen_ark_list.sh --cmd "$train_cmd" --nj $nj ${feat_train_dir} ${fbankdir}/train ${feat_train_dir}
121+
utils/gen_ark_list.sh --cmd "$train_cmd" --nj $nj ${feat_dev_dir} ${fbankdir}/dev ${feat_dev_dir}
115122
fi
116123

117124
token_list=${feats_dir}/data/${lang}_token_list/char/tokens.txt
@@ -140,9 +147,10 @@ fi
140147
# Training Stage
141148
world_size=$gpu_num # run on one machine
142149
if [ ${stage} -le 3 ] && [ ${stop_stage} -ge 3 ]; then
150+
echo "stage 3: Training"
143151
mkdir -p ${exp_dir}/exp/${model_dir}
144152
mkdir -p ${exp_dir}/exp/${model_dir}/log
145-
INIT_FILE=$exp_dir/ddp_init
153+
INIT_FILE=${exp_dir}/exp/${model_dir}/ddp_init
146154
if [ -f $INIT_FILE ];then
147155
rm -f $INIT_FILE
148156
fi
@@ -184,25 +192,57 @@ fi
184192

185193
# Testing Stage
186194
if [ ${stage} -le 4 ] && [ ${stop_stage} -ge 4 ]; then
187-
utils/easy_asr_infer.sh \
188-
--lang zh \
189-
--datadir ${feats_dir} \
190-
--feats_type ${feats_type} \
191-
--feats_dim ${feats_dim} \
192-
--token_type ${token_type} \
193-
--gpu_inference ${gpu_inference} \
194-
--inference_config "${inference_config}" \
195-
--test_sets "${test_sets}" \
196-
--token_list $token_list \
197-
--asr_exp ${exp_dir}/${model_dir} \
198-
--stage 12 \
199-
--stop_stage 12 \
200-
--scp $scp \
201-
--text text \
202-
--inference_nj $inference_nj \
203-
--njob $njob \
204-
--inference_asr_model $inference_asr_model \
205-
--gpuid_list $gpuid_list \
206-
--mode asr
195+
echo "stage 4: Inference"
196+
for dset in ${test_sets}; do
197+
asr_exp=${exp_dir}/exp/${model_dir}
198+
inference_tag="$(basename "${inference_config}" .yaml)"
199+
_dir="${asr_exp}/${inference_tag}/${inference_asr_model}/${dset}"
200+
_logdir="${_dir}/logdir"
201+
if [ -d ${_dir} ]; then
202+
echo "${_dir} is already exists. if you want to decode again, please delete this dir first."
203+
exit 0
204+
fi
205+
mkdir -p "${_logdir}"
206+
_data="${feats_dir}/${dumpdir}/${dset}"
207+
key_file=${_data}/${scp}
208+
num_scp_file="$(<${key_file} wc -l)"
209+
_nj=$([ $inference_nj -le $num_scp_file ] && echo "$inference_nj" || echo "$num_scp_file")
210+
split_scps=
211+
for n in $(seq "${_nj}"); do
212+
split_scps+=" ${_logdir}/keys.${n}.scp"
213+
done
214+
# shellcheck disable=SC2086
215+
utils/split_scp.pl "${key_file}" ${split_scps}
216+
_opts=
217+
if [ -n "${inference_config}" ]; then
218+
_opts+="--config ${inference_config} "
219+
fi
220+
${infer_cmd} --gpu "${_ngpu}" --max-jobs-run "${_nj}" JOB=1:"${_nj}" "${_logdir}"/asr_inference.JOB.log \
221+
python -m funasr.bin.asr_inference_launch \
222+
--batch_size 1 \
223+
--ngpu "${_ngpu}" \
224+
--njob ${njob} \
225+
--gpuid_list ${gpuid_list} \
226+
--data_path_and_name_and_type "${_data}/${scp},speech,${type}" \
227+
--key_file "${_logdir}"/keys.JOB.scp \
228+
--asr_train_config "${asr_exp}"/config.yaml \
229+
--asr_model_file "${asr_exp}"/"${inference_asr_model}" \
230+
--output_dir "${_logdir}"/output.JOB \
231+
--mode asr \
232+
${_opts}
233+
234+
for f in token token_int score text; do
235+
if [ -f "${_logdir}/output.1/1best_recog/${f}" ]; then
236+
for i in $(seq "${_nj}"); do
237+
cat "${_logdir}/output.${i}/1best_recog/${f}"
238+
done | sort -k1 >"${_dir}/${f}"
239+
fi
240+
done
241+
python utils/proce_text.py ${_dir}/text ${_dir}/text.proc
242+
python utils/proce_text.py ${_data}/text ${_data}/text.proc
243+
python utils/compute_wer.py ${_data}/text.proc ${_dir}/text.proc ${_dir}/text.cer
244+
tail -n 3 ${_dir}/text.cer > ${_dir}/text.cer.txt
245+
cat ${_dir}/text.cer.txt
246+
done
207247
fi
208248

egs/aishell/conformer/utils

Lines changed: 0 additions & 1 deletion
This file was deleted.

egs/aishell/conformer/utils/__init__.py

Whitespace-only changes.
Lines changed: 79 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,79 @@
1+
from kaldiio import ReadHelper
2+
from kaldiio import WriteHelper
3+
4+
import argparse
5+
import json
6+
import math
7+
import numpy as np
8+
9+
10+
def get_parser():
11+
parser = argparse.ArgumentParser(
12+
description="apply cmvn",
13+
formatter_class=argparse.ArgumentDefaultsHelpFormatter,
14+
)
15+
parser.add_argument(
16+
"--ark-file",
17+
"-a",
18+
default=False,
19+
required=True,
20+
type=str,
21+
help="fbank ark file",
22+
)
23+
parser.add_argument(
24+
"--cmvn-file",
25+
"-c",
26+
default=False,
27+
required=True,
28+
type=str,
29+
help="cmvn file",
30+
)
31+
parser.add_argument(
32+
"--ark-index",
33+
"-i",
34+
default=1,
35+
required=True,
36+
type=int,
37+
help="ark index",
38+
)
39+
parser.add_argument(
40+
"--output-dir",
41+
"-o",
42+
default=False,
43+
required=True,
44+
type=str,
45+
help="output dir",
46+
)
47+
return parser
48+
49+
50+
def main():
51+
parser = get_parser()
52+
args = parser.parse_args()
53+
54+
ark_file = args.output_dir + "/feats." + str(args.ark_index) + ".ark"
55+
scp_file = args.output_dir + "/feats." + str(args.ark_index) + ".scp"
56+
ark_writer = WriteHelper('ark,scp:{},{}'.format(ark_file, scp_file))
57+
58+
with open(args.cmvn_file) as f:
59+
cmvn_stats = json.load(f)
60+
61+
means = cmvn_stats['mean_stats']
62+
vars = cmvn_stats['var_stats']
63+
total_frames = cmvn_stats['total_frames']
64+
65+
for i in range(len(means)):
66+
means[i] /= total_frames
67+
vars[i] = vars[i] / total_frames - means[i] * means[i]
68+
if vars[i] < 1.0e-20:
69+
vars[i] = 1.0e-20
70+
vars[i] = 1.0 / math.sqrt(vars[i])
71+
72+
with ReadHelper('ark:{}'.format(args.ark_file)) as ark_reader:
73+
for key, mat in ark_reader:
74+
mat = (mat - means) * vars
75+
ark_writer(key, mat)
76+
77+
78+
if __name__ == '__main__':
79+
main()

0 commit comments

Comments
 (0)