The best Vietnamese speech recognition using Conforme-CTC 2024

https://github.com/thinhlx1993/vietnamese_asr

Training data

I collect data from many different sources for training. The training data contains over 10k hours of speech data from the sources below

Common Voice dataset
VIVOS dataset
(AN4) database audio files
Vietnamese Speech recognition
Youtube public dataset
Vietnamese Dialogue Telephony speech dataset
Travel Call Center Speech Data
LibriSpeech

Models setup

Tokenizer SentencePieceTokenizer initialized with 128 tokens

121 M Total params

Name	Type	Params
0	preprocessor	AudioToMelSpectrogramPreprocessor
1	encoder	ConformerEncoder
2	decoder	ConvASRDecoder
3	loss	CTCLoss
4	spec_augmentation	SpectrogramAugmentation
5	wer	WER

Benchmark WER result

	WER	CER
without ngram LM	10.71	12.21
with ngram LM	9.15	10.2

How to use

Download model weight here

https://drive.google.com/drive/folders/1SVNibfeMshfVkmatIU90LYok_Mf0zMD0?usp=sharing

Install Nemo Frameworks

https://github.com/NVIDIA/NeMo

You can try demo in the example folder or this one

I created a free-to-use API server to submit the inference data

The file input should have a bitrate of 16000 to avoid hidden bugs

File duration must be lower than 10s

import subprocess

command = [
    "curl", "--location", "https://api.voicesplitter.com/api/v1/uploads",
    "--form", 'file=@"/path/to/your/wav_file.wav"'
]

result = subprocess.run(command, capture_output=True, text=True)
print(result.stdout.encode('utf-8').decode('unicode_escape'))

Contact

[email protected] | Thinh Le's LinkedIn Profile

Name		Name	Last commit message	Last commit date
Latest commit History 24 Commits
example		example
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
get_index.py		get_index.py
install_service.sh		install_service.sh
vietnamese-asr.service		vietnamese-asr.service

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

The best Vietnamese speech recognition using Conforme-CTC 2024

Training data

Models setup

Benchmark WER result

How to use

Download model weight here

Install Nemo Frameworks

You can try demo in the example folder or this one

Contact

About

Uh oh!

Releases 1

Packages

Contributors 2

Uh oh!

Languages

License

thinhlx1993/vietnamese_asr

Folders and files

Latest commit

History

Repository files navigation

The best Vietnamese speech recognition using Conforme-CTC 2024

Training data

Models setup

Benchmark WER result

How to use

Download model weight here

Install Nemo Frameworks

You can try demo in the example folder or this one

Contact

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Contributors 2

Uh oh!

Languages

Packages