Skip to content

thinhlx1993/vietnamese_asr

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

24 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

The best Vietnamese speech recognition using Conforme-CTC 2024

https://github.com/thinhlx1993/vietnamese_asr

PWC

Training data

I collect data from many different sources for training. The training data contains over 10k hours of speech data from the sources below

  • Common Voice dataset
  • VIVOS dataset
  • (AN4) database audio files
  • Vietnamese Speech recognition
  • Youtube public dataset
  • Vietnamese Dialogue Telephony speech dataset
  • Travel Call Center Speech Data
  • LibriSpeech

Models setup

Tokenizer SentencePieceTokenizer initialized with 128 tokens

121 M Total params

Name Type Params
0 preprocessor AudioToMelSpectrogramPreprocessor
1 encoder ConformerEncoder
2 decoder ConvASRDecoder
3 loss CTCLoss
4 spec_augmentation SpectrogramAugmentation
5 wer WER

Benchmark WER result

WER CER
without ngram LM 10.71 12.21
with ngram LM 9.15 10.2

How to use

Download model weight here

https://drive.google.com/drive/folders/1SVNibfeMshfVkmatIU90LYok_Mf0zMD0?usp=sharing

Install Nemo Frameworks

https://github.com/NVIDIA/NeMo

You can try demo in the example folder or this one

I created a free-to-use API server to submit the inference data

The file input should have a bitrate of 16000 to avoid hidden bugs

File duration must be lower than 10s

import subprocess

command = [
    "curl", "--location", "https://api.voicesplitter.com/api/v1/uploads",
    "--form", 'file=@"/path/to/your/wav_file.wav"'
]

result = subprocess.run(command, capture_output=True, text=True)
print(result.stdout.encode('utf-8').decode('unicode_escape'))

Contact

[email protected] | Thinh Le's LinkedIn Profile

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Contributors 2

  •  
  •