-
Notifications
You must be signed in to change notification settings - Fork 676
FAQ
-
JIT model supports both
8000and16000Hz, ONNX model supports16000Hz only. Although other values are not directly supported, multiples of16000(e.g.32000or48000) are cast to16000inside of the JIT model!
-
Though for majority of use cases no tuning is necessary by design, a good start would be to plot probabilities, select the
threshold,min_speech_duration_ms,window_size_samplesandmin_silence_duration_ms. See thus discussion and docstrings for examples. -
This should give you some idea. Also please see the docstring for some base values. typically anything higher than 16 kHz is not required for speech. The model most likely will have problems with extremely long chunks.
-
Yes. Though the models were designed for streaming, they can also be used to process long audios. Please see the provided utils, the
jitmodel for example has methodmodel.reset_states().

