-
Notifications
You must be signed in to change notification settings - Fork 85
Description
Hi @jcvasquezc ,
I would like to use dynamic phonation feature with mfcc/fbank feature as the input to feed a DNN.
The related code is shown as below:
phonafeature=phonation.extract_features_file(filename, static=False, plots=False, fmt="npy") fbankfeature, energies = python_speech_features.fbank(filename, samplerate=16000, nfilt=40, nfft=768,winlen=0.04,winstep=0.02, winfunc=np.hamming)
Because I noticed that the dynamic phonation feature is using winlen=0.04,winstep=0.02, so I set the same parameter value to fbank function.
However, the len(phonafeature) and len(fbankfeature) for one filename input is not same.
e.g.: filename=demo.wav,this demo.wav has 15s long and 16000 sample rate.
the len(phonafeature) for this demo.wav is (430.7), the len(fbankfeature) is (749.40).
For concatenate propose, I have to padding the phonafeature with constant value 0 to match the len(fbankfeature), i.e., from (430.7) to (749.7). Then I can get the concatenated phonation plus fbank feature (749.47) for demo.wav
But I dont think it is the correct way to use dynamic phonation feature with mfcc/fbank feature as the input to feed a DNN.
Could you help me with this issue?
And why is the different in the length of the output phonation feature and fbank feature under same winlen and winstep?
Many thanks