How to use dynamic phonation feature with mfcc/fbank feature as the input to feed a DNN.

Hi @jcvasquezc ,

I would like to use dynamic phonation feature with mfcc/fbank feature as the input to feed a DNN.

The related code is shown as below:
`phonafeature=phonation.extract_features_file(filename, static=False, plots=False, fmt="npy")
  fbankfeature, energies = python_speech_features.fbank(filename, samplerate=16000, nfilt=40, nfft=768,winlen=0.04,winstep=0.02, winfunc=np.hamming)`

Because I noticed that the dynamic phonation feature is using winlen=0.04,winstep=0.02, so I set the same parameter value to fbank function.
However, the len(phonafeature) and len(fbankfeature) for one `filename` input is not same.
e.g.: `filename=demo.wav`,this `demo.wav` has 15s long and 16000 sample rate.
the  len(phonafeature) for this demo.wav is (430.7), the  len(fbankfeature) is (749.40).

For concatenate propose, I have to padding the `phonafeature` with constant value 0 to match the `len(fbankfeature)`, i.e., from (430.7) to (749.7). Then I can get the concatenated phonation plus fbank feature (749.47) for `demo.wav`

But I dont think it is the correct way to use dynamic phonation feature with mfcc/fbank feature as the input to feed a DNN.

Could you help me with this issue?
And why is the different in the  length of the output phonation feature and fbank feature under same `winlen` and `winstep`?

Many thanks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

How to use dynamic phonation feature with mfcc/fbank feature as the input to feed a DNN. #32

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

How to use dynamic phonation feature with mfcc/fbank feature as the input to feed a DNN. #32

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions