Difference between mfcc feature and torch tensor feature used while computing d vector? Does mfcc affect speaker identification accuracy?

Hi,

Thanks a lot for your work.

I have reproduced the same for TIMIT dataset. Now I have converted that model to ONNX and then to tflite for d vector computaion and speaker identification on mobile.

As I verified sincnet tflite model in python it worked for me, but now I have to do the same inference on mobile device.

So I am trying to convert raw audio into tensor and same numpy computation in c/c++.

I have not found  any direct way of audio into tensor conversion, as there is no direct implementation of torchaudio for mobile so I am looking to this mfcc computation of audio and then convert into tensor of same dimension.

I can compute MFCC feature using some c++ library as well as some c++ code. Now I want to know Have you tried to compute d vector (or train model for speaker_id) using MFCC instead of torch tensor(soundfile feature and then to torch tensor)?

As I can found you did something like mfcc comparison  as mentioned in below link:
https://github.com/pytorch/audio/issues/328

So Can you please confirm:
1. What if I directly compute mfcc feature from audio using some c++ library and then can load sincnet model on mobile for final d vector calculation? Does it affect my d vector values or Does it lead to some major difference in speaker identification final accuracy than using torch tensor?
2. Can you suggest a method to convert raw audio to tensor on mobile device similar to what has been done here?

I hope my question is clear.

Thanks a lot.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Difference between mfcc feature and torch tensor feature used while computing d vector? Does mfcc affect speaker identification accuracy? #99

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Difference between mfcc feature and torch tensor feature used while computing d vector? Does mfcc affect speaker identification accuracy? #99

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions