Skip to content

Difference between mfcc feature and torch tensor feature used while computing d vector? Does mfcc affect speaker identification accuracy? #99

@alamnasim

Description

@alamnasim

Hi,

Thanks a lot for your work.

I have reproduced the same for TIMIT dataset. Now I have converted that model to ONNX and then to tflite for d vector computaion and speaker identification on mobile.

As I verified sincnet tflite model in python it worked for me, but now I have to do the same inference on mobile device.

So I am trying to convert raw audio into tensor and same numpy computation in c/c++.

I have not found any direct way of audio into tensor conversion, as there is no direct implementation of torchaudio for mobile so I am looking to this mfcc computation of audio and then convert into tensor of same dimension.

I can compute MFCC feature using some c++ library as well as some c++ code. Now I want to know Have you tried to compute d vector (or train model for speaker_id) using MFCC instead of torch tensor(soundfile feature and then to torch tensor)?

As I can found you did something like mfcc comparison as mentioned in below link:
pytorch/audio#328

So Can you please confirm:

  1. What if I directly compute mfcc feature from audio using some c++ library and then can load sincnet model on mobile for final d vector calculation? Does it affect my d vector values or Does it lead to some major difference in speaker identification final accuracy than using torch tensor?
  2. Can you suggest a method to convert raw audio to tensor on mobile device similar to what has been done here?

I hope my question is clear.

Thanks a lot.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions