Hi, thanks for sharing I wonder how the pre-trained model was trained. What data did you use to train the encoder ? LibriSpeech ? VoxCeleb ?