Skip to content

Optimize weighted embedding extraction with pyannote 3.1 #214

@juanmc2005

Description

@juanmc2005

With pyannote 3.1, we could do only 1 forward pass of the audio instead of num_speakers when extracting embeddings with weights. This is probably at least one of the causes behind the pytorch version of the wespeaker embedding model being that much slower.

This optimization would also reduce the latency of pyannote/embedding so both would need to be re-computed in the README table.

Important: we should verify that this method is also compatible with masking (e.g. in speechbrain embeddings)

Metadata

Metadata

Assignees

No one assigned

    Labels

    featureNew feature or request

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions