Replies: 1 comment
-
|
The underlying model has indeed been specifically trained to recognize at most 2 simultaneous speakers. You'll need to train (or finetune) a new one with |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
I have an audio that includes parts where three or more speakers are speaking at the same time. However, the pre-trained
pyannotemodels seems to have problems recognizing this.What I hoped to try:
min_duration_onfor pre-trained model - maybe the overlapping speeches are too short and setting this to0would be more sensitive to multiple speakers recognition. However, I was not able to achieve this, potential solutions are discussed here with only link to documentationmax_speakers_per_chunkormax_speakers_per_frame- however, this seems possible only for fine-tuning the models.Is there an easy way to use
pipelinewith making the model more willing to recognize three or more speakers at the same timestamp?Current code, which recognizes only up to two speakers at given timestamp:
The audio file is a test video compiled from Mozilla Common Voice dataset, the last 15 seconds contain overlapping speech of three people. I haven't found an example where three or more speakers at the same timestamp are recognized.
Beta Was this translation helpful? Give feedback.
All reactions