Recognizing three or more overlapping at the same timestamp #1837

druskacik · 2025-01-27T12:49:10Z

druskacik
Jan 27, 2025

I have an audio that includes parts where three or more speakers are speaking at the same time. However, the pre-trained pyannote models seems to have problems recognizing this.

What I hoped to try:

Editing min_duration_on for pre-trained model - maybe the overlapping speeches are too short and setting this to 0 would be more sensitive to multiple speakers recognition. However, I was not able to achieve this, potential solutions are discussed here with only link to documentation
Setting max_speakers_per_chunk or max_speakers_per_frame - however, this seems possible only for fine-tuning the models.

Is there an easy way to use pipeline with making the model more willing to recognize three or more speakers at the same timestamp?

Current code, which recognizes only up to two speakers at given timestamp:

import os
import torch
from pyannote.audio import Pipeline

from dotenv import load_dotenv
load_dotenv()

model_handle = 'pyannote/speaker-diarization-3.1'
pipeline = Pipeline.from_pretrained(
    model_handle,
    use_auth_token=os.getenv('HF_TOKEN'))

from pyannote.audio.pipelines.utils.hook import ProgressHook
with ProgressHook() as hook:
    diarization = pipeline("data/common_voice_combined.mp3", hook=hook)

The audio file is a test video compiled from Mozilla Common Voice dataset, the last 15 seconds contain overlapping speech of three people. I haven't found an example where three or more speakers at the same timestamp are recognized.

hbredin · 2025-01-28T11:08:12Z

hbredin
Jan 28, 2025
Maintainer

The underlying model has indeed been specifically trained to recognize at most 2 simultaneous speakers.

You'll need to train (or finetune) a new one with max_speakers_per_frame = 3 to reach your goal.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Recognizing three or more overlapping at the same timestamp #1837

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Recognizing three or more overlapping at the same timestamp #1837

Uh oh!

druskacik Jan 27, 2025

Replies: 1 comment

Uh oh!

hbredin Jan 28, 2025 Maintainer

druskacik
Jan 27, 2025

hbredin
Jan 28, 2025
Maintainer