-
Notifications
You must be signed in to change notification settings - Fork 43
Description
Description
Is there a way to use the pyannote.metrics DiarizationErrorRate() to calculate errors based on specifically identified speakers rather than using either the in-built Hungarian optimal mapping or greedy mapping?
Most speaker diarization systems distinguish speakers with generic labels such as SPEAKER_00, but if we have a speaker identification system on top it would be great to see how the DER is affected.
Example
Reference RTTM file:
SPEAKER AUDIOFILE1 1 1 5 ANN
SPEAKER AUDIOFILE1 1 6 3 BOB
SPEAKER AUDIOFILE1 1 8 2 ANN
Hypothesis RTTM file 1 from a speaker diarization system:
SPEAKER AUDIOFILE1 1 1 5 SPEAKER_00
SPEAKER AUDIOFILE1 1 6 3 SPEAKER_01
SPEAKER AUDIOFILE1 1 8 2 SPEAKER_00
DER for hypothesis RTTM file 1 is 20%, comprising 10% miss and 10% false alarm.
Hypothesis RTTM file 2 from speaker diarization followed by speaker identification:
SPEAKER AUDIOFILE1 1 1 5 BOB
SPEAKER AUDIOFILE1 1 6 3 ANN
SPEAKER AUDIOFILE1 1 8 2 BOB
DER for hypothesis RTTM file 2 is still 20%, comprising 10% miss and 10% false alarm. It does not factor in the speaker identification error from the wrongly identified speakers.
Apologies if I am asking something obvious. I feel there must be an easy answer out there but I have not found it. I am aware of speaker-attributed word error rates (SAWER) and its variants, but am not aware of any speaker-attributed DER metrics.