Skip to content

Calculate DER for specified speaker identification mapping #69

@swm35

Description

@swm35

Description

Is there a way to use the pyannote.metrics DiarizationErrorRate() to calculate errors based on specifically identified speakers rather than using either the in-built Hungarian optimal mapping or greedy mapping?

Most speaker diarization systems distinguish speakers with generic labels such as SPEAKER_00, but if we have a speaker identification system on top it would be great to see how the DER is affected.

Example

Reference RTTM file:
SPEAKER AUDIOFILE1 1 1 5 ANN
SPEAKER AUDIOFILE1 1 6 3 BOB
SPEAKER AUDIOFILE1 1 8 2 ANN

Hypothesis RTTM file 1 from a speaker diarization system:
SPEAKER AUDIOFILE1 1 1 5 SPEAKER_00
SPEAKER AUDIOFILE1 1 6 3 SPEAKER_01
SPEAKER AUDIOFILE1 1 8 2 SPEAKER_00

DER for hypothesis RTTM file 1 is 20%, comprising 10% miss and 10% false alarm.

Hypothesis RTTM file 2 from speaker diarization followed by speaker identification:
SPEAKER AUDIOFILE1 1 1 5 BOB
SPEAKER AUDIOFILE1 1 6 3 ANN
SPEAKER AUDIOFILE1 1 8 2 BOB

DER for hypothesis RTTM file 2 is still 20%, comprising 10% miss and 10% false alarm. It does not factor in the speaker identification error from the wrongly identified speakers.

Apologies if I am asking something obvious. I feel there must be an easy answer out there but I have not found it. I am aware of speaker-attributed word error rates (SAWER) and its variants, but am not aware of any speaker-attributed DER metrics.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions