[Not for merge] Diarization workflow with SpeechBrain #1031

desh2608 · 2023-04-17T18:40:45Z

This workflow shows how we can use SpeechBrain x-vectors + sklearn agglomerative clustering to perform a crude speaker diarization. This can be used on top of the whisper workflow to obtain speaker-attributed transcripts.

… diar_workflow

pzelasko

This is cool, what is the reason you don't want to merge it?

desh2608 · 2023-04-18T23:52:11Z

This is cool, what is the reason you don't want to merge it?

Mainly because this approach isn't really benchmarked on anything, and I am not sure how well the ECAPA-TDNN embeddings would work with agglomerative clustering.

flyingleafe · 2023-05-16T11:47:44Z

@desh2608 pyannote.audio is basically ECAPA-TDNN + agglomerative clustering, and it is benchmarked quite well.
(https://github.com/pyannote/pyannote-audio)
Why not use it directly?

desh2608 · 2023-05-16T12:37:42Z

@desh2608 pyannote.audio is basically ECAPA-TDNN + agglomerative clustering, and it is benchmarked quite well. (https://github.com/pyannote/pyannote-audio) Why not use it directly?

I think that was in the older Pyannote, if I'm not mistaken? Pyannote 2.0 uses end-to-end segmentation which performs much better. In any case, this was just a quick DIY workflow. It should be relatively easy for folks to just use Pyannote to create RTTMs and then use the SupervisionSet.from_rttm() to create Lhotse manifests.

flyingleafe · 2023-05-16T14:23:47Z

@desh2608 pyannote.audio is basically ECAPA-TDNN + agglomerative clustering, and it is benchmarked quite well. (https://github.com/pyannote/pyannote-audio) Why not use it directly?

I think that was in the older Pyannote, if I'm not mistaken? Pyannote 2.0 uses end-to-end segmentation which performs much better. In any case, this was just a quick DIY workflow. It should be relatively easy for folks to just use Pyannote to create RTTMs and then use the SupervisionSet.from_rttm() to create Lhotse manifests.

Well, not quite, the segmentation model in Pyannote 2.0 is a first step, the assignment of speakers to the segments is still done with ECAPA-TDNN + clustering. But whatever.

desh2608 added 6 commits November 2, 2022 11:14

remove zero duration segments for indexing

7a2d059

Merge branch 'master' of https://github.com/lhotse-speech/lhotse into…

ea28375

… diar_workflow

add diarization workflow with speechbrain

0c53722

add missing file

33783d3

merge upstream

3063429

remove unwanted change

6c9ce1a

pzelasko reviewed Apr 18, 2023

View reviewed changes

Adel-Moumen mentioned this pull request Jan 27, 2024

Display coverage status in PRs speechbrain/speechbrain#2277

Draft

13 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Not for merge] Diarization workflow with SpeechBrain #1031

[Not for merge] Diarization workflow with SpeechBrain #1031

Uh oh!

desh2608 commented Apr 17, 2023

Uh oh!

pzelasko left a comment

Uh oh!

desh2608 commented Apr 18, 2023

Uh oh!

flyingleafe commented May 16, 2023

Uh oh!

desh2608 commented May 16, 2023

Uh oh!

flyingleafe commented May 16, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

[Not for merge] Diarization workflow with SpeechBrain #1031

Are you sure you want to change the base?

[Not for merge] Diarization workflow with SpeechBrain #1031

Uh oh!

Conversation

desh2608 commented Apr 17, 2023

Uh oh!

pzelasko left a comment

Choose a reason for hiding this comment

Uh oh!

desh2608 commented Apr 18, 2023

Uh oh!

flyingleafe commented May 16, 2023

Uh oh!

desh2608 commented May 16, 2023

Uh oh!

flyingleafe commented May 16, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants