Releases: pyannote/pyannote-audio
Releases · pyannote/pyannote-audio
Version 3.0.0
TL;DR
Better pretrained pipeline and model
- Much better overlapping speech detection with powerset pyannote/segmentation-3.0
- Much better speaker diarization performance with pyannote/speaker-diarization-3.0
| Benchmark (DER %) | v2.1 | v3.0 |
|---|---|---|
| AISHELL-4 | 14.1 | 12.3 |
| AliMeeting (channel 1) | 27.4 | 24.3 |
| AMI (IHM) | 18.9 | 19.0 |
| AMI (SDM) | 27.1 | 22.2 |
| AVA-AVD | - | 49.1 |
| DIHARD 3 (full) | 26.9 | 21.7 |
| MSDWild | - | 24.6 |
| REPERE (phase2) | 8.2 | 7.8 |
| VoxConverse (v0.3) | 11.2 | 11.3 |
Major breaking changes
- BREAKING: pipelines now run on CPU by default
Usepipeline.to(torch.device('cuda'))to use GPU - BREAKING: removed
SpeakerSegmentationpipeline
UseSpeakerDiarizationpipeline instead - BREAKING: removed support for
prodi.gyrecipes
Full changelog
Features and improvements
- feat(pipeline): send pipeline to device with
pipeline.to(device) - feat(pipeline): add
return_embeddingsoption toSpeakerDiarizationpipeline - feat(pipeline): make
segmentation_batch_sizeandembedding_batch_sizemutable inSpeakerDiarizationpipeline (they now default to1) - feat(pipeline): add progress hook to pipelines
- feat(task): add powerset support to
SpeakerDiarizationtask - feat(task): add support for multi-task models
- feat(task): add support for label scope in speaker diarization task
- feat(task): add support for missing classes in multi-label segmentation task
- feat(model): add segmentation model based on torchaudio self-supervised representation
- feat(pipeline): check version compatibility at load time
- improve(task): load metadata as tensors rather than pyannote.core instances
- improve(task): improve error message on missing specifications
Breaking changes
- BREAKING(task): rename
Segmentationtask toSpeakerDiarization - BREAKING(pipeline): pipeline defaults to CPU (use
pipeline.to(device)) - BREAKING(pipeline): remove
SpeakerSegmentationpipeline (useSpeakerDiarizationpipeline) - BREAKING(pipeline): remove
segmentation_durationparameter fromSpeakerDiarizationpipeline (defaults todurationof segmentation model) - BREAKING(task): remove support for variable chunk duration for segmentation tasks
- BREAKING(pipeline): remove support for
FINCHClusteringandHiddenMarkovModelClustering - BREAKING(setup): drop support for Python 3.7
- BREAKING(io): channels are now 0-indexed (used to be 1-indexed)
- BREAKING(io): multi-channel audio is no longer downmixed to mono by default.
You should update howpyannote.audio.core.io.Audiois instantiated:- replace
Audio()byAudio(mono="downmix"); - replace
Audio(mono=True)byAudio(mono="downmix"); - replace
Audio(mono=False)byAudio().
- replace
- BREAKING(model): get rid of (flaky)
Model.introspection
If, for some weird reason, you wrote some custom code based on that,
you should instead rely onModel.example_output. - BREAKING(interactive): remove support for Prodigy recipes
Fixes and improvements
- fix(pipeline): fix reproducibility issue with Ampere CUDA devices
- fix(pipeline): fix support for IOBase audio
- fix(pipeline): fix corner case with no speaker
- fix(train): prevent metadata preparation to happen twice
- fix(task): fix support for "balance" option
- improve(task): shorten and improve structure of Tensorboard tags
Dependencies update
- setup: switch to torch 2.0+, torchaudio 2.0+, soundfile 0.12+, lightning 2.0+, torchmetrics 0.11+
- setup: switch to pyannote.core 5.0+, pyannote.database 5.0+, and pyannote.pipeline 3.0+
- setup: switch to speechbrain 0.5.14+
Version 2.1.1
Version 2.1.x introduces a major overhaul of pyannote.audio default speaker diarization pipeline, made of three main stages:
- neural speaker segmentation applied to a short sliding window;
- neural speaker embedding of each (local) speakers;
- (global) agglomerative clustering.
More details in the attached technical report.
Version 1.1.1
chore: do not update to pyannote.pipeline >= 2.0