Releases · pyannote/pyannote-audio

TL;DR

Better pretrained pipeline and model

Much better overlapping speech detection with powerset pyannote/segmentation-3.0
Much better speaker diarization performance with pyannote/speaker-diarization-3.0

Benchmark (DER %)	v2.1	v3.0
AISHELL-4	14.1	12.3
AliMeeting (channel 1)	27.4	24.3
AMI (IHM)	18.9	19.0
AMI (SDM)	27.1	22.2
AVA-AVD	-	49.1
DIHARD 3 (full)	26.9	21.7
MSDWild	-	24.6
REPERE (phase2)	8.2	7.8
VoxConverse (v0.3)	11.2	11.3

Major breaking changes

BREAKING: pipelines now run on CPU by default
Use pipeline.to(torch.device('cuda')) to use GPU
BREAKING: removed SpeakerSegmentation pipeline
Use SpeakerDiarization pipeline instead
BREAKING: removed support for prodi.gy recipes

Full changelog

Features and improvements

feat(pipeline): send pipeline to device with pipeline.to(device)
feat(pipeline): add return_embeddings option to SpeakerDiarization pipeline
feat(pipeline): make segmentation_batch_size and embedding_batch_size mutable in SpeakerDiarization pipeline (they now default to 1)
feat(pipeline): add progress hook to pipelines
feat(task): add powerset support to SpeakerDiarization task
feat(task): add support for multi-task models
feat(task): add support for label scope in speaker diarization task
feat(task): add support for missing classes in multi-label segmentation task
feat(model): add segmentation model based on torchaudio self-supervised representation
feat(pipeline): check version compatibility at load time
improve(task): load metadata as tensors rather than pyannote.core instances
improve(task): improve error message on missing specifications

Breaking changes

BREAKING(task): rename Segmentation task to SpeakerDiarization
BREAKING(pipeline): pipeline defaults to CPU (use pipeline.to(device))
BREAKING(pipeline): remove SpeakerSegmentation pipeline (use SpeakerDiarization pipeline)
BREAKING(pipeline): remove segmentation_duration parameter from SpeakerDiarization pipeline (defaults to duration of segmentation model)
BREAKING(task): remove support for variable chunk duration for segmentation tasks
BREAKING(pipeline): remove support for FINCHClustering and HiddenMarkovModelClustering
BREAKING(setup): drop support for Python 3.7
BREAKING(io): channels are now 0-indexed (used to be 1-indexed)
BREAKING(io): multi-channel audio is no longer downmixed to mono by default.
You should update how pyannote.audio.core.io.Audio is instantiated:
- replace Audio() by Audio(mono="downmix");
- replace Audio(mono=True) by Audio(mono="downmix");
- replace Audio(mono=False) by Audio().
BREAKING(model): get rid of (flaky) Model.introspection
If, for some weird reason, you wrote some custom code based on that,
you should instead rely on Model.example_output.
BREAKING(interactive): remove support for Prodigy recipes

Fixes and improvements

fix(pipeline): fix reproducibility issue with Ampere CUDA devices
fix(pipeline): fix support for IOBase audio
fix(pipeline): fix corner case with no speaker
fix(train): prevent metadata preparation to happen twice
fix(task): fix support for "balance" option
improve(task): shorten and improve structure of Tensorboard tags

Dependencies update

setup: switch to torch 2.0+, torchaudio 2.0+, soundfile 0.12+, lightning 2.0+, torchmetrics 0.11+
setup: switch to pyannote.core 5.0+, pyannote.database 5.0+, and pyannote.pipeline 3.0+
setup: switch to speechbrain 0.5.14+

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Choose a tag to compare

Sorry, something went wrong.

Sorry, something went wrong.

Uh oh!

No results found

TL;DR

Better pretrained pipeline and model

Major breaking changes

Full changelog

Features and improvements

Breaking changes

Fixes and improvements

Dependencies update

Uh oh!

Choose a tag to compare

Sorry, something went wrong.

Sorry, something went wrong.

Uh oh!

No results found

Uh oh!

Choose a tag to compare

Sorry, something went wrong.

Sorry, something went wrong.

Uh oh!

No results found

Uh oh!

Releases: pyannote/pyannote-audio

Version 3.0.0

TL;DR

Better pretrained pipeline and model

Major breaking changes

Full changelog

Features and improvements

Breaking changes

Fixes and improvements

Dependencies update

Uh oh!

Version 2.1.1

Uh oh!

Version 1.1.1

Uh oh!