Version 4.0.0

TL;DR

Improved speaker assignment and counting

pyannote/speaker-diarization-community-1 pretrained pipeline relies on VBx clustering instead of agglomerative hierarchical clustering (as suggested by BUT Speech@FIT researchers Petr Pálka and Jiangyu Han).

Exclusive speaker diarization

pyannote/speaker-diarization-community-1 pretrained pipeline returns a new exclusive speaker diarization, on top of the regular speaker diarization.
This is a feature which is backported from our latest commercial model that simplifies the reconciliation between fine-grained speaker diarization timestamps and (sometimes not so precise) transcription timestamps.

from pyannote.audio import Pipeline
pipeline = Pipeline.from_pretrained(
    "pyannote/speaker-diarization-community-1", token="huggingface-access-token")
output = pipeline("/path/to/conversation.wav")
print(output.speaker_diarization)            # regular speaker diarization
print(output.exclusive_speaker_diarization)  # exclusive speaker diarization

Faster training

Metadata caching and optimized dataloaders make training on large scale datasets much faster.
This led to a 15x speed up on pyannoteAI internal large scale training.

pyannoteAI premium speaker diarization

Change one line of code to use pyannoteAI premium models and enjoy more accurate speaker diarization.

from pyannote.audio import Pipeline
pipeline = Pipeline.from_pretrained(
-    "pyannote/speaker-diarization-community-1", token="huggingface-access-token")
+    "pyannote/speaker-diarization-precision-2, token="pyannoteAI-api-key")
diarization = pipeline("/path/to/conversation.wav")

Offline (air-gapped) use

Pipelines can now be stored alongside their internal models in the same repository, streamlining fully offline use.

Accept pyannote/speaker-diarization-community-1 pipeline user agreement

Clone the pipeline repository from Huggingface (if prompted for a password, use a Huggingface access token with correct permissions)

$ git lfs install
$ git clone https://hf.co/pyannote/speaker-diarization-community-1 /path/to/directory/pyannote-speaker-diarization-community-1

Enjoy!

# load pipeline from disk (works without internet connection)
from pyannote.audio import Pipeline
pipeline = Pipeline.from_pretrained('/path/to/directory/pyannote-speaker-diarization-community-1')

# run the pipeline locally on your computer
diarization = pipeline("audio.wav")

Telemetry

With the optional telemetry feature in pyannote.audio, you can choose to send anonymous usage metrics to help the pyannote team improve the library.

Breaking changes

BREAKING(io): remove support for sox and soundfile audio I/O backends (only ffmpeg or in-memory audio is supported)
BREAKING(setup): drop support to Python < 3.10
BREAKING(hub): rename use_auth_token to token
BREAKING(hub): drop support for {pipeline_name}@{revision} syntax in Model.from_pretrained(...) and Pipeline.from_pretrained(...) -- use new revision keyword argument instead
BREAKING(task): remove OverlappedSpeechDetection task (part of SpeakerDiarization task)
BREAKING(pipeline): remove OverlappedSpeechDetection and Resegmentation unmaintained pipelines (part of SpeakerDiarization)
BREAKING(cache): rely on huggingface_hub caching directory (PYANNOTE_CACHE is no longer used)
BREAKING(inference): Inference now only supports already instantiated models
BREAKING(task): drop support for multilabel training in SpeakerDiarization task
BREAKING(task): drop support for warm_up option in SpeakerDiarization task
BREAKING(task): drop support for weigh_by_cardinality option in SpeakerDiarization task
BREAKING(task): drop support for vad_loss option in SpeakerDiarization task
BREAKING(chore): switch to native namespace package
BREAKING(cli): remove deprecated pyannote-audio-train CLI

New features

feat(io): switch from torchaudio to torchcodec for audio I/O
feat(pipeline): add support for VBx clustering (@Selesnyan and jyhan03)
feat(pyannoteAI): add wrapper around pyannoteAI SDK
improve(hub): add support for pipeline repos that also include underlying models
feat(clustering): add support for k-means clustering
feat(model): add wav2vec_frozen option to freeze/unfreeze wav2vec in SSeRiouSS architecture
feat(task): add support for manual optimization in SpeakerDiarization task
feat(utils): add hidden option to ProgressHook
feat(utils): add FilterByNumberOfSpeakers protocol files filter
feat(core): add Calibration class to calibrate logits/distances into probabilities
feat(metric): add DetectionErrorRate, SegmentationErrorRate, DiarizationPrecision, and DiarizationRecall metrics
feat(cli): add CLI to download, apply, benchmark, and optimize pipelines
feat(cli): add CLI to strip checkpoints to their bare inference minimum

Improvements

improve(model): improve WavLM (un)freezing support for SSeRiouSS architecture (@clement-pages)
improve(task): improve SpeakerDiarization training with manual optimization (@clement-pages)
improve(train): speed up dataloaders
improve(setup): switch to uv
improve(setup): switch to lightning from pytorch-lightning
improve(utils): improve dependency check when loading pretrained models and/or pipeline
improve(utils): add option to skip dependency check
improve(utils): add option to load a pretrained model checkpoint from an io.BytesIO buffer
improve(pipeline): add option to load a pretrained pipeline from a dict (@benniekiss)

Fixes

fix(model): improve WavLM (un)freezing support for ToTaToNet architecture (@clement-pages)
fix(separation): fix clipping issue in speech separation pipeline (@joonaskalda)
fix(separation): fix alignment between separated sources and diarization (@Lebourdais and @clement-pages)
fix(separation): prevent leakage removal collar from being applied to diarization (@clement-pages)
fix(separation): fix PixIT training with manual optimization (@clement-pages)
fix(doc): fix link to pytorch (@emmanuel-ferdman)
fix(task): fix corner case with small (<9) number of validation samples (@antoinelaurent)
fix(doc): fix default embedding in SpeechSeparation and SpeakerDiarization docstring (@razi-tm).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

pyannote.audio 4.0

Choose a tag to compare

Sorry, something went wrong.

Sorry, something went wrong.

Uh oh!

No results found

Version 4.0.0

TL;DR

Improved speaker assignment and counting

Exclusive speaker diarization

Faster training

pyannoteAI premium speaker diarization

Offline (air-gapped) use

Telemetry

Breaking changes

New features

Improvements

Fixes

Uh oh!