pyannote.audio 4.0
Version 4.0.0
TL;DR
Improved speaker assignment and counting
pyannote/speaker-diarization-community-1 pretrained pipeline relies on VBx clustering instead of agglomerative hierarchical clustering (as suggested by BUT Speech@FIT researchers Petr Pálka and Jiangyu Han).
Exclusive speaker diarization
pyannote/speaker-diarization-community-1 pretrained pipeline returns a new exclusive speaker diarization, on top of the regular speaker diarization.
This is a feature which is backported from our latest commercial model that simplifies the reconciliation between fine-grained speaker diarization timestamps and (sometimes not so precise) transcription timestamps.
from pyannote.audio import Pipeline
pipeline = Pipeline.from_pretrained(
"pyannote/speaker-diarization-community-1", token="huggingface-access-token")
output = pipeline("/path/to/conversation.wav")
print(output.speaker_diarization) # regular speaker diarization
print(output.exclusive_speaker_diarization) # exclusive speaker diarizationFaster training
Metadata caching and optimized dataloaders make training on large scale datasets much faster.
This led to a 15x speed up on pyannoteAI internal large scale training.
pyannoteAI premium speaker diarization
Change one line of code to use pyannoteAI premium models and enjoy more accurate speaker diarization.
from pyannote.audio import Pipeline
pipeline = Pipeline.from_pretrained(
- "pyannote/speaker-diarization-community-1", token="huggingface-access-token")
+ "pyannote/speaker-diarization-precision-2, token="pyannoteAI-api-key")
diarization = pipeline("/path/to/conversation.wav")Offline (air-gapped) use
Pipelines can now be stored alongside their internal models in the same repository, streamlining fully offline use.
-
Accept
pyannote/speaker-diarization-community-1pipeline user agreement -
Clone the pipeline repository from Huggingface (if prompted for a password, use a Huggingface access token with correct permissions)
$ git lfs install $ git clone https://hf.co/pyannote/speaker-diarization-community-1 /path/to/directory/pyannote-speaker-diarization-community-1
-
Enjoy!
# load pipeline from disk (works without internet connection) from pyannote.audio import Pipeline pipeline = Pipeline.from_pretrained('/path/to/directory/pyannote-speaker-diarization-community-1') # run the pipeline locally on your computer diarization = pipeline("audio.wav")
Telemetry
With the optional telemetry feature in pyannote.audio, you can choose to send anonymous usage metrics to help the pyannote team improve the library.
Breaking changes
- BREAKING(io): remove support for
soxandsoundfileaudio I/O backends (onlyffmpegor in-memory audio is supported) - BREAKING(setup): drop support to
Python< 3.10 - BREAKING(hub): rename
use_auth_tokentotoken - BREAKING(hub): drop support for
{pipeline_name}@{revision}syntax inModel.from_pretrained(...)andPipeline.from_pretrained(...)-- use newrevisionkeyword argument instead - BREAKING(task): remove
OverlappedSpeechDetectiontask (part ofSpeakerDiarizationtask) - BREAKING(pipeline): remove
OverlappedSpeechDetectionandResegmentationunmaintained pipelines (part ofSpeakerDiarization) - BREAKING(cache): rely on
huggingface_hubcaching directory (PYANNOTE_CACHEis no longer used) - BREAKING(inference):
Inferencenow only supports already instantiated models - BREAKING(task): drop support for
multilabeltraining inSpeakerDiarizationtask - BREAKING(task): drop support for
warm_upoption inSpeakerDiarizationtask - BREAKING(task): drop support for
weigh_by_cardinalityoption inSpeakerDiarizationtask - BREAKING(task): drop support for
vad_lossoption inSpeakerDiarizationtask - BREAKING(chore): switch to native namespace package
- BREAKING(cli): remove deprecated
pyannote-audio-trainCLI
New features
- feat(io): switch from
torchaudiototorchcodecfor audio I/O - feat(pipeline): add support for VBx clustering (@Selesnyan and jyhan03)
- feat(pyannoteAI): add wrapper around pyannoteAI SDK
- improve(hub): add support for pipeline repos that also include underlying models
- feat(clustering): add support for
k-meansclustering - feat(model): add
wav2vec_frozenoption to freeze/unfreezewav2vecinSSeRiouSSarchitecture - feat(task): add support for manual optimization in
SpeakerDiarizationtask - feat(utils): add
hiddenoption toProgressHook - feat(utils): add
FilterByNumberOfSpeakersprotocol files filter - feat(core): add
Calibrationclass to calibrate logits/distances into probabilities - feat(metric): add
DetectionErrorRate,SegmentationErrorRate,DiarizationPrecision, andDiarizationRecallmetrics - feat(cli): add CLI to download, apply, benchmark, and optimize pipelines
- feat(cli): add CLI to strip checkpoints to their bare inference minimum
Improvements
- improve(model): improve WavLM (un)freezing support for
SSeRiouSSarchitecture (@clement-pages) - improve(task): improve
SpeakerDiarizationtraining with manual optimization (@clement-pages) - improve(train): speed up dataloaders
- improve(setup): switch to
uv - improve(setup): switch to
lightningfrompytorch-lightning - improve(utils): improve dependency check when loading pretrained models and/or pipeline
- improve(utils): add option to skip dependency check
- improve(utils): add option to load a pretrained model checkpoint from an
io.BytesIObuffer - improve(pipeline): add option to load a pretrained pipeline from a
dict(@benniekiss)
Fixes
- fix(model): improve WavLM (un)freezing support for
ToTaToNetarchitecture (@clement-pages) - fix(separation): fix clipping issue in speech separation pipeline (@joonaskalda)
- fix(separation): fix alignment between separated sources and diarization (@Lebourdais and @clement-pages)
- fix(separation): prevent leakage removal collar from being applied to diarization (@clement-pages)
- fix(separation): fix
PixITtraining with manual optimization (@clement-pages) - fix(doc): fix link to pytorch (@emmanuel-ferdman)
- fix(task): fix corner case with small (<9) number of validation samples (@antoinelaurent)
- fix(doc): fix default embedding in
SpeechSeparationandSpeakerDiarizationdocstring (@razi-tm).