Skip to content

Commit 2cf1490

Browse files
committed
Merge branch 'release/2.1'
2 parents 25462d5 + 6d9d98c commit 2cf1490

23 files changed

+2853
-1746
lines changed

.github/workflows/test.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -37,4 +37,4 @@ jobs:
3737
file: ./coverage.xml
3838
env_vars: PYTHON
3939
name: codecov-pyannote-audio
40-
fail_ci_if_error: true
40+
fail_ci_if_error: false

CHANGELOG.md

Lines changed: 75 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,75 @@
1+
# Changelog
2+
3+
## Version 2.1 (2022-11-xx)
4+
5+
- BREAKING(pipeline): rewrite speaker diarization pipeline
6+
- feat(pipeline): add option to optimize for DER variant
7+
- feat(clustering): add support for NeMo speaker embedding
8+
- feat(clustering): add FINCH clustering
9+
- feat(clustering): add min_cluster_size hparams to AgglomerativeClustering
10+
- feat(hub): add support for private/gated models
11+
- setup(hub): switch to latest hugginface_hub API
12+
- fix(pipeline): fix support for missing reference in Resegmentation pipeline
13+
- fix(clustering) fix corner case where HMM.fit finds too little states
14+
15+
## Version 2.0.1 (2022-07-20)
16+
17+
- BREAKING: complete rewrite
18+
- feat: much better performance
19+
- feat: Python-first API
20+
- feat: pretrained pipelines (and models) on Huggingface model hub
21+
- feat: multi-GPU training with pytorch-lightning
22+
- feat: data augmentation with torch-audiomentations
23+
- feat: Prodigy recipe for model-assisted audio annotation
24+
25+
## Version 1.1.2 (2021-01-28)
26+
27+
- fix: make sure master branch is used to load pretrained models (#599)
28+
29+
## Version 1.1 (2020-11-08)
30+
31+
- last release before complete rewriting
32+
33+
## Version 1.0.1 (2018--07-19)
34+
35+
- fix: fix regression in Precomputed.__call__ (#110, #105)
36+
37+
## Version 1.0 (2018-07-03)
38+
39+
- chore: switch from keras to pytorch (with tensorboard support)
40+
- improve: faster & better traning (`AutoLR`, advanced learning rate schedulers, improved batch generators)
41+
- feat: add tunable speaker diarization pipeline (with its own tutorial)
42+
- chore: drop support for Python 2 (use Python 3.6 or later)
43+
44+
## Version 0.3.1 (2017-07-06)
45+
46+
- feat: add python 3 support
47+
- chore: rewrite neural speaker embedding using autograd
48+
- feat: add new embedding architectures
49+
- feat: add new embedding losses
50+
- chore: switch to Keras 2
51+
- doc: add tutorial for (MFCC) feature extraction
52+
- doc: add tutorial for (LSTM-based) speech activity detection
53+
- doc: add tutorial for (LSTM-based) speaker change detection
54+
- doc: add tutorial for (TristouNet) neural speaker embedding
55+
56+
## Version 0.2.1 (2017-03-28)
57+
58+
- feat: add LSTM-based speech activity detection
59+
- feat: add LSTM-based speaker change detection
60+
- improve: refactor LSTM-based speaker embedding
61+
- feat: add librosa basic support
62+
- feat: add SMORMS3 optimizer
63+
64+
## Version 0.1.4 (2016-09-26)
65+
66+
- feat: add 'covariance_type' option to BIC segmentation
67+
68+
## Version 0.1.3 (2016-09-23)
69+
70+
- chore: rename sequence generator in preparation of the release of
71+
TristouNet reproducible research package.
72+
73+
## Version 0.1.2 (2016-09-22)
74+
75+
- first public version

README.md

Lines changed: 29 additions & 18 deletions
Original file line numberDiff line numberDiff line change
@@ -11,23 +11,26 @@
1111

1212

1313
```python
14-
# instantiate pretrained speaker diarization pipeline
14+
# 1. visit hf.co/pyannote/speaker-diarization and accept user conditions (only if requested)
15+
# 2. visit hf.co/settings/tokens to create an access token (only if you had to go through 1.)
16+
# 3. instantiate pretrained speaker diarization pipeline
1517
from pyannote.audio import Pipeline
16-
pipeline = Pipeline.from_pretrained("pyannote/speaker-diarization")
18+
pipeline = Pipeline.from_pretrained("pyannote/speaker-diarization",
19+
use_auth_token="ACCESS_TOKEN_GOES_HERE")
1720

18-
# apply pretrained pipeline
21+
# 4. apply pretrained pipeline
1922
diarization = pipeline("audio.wav")
2023

21-
# print the result
24+
# 5. print the result
2225
for turn, _, speaker in diarization.itertracks(yield_label=True):
2326
print(f"start={turn.start:.1f}s stop={turn.end:.1f}s speaker_{speaker}")
24-
# start=0.2s stop=1.5s speaker_A
25-
# start=1.8s stop=3.9s speaker_B
26-
# start=4.2s stop=5.7s speaker_A
27+
# start=0.2s stop=1.5s speaker_0
28+
# start=1.8s stop=3.9s speaker_1
29+
# start=4.2s stop=5.7s speaker_0
2730
# ...
2831
```
2932

30-
## What's new in `pyannote.audio` 2.0
33+
## What's new in `pyannote.audio` 2.x?
3134

3235
For version 2.x of `pyannote.audio`, [I](https://herve.niderb.fr) decided to rewrite almost everything from scratch.
3336
Highlights of this release are:
@@ -51,11 +54,12 @@ conda activate pyannote
5154
# (see https://pytorch.org/get-started/previous-versions/#v1110)
5255
conda install pytorch==1.11.0 torchvision==0.12.0 torchaudio==0.11.0 -c pytorch
5356

54-
pip install pyannote.audio
57+
pip install -qq https://github.com/pyannote/pyannote-audio/archive/develop.zip
5558
```
5659

5760
## Documentation
5861

62+
- [Changelog](CHANGELOG.md)
5963
- Models
6064
- Available tasks explained
6165
- [Applying a pretrained model](tutorials/applying_a_model.ipynb)
@@ -69,6 +73,9 @@ pip install pyannote.audio
6973
- [Adding a new task](tutorials/add_your_own_task.ipynb)
7074
- Adding a new pipeline
7175
- Sharing pretrained models and pipelines
76+
- Blog
77+
- 2022-10-23 > ["One speaker segmentation model to rule them all"](https://herve.niderb.fr/fastpages/2022/10/23/One-speaker-segmentation-model-to-rule-them-all)
78+
- 2021-08-05 > ["Streaming voice activity detection with pyannote.audio"](https://herve.niderb.fr/fastpages/2021/08/05/Streaming-voice-activity-detection-with-pyannote.html)
7279
- Miscellaneous
7380
- [Training with `pyannote-audio-train` command line tool](tutorials/training_with_cli.md)
7481
- [Annotating your own data with Prodigy](tutorials/prodigy.md)
@@ -94,15 +101,19 @@ pip install pyannote.audio
94101

95102
## Benchmark
96103

97-
Out of the box, `pyannote.audio` default speaker diarization pipeline is expected to be much better (and faster) in v2.0 than in v1.1.:
98-
99-
| Dataset | DER% with v1.1 | DER% with v2.0 | Relative improvement |
100-
| ----------- | -------------- | -------------- | -------------------- |
101-
| AMI | 29.7% | 18.2% | 38% |
102-
| DIHARD | 29.2% | 21.0% | 28% |
103-
| VoxConverse | 21.5% | 12.8% | 40% |
104-
105-
A more detailed benchmark is available [here](https://hf.co/pyannote/speaker-diarization).
104+
Out of the box, `pyannote.audio` default speaker diarization [pipeline](https://hf.co/pyannote/speaker-diarization) is expected to be much better (and faster) in v2.x than in v1.1. Those numbers are diarization error rates (in %)
105+
106+
| Dataset \ Version | v1.1 | v2.0 | v2.1 (finetuned) |
107+
| ---------------------- | ---- | ---- | ---------------- |
108+
| AISHELL-4 | - | 14.6 | 14.1 (14.5) |
109+
| AliMeeting (channel 1) | - | - | 27.4 (23.8) |
110+
| AMI (IHM) | 29.7 | 18.2 | 18.9 (18.5) |
111+
| AMI (SDM) | - | 29.0 | 27.1 (22.2) |
112+
| CALLHOME (part2) | - | 30.2 | 32.4 (29.3) |
113+
| DIHARD 3 (full) | 29.2 | 21.0 | 26.9 (21.9) |
114+
| VoxConverse (v0.3) | 21.5 | 12.6 | 11.2 (10.7) |
115+
| REPERE (phase2) | - | 12.6 | 8.2 ( 8.3) |
116+
| This American Life | - | - | 20.8 (15.2) |
106117

107118
## Citations
108119

doc/source/changelog.rst

Lines changed: 14 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -2,9 +2,22 @@
22
Changelog
33
#########
44

5-
Version 2.0.1 (2022-07-20)
5+
Version 2.1 (2022-11-xx)
66
~~~~~~~~~~~~~~~~~~~~~~~~
77

8+
- BREAKING(pipeline): rewrite speaker diarization pipeline
9+
- feat(pipeline): add option to optimize for DER variant
10+
- feat(clustering): add support for NeMo speaker embedding
11+
- feat(clustering): add FINCH clustering
12+
- feat(clustering): add min_cluster_size hparams to AgglomerativeClustering
13+
- feat(hub): add support for private/gated models
14+
- setup(hub): switch to latest hugginface_hub API
15+
- fix(pipeline): fix support for missing reference in Resegmentation pipeline
16+
- fix(clustering) fix corner case where HMM.fit finds too little states
17+
18+
Version 2.0.1 (2022-07-20)
19+
~~~~~~~~~~~~~~~~~~~~~~~~~~
20+
821
- BREAKING: complete rewrite
922
- feat: much better performance
1023
- feat: Python-first API
Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,5 @@
1+
# @package _group_
2+
_target_: adan_pytorch.Adan
3+
lr: 1e-3
4+
betas: [0.1, 0.1, 0.001]
5+
weight_decay: 0.0

pyannote/audio/core/model.py

Lines changed: 51 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -33,7 +33,8 @@
3333
import torch
3434
import torch.nn as nn
3535
import torch.optim
36-
from huggingface_hub import cached_download, hf_hub_url
36+
from huggingface_hub import hf_hub_download
37+
from huggingface_hub.utils import RepositoryNotFoundError
3738
from pyannote.core import SlidingWindow
3839
from pytorch_lightning.utilities.cloud_io import load as pl_load
3940
from pytorch_lightning.utilities.model_summary import ModelSummary
@@ -415,6 +416,10 @@ def on_save_checkpoint(self, checkpoint):
415416

416417
@staticmethod
417418
def check_version(library: Text, theirs: Text, mine: Text):
419+
420+
theirs = ".".join(theirs.split(".")[:3])
421+
mine = ".".join(mine.split(".")[:3])
422+
418423
theirs = VersionInfo.parse(theirs)
419424
mine = VersionInfo.parse(mine)
420425
if theirs.major != mine.major:
@@ -777,32 +782,62 @@ def from_pretrained(
777782
model_id = checkpoint
778783
revision = None
779784

780-
url = hf_hub_url(
781-
model_id, filename=HF_PYTORCH_WEIGHTS_NAME, revision=revision
782-
)
783-
path_for_pl = cached_download(
784-
url=url,
785-
library_name="pyannote",
786-
library_version=__version__,
787-
cache_dir=cache_dir,
788-
use_auth_token=use_auth_token,
789-
)
785+
try:
786+
path_for_pl = hf_hub_download(
787+
model_id,
788+
HF_PYTORCH_WEIGHTS_NAME,
789+
repo_type="model",
790+
revision=revision,
791+
library_name="pyannote",
792+
library_version=__version__,
793+
cache_dir=cache_dir,
794+
# force_download=False,
795+
# proxies=None,
796+
# etag_timeout=10,
797+
# resume_download=False,
798+
use_auth_token=use_auth_token,
799+
# local_files_only=False,
800+
# legacy_cache_layout=False,
801+
)
802+
except RepositoryNotFoundError:
803+
print(
804+
f"""
805+
Could not download '{model_id}' model.
806+
It might be because the model is private or gated so make
807+
sure to authenticate. Visit https://hf.co/settings/tokens to
808+
create your access token and retry with:
809+
810+
>>> Model.from_pretrained('{model_id}',
811+
... use_auth_token=YOUR_AUTH_TOKEN)
812+
813+
If this still does not work, it might be because the model is gated:
814+
visit https://hf.co/{model_id} to accept the user conditions."""
815+
)
816+
return None
790817

791818
# HACK Huggingface download counters rely on config.yaml
792819
# HACK Therefore we download config.yaml even though we
793820
# HACK do not use it. Fails silently in case model does not
794821
# HACK have a config.yaml file.
795822
try:
796-
config_url = hf_hub_url(
797-
model_id, filename=HF_LIGHTNING_CONFIG_NAME, revision=revision
798-
)
799-
_ = cached_download(
800-
url=config_url,
823+
824+
_ = hf_hub_download(
825+
model_id,
826+
HF_LIGHTNING_CONFIG_NAME,
827+
repo_type="model",
828+
revision=revision,
801829
library_name="pyannote",
802830
library_version=__version__,
803831
cache_dir=cache_dir,
832+
# force_download=False,
833+
# proxies=None,
834+
# etag_timeout=10,
835+
# resume_download=False,
804836
use_auth_token=use_auth_token,
837+
# local_files_only=False,
838+
# legacy_cache_layout=False,
805839
)
840+
806841
except Exception:
807842
pass
808843

pyannote/audio/core/pipeline.py

Lines changed: 42 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -28,14 +28,15 @@
2828
from typing import Callable, List, Optional, Text, Union
2929

3030
import yaml
31-
from huggingface_hub import cached_download, hf_hub_url
31+
from huggingface_hub import hf_hub_download
32+
from huggingface_hub.utils import RepositoryNotFoundError
33+
from pyannote.core.utils.helper import get_class_by_name
34+
from pyannote.database import FileFinder, ProtocolFile
35+
from pyannote.pipeline import Pipeline as _Pipeline
3236

3337
from pyannote.audio import Audio, __version__
3438
from pyannote.audio.core.io import AudioFile
3539
from pyannote.audio.core.model import CACHE_DIR
36-
from pyannote.core.utils.helper import get_class_by_name
37-
from pyannote.database import FileFinder, ProtocolFile
38-
from pyannote.pipeline import Pipeline as _Pipeline
3940

4041
PIPELINE_PARAMS_NAME = "config.yaml"
4142

@@ -77,15 +78,40 @@ def from_pretrained(
7778
else:
7879
model_id = checkpoint_path
7980
revision = None
80-
url = hf_hub_url(model_id, filename=PIPELINE_PARAMS_NAME, revision=revision)
81-
82-
config_yml = cached_download(
83-
url=url,
84-
library_name="pyannote",
85-
library_version=__version__,
86-
cache_dir=cache_dir,
87-
use_auth_token=use_auth_token,
88-
)
81+
82+
try:
83+
config_yml = hf_hub_download(
84+
model_id,
85+
PIPELINE_PARAMS_NAME,
86+
repo_type="model",
87+
revision=revision,
88+
library_name="pyannote",
89+
library_version=__version__,
90+
cache_dir=cache_dir,
91+
# force_download=False,
92+
# proxies=None,
93+
# etag_timeout=10,
94+
# resume_download=False,
95+
use_auth_token=use_auth_token,
96+
# local_files_only=False,
97+
# legacy_cache_layout=False,
98+
)
99+
100+
except RepositoryNotFoundError:
101+
print(
102+
f"""
103+
Could not download '{model_id}' pipeline.
104+
It might be because the pipeline is private or gated so make
105+
sure to authenticate. Visit https://hf.co/settings/tokens to
106+
create your access token and retry with:
107+
108+
>>> Pipeline.from_pretrained('{model_id}',
109+
... use_auth_token=YOUR_AUTH_TOKEN)
110+
111+
If this still does not work, it might be because the pipeline is gated:
112+
visit https://hf.co/{model_id} to accept the user conditions."""
113+
)
114+
return None
89115

90116
with open(config_yml, "r") as fp:
91117
config = yaml.load(fp, Loader=yaml.SafeLoader)
@@ -95,7 +121,9 @@ def from_pretrained(
95121
Klass = get_class_by_name(
96122
pipeline_name, default_module_name="pyannote.pipeline.blocks"
97123
)
98-
pipeline = Klass(**config["pipeline"].get("params", {}))
124+
params = config["pipeline"].get("params", {})
125+
params.setdefault("use_auth_token", use_auth_token)
126+
pipeline = Klass(**params)
99127

100128
# freeze parameters
101129
if "freeze" in config:

pyannote/audio/interactive/pipeline/recipe.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -175,7 +175,7 @@ def pipeline(
175175
beep: bool = False,
176176
) -> Dict[str, Any]:
177177

178-
pipeline = Pipeline.from_pretrained(pipeline)
178+
pipeline = Pipeline.from_pretrained(pipeline, use_auth_token=True)
179179
classes = pipeline.classes()
180180

181181
if isinstance(classes, Iterator):

0 commit comments

Comments
 (0)