Skip to content

Commit 6a972c0

Browse files
committed
Merge branch 'release/3.1.1'
2 parents f45da71 + c657362 commit 6a972c0

File tree

7 files changed

+1186
-3865
lines changed

7 files changed

+1186
-3865
lines changed

CHANGELOG.md

Lines changed: 9 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,14 @@
11
# Changelog
22

3-
## `develop` branch
3+
## Version 3.1.1 (2023-12-01)
4+
5+
### TL;DR
6+
7+
Providing `num_speakers` to [`pyannote/speaker-diarization-3.1`](https://hf.co/pyannote/speaker-diarization-3.1) now [works as expected](https://github.com/pyannote/pyannote-audio/issues/1567).
8+
9+
### Fixes
10+
11+
- fix(pipeline): fix support for setting `num_speakers` in [`pyannote/speaker-diarization-3.1`](https://hf.co/pyannote/speaker-diarization-3.1) pipeline
412

513
## Version 3.1.0 (2023-11-16)
614

CODE_OF_CONDUCT.md

Lines changed: 128 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,128 @@
1+
# Contributor Covenant Code of Conduct
2+
3+
## Our Pledge
4+
5+
We as members, contributors, and leaders pledge to make participation in our
6+
community a harassment-free experience for everyone, regardless of age, body
7+
size, visible or invisible disability, ethnicity, sex characteristics, gender
8+
identity and expression, level of experience, education, socio-economic status,
9+
nationality, personal appearance, race, religion, or sexual identity
10+
and orientation.
11+
12+
We pledge to act and interact in ways that contribute to an open, welcoming,
13+
diverse, inclusive, and healthy community.
14+
15+
## Our Standards
16+
17+
Examples of behavior that contributes to a positive environment for our
18+
community include:
19+
20+
* Demonstrating empathy and kindness toward other people
21+
* Being respectful of differing opinions, viewpoints, and experiences
22+
* Giving and gracefully accepting constructive feedback
23+
* Accepting responsibility and apologizing to those affected by our mistakes,
24+
and learning from the experience
25+
* Focusing on what is best not just for us as individuals, but for the
26+
overall community
27+
28+
Examples of unacceptable behavior include:
29+
30+
* The use of sexualized language or imagery, and sexual attention or
31+
advances of any kind
32+
* Trolling, insulting or derogatory comments, and personal or political attacks
33+
* Public or private harassment
34+
* Publishing others' private information, such as a physical or email
35+
address, without their explicit permission
36+
* Other conduct which could reasonably be considered inappropriate in a
37+
professional setting
38+
39+
## Enforcement Responsibilities
40+
41+
Community leaders are responsible for clarifying and enforcing our standards of
42+
acceptable behavior and will take appropriate and fair corrective action in
43+
response to any behavior that they deem inappropriate, threatening, offensive,
44+
or harmful.
45+
46+
Community leaders have the right and responsibility to remove, edit, or reject
47+
comments, commits, code, wiki edits, issues, and other contributions that are
48+
not aligned to this Code of Conduct, and will communicate reasons for moderation
49+
decisions when appropriate.
50+
51+
## Scope
52+
53+
This Code of Conduct applies within all community spaces, and also applies when
54+
an individual is officially representing the community in public spaces.
55+
Examples of representing our community include using an official e-mail address,
56+
posting via an official social media account, or acting as an appointed
57+
representative at an online or offline event.
58+
59+
## Enforcement
60+
61+
Instances of abusive, harassing, or otherwise unacceptable behavior may be
62+
reported to the community leaders responsible for enforcement at
63+
64+
All complaints will be reviewed and investigated promptly and fairly.
65+
66+
All community leaders are obligated to respect the privacy and security of the
67+
reporter of any incident.
68+
69+
## Enforcement Guidelines
70+
71+
Community leaders will follow these Community Impact Guidelines in determining
72+
the consequences for any action they deem in violation of this Code of Conduct:
73+
74+
### 1. Correction
75+
76+
**Community Impact**: Use of inappropriate language or other behavior deemed
77+
unprofessional or unwelcome in the community.
78+
79+
**Consequence**: A private, written warning from community leaders, providing
80+
clarity around the nature of the violation and an explanation of why the
81+
behavior was inappropriate. A public apology may be requested.
82+
83+
### 2. Warning
84+
85+
**Community Impact**: A violation through a single incident or series
86+
of actions.
87+
88+
**Consequence**: A warning with consequences for continued behavior. No
89+
interaction with the people involved, including unsolicited interaction with
90+
those enforcing the Code of Conduct, for a specified period of time. This
91+
includes avoiding interactions in community spaces as well as external channels
92+
like social media. Violating these terms may lead to a temporary or
93+
permanent ban.
94+
95+
### 3. Temporary Ban
96+
97+
**Community Impact**: A serious violation of community standards, including
98+
sustained inappropriate behavior.
99+
100+
**Consequence**: A temporary ban from any sort of interaction or public
101+
communication with the community for a specified period of time. No public or
102+
private interaction with the people involved, including unsolicited interaction
103+
with those enforcing the Code of Conduct, is allowed during this period.
104+
Violating these terms may lead to a permanent ban.
105+
106+
### 4. Permanent Ban
107+
108+
**Community Impact**: Demonstrating a pattern of violation of community
109+
standards, including sustained inappropriate behavior, harassment of an
110+
individual, or aggression toward or disparagement of classes of individuals.
111+
112+
**Consequence**: A permanent ban from any sort of public interaction within
113+
the community.
114+
115+
## Attribution
116+
117+
This Code of Conduct is adapted from the [Contributor Covenant][homepage],
118+
version 2.0, available at
119+
https://www.contributor-covenant.org/version/2/0/code_of_conduct.html.
120+
121+
Community Impact Guidelines were inspired by [Mozilla's code of conduct
122+
enforcement ladder](https://github.com/mozilla/diversity).
123+
124+
[homepage]: https://www.contributor-covenant.org
125+
126+
For answers to common questions about this code of conduct, see the FAQ at
127+
https://www.contributor-covenant.org/faq. Translations are available at
128+
https://www.contributor-covenant.org/translations.

README.md

Lines changed: 35 additions & 34 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
Using `pyannote.audio` open-source toolkit in production?
1+
Using `pyannote.audio` open-source toolkit in production?
22
Make the most of it thanks to our [consulting services](https://herve.niderb.fr/consulting.html).
33

44
# `pyannote.audio` speaker diarization toolkit
@@ -9,19 +9,17 @@ Make the most of it thanks to our [consulting services](https://herve.niderb.fr/
99
<a href="https://www.youtube.com/watch?v=37R_R82lfwA"><img src="https://img.youtube.com/vi/37R_R82lfwA/0.jpg"></a>
1010
</p>
1111

12-
1312
## TL;DR
1413

15-
1. Install [`pyannote.audio`](https://github.com/pyannote/pyannote-audio) `3.0` with `pip install pyannote.audio`
14+
1. Install [`pyannote.audio`](https://github.com/pyannote/pyannote-audio) with `pip install pyannote.audio`
1615
2. Accept [`pyannote/segmentation-3.0`](https://hf.co/pyannote/segmentation-3.0) user conditions
17-
3. Accept [`pyannote/speaker-diarization-3.0`](https://hf.co/pyannote/speaker-diarization-3.0) user conditions
16+
3. Accept [`pyannote/speaker-diarization-3.1`](https://hf.co/pyannote/speaker-diarization-3.1) user conditions
1817
4. Create access token at [`hf.co/settings/tokens`](https://hf.co/settings/tokens).
1918

20-
2119
```python
2220
from pyannote.audio import Pipeline
2321
pipeline = Pipeline.from_pretrained(
24-
"pyannote/speaker-diarization-3.0",
22+
"pyannote/speaker-diarization-3.1",
2523
use_auth_token="HUGGINGFACE_ACCESS_TOKEN_GOES_HERE")
2624

2725
# send pipeline to GPU (when available)
@@ -47,50 +45,53 @@ for turn, _, speaker in diarization.itertracks(yield_label=True):
4745
- :snake: Python-first API
4846
- :zap: multi-GPU training with [pytorch-lightning](https://pytorchlightning.ai/)
4947

50-
5148
## Documentation
5249

5350
- [Changelog](CHANGELOG.md)
5451
- [Frequently asked questions](FAQ.md)
5552
- Models
56-
- Available tasks explained
57-
- [Applying a pretrained model](tutorials/applying_a_model.ipynb)
58-
- [Training, fine-tuning, and transfer learning](tutorials/training_a_model.ipynb)
53+
- Available tasks explained
54+
- [Applying a pretrained model](tutorials/applying_a_model.ipynb)
55+
- [Training, fine-tuning, and transfer learning](tutorials/training_a_model.ipynb)
5956
- Pipelines
60-
- Available pipelines explained
61-
- [Applying a pretrained pipeline](tutorials/applying_a_pipeline.ipynb)
62-
- [Adapting a pretrained pipeline to your own data](tutorials/adapting_pretrained_pipeline.ipynb)
63-
- [Training a pipeline](tutorials/voice_activity_detection.ipynb)
57+
- Available pipelines explained
58+
- [Applying a pretrained pipeline](tutorials/applying_a_pipeline.ipynb)
59+
- [Adapting a pretrained pipeline to your own data](tutorials/adapting_pretrained_pipeline.ipynb)
60+
- [Training a pipeline](tutorials/voice_activity_detection.ipynb)
6461
- Contributing
65-
- [Adding a new model](tutorials/add_your_own_model.ipynb)
66-
- [Adding a new task](tutorials/add_your_own_task.ipynb)
67-
- Adding a new pipeline
68-
- Sharing pretrained models and pipelines
62+
- [Adding a new model](tutorials/add_your_own_model.ipynb)
63+
- [Adding a new task](tutorials/add_your_own_task.ipynb)
64+
- Adding a new pipeline
65+
- Sharing pretrained models and pipelines
6966
- Blog
70-
- 2022-12-02 > ["How I reached 1st place at Ego4D 2022, 1st place at Albayzin 2022, and 6th place at VoxSRC 2022 speaker diarization challenges"](tutorials/adapting_pretrained_pipeline.ipynb)
71-
- 2022-10-23 > ["One speaker segmentation model to rule them all"](https://herve.niderb.fr/fastpages/2022/10/23/One-speaker-segmentation-model-to-rule-them-all)
72-
- 2021-08-05 > ["Streaming voice activity detection with pyannote.audio"](https://herve.niderb.fr/fastpages/2021/08/05/Streaming-voice-activity-detection-with-pyannote.html)
67+
- 2022-12-02 > ["How I reached 1st place at Ego4D 2022, 1st place at Albayzin 2022, and 6th place at VoxSRC 2022 speaker diarization challenges"](tutorials/adapting_pretrained_pipeline.ipynb)
68+
- 2022-10-23 > ["One speaker segmentation model to rule them all"](https://herve.niderb.fr/fastpages/2022/10/23/One-speaker-segmentation-model-to-rule-them-all)
69+
- 2021-08-05 > ["Streaming voice activity detection with pyannote.audio"](https://herve.niderb.fr/fastpages/2021/08/05/Streaming-voice-activity-detection-with-pyannote.html)
7370
- Videos
7471
- [Introduction to speaker diarization](https://umotion.univ-lemans.fr/video/9513-speech-segmentation-and-speaker-diarization/) / JSALT 2023 summer school / 90 min
7572
- [Speaker segmentation model](https://www.youtube.com/watch?v=wDH2rvkjymY) / Interspeech 2021 / 3 min
76-
- [First releaase of pyannote.audio](https://www.youtube.com/watch?v=37R_R82lfwA) / ICASSP 2020 / 8 min
73+
- [First releaase of pyannote.audio](https://www.youtube.com/watch?v=37R_R82lfwA) / ICASSP 2020 / 8 min
7774

7875
## Benchmark
7976

80-
Out of the box, `pyannote.audio` speaker diarization [pipeline](https://hf.co/pyannote/speaker-diarization-3.0) v3.0 is expected to be much better (and faster) than v2.x.
77+
Out of the box, `pyannote.audio` speaker diarization [pipeline](https://hf.co/pyannote/speaker-diarization-3.1) v3.1 is expected to be much better (and faster) than v2.x.
8178
Those numbers are diarization error rates (in %):
8279

83-
| Dataset \ Version | v1.1 | v2.0 | [v2.1](https://hf.co/pyannote/speaker-diarization-2.1) | [v3.0](https://hf.co/pyannote/speaker-diarization-3.0) | <a href="mailto:herve-at-niderb-dot-fr?subject=Premium pyannote.audio pipeline&body=Looks like I got your attention! Drop me an email for more details. Hervé.">Premium</a> |
84-
| ---------------------- | ---- | ---- | ------ | ------ | --------- |
85-
| AISHELL-4 | - | 14.6 | 14.1 | 12.3 | 12.3 |
86-
| AliMeeting (channel 1) | - | - | 27.4 | 24.3 | 19.4 |
87-
| AMI (IHM) | 29.7 | 18.2 | 18.9 | 19.0 | 16.7 |
88-
| AMI (SDM) | - | 29.0 | 27.1 | 22.2 | 20.1 |
89-
| AVA-AVD | - | - | - | 49.1 | 42.7 |
90-
| DIHARD 3 (full) | 29.2 | 21.0 | 26.9 | 21.7 | 17.0 |
91-
| MSDWild | - | - | - | 24.6 | 20.4 |
92-
| REPERE (phase2) | - | 12.6 | 8.2 | 7.8 | 7.8 |
93-
| VoxConverse (v0.3) | 21.5 | 12.6 | 11.2 | 11.3 | 9.5 |
80+
| Benchmark | [v2.1](https://hf.co/pyannote/speaker-diarization-2.1) | [v3.1](https://hf.co/pyannote/speaker-diarization-3.1) | [Premium](https://forms.gle/eKhn7H2zTa68sMMx8) |
81+
| ---------------------- | ------------------------------------------------------ | ------------------------------------------------------ | ---------------------------------------------- |
82+
| AISHELL-4 | 14.1 | 12.3 | 11.9 |
83+
| AliMeeting (channel 1) | 27.4 | 24.5 | 22.5 |
84+
| AMI (IHM) | 18.9 | 18.8 | 16.6 |
85+
| AMI (SDM) | 27.1 | 22.6 | 20.9 |
86+
| AVA-AVD | 66.3 | 50.0 | 39.8 |
87+
| CALLHOME (part 2) | 31.6 | 28.4 | 22.2 |
88+
| DIHARD 3 (full) | 26.9 | 21.4 | 17.2 |
89+
| Ego4D (dev.) | 61.5 | 51.2 | 43.8 |
90+
| MSDWild | 32.8 | 25.4 | 19.8 |
91+
| REPERE (phase2) | 8.2 | 7.8 | 7.6 |
92+
| VoxConverse (v0.3) | 11.2 | 11.2 | 9.4 |
93+
94+
[Diarization error rate](http://pyannote.github.io/pyannote-metrics/reference.html#diarization) (in %)
9495

9596
## Citations
9697

doc/requirements.txt

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
ipython==7.16.3
1+
ipython==8.10.0
22
recommonmark
3-
Sphinx==2.2.2
3+
Sphinx==3.0.4
44
sphinx_rtd_theme==0.4.3

pyannote/audio/pipelines/clustering.py

Lines changed: 8 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -97,7 +97,13 @@ def filter_embeddings(
9797
speaker_idx : (num_embeddings, ) array
9898
"""
9999

100-
chunk_idx, speaker_idx = np.where(~np.any(np.isnan(embeddings), axis=2))
100+
# whether speaker is active
101+
active = np.sum(segmentations.data, axis=1) > 0
102+
# whether speaker embedding extraction went fine
103+
valid = ~np.any(np.isnan(embeddings), axis=2)
104+
105+
# indices of embeddings that are both active and valid
106+
chunk_idx, speaker_idx = np.where(active * valid)
101107

102108
# sample max_num_embeddings embeddings
103109
num_embeddings = len(chunk_idx)
@@ -240,6 +246,7 @@ def __call__(
240246
)
241247

242248
num_embeddings, _ = train_embeddings.shape
249+
243250
num_clusters, min_clusters, max_clusters = self.set_num_clusters(
244251
num_embeddings,
245252
num_clusters=num_clusters,

0 commit comments

Comments
 (0)