You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+47-52Lines changed: 47 additions & 52 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,30 +1,37 @@
1
-
> [!IMPORTANT]
2
-
> I propose (paid) scientific [consulting services](https://herve.niderb.fr/consulting.html) to companies willing to make the most of their data and open-source speech processing toolkits (and `pyannote` in particular).
1
+
Using `pyannote.audio` open-source toolkit in production?
2
+
Make the most of it thanks to our [consulting services](https://herve.niderb.fr/consulting.html).
3
3
4
-
# Speaker diarization with `pyannote.audio`
4
+
# `pyannote.audio` speaker diarization toolkit
5
5
6
-
`pyannote.audio` is an open-source toolkit written in Python for speaker diarization. Based on [PyTorch](pytorch.org) machine learning framework, it provides a set of trainable end-to-end neural building blocks that can be combined and jointly optimized to build speaker diarization pipelines.
6
+
`pyannote.audio` is an open-source toolkit written in Python for speaker diarization. Based on [PyTorch](pytorch.org) machine learning framework, it comes with state-of-the-art [pretrained models and pipelines](https://hf.co/pyannote), that can be further finetuned to your own data for even better performance.
## TL;DR [](https://colab.research.google.com/github/pyannote/pyannote-audio/blob/develop/tutorials/intro.ipynb)
13
+
## TL;DR
14
+
15
+
1. Install [`pyannote.audio`](https://github.com/pyannote/pyannote-audio)`3.0` with `pip install pyannote.audio`
16
+
2. Accept [`pyannote/segmentation-3.0`](https://hf.co/pyannote/segmentation-3.0) user conditions
17
+
3. Accept [`pyannote/speaker-diarization-3.0`](https://hf.co/pyannote-speaker-diarization-3.0) user conditions
18
+
4. Create access token at [`hf.co/settings/tokens`](https://hf.co/settings/tokens).
14
19
15
20
16
21
```python
17
-
# 1. visit hf.co/pyannote/speaker-diarization and hf.co/pyannote/segmentation and accept user conditions (only if requested)
18
-
# 2. visit hf.co/settings/tokens to create an access token (only if you had to go through 1.)
- 2022-12-02 > ["How I reached 1st place at Ego4D 2022, 1st place at Albayzin 2022, and 6th place at VoxSRC 2022 speaker diarization challenges"](tutorials/adapting_pretrained_pipeline.ipynb)
73
71
- 2022-10-23 > ["One speaker segmentation model to rule them all"](https://herve.niderb.fr/fastpages/2022/10/23/One-speaker-segmentation-model-to-rule-them-all)
74
72
- 2021-08-05 > ["Streaming voice activity detection with pyannote.audio"](https://herve.niderb.fr/fastpages/2021/08/05/Streaming-voice-activity-detection-with-pyannote.html)
75
-
-Miscellaneous
76
-
-[Training with `pyannote-audio-train` command line tool](tutorials/training_with_cli.md)
-[Introduction to speaker diarization](https://umotion.univ-lemans.fr/video/9513-speech-segmentation-and-speaker-diarization/) / JSALT 2023 summer school / 90 min
75
+
-[Speaker segmentation model](https://www.youtube.com/watch?v=wDH2rvkjymY) / Interspeech 2021 / 3 min
76
+
-[First releaase of pyannote.audio](https://www.youtube.com/watch?v=37R_R82lfwA) / ICASSP 2020 / 8 min
79
77
80
78
## Benchmark
81
79
82
-
Out of the box, `pyannote.audio` default speaker diarization [pipeline](https://hf.co/pyannote/speaker-diarization) is expected to be much better (and faster) in v2.x than in v1.1. Those numbers are diarization error rates (in %)
Out of the box, `pyannote.audio` speaker diarization [pipeline](https://hf.co/pyannote/speaker-diarization-3.0) v3.0 is expected to be much better (and faster) than v2.x.
81
+
Those numbers are diarization error rates (in %):
82
+
83
+
| Dataset \ Version | v1.1 | v2.0 |[v2.1](https://hf.co/pyannote/speaker-diarization-2.1)|[v3.0](https://hf.co/pyannote/speaker-diarization-3.0)| <ahref="mailto:herve-at-niderb-dot-fr?subject=Premium pyannote.audio pipeline&body=Looks like I got your attention! Drop me an email for more details. Hervé.">Premium</a> |
If you use `pyannote.audio` please use the following citations:
99
98
100
99
```bibtex
101
-
@inproceedings{Bredin2020,
102
-
Title = {{pyannote.audio: neural building blocks for speaker diarization}},
103
-
Author = {{Bredin}, Herv{\'e} and {Yin}, Ruiqing and {Coria}, Juan Manuel and {Gelly}, Gregory and {Korshunov}, Pavel and {Lavechin}, Marvin and {Fustes}, Diego and {Titeux}, Hadrien and {Bouaziz}, Wassim and {Gill}, Marie-Philippe},
104
-
Booktitle = {ICASSP 2020, IEEE International Conference on Acoustics, Speech, and Signal Processing},
105
-
Year = {2020},
100
+
@inproceedings{Plaquet23,
101
+
author={Alexis Plaquet and Hervé Bredin},
102
+
title={{Powerset multi-class cross entropy loss for neural speaker diarization}},
103
+
year=2023,
104
+
booktitle={Proc. INTERSPEECH 2023},
106
105
}
107
106
```
108
107
109
108
```bibtex
110
-
@inproceedings{Bredin2021,
111
-
Title = {{End-to-end speaker segmentation for overlap-aware resegmentation}},
112
-
Author = {{Bredin}, Herv{\'e} and {Laurent}, Antoine},
113
-
Booktitle = {Proc. Interspeech 2021},
114
-
Year = {2021},
109
+
@inproceedings{Bredin23,
110
+
author={Hervé Bredin},
111
+
title={{pyannote.audio 2.1 speaker diarization pipeline: principle, benchmark, and recipe}},
112
+
year=2023,
113
+
booktitle={Proc. INTERSPEECH 2023},
115
114
}
116
115
```
117
116
118
-
## Support
119
-
120
-
For commercial enquiries and scientific consulting, please contact [me](mailto:[email protected]).
121
-
122
117
## Development
123
118
124
119
The commands below will setup pre-commit hooks and packages needed for developing the `pyannote.audio` library.
0 commit comments