Skip to content

Commit 795b92a

Browse files
committed
Merge branch 'release/3.0.0'
2 parents 7ead17e + 9a5a902 commit 795b92a

File tree

110 files changed

+8413
-14658
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

110 files changed

+8413
-14658
lines changed

.faq/FAQ.md

Lines changed: 20 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,20 @@
1+
2+
# Frequently Asked Questions
3+
4+
{%- for question in questions %}
5+
- [{{ question.title }}](#{{ question.slug }})
6+
{%- endfor %}
7+
8+
9+
{%- for question in questions %}
10+
11+
<a name="{{ question.slug }}"></a>
12+
## {{ question.title }}
13+
14+
{{ question.body }}
15+
16+
{%- endfor %}
17+
18+
<hr>
19+
20+
Generated by [FAQtory](https://github.com/willmcgugan/faqtory)

.faq/suggest.md

Lines changed: 34 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,34 @@
1+
Thank you for your issue.
2+
3+
{%- if questions -%}
4+
{% if questions|length == 1 %}
5+
We found the following entry in the [FAQ]({{ faq_url }}) which you may find helpful:
6+
{%- else %}
7+
We found the following entries in the [FAQ]({{ faq_url }}) which you may find helpful:
8+
{%- endif %}
9+
10+
{% for question in questions %}
11+
- [{{ question.title }}]({{ faq_url }}#{{ question.slug }})
12+
{%- endfor %}
13+
14+
{%- else -%}
15+
You might want to check the [FAQ]({{ faq_url }}) if you haven't done so already.
16+
{%- endif %}
17+
18+
Feel free to close this issue if you found an answer in the FAQ.
19+
20+
If your issue is a feature request, please read [this](https://xyproblem.info/) first and update your request accordingly, if needed.
21+
22+
If your issue is a bug report, please provide a [minimum reproducible example](https://stackoverflow.com/help/minimal-reproducible-example) as a link to a self-contained [Google Colab](https://colab.research.google.com/) notebook containing everthing needed to reproduce the bug:
23+
- installation
24+
- data preparation
25+
- model download
26+
- etc.
27+
28+
Providing an MRE will increase your chance of getting an answer from the community (either maintainers or other power users).
29+
30+
Companies relying on `pyannote.audio` in production may contact [me](https://herve.niderb.fr) via email regarding:
31+
* paid scientific consulting around speaker diarization and speech processing in general;
32+
* custom models and tailored features (via the local tech transfer office).
33+
34+
> This is an automated reply, generated by [FAQtory](https://github.com/willmcgugan/faqtory)

.github/stale.yml

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
# Number of days of inactivity before an issue becomes stale
2-
daysUntilStale: 60
2+
daysUntilStale: 180
33
# Number of days of inactivity before a stale issue is closed
4-
daysUntilClose: 7
4+
daysUntilClose: 30
55
# Issues with these labels will never be considered stale
66
exemptLabels:
77
- pinned

.github/workflows/new_issue.yml

Lines changed: 29 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,29 @@
1+
name: issues
2+
on:
3+
issues:
4+
types: [opened]
5+
jobs:
6+
add-comment:
7+
runs-on: ubuntu-latest
8+
permissions:
9+
issues: write
10+
steps:
11+
- uses: actions/checkout@v3
12+
with:
13+
ref: develop
14+
- name: Install FAQtory
15+
run: pip install FAQtory
16+
- name: Run Suggest
17+
env:
18+
TITLE: ${{ github.event.issue.title }}
19+
run: faqtory suggest "$TITLE" > suggest.md
20+
- name: Read suggest.md
21+
id: suggest
22+
uses: juliangruber/read-file-action@v1
23+
with:
24+
path: ./suggest.md
25+
- name: Suggest FAQ
26+
uses: peter-evans/create-or-update-comment@a35cf36e5301d70b76f316e867e7788a55a31dae
27+
with:
28+
issue-number: ${{ github.event.issue.number }}
29+
body: ${{ steps.suggest.outputs.content }}

.github/workflows/test.yml

Lines changed: 18 additions & 25 deletions
Original file line numberDiff line numberDiff line change
@@ -2,9 +2,9 @@ name: Tests
22

33
on:
44
push:
5-
branches: [ develop ]
5+
branches: [develop]
66
pull_request:
7-
branches: [ develop ]
7+
branches: [develop]
88

99
jobs:
1010
build:
@@ -13,28 +13,21 @@ jobs:
1313
strategy:
1414
matrix:
1515
os: [ubuntu-latest]
16-
python-version: [3.7, 3.8, 3.9]
16+
python-version: [3.8, 3.9, "3.10"]
1717
steps:
18-
- uses: actions/checkout@v2
19-
- name: Set up Python ${{ matrix.python-version }}
20-
uses: actions/setup-python@v2
21-
with:
22-
python-version: ${{ matrix.python-version }}
23-
- name: Install libsndfile
24-
if: matrix.os == 'ubuntu-latest'
25-
run: |
26-
sudo apt-get install libsndfile1
27-
- name: Install pyannote.audio
28-
run: |
18+
- uses: actions/checkout@v2
19+
- name: Set up Python ${{ matrix.python-version }}
20+
uses: actions/setup-python@v2
21+
with:
22+
python-version: ${{ matrix.python-version }}
23+
- name: Install libsndfile
24+
if: matrix.os == 'ubuntu-latest'
25+
run: |
26+
sudo apt-get update
27+
sudo apt-get install libsndfile1
28+
- name: Install pyannote.audio
29+
run: |
2930
pip install -e .[dev,testing]
30-
- name: Test with pytest
31-
run: |
32-
export PYANNOTE_DATABASE_CONFIG=$GITHUB_WORKSPACE/tests/data/database.yml
33-
pytest --cov-report=xml
34-
- name: Upload coverage to Codecov
35-
uses: codecov/codecov-action@v1
36-
with:
37-
file: ./coverage.xml
38-
env_vars: PYTHON
39-
name: codecov-pyannote-audio
40-
fail_ci_if_error: false
31+
- name: Test with pytest
32+
run: |
33+
pytest

.pre-commit-config.yaml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -20,7 +20,7 @@ repos:
2020
args: ["--profile", "black"]
2121

2222
# Formatting, Whitespace, etc
23-
- repo: git://github.com/pre-commit/pre-commit-hooks
23+
- repo: https://github.com/pre-commit/pre-commit-hooks
2424
rev: v2.2.3
2525
hooks:
2626
- id: trailing-whitespace

CHANGELOG.md

Lines changed: 128 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,128 @@
1+
# Changelog
2+
3+
## Version 3.0.0 (2023-09-26)
4+
5+
### Features and improvements
6+
7+
- feat(pipeline): send pipeline to device with `pipeline.to(device)`
8+
- feat(pipeline): add `return_embeddings` option to `SpeakerDiarization` pipeline
9+
- feat(pipeline): make `segmentation_batch_size` and `embedding_batch_size` mutable in `SpeakerDiarization` pipeline (they now default to `1`)
10+
- feat(pipeline): add progress hook to pipelines
11+
- feat(task): add [powerset](https://www.isca-speech.org/archive/interspeech_2023/plaquet23_interspeech.html) support to `SpeakerDiarization` task
12+
- feat(task): add support for multi-task models
13+
- feat(task): add support for label scope in speaker diarization task
14+
- feat(task): add support for missing classes in multi-label segmentation task
15+
- feat(model): add segmentation model based on torchaudio self-supervised representation
16+
- feat(pipeline): check version compatibility at load time
17+
- improve(task): load metadata as tensors rather than pyannote.core instances
18+
- improve(task): improve error message on missing specifications
19+
20+
### Breaking changes
21+
22+
- BREAKING(task): rename `Segmentation` task to `SpeakerDiarization`
23+
- BREAKING(pipeline): pipeline defaults to CPU (use `pipeline.to(device)`)
24+
- BREAKING(pipeline): remove `SpeakerSegmentation` pipeline (use `SpeakerDiarization` pipeline)
25+
- BREAKING(pipeline): remove `segmentation_duration` parameter from `SpeakerDiarization` pipeline (defaults to `duration` of segmentation model)
26+
- BREAKING(task): remove support for variable chunk duration for segmentation tasks
27+
- BREAKING(pipeline): remove support for `FINCHClustering` and `HiddenMarkovModelClustering`
28+
- BREAKING(setup): drop support for Python 3.7
29+
- BREAKING(io): channels are now 0-indexed (used to be 1-indexed)
30+
- BREAKING(io): multi-channel audio is no longer downmixed to mono by default.
31+
You should update how `pyannote.audio.core.io.Audio` is instantiated:
32+
* replace `Audio()` by `Audio(mono="downmix")`;
33+
* replace `Audio(mono=True)` by `Audio(mono="downmix")`;
34+
* replace `Audio(mono=False)` by `Audio()`.
35+
- BREAKING(model): get rid of (flaky) `Model.introspection`
36+
If, for some weird reason, you wrote some custom code based on that,
37+
you should instead rely on `Model.example_output`.
38+
- BREAKING(interactive): remove support for Prodigy recipes
39+
40+
41+
### Fixes and improvements
42+
43+
- fix(pipeline): fix reproducibility issue with Ampere CUDA devices
44+
- fix(pipeline): fix support for IOBase audio
45+
- fix(pipeline): fix corner case with no speaker
46+
- fix(train): prevent metadata preparation to happen twice
47+
- fix(task): fix support for "balance" option
48+
- improve(task): shorten and improve structure of Tensorboard tags
49+
50+
### Dependencies update
51+
52+
- setup: switch to torch 2.0+, torchaudio 2.0+, soundfile 0.12+, lightning 2.0+, torchmetrics 0.11+
53+
- setup: switch to pyannote.core 5.0+, pyannote.database 5.0+, and pyannote.pipeline 3.0+
54+
- setup: switch to speechbrain 0.5.14+
55+
56+
## Version 2.1.1 (2022-10-27)
57+
58+
- BREAKING(pipeline): rewrite speaker diarization pipeline
59+
- feat(pipeline): add option to optimize for DER variant
60+
- feat(clustering): add support for NeMo speaker embedding
61+
- feat(clustering): add FINCH clustering
62+
- feat(clustering): add min_cluster_size hparams to AgglomerativeClustering
63+
- feat(hub): add support for private/gated models
64+
- setup(hub): switch to latest hugginface_hub API
65+
- fix(pipeline): fix support for missing reference in Resegmentation pipeline
66+
- fix(clustering) fix corner case where HMM.fit finds too little states
67+
68+
## Version 2.0.1 (2022-07-20)
69+
70+
- BREAKING: complete rewrite
71+
- feat: much better performance
72+
- feat: Python-first API
73+
- feat: pretrained pipelines (and models) on Huggingface model hub
74+
- feat: multi-GPU training with pytorch-lightning
75+
- feat: data augmentation with torch-audiomentations
76+
- feat: Prodigy recipe for model-assisted audio annotation
77+
78+
## Version 1.1.2 (2021-01-28)
79+
80+
- fix: make sure master branch is used to load pretrained models (#599)
81+
82+
## Version 1.1 (2020-11-08)
83+
84+
- last release before complete rewriting
85+
86+
## Version 1.0.1 (2018-07-19)
87+
88+
- fix: fix regression in Precomputed.__call__ (#110, #105)
89+
90+
## Version 1.0 (2018-07-03)
91+
92+
- chore: switch from keras to pytorch (with tensorboard support)
93+
- improve: faster & better traning (`AutoLR`, advanced learning rate schedulers, improved batch generators)
94+
- feat: add tunable speaker diarization pipeline (with its own tutorial)
95+
- chore: drop support for Python 2 (use Python 3.6 or later)
96+
97+
## Version 0.3.1 (2017-07-06)
98+
99+
- feat: add python 3 support
100+
- chore: rewrite neural speaker embedding using autograd
101+
- feat: add new embedding architectures
102+
- feat: add new embedding losses
103+
- chore: switch to Keras 2
104+
- doc: add tutorial for (MFCC) feature extraction
105+
- doc: add tutorial for (LSTM-based) speech activity detection
106+
- doc: add tutorial for (LSTM-based) speaker change detection
107+
- doc: add tutorial for (TristouNet) neural speaker embedding
108+
109+
## Version 0.2.1 (2017-03-28)
110+
111+
- feat: add LSTM-based speech activity detection
112+
- feat: add LSTM-based speaker change detection
113+
- improve: refactor LSTM-based speaker embedding
114+
- feat: add librosa basic support
115+
- feat: add SMORMS3 optimizer
116+
117+
## Version 0.1.4 (2016-09-26)
118+
119+
- feat: add 'covariance_type' option to BIC segmentation
120+
121+
## Version 0.1.3 (2016-09-23)
122+
123+
- chore: rename sequence generator in preparation of the release of
124+
TristouNet reproducible research package.
125+
126+
## Version 0.1.2 (2016-09-22)
127+
128+
- first public version

FAQ.md

Lines changed: 54 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,54 @@
1+
2+
# Frequently Asked Questions
3+
- [Can I apply pretrained pipelines on audio already loaded in memory?](#can-i-apply-pretrained-pipelines-on-audio-already-loaded-in-memory)
4+
- [Can I use gated models (and pipelines) offline?](#can-i-use-gated-models-(and-pipelines)-offline)
5+
- [Does pyannote support streaming speaker diarization?](#does-pyannote-support-streaming-speaker-diarization)
6+
- [How can I improve performance?](#how-can-i-improve-performance)
7+
- [How does one spell and pronounce pyannote.audio?](#how-does-one-spell-and-pronounce-pyannoteaudio)
8+
9+
<a name="can-i-apply-pretrained-pipelines-on-audio-already-loaded-in-memory"></a>
10+
## Can I apply pretrained pipelines on audio already loaded in memory?
11+
12+
Yes: read [this tutorial](tutorials/applying_a_pipeline.ipynb) until the end.
13+
14+
<a name="can-i-use-gated-models-(and-pipelines)-offline"></a>
15+
## Can I use gated models (and pipelines) offline?
16+
17+
**Short answer**: yes, see [this tutorial](tutorials/applying_a_model.ipynb) for models and [that one](tutorials/applying_a_pipeline.ipynb) for pipelines.
18+
19+
**Long answer**: gating models and pipelines allows [me](https://herve.niderb.fr) to know a bit more about `pyannote.audio` user base and eventually help me write grant proposals to make `pyannote.audio` even better. So, please fill gating forms as precisely as possible.
20+
21+
For instance, before gating `pyannote/speaker-diarization`, I had no idea that so many people were relying on it in production. Hint: sponsors are more than welcome! Maintaining open source libraries is time consuming.
22+
23+
That being said, this whole authentication process does not prevent you from using official `pyannote.audio` models offline (i.e. without going through the authentication process in every `docker run ...` or whatever you are using in production): see [this tutorial](tutorials/applying_a_model.ipynb) for models and [that one](tutorials/applying_a_pipeline.ipynb) for pipelines.
24+
25+
<a name="does-pyannote-support-streaming-speaker-diarization"></a>
26+
## Does pyannote support streaming speaker diarization?
27+
28+
**Short answer:** not out of the box, no.
29+
30+
**Long answer:** [I](https://herve.niderb.fr) am looking for sponsors to add this feature. In the meantime, [`diart`](https://github.com/juanmc2005/StreamingSpeakerDiarization) is the closest you can get from a streaming `pyannote.audio`. You might also be interested in [this blog post](https://herve.niderb.fr/fastpages/2021/08/05/Streaming-voice-activity-detection-with-pyannote.html) about streaming voice activity detection based on `pyannote.audio`.
31+
32+
<a name="how-can-i-improve-performance"></a>
33+
## How can I improve performance?
34+
35+
**Long answer:**
36+
37+
1. Manually annotate dozens of conversations as precisely as possible.
38+
2. Separate them into train (80%), development (10%) and test (10%) subsets.
39+
3. Setup the data for use with [`pyannote.database`](https://github.com/pyannote/pyannote-database#speaker-diarization).
40+
4. Follow [this recipe](https://github.com/pyannote/pyannote-audio/blob/develop/tutorials/adapting_pretrained_pipeline.ipynb).
41+
5. Enjoy.
42+
43+
**Also:** [I am available](https://herve.niderb.fr) for contracting to help you with that.
44+
45+
<a name="how-does-one-spell-and-pronounce-pyannoteaudio"></a>
46+
## How does one spell and pronounce pyannote.audio?
47+
48+
📝 Written in lower case: `pyannote.audio` (or `pyannote` if you are lazy). Not `PyAnnote` nor `PyAnnotate` (sic).
49+
📢 Pronounced like the french verb `pianoter`. `pi` like in `pi`ano, not `py` like in `py`thon.
50+
🎹 `pianoter` means to play the piano (hence the logo 🤯).
51+
52+
<hr>
53+
54+
Generated by [FAQtory](https://github.com/willmcgugan/faqtory)

0 commit comments

Comments
 (0)