|
| 1 | +# Changelog |
| 2 | + |
| 3 | +## Version 3.0.0 (2023-09-26) |
| 4 | + |
| 5 | +### Features and improvements |
| 6 | + |
| 7 | + - feat(pipeline): send pipeline to device with `pipeline.to(device)` |
| 8 | + - feat(pipeline): add `return_embeddings` option to `SpeakerDiarization` pipeline |
| 9 | + - feat(pipeline): make `segmentation_batch_size` and `embedding_batch_size` mutable in `SpeakerDiarization` pipeline (they now default to `1`) |
| 10 | + - feat(pipeline): add progress hook to pipelines |
| 11 | + - feat(task): add [powerset](https://www.isca-speech.org/archive/interspeech_2023/plaquet23_interspeech.html) support to `SpeakerDiarization` task |
| 12 | + - feat(task): add support for multi-task models |
| 13 | + - feat(task): add support for label scope in speaker diarization task |
| 14 | + - feat(task): add support for missing classes in multi-label segmentation task |
| 15 | + - feat(model): add segmentation model based on torchaudio self-supervised representation |
| 16 | + - feat(pipeline): check version compatibility at load time |
| 17 | + - improve(task): load metadata as tensors rather than pyannote.core instances |
| 18 | + - improve(task): improve error message on missing specifications |
| 19 | + |
| 20 | +### Breaking changes |
| 21 | + |
| 22 | + - BREAKING(task): rename `Segmentation` task to `SpeakerDiarization` |
| 23 | + - BREAKING(pipeline): pipeline defaults to CPU (use `pipeline.to(device)`) |
| 24 | + - BREAKING(pipeline): remove `SpeakerSegmentation` pipeline (use `SpeakerDiarization` pipeline) |
| 25 | + - BREAKING(pipeline): remove `segmentation_duration` parameter from `SpeakerDiarization` pipeline (defaults to `duration` of segmentation model) |
| 26 | + - BREAKING(task): remove support for variable chunk duration for segmentation tasks |
| 27 | + - BREAKING(pipeline): remove support for `FINCHClustering` and `HiddenMarkovModelClustering` |
| 28 | + - BREAKING(setup): drop support for Python 3.7 |
| 29 | + - BREAKING(io): channels are now 0-indexed (used to be 1-indexed) |
| 30 | + - BREAKING(io): multi-channel audio is no longer downmixed to mono by default. |
| 31 | + You should update how `pyannote.audio.core.io.Audio` is instantiated: |
| 32 | + * replace `Audio()` by `Audio(mono="downmix")`; |
| 33 | + * replace `Audio(mono=True)` by `Audio(mono="downmix")`; |
| 34 | + * replace `Audio(mono=False)` by `Audio()`. |
| 35 | + - BREAKING(model): get rid of (flaky) `Model.introspection` |
| 36 | + If, for some weird reason, you wrote some custom code based on that, |
| 37 | + you should instead rely on `Model.example_output`. |
| 38 | + - BREAKING(interactive): remove support for Prodigy recipes |
| 39 | + |
| 40 | + |
| 41 | +### Fixes and improvements |
| 42 | + |
| 43 | + - fix(pipeline): fix reproducibility issue with Ampere CUDA devices |
| 44 | + - fix(pipeline): fix support for IOBase audio |
| 45 | + - fix(pipeline): fix corner case with no speaker |
| 46 | + - fix(train): prevent metadata preparation to happen twice |
| 47 | + - fix(task): fix support for "balance" option |
| 48 | + - improve(task): shorten and improve structure of Tensorboard tags |
| 49 | + |
| 50 | +### Dependencies update |
| 51 | + |
| 52 | + - setup: switch to torch 2.0+, torchaudio 2.0+, soundfile 0.12+, lightning 2.0+, torchmetrics 0.11+ |
| 53 | + - setup: switch to pyannote.core 5.0+, pyannote.database 5.0+, and pyannote.pipeline 3.0+ |
| 54 | + - setup: switch to speechbrain 0.5.14+ |
| 55 | + |
| 56 | +## Version 2.1.1 (2022-10-27) |
| 57 | + |
| 58 | + - BREAKING(pipeline): rewrite speaker diarization pipeline |
| 59 | + - feat(pipeline): add option to optimize for DER variant |
| 60 | + - feat(clustering): add support for NeMo speaker embedding |
| 61 | + - feat(clustering): add FINCH clustering |
| 62 | + - feat(clustering): add min_cluster_size hparams to AgglomerativeClustering |
| 63 | + - feat(hub): add support for private/gated models |
| 64 | + - setup(hub): switch to latest hugginface_hub API |
| 65 | + - fix(pipeline): fix support for missing reference in Resegmentation pipeline |
| 66 | + - fix(clustering) fix corner case where HMM.fit finds too little states |
| 67 | + |
| 68 | +## Version 2.0.1 (2022-07-20) |
| 69 | + |
| 70 | + - BREAKING: complete rewrite |
| 71 | + - feat: much better performance |
| 72 | + - feat: Python-first API |
| 73 | + - feat: pretrained pipelines (and models) on Huggingface model hub |
| 74 | + - feat: multi-GPU training with pytorch-lightning |
| 75 | + - feat: data augmentation with torch-audiomentations |
| 76 | + - feat: Prodigy recipe for model-assisted audio annotation |
| 77 | + |
| 78 | +## Version 1.1.2 (2021-01-28) |
| 79 | + |
| 80 | + - fix: make sure master branch is used to load pretrained models (#599) |
| 81 | + |
| 82 | +## Version 1.1 (2020-11-08) |
| 83 | + |
| 84 | + - last release before complete rewriting |
| 85 | + |
| 86 | +## Version 1.0.1 (2018-07-19) |
| 87 | + |
| 88 | + - fix: fix regression in Precomputed.__call__ (#110, #105) |
| 89 | + |
| 90 | +## Version 1.0 (2018-07-03) |
| 91 | + |
| 92 | + - chore: switch from keras to pytorch (with tensorboard support) |
| 93 | + - improve: faster & better traning (`AutoLR`, advanced learning rate schedulers, improved batch generators) |
| 94 | + - feat: add tunable speaker diarization pipeline (with its own tutorial) |
| 95 | + - chore: drop support for Python 2 (use Python 3.6 or later) |
| 96 | + |
| 97 | +## Version 0.3.1 (2017-07-06) |
| 98 | + |
| 99 | + - feat: add python 3 support |
| 100 | + - chore: rewrite neural speaker embedding using autograd |
| 101 | + - feat: add new embedding architectures |
| 102 | + - feat: add new embedding losses |
| 103 | + - chore: switch to Keras 2 |
| 104 | + - doc: add tutorial for (MFCC) feature extraction |
| 105 | + - doc: add tutorial for (LSTM-based) speech activity detection |
| 106 | + - doc: add tutorial for (LSTM-based) speaker change detection |
| 107 | + - doc: add tutorial for (TristouNet) neural speaker embedding |
| 108 | + |
| 109 | +## Version 0.2.1 (2017-03-28) |
| 110 | + |
| 111 | + - feat: add LSTM-based speech activity detection |
| 112 | + - feat: add LSTM-based speaker change detection |
| 113 | + - improve: refactor LSTM-based speaker embedding |
| 114 | + - feat: add librosa basic support |
| 115 | + - feat: add SMORMS3 optimizer |
| 116 | + |
| 117 | +## Version 0.1.4 (2016-09-26) |
| 118 | + |
| 119 | + - feat: add 'covariance_type' option to BIC segmentation |
| 120 | + |
| 121 | +## Version 0.1.3 (2016-09-23) |
| 122 | + |
| 123 | + - chore: rename sequence generator in preparation of the release of |
| 124 | + TristouNet reproducible research package. |
| 125 | + |
| 126 | +## Version 0.1.2 (2016-09-22) |
| 127 | + |
| 128 | + - first public version |
0 commit comments