v6.0
New v6 VAD Released
- Improved quality;
- v5 features and improvements kept;
- 16% less errors on noisy real-life data;
- 11% less errors on multi-domain validation;
- Various community contributions;
- Added quality comparison with TenVAD;
- Changed the training algorithm, ideally it should result in higher robustness;
- Metrics on a new (manually noised) community-provided dataset to be added soon;
- Known persisting issues: music with human voice-like instruments, very high pitched voices (artificial, cartoons, small children);
What's Changed
- Improve documentation. by @EarningsCall in #553
- Adamnsandle by @adamnsandle in #573
- fx #576 by @adamnsandle in #579
- Add cpp source based on libtorch by @NathanJHLee in #578
- fx negative ths bug by @adamnsandle in #581
- Add haskell example by @qwbarch in #591
- Add CITATION.cff file for proper citation by @kiwamizamurai in #601
- Fix/cpp vad context by @OJRYK in #605
- Specify time resolution when returning speech coordinates in seconds by @b3by in #627
- Use second coordinates for audio concatenation in collect_chunks and drop_chunks by @b3by in #626
- Surface drop_chunks in init by @davidrs in #656
- Adamnsandle by @adamnsandle in #669
- fx by @adamnsandle in #670
- Adamnsandle by @adamnsandle in #671
- fx by @adamnsandle in #672
- Adamnsandle by @adamnsandle in #673
- Adamnsandle by @adamnsandle in #674
- Adamnsandle by @adamnsandle in #675
- Adamnsandle by @adamnsandle in #676
- Adding additional params to get_speech_timestamps by @shashank14k in #664
- get rid of hop_size_ratio by @adamnsandle in #677
New Contributors
- @EarningsCall made their first contribution in #553
- @NathanJHLee made their first contribution in #578
- @qwbarch made their first contribution in #591
- @kiwamizamurai made their first contribution in #601
- @OJRYK made their first contribution in #605
- @b3by made their first contribution in #627
- @davidrs made their first contribution in #656
- @shashank14k made their first contribution in #664
Full Changelog: v5.1.2...v6.0