Releases · m-bain/whisperX

24 Jun 14:23

Barabazs

v3.4.0

b93e9b6

v3.4.0

What's Changed

chore: add lockfile check step to CI workflows by @Barabazs in #1130
docs: add common issue section for libcudnn dependencies in README by @Barabazs in #1161
feat: diarization model env config by @bgdnvk in #1101
docs: add missing torch import to Python usage example in README by @hammerill in #1168
feat: enhance diarization with optional output of speaker embeddings by @eek in #1085

New Contributors

@bgdnvk made their first contribution in #1101
@hammerill made their first contribution in #1168
@eek made their first contribution in #1085

Full Changelog: v3.3.4...v3.4.0

Contributors

eek, Barabazs, and 2 other contributors

Assets 3

03 May 09:39

Barabazs

v3.3.4

b2d50a0

v3.3.4

What's Changed

feat: improve CLI loading speed by lazy loading public API by @Barabazs in #1128

Full Changelog: v3.3.3...v3.3.4

Contributors

Barabazs

Assets 3

01 May 09:09

Barabazs

v3.3.3

f5b40b5

v3.3.3

What's Changed

Silero VAD support by @3manifold in #888
refactor: add more types by @Barabazs in #996
fix vad_method is none by @winking324 in #995
support timestamp for numbers. by @bfs18 in #986
Update links to language models in README by @MJochim in #757
chore: remove deprecated VAD_SEGMENTATION_URL by @Barabazs in #1003
chore: handle empty segments_list case in silero by @tan90xx in #1005
chore: fix variable naming inconsistency from segments to segments_list by @tan90xx in #1006
feat: add Latvian align model by @slikts in #1017
Add models_cache_only param by @philmcmahon in #1024
Added Phoneme-Based ASR Model for Tagalog by @mtfugin in #1067
Basque alignment model by @xezpeleta in #1074
Revert "feat: add Basque alignment model (#1074)" by @Barabazs in #1077
feat: use uv for package/project management by @Barabazs in #1002
feat(hotwords): Pass hotwords option to faster-whisper by @jademlc in #1073
refactor: update import statements to use explicit module paths by @Barabazs in #1091
docs: update installation instructions by @Barabazs in #1092
fix: update setuptools configuration to include package discovery by @Barabazs in #1093
Remove duplicated item by @yccheok in #1109
fix: resolve dependency issues by @Barabazs in #1126

New Contributors

@3manifold made their first contribution in #888
@winking324 made their first contribution in #995
@bfs18 made their first contribution in #986
@MJochim made their first contribution in #757
@tan90xx made their first contribution in #1005
@slikts made their first contribution in #1017
@philmcmahon made their first contribution in #1024
@mtfugin made their first contribution in #1067
@xezpeleta made their first contribution in #1074
@jademlc made their first contribution in #1073
@yccheok made their first contribution in #1109

Full Changelog: v3.3.1...v3.3.3

Contributors

slikts, yccheok, and 10 other contributors

Assets 3

10 Apr 07:38

Barabazs

v3.3.2

73db397

v3.3.2

What's Changed

chore: update ctranslate2 version requirement to >=4.5.0

This patch release should resolve CUDNN related issues.

Full Changelog: v3.3.1...v3.3.2

Assets 3

08 Jan 17:01

Barabazs

v3.3.1

734084c

v3.3.1

What's Changed

refactor: add type hints and fix import statement by @Barabazs in #975
feat: include speaker information in WriteTXT when diarizing by @Barabazs in #976
Bug Fix: Suppress Numerals dataclasses replace method by @jmt0221 in #981

New Contributors

@jmt0221 made their first contribution in #981

Full Changelog: v3.3.0...v3.3.1

Contributors

Barabazs and jmt0221

Assets 3

02 Jan 13:09

Barabazs

v3.3.0

4916192

v3.3.0

What's Changed

Update faster-whisper to 1.0.2 to enable model distil-large-v3 by @moritzbrantner in #814
latest faster-whisper support added by @Hasan-Naseer in #875
Working version with pyannote:3.3.2 and faster-whisper:1.1.0 by @ibombonato in #936
Add ultization to verbose flag by @H4CK3Rabhi in #759
Added local_files_only option on whisperx.load_model for offline mode by @RoqueGio in #867
adding cache_dir to wav2vec2 by @bnitsan in #681
feat: add basic installation test flow & restrict python versions by @Barabazs in #965
chore: add build and release workflow by @Barabazs in #966
fix: update README image source and enhance setup.py for long description by @Barabazs in #968
docs: update installation instructions in README by @Barabazs in #969
fix: add UTF-8 encoding when reading README.md by @xigh in #970
chore: loosen ctranslate2 version restriction & bump whisperX version by @Barabazs in #971

New Contributors

@moritzbrantner made their first contribution in #814
@Hasan-Naseer made their first contribution in #875
@ibombonato made their first contribution in #936
@H4CK3Rabhi made their first contribution in #759
@RoqueGio made their first contribution in #867
@bnitsan made their first contribution in #681
@xigh made their first contribution in #970

Full Changelog: v3.2.0...v3.3.0

Contributors

ibombonato, xigh, and 6 other contributors

Assets 3

18 Dec 08:03

Barabazs

v3.2.0

7307306

v3.2.0

Device and Language Support

added Korean wav2vec2 model by @Boulaouaney in #277
Add Czech alignment model by @Thebys in #280
Adding Norwegian Bokmål and Norwegian Nynorsk by @peregilk in #636
Support language names in --language parameter. by @jkukul in #517
Add align model for catalan language. by @davidmartinrius in #581
add missing Cantonese in supported languages by @MahmoudAshraf97 in #617
Add alignment model for Malayalam by @kurianbenoy in #585
Added Romanian phoneme-based ASR model by @Majdoddin in #791
added alignment for sk and sl languages by @jan-panoch in #852
Add war2vec model for Vietnamese in #278
Add Urdu model support for alignment by @abCods in #374
chore(writer): Join words without spaces for ja, zh by @jim60105 in #440

Bug Fixes and Stability Improvements

fix Unequal Stack Size VAD error by @m-bain in #281
fix: Bug in type hinting by @VisionOra in #294
pin faster whisper by @sorgfresser in #474
Fix repeat transcription on different languages and proper suppress_numerals use by @Joemgu7 in #395
fix writer fail on segments 0 by @sorgfresser in #429
fix missing speaker prefix by @invisprints in #438
fix: correct defaut_asr_options with new options (patch 0.8) by @remic33 in #458
Fixes --model_dir path by @canoalberto in #648
fix: force ctranslate to version 4.4.0 by @Barabazs in #946
fix: update faster-whisper dependencies by @cococig in #716
fix: ZeroDivisionError when --print_progress True by @mvoggu in #494
Minor fixes for word options and subtitles by @amolinasalazar in #549
fix unboundlocalerror by @sorgfresser in #554
Fix: Allow vad options to be configurable by passing to FasterWhisperPipeline and merge_chunks. by @abettke in #507
fix minimum input length for torch wav2vec2 models by @MahmoudAshraf97 in #510
fix(diarize): key error on empty track by @characat0 in #518
pip compliance for git+ installs by @spbisc97 in #603

Documentation Updates

adds link to whisperX medium on replicate.com by @CaRniFeXeR in #431
Document --compute_type command line option by @dotgrid in #430
adding link to Replicate demo by @daanelson in #352
fix: typo in error message by @zamoshchin in #493
Fix link in README.md by @jimregan in #668
Update README.md by @valentt in #509
Add a special note about Speaker-Diarization-3.0 in readme by @kaihe-stori in #521
Update README to correct speaker diarization version link by @gillens in #618
Update README.md by @mlopsengr in #630
fix link by @M0HID in #605
Remove torchvision from README by @baer in #378

Miscellaneous Changes

move model to assets by @m-bain in #945
Update alignment.py by @Ayushi-Desynova in #418
Update alignment.py by @awerks in #427
push contributions from main by @m-bain in #290
make diarization faster by @davidas1 in #400
Add device_index option by @sorgfresser in #266
Add transcribe keywords by @sorgfresser in #269
Added download path parameter. by @prameshbajra in #284
Suppress numerals by @m-bain in #303
Add Audacity export by @Ca-ressemble-a-du-fake in #309
Update transcribe.py -> small change in batch_size description by @mabergerx in #382
Suggest using pytorch-cuda 11.8 instead of 11.7 by @tijszwinkels in #255
feat: Add merge chunks chunk_size as arguments. by @jim60105 in #445
A solution to long subtitles and words without timestamps by @awerks in #459
chore(writer): improve text display(ja etc) in json file by @darwintree in #472
add faster whisper threading by @sorgfresser in #473
Pyannote3 by @remic33 in #492
Update alignment.py by @piuy11 in #487
Pass patience and beam_size to faster-whisper. by @jkukul in #527
remove the minimum length for alignment and print the failing segment by @MahmoudAshraf97 in #529
Update setup.py to use pyannote.audio version with working GPU by @wuurrd in #531
Update setup.py to download pyannote depending on platform by @justinwlin in #541
Drop ffmpeg-python dependency and call ffmpeg directly. by @hidenori-endo in #570
no align based on space by @sorgfresser in #556
Update asr.py and make the model parameter be used by @kaka1909 in #580
Move load_model after WhisperModel by @DougTrajano in #584
Update pyannote to 3.1.0 by @remic33 in #586
support for large-v3 by @MahmoudAshraf97 in #599
Added option to load Custom VAD model to load model method by @Swami-Abhinav in #654
Update pyannote to v3.1.1 to fix a diarization problem (and diarize.py) by @santialferez in #646
Get rid of numeral_symbol_tokens variable in printed message by @KossaiSbai in #669
Add Replicate large-v3 demo by @victor-upmeet in #703
local vad model by @m-bain in #944
Feat: add new align models - SHORT by @Equipo45 in #922
Update alignment.py by @peregilk in #687