Releases · SYSTRAN/faster-whisper · GitHub

31 Oct 11:34

MahmoudAshraf97

faster-whisper 1.2.1 Latest

Latest

What's Changed

only merge when clip_timestamps are not provided by @MahmoudAshraf97 in #1345
Fix: Prevent <|nocaptions|> tokens in BatchedInferencePipeline by @mmichelli in #1338
Upgrade to Silero-VAD V6 by @MahmoudAshraf97 and @sssshhhhhh in #1373
Offload retry logic to hf hub by @MahmoudAshraf97 in #1382
Refinements for clip timestamps in batched inference by @MahmoudAshraf97 in #1376

New Contributors

@mmichelli made their first contribution in #1338
@sssshhhhhh made their first contribution in #1365

Full Changelog: v1.2.0...v1.2.1

Contributors

mmichelli, MahmoudAshraf97, and sssshhhhhh

Assets 2

06 Aug 00:32

MahmoudAshraf97

faster-whisper 1.2.0

What's Changed

feat: allow passing specific revision to download by @felixmosh in #1292
Support distil-large-v3.5 by @MahmoudAshraf97 in #1311
Feature: Allow loading of private HF models by @r15hil in #1309
bugfix: Get correct chunk index when restoring timestamps by @MahmoudAshraf97 in #1336
Remove Silence in Batched transcription by @MahmoudAshraf97 in #1297

New Contributors

@djpg made their first contribution in #1267
@felixmosh made their first contribution in #1292
@r15hil made their first contribution in #1309

Full Changelog: v1.1.1...v1.2.0

Contributors

felixmosh, djpg, and 2 other contributors

Assets 2

01 Jan 14:45

MahmoudAshraf97

faster-whisper 1.1.1

What's Changed

Brings back original VAD parameters naming by @Purfview in #1181
Make batched suppress_tokens behaviour same as in sequential by @Purfview in #1194
Fixes OOM Errors - too high RAM usage by VAD by @Purfview in #1198
Add duration of audio and VAD removed duration to BatchedInferencePipeline by @greenw0lf in #1186
Fix neg_threshold by @Purfview in #1191

New Contributors

@greenw0lf made their first contribution in #1186

Full Changelog: v1.1.0...v1.1.1

Contributors

greenw0lf and Purfview

Assets 2

21 Nov 17:19

MahmoudAshraf97

faster-whisper 1.1.0

New Features

New batched inference that is 4x faster and accurate, Refer to README on usage instructions.
Support for the new large-v3-turbo model.
VAD filter is now 3x faster on CPU.
Feature Extraction is now 3x faster.
Added log_progress to WhisperModel.transcribe to print transcription progress.
Added multilingual option to transcription to allow transcribing multilingual audio. Note that Large models already have codeswitching capabilities, so this is mostly beneficial to medium model or smaller.
WhisperModel.detect_language now has the option to use VAD filter and improved language detection using language_detection_segments and language_detection_threshold.

Bug Fixes

Use correct features padding for encoder input when chunk_length <30s
Use correct seek value in output

Other Changes

replace NamedTuple with dataclass in Word, Segment, TranscriptionOptions, TranscriptionInfo, and VadOptions, this allows conversion to json without nesting. Note that _asdict() method is still available in Word and Segment classes for backward compatibility but will be removed in the next release, you can use dataclasses.asdict() instead.
Added new tests for development
Updated benchmarks in the Readme
use jiwer instead of evaluate in benchmarks
Filter out non_speech_tokens in suppressed tokens by @jordimas in #898

New Contributors

@Jiltseb made their first contribution in #856
@heimoshuiyu made their first contribution in #1092

Full Changelog: v1.0.3...v1.1.0

Contributors

jordimas, Jiltseb, and heimoshuiyu

Assets 2

01 Jul 10:05

faster-whisper 1.0.3

Upgrade Silero-Vad model to latest V5 version (#884)

Silero-vad V5 release: https://github.com/snakers4/silero-vad/releases/tag/v5.0

window_size_samples parameter is fixed at 512.
Change to use the state variable instead of the existing h and c variables.
Slightly changed internal logic, now some context (part of previous chunk) is passed along with the current chunk.
Change the dimensions of the state variable from 64 to 128.
Replace ONNX file with V5 version

Other changes

Improve language detection when using clip_timestamps (#867)
Docker file improvements (#848)
Fix #839 incorrect clip_timestamps being used in model (#842)

Assets 2

06 May 02:08

faster-whisper 1.0.2

Add support for distil-large-v3 (#755)
The latest Distil-Whisper model, distil-large-v3, is intrinsically designed to work with the OpenAI sequential algorithm.
Benchmarks (#773)
Introduces functionality to measure benchmarking for memory, Word Error Rate (WER), and speed in Faster-whisper.
Support initializing more whisper model args (#807)
Small bug fix:
- code breaks if audio is empty (#768)
- Foolproof: Disable VAD if clip_timestamps is in use (#769)
- make faster_whisper.assets as a valid python package to distribute (#774)
- Loosen tokenizers version constraint (#804)
- CUDA version and updated installation instructions (#785)
New feature from original openai Whisper project:
- Feature/add hotwords (#731)
- Improve language detection (#732)

Assets 2

01 Mar 10:46

nguyendc-hub

faster-whisper 1.0.1

Bug fixes and performance improvements:
- Update logic to get segment from features before encoding (#705)
- Fix window end heuristic for hallucination_silence_threshold (#706)

Assets 2

22 Feb 08:56

nguyendc-hub

faster-whisper 1.0.0

Support distil-whisper model (#557)
Robust knowledge distillation of the Whisper model via large-scale pseudo-labelling.
For more detail: https://github.com/huggingface/distil-whisper
Upgrade ctranslate2 version to 4.0 to support CUDA 12 (#694)
Upgrade PyAV version to 11.* to support Python3.12.x (#679)
Small bug fixes
- Illogical "Avoid computing higher temperatures on no_speech" (#652)
- broken prompt_reset_on_temperature (#604)
- Word timing tweaks (#616)
New improvements from original OpenAI Whisper project
- Skip silence around hallucinations (#646)
- Prevent infinite loop for out-of-bound timestamps in clip_timestamps (#697)

Assets 2

22 Feb 12:08

nguyendc-hub

faster-whisper 0.10.1

Fix the broken tag v0.10.0

Assets 2

22 Feb 11:55

nguyendc-hub

faster-whisper 0.10.0

Support "large-v3" model with
- The ability to load feature_size/num_mels and other from preprocessor_config.json
- A new language token for Cantonese (yue)
Update CTranslate2 requirement to include the latest version 3.22.0
Update tokenizers requirement to include the latest version 0.15
Change the hub to fetch models from Systran organization

Assets 2