Skip to content

Commit d8a2b4f

Browse files
authored
Merge pull request #246 from m-bain/v3
V3
2 parents 46b4162 + 9ffb7e7 commit d8a2b4f

File tree

7 files changed

+18
-119
lines changed

7 files changed

+18
-119
lines changed

Dockerfile

Lines changed: 0 additions & 19 deletions
This file was deleted.

README.md

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -32,12 +32,12 @@
3232
<!-- <h2 align="left", id="what-is-it">What is it 🔎</h2> -->
3333

3434

35-
This repository provides fast automatic speaker recognition (70x realtime with large-v2) with word-level timestamps and speaker diarization.
35+
This repository provides fast automatic speech recognition (70x realtime with large-v2) with word-level timestamps and speaker diarization.
3636

3737
- ⚡️ Batched inference for 70x realtime transcription using whisper large-v2
3838
- 🪶 [faster-whisper](https://github.com/guillaumekln/faster-whisper) backend, requires <8GB gpu memory for large-v2 with beam_size=5
3939
- 🎯 Accurate word-level timestamps using wav2vec2 alignment
40-
- 👯‍♂️ Multispeaker ASR using speaker diarization from [pyannote-audio](https://github.com/pyannote/pyannote-audio) (labels each segment/word with speaker ID)
40+
- 👯‍♂️ Multispeaker ASR using speaker diarization from [pyannote-audio](https://github.com/pyannote/pyannote-audio) (speaker ID labels)
4141
- 🗣️ VAD preprocessing, reduces hallucination & batching with no WER degradation
4242

4343

@@ -74,9 +74,9 @@ GPU execution requires the NVIDIA libraries cuBLAS 11.x and cuDNN 8.x to be inst
7474

7575
### 2. Install PyTorch2.0, e.g. for Linux and Windows CUDA11.7:
7676

77-
`pip3 install torch torchvision torchaudio`
77+
`conda install pytorch==2.0.0 torchvision==0.15.0 torchaudio==2.0.0 pytorch-cuda=11.7 -c pytorch -c nvidia`
7878

79-
See other methods [here.](https://pytorch.org/get-started/locally/)
79+
See other methods [here.](https://pytorch.org/get-started/previous-versions/#v200)
8080

8181
### 3. Install this repo
8282

notebooks/whisperx.ipynb

Lines changed: 0 additions & 91 deletions
This file was deleted.

setup.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -6,8 +6,8 @@
66
setup(
77
name="whisperx",
88
py_modules=["whisperx"],
9-
version="3.1.0",
10-
description="Time-Accurate Automatic Speech Recognition.",
9+
version="3.1.1",
10+
description="Time-Accurate Automatic Speech Recognition using Whisper.",
1111
readme="README.md",
1212
python_requires=">=3.8",
1313
author="Max Bain",

whisperx/alignment.py

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -261,6 +261,10 @@ def align(
261261
word_text = "".join(word_chars["char"].tolist()).strip()
262262
if len(word_text) == 0:
263263
continue
264+
265+
# dont use space character for alignment
266+
word_chars = word_chars[word_chars["char"] != " "]
267+
264268
word_start = word_chars["start"].min()
265269
word_end = word_chars["end"].max()
266270
word_score = round(word_chars["score"].mean(), 3)

whisperx/asr.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -14,7 +14,7 @@
1414
from .types import TranscriptionResult, SingleSegment
1515

1616
def load_model(whisper_arch, device, compute_type="float16", asr_options=None, language=None,
17-
vad_options=None, model=None):
17+
vad_options=None, model=None, task="transcribe"):
1818
'''Load a Whisper model for inference.
1919
Args:
2020
whisper_arch: str - The name of the Whisper model to load.
@@ -31,7 +31,7 @@ def load_model(whisper_arch, device, compute_type="float16", asr_options=None, l
3131

3232
model = WhisperModel(whisper_arch, device=device, compute_type=compute_type)
3333
if language is not None:
34-
tokenizer = faster_whisper.tokenizer.Tokenizer(model.hf_tokenizer, model.model.is_multilingual, task="transcribe", language=language)
34+
tokenizer = faster_whisper.tokenizer.Tokenizer(model.hf_tokenizer, model.model.is_multilingual, task=task, language=language)
3535
else:
3636
print("No language specified, language will be first be detected for each audio file (increases inference time).")
3737
tokenizer = None

0 commit comments

Comments
 (0)