whisperX vs fasterWhisper #1054

sijitang · 2025-02-09T19:06:03Z

sijitang
Feb 9, 2025

Hi,

I used both whisperX and fasterwhisper to transcribe the same audio, and the two resulting subtitles have the following differences:

WhisperX’s subtitles miss some parts of the content, but the timeline alignment is relatively good.
The subtitles transcribed by fasterwhisper are almost complete in terms of content, but it feels like there are some timestamp issues—either inaccurate or too long.
Even when I use the alternative VAD method (Silero) in whisperX to transcribe the audio, it still doesn’t capture as much content as fasterwhisper.

My question is: why does this happen? Isn’t whisperX also using fasterwhisper for transcription? Why is there missing content?
Is it possible to modify some parameters in whisperX so that it achieves the same transcription completeness as fasterwhisper while retaining whisperX’s alignment capability?

Does anyone with experience in improving transcription quality have any suggestions that could help me out?

Thanks

adamlogan · 2025-10-18T00:14:35Z

adamlogan
Oct 18, 2025

VAD isn't entirely accurate, it's as simple as that.

I haven't dived into these flags yet, but have you tried messing with the VAD options?

--vad_onset VAD_ONSET
Onset threshold for VAD (see pyannote.audio), reduce this if speech is not being detected (default: 0.5)

--vad_offset VAD_OFFSET
Offset threshold for VAD (see pyannote.audio), reduce this if speech is not being detected. (default: 0.363)

Other related options include:

--logprob_threshold LOGPROB_THRESHOLD if the average log probability is lower than this value, treat the decoding as failed (default: -1.0)

--no_speech_threshold NO_SPEECH_THRESHOLD
if the probability of the <|nospeech|> token is higher than this value AND the decoding has failed due to logprob_threshold, consider the segment as silence (default: 0.6)

I'm considering trying to use voice alignment tools a la carte independent of WhisperX because of how slow it is, pyenv sucks for running CLI tools and the dependency hell of requirements for a fully functional WhisperX.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

whisperX vs fasterWhisper #1054

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Select a reply

Uh oh!

Uh oh!

whisperX vs fasterWhisper #1054

Uh oh!

sijitang Feb 9, 2025

Replies: 1 comment

Uh oh!

Uh oh!

adamlogan Oct 18, 2025

sijitang
Feb 9, 2025

adamlogan
Oct 18, 2025