Replies: 1 comment
-
|
VAD isn't entirely accurate, it's as simple as that. I haven't dived into these flags yet, but have you tried messing with the VAD options? --vad_onset VAD_ONSET --vad_offset VAD_OFFSET Other related options include: --logprob_threshold LOGPROB_THRESHOLD if the average log probability is lower than this value, treat the decoding as failed (default: -1.0) --no_speech_threshold NO_SPEECH_THRESHOLD I'm considering trying to use voice alignment tools a la carte independent of WhisperX because of how slow it is, pyenv sucks for running CLI tools and the dependency hell of requirements for a fully functional WhisperX. |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
Hi,
I used both whisperX and fasterwhisper to transcribe the same audio, and the two resulting subtitles have the following differences:
WhisperX’s subtitles miss some parts of the content, but the timeline alignment is relatively good.
The subtitles transcribed by fasterwhisper are almost complete in terms of content, but it feels like there are some timestamp issues—either inaccurate or too long.
Even when I use the alternative VAD method (Silero) in whisperX to transcribe the audio, it still doesn’t capture as much content as fasterwhisper.
My question is: why does this happen? Isn’t whisperX also using fasterwhisper for transcription? Why is there missing content?
Is it possible to modify some parameters in whisperX so that it achieves the same transcription completeness as fasterwhisper while retaining whisperX’s alignment capability?
Does anyone with experience in improving transcription quality have any suggestions that could help me out?
Thanks
Beta Was this translation helpful? Give feedback.
All reactions