Whisper giving surprising results #457

colemar · 2025-04-21T00:17:36Z

colemar
Apr 21, 2025

Hi all,

I tested the audio-to-text Whisper tool for the first time with a 2 hours audio file extracted from the italian dubbing of the movie "On the Beach" (1959).

According to whisper_log.txt the command line was:

faster-whisper-xxl.exe --language it --model "medium" --standard "C:\Users\colem\AppData\Local\Temp\a8b79e0e-14b2-46f6-85d6-5b0b5ff5949e.wav"

In about 10 minutes it produced a .srt file whose first lines are:

1
00:00:24,140 --> 00:00:27,080
Sottotitoli e revisione a Kanataka Pronti
all'emersione.

(english: "subtitles and revision at Kanataka Ready to surface")
and the last lines are:

1363
02:14:27,060 --> 02:14:28,780
Sottotitoli creati dalla comunità Amara.org

(english: "subtitles created by community Amara.org")

There is no trace of the sentences I highlighted in bold in the audio file, at least not around those timestamps.
Amara.org is a subtitle service.

Should I assume that the Whisper medium model was also trained with a subtitle from Amara.org for this specific movie?

Answered by Purfview

Apr 21, 2025

In about 10 minutes it produced a .srt file whose first lines are...
There is no trace of the sentences I highlighted in bold in the audio file, at least not around those timestamps.
Amara.org is a subtitle service.

This is just a hallucination.
To reduce hallucinations try --vad_method pyannote_v3 and --model large-v2

Should I assume that the Whisper medium model was also trained with a subtitle from Amara.org for this specific movie?

No.

View full answer

Purfview · 2025-04-21T00:30:28Z

Purfview
Apr 21, 2025
Maintainer

In about 10 minutes it produced a .srt file whose first lines are...
There is no trace of the sentences I highlighted in bold in the audio file, at least not around those timestamps.
Amara.org is a subtitle service.

This is just a hallucination.
To reduce hallucinations try --vad_method pyannote_v3 and --model large-v2

Should I assume that the Whisper medium model was also trained with a subtitle from Amara.org for this specific movie?

No.

0 replies

colemar · 2025-04-21T20:57:30Z

colemar
Apr 21, 2025
Author

Yes, much better with this command line (run from Subtitle Edit):
faster-whisper-xxl.exe --language it --model "large-v2" --standard --vad_method pyannote_v3 "C:\Users\colem\AppData\Local\Temp\17fa141a-0da9-493a-9ece-b2337a4cc6a6.wav"

3 replies

Purfview Apr 21, 2025
Maintainer

If you use it on movies/series where there is background music/noise I would recommend to use --ff_vocal_extract mdx_kim2, this should improve things further.
But it's not good on .wavs extracted by SE, I dunno why in your command it's using its wav as input, it should be the original file.
Maybe you are using it in some unusual way.

And there is much better voice extraction with --ff_vocal_extract mb-roformer in Pro version: #456

colemar Apr 21, 2025
Author

I see that SE displays a message telling that is converting the audio. The input was a .opus file.
I believe that .wav is lossless anyway.

Purfview Apr 22, 2025
Maintainer

The input was a .opus file.

That must be it, as for mkv/mp4 containers it inputs the original file.
EDIT: I just tested .opus and .mp3, with both SE made that temp .wav as input.

@niksedk could you extend direct file feature to all input files types?

I believe that .wav is lossless anyway.

No, those are very low quality wav files, not suitable for audio filtering at all.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Whisper giving surprising results #457

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 2 comments 3 replies

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Select a reply

Uh oh!

Whisper giving surprising results #457

Uh oh!

Uh oh!

colemar Apr 21, 2025

Replies: 2 comments · 3 replies

Uh oh!

Purfview Apr 21, 2025 Maintainer

Uh oh!

colemar Apr 21, 2025 Author

Uh oh!

Uh oh!

Purfview Apr 21, 2025 Maintainer

Uh oh!

colemar Apr 21, 2025 Author

Uh oh!

Uh oh!

Purfview Apr 22, 2025 Maintainer

colemar
Apr 21, 2025

Replies: 2 comments 3 replies

Purfview
Apr 21, 2025
Maintainer

colemar
Apr 21, 2025
Author

Purfview Apr 21, 2025
Maintainer

colemar Apr 21, 2025
Author

Purfview Apr 22, 2025
Maintainer