Is there a way to save the file produced when using --ff_vocal_extract? #521

subgrinder · 2025-10-10T14:22:29Z

subgrinder
Oct 10, 2025

Is there a way to save the vocal extraction file produced when using --ff_vocal_extract? If not, can one be added? I am doing a lot of runs using the same audio file to test different combinations of options to produce text files - extracting the audio once would save a lot of time. The goal is to find the least bad transcription.

Answered by Purfview

Oct 10, 2025

I still do not know where to find the saved extracted vocal file when
using --ff_dump. Where is the file and what is the name?

At the same place where the input file is or at the exe (I don't remember now).
Named same as input file + "dump".

View full answer

Purfview · 2025-10-10T15:04:24Z

Purfview
Oct 10, 2025
Maintainer

Is there a way to save the vocal extraction file produced when using --ff_vocal_extract?

Use --ff_dump, it dumps pre-processed audio by the "--ff_..." filters to the 16000Hz file and prevents deletion of some intermediate audio files.
Note: 16KHz/mono audio is what whisper/vad/diarization models need. When the most "--ff_..." filters works best with untouched audio.

I am doing a lot of runs using the same audio file to test different combinations of options to produce text files... The goal is to find the least bad transcription.

Keep in mind that by default whisper's model is not deterministic and a single byte change in audio data can have random effects on transcription.
Such a comparison is not easy, you need a large amount of audio with already proofed transcriptions and code to calculate the error rate. You need to process and compare maybe a hundreds hours of audio to have a tangible comparison of transcription quality.

0 replies

subgrinder · 2025-10-10T15:49:03Z

subgrinder
Oct 10, 2025
Author

Where are the files saved when using --ff_dump?

You need to process and compare maybe a hundreds hours of audio

Exactly: For my test files (~22 mins of audio), mb-roformer takes 2 minutes to extract the audio and mdx2_kim2 takes 45 seconds; the faster-whisper-xxl processing takes about 1min 45 secs with few options set (this will get longer for many combinations of options). There will be hundreds of runs using various combinations of settings. The faster-whisper processing time for the runs without audio extraction will take a minimum of 30 hours (actually more due to settings and time between starting each run). If mb-roformer is run every time for the same file, that adds about 33 hours to the time to do the runs (the same tests with mdx2_kim2 will be another 30+ hours). I don't know if the vocal extractors also have non-deterministic output but if they do then that would be eliminated by using the first extracted audio file as input to all subsequent runs.

1 reply

Purfview Oct 10, 2025
Maintainer

For my test files (~22 mins of audio)

22 mins is nothing, I meant a hundreds hours of actual audio, not hours of runs.

subgrinder · 2025-10-10T18:42:39Z

subgrinder
Oct 10, 2025
Author

I still do not know where to find the saved extracted vocal file when
using --ff_dump. Where is the file and what is the name?

A single 22 minute track has been very useful. There are lots of "knobs" to
turn when transcribing using faster-whisper-xxl. Using the same vocal track
over with different command line options produces different results that can be
checked to determine which command line arguments produce the best result.
There are a lot more combinations to be run that I can automate rather than hand
editing each run. The assumption is that the "best" set of command line options will
provide a best guess for transcribing other vocal tracks of the same series. A
different set is likely needed for a different series....

I used OneClickTranscribe.bat (model changed to large-v2) to get SRT files for a
set of 52 TV shows. The shows all end with the same closing words. Half the
transcriptions got it right (with spelling and punctuation variations - that is
expected). Here are some examples of the transcriptions that are wrong:

I'll steal the boy!
I'll kill my horse!
Well, sit over and wait!
I don't feel very high!

And a real doozy: Tonto's been overhauled!

The actual sentence is "Hi-yo, Silver! Away!" - The Lone Ranger. Other old
shows are getting similar results. Those shows do not have great quality
audio - it is what we have.

Note that the 22 minute audio is not a Lone Ranger vocal track but another
old show.

3 replies

Purfview Oct 10, 2025
Maintainer

I still do not know where to find the saved extracted vocal file when
using --ff_dump. Where is the file and what is the name?

At the same place where the input file is or at the exe (I don't remember now).
Named same as input file + "dump".

Answer selected by subgrinder

subgrinder Oct 10, 2025
Author

Thank you - got it.

It is in the same folder as the input file. Note that there are 2 .wav files there. I did two runs to confirm the second - one with mdx2_kim2 and one with mb-roformer. The second file has "mdx2" or "roformer" matching the vocal extractor. The latter two files are what I will use to pass to faster-whisper-xxl as those are (I think) the result the vocal extraction: their sizes are comparable to spleeter extraction.

Purfview Oct 10, 2025
Maintainer

spleeter is very outdated, much worse than "mdx_kim2".

Is there a way to save the file produced when using --ff_vocal_extract? #521

Uh oh!

Uh oh!

subgrinder Oct 10, 2025

Replies: 3 comments · 4 replies

Uh oh!

Uh oh!

Purfview Oct 10, 2025 Maintainer

Uh oh!

subgrinder Oct 10, 2025 Author

Uh oh!

Purfview Oct 10, 2025 Maintainer

Uh oh!

subgrinder Oct 10, 2025 Author

Uh oh!

Uh oh!

Purfview Oct 10, 2025 Maintainer

Uh oh!

Uh oh!

subgrinder Oct 10, 2025 Author

Uh oh!

Uh oh!

Purfview Oct 10, 2025 Maintainer

subgrinder
Oct 10, 2025

Replies: 3 comments 4 replies

Purfview
Oct 10, 2025
Maintainer

subgrinder
Oct 10, 2025
Author

Purfview Oct 10, 2025
Maintainer

subgrinder
Oct 10, 2025
Author

Purfview Oct 10, 2025
Maintainer

subgrinder Oct 10, 2025
Author

Purfview Oct 10, 2025
Maintainer