Is there a way to save the file produced when using --ff_vocal_extract? #521
-
|
Is there a way to save the vocal extraction file produced when using --ff_vocal_extract? If not, can one be added? I am doing a lot of runs using the same audio file to test different combinations of options to produce text files - extracting the audio once would save a lot of time. The goal is to find the least bad transcription. |
Beta Was this translation helpful? Give feedback.
Replies: 3 comments 4 replies
-
Use
Keep in mind that by default whisper's model is not deterministic and a single byte change in audio data can have random effects on transcription. |
Beta Was this translation helpful? Give feedback.
-
|
Where are the files saved when using --ff_dump?
Exactly: For my test files (~22 mins of audio), mb-roformer takes 2 minutes to extract the audio and mdx2_kim2 takes 45 seconds; the faster-whisper-xxl processing takes about 1min 45 secs with few options set (this will get longer for many combinations of options). There will be hundreds of runs using various combinations of settings. The faster-whisper processing time for the runs without audio extraction will take a minimum of 30 hours (actually more due to settings and time between starting each run). If mb-roformer is run every time for the same file, that adds about 33 hours to the time to do the runs (the same tests with mdx2_kim2 will be another 30+ hours). I don't know if the vocal extractors also have non-deterministic output but if they do then that would be eliminated by using the first extracted audio file as input to all subsequent runs. |
Beta Was this translation helpful? Give feedback.
-
|
I still do not know where to find the saved extracted vocal file when A single 22 minute track has been very useful. There are lots of "knobs" to I used OneClickTranscribe.bat (model changed to large-v2) to get SRT files for a I'll steal the boy! And a real doozy: Tonto's been overhauled! The actual sentence is "Hi-yo, Silver! Away!" - The Lone Ranger. Other old Note that the 22 minute audio is not a Lone Ranger vocal track but another |
Beta Was this translation helpful? Give feedback.
At the same place where the input file is or at the exe (I don't remember now).
Named same as input file + "dump".