Standalone Faster-Whisper-XXL features #231

Purfview · 2024-04-06T16:04:16Z

Purfview
Apr 6, 2024
Maintainer

EDIT1: Don't post your questions here, it's already littered with random posts.

Includes all Standalone Faster-Whisper features +the additional ones mentioned below.
Includes all needed libs.

Vocal extraction model:

--ff_mdx_kim2: Preprocess audio with MDX23 Kim vocal v2 model (thanks to Kimberley Jensen). [Better than HT Demucs v4 FT]

Alternative VAD (Voice activity detection) methods:

--vad_method choices:

silero_v3 - Generally less accurate than v4, but doesn't have some quirks of v4.
silero_v4 - Same as silero_v4_fw. Runs original Silero's code instead of adapted one.
silero_v5 - Same as silero_v5_fw. Runs original Silero's code instead of adapted one.
silero_v4_fw - Default model. Most accurate Silero version, has some non-fatal quirks.
silero_v5_fw - Bad accuracy. Not a VAD, it's Random Detector of Some Speech :), has various fatal quirks. Avoid!
pyannote_v3 - The best accuracy, supports CUDA.
pyannote_onnx_v3 - Lite version of pyannote_v3. Similar accuracy to Silero v4, maybe a bit better, supports CUDA.
webrtc - Low accuracy, outdated VAD. Takes only 'vad_min_speech_duration_ms' & 'vad_speech_pad_ms'.
auditok - Actually it's not VAD, it's AAD - Audio Activity Detection.

Speaker Diarization:

--diarize choices:

pyannote_v3.0 - Fastest for CPU
pyannote_v3.1 - Same as v3.0 but should be faster with CUDA
reverb_v1 - Allegedly better than pyannote v3
reverb_v2 - The slowest, allegedly the best

For more read and post there -> Speaker Diarization
Legal notice: Reverb models are only for personal non-profit use.

Latest CTranslate2:

Up to ~26% faster on CPU with the int8 quantizations.
Flash attention support, that's CUDA, but the benchmarks shows no effect on the performance.

despairTK · 2024-04-07T03:41:27Z

despairTK
Apr 7, 2024

I really like the new parameter --vad_alt_method. Among them, silero_v3/silero_v4/pyannote_onnx_v3 are much better than the original VAD.

For example, there will be some gaps in the original VAD, and for example, sentences starting with "So" will often have a delayed start of the timeline. These issues are resolved in the silero_v3/silero_v4/pyannote_onnx_v3 parameters.

Finally, let me ask, which of the three parameters of silero_v3/silero_v4/pyannote_onnx_v3 has the best test results? Or what are their characteristics?

3 replies

Purfview Apr 7, 2024
Maintainer Author

First why I've added alternatives, because Silero v4 has problems like false-positive detection on nearly silent parts and other imperfections:
snakers4/silero-vad#396
snakers4/silero-vad#369

...sentences starting with "So" will often have a delayed start of the timeline

Did you evaluate that looking at a transcription?

I really like the new parameter --vad_alt_method. Among them, silero_v3/silero_v4/pyannote_v3 are much better than the original VAD.

Actually the original VAD runs exactly same model as "silero_v4". 😌
"silero_v4" runs original code from silero's repo, when the original VAD runs the same model with adapted code.

Finally, let me ask, which of the three parameters of silero_v3/silero_v4/pyannote_onnx_v3 has the best test results?

Probably neither of them are best, I not tested them much yet.
Did one quick test with "pyannote_onnx_v3", for my disappointment it was less accurate than original, it missed few segments.

despairTK Apr 7, 2024

Well...after replying to you, I transcribed dozens of audios, ranging from 10 minutes to 2 hours, and the audio language is English. After comparing the transcription results, I think silero_v3 is the best choice.

Purfview Apr 7, 2024
Maintainer Author

After comparing the transcription results,

That's not how VAD evaluation works, you are looking at whisper's randomness not at VAD's accuracy.

To evaluate VAD you need to look only at VAD's timestamps, you can get them with --vad_dump, then those can be loaded in SE and looked on the waveform.

And VAD params needs to be adjusted so you can see accuracy better, like vad_speech_pad_ms=0 and vad_min_silence_duration_ms=500.
And the same threshold doesn't convert to a same responsiveness on the different methods.

tristan-mcinnis · 2024-04-09T06:29:53Z

tristan-mcinnis
Apr 9, 2024

any hope of doing something similar for mac in the future?

5 replies

Purfview Apr 9, 2024
Maintainer Author

Probably there won't be macOS builds for this.

dgoryeo Apr 9, 2024

I guess no chance of a python package either?

Purfview Apr 9, 2024
Maintainer Author

I guess no chance of a python package either?

Yeap, it's out of the scope of this repo.

tristan-mcinnis Apr 11, 2024

Sorry, saw a comment here earlier about 'why post' this here but now its gone -- I was trying to keep it in the thread of this 'XXL' version. For example, if there can be an XXL version why not as if there will be a '-MAC' version in the future... who knows - just asking :)

Separately, for others who may find this thread and dismayed at no mac support there is a lot of work being done with 'whisperkit' which also does great work. https://github.com/argmaxinc/WhisperKit

The future of high-quality transcriptions is very bright.

aehlke May 17, 2024

WhisperKit has no plans to add VAD though. I plan to do it using a flutter port I found of silero

Sonnenfleck · 2024-04-21T05:55:01Z

Sonnenfleck
Apr 21, 2024

A little annoyance with --ff_mdx_kim2 in r192.3.3 [XXL], which otherwise works great (well, after I realized I had to change my scripts because it's now faster-whisper-xxl.exe instead of whisper-faster-xxl.exe 😉 ):

I'm running Faster-Whisper-XXL in a Nextcloud folder (with a cronjob checking if new audio files have been synchronized, then running faster-whisper-xxl). So far, this worked fine, but in r192.3.3 with MDX filtering enabled, it seems first the *_mdx.wav file is created and then it's moved to a temp folder (?). This move fails because Nextcloud already tries to sync the mdx file, and this leads to whisper-faster-xxl just quitting with an error that the *_mdx.wav file is already in use.

I now set the Nextcloud rules to just ignore *_mdx.wav files, but would it be possible to create them in a temp folder from the start?

5 replies

Purfview Apr 21, 2024
Maintainer Author

it seems first the *_mdx.wav file is created and then it's moved to a temp folder (?)

It's intermediate file, first it's written to the same folder as the original input file then it's deleted. No "movements" of it.
If Nextcloud app tries to do anything with it then of course deletion fails.

would it be possible to create them in a temp folder

Should be possible.

Sonnenfleck Apr 22, 2024

I'm confused: If the mdx file is deleted before transcription starts, then what file is whisper running on?
At least as I observe it, the problem here is: As soon as mdx filtering is finished, Nextcloud grabs the _mdx.wav to sync it, but before it finished syncing (which would be just a few seconds later), faster-whisper tries to delete the _mdx.wav but can't, because it's in use. Shouldn't the intermediate file be deleted after transcription? Or does VAD create another intermediate file (if yes, then it's not visible in the same folder)?

Purfview Apr 22, 2024
Maintainer Author

I'm confused: If the mdx file is deleted before transcription starts, then what file is whisper running on?

The intermediate file is loaded into variables further and it's not needed in the physical form anymore.
Yesterday I've made a quick patch to use a temp folder but not tested it yet, I'll try to release it today.

Or does VAD create another intermediate file (if yes, then it's not visible in the same folder)?

VAD's "intermediate" is created only in memory.

Sonnenfleck Apr 22, 2024

Ah, I get it, thanks for clarifying and for implementing the patch. :)

Purfview Apr 22, 2024
Maintainer Author

Implemented in r192.3.4

Herzfrequenz21 · 2024-04-27T19:05:07Z

Herzfrequenz21
Apr 27, 2024

Do I need to use some kind of tag to make the recognition against a little noise or soft music better?

14 replies

dgoryeo May 2, 2024

You're right, I thought the memory error was at the point of vocal isolation.

Herzfrequenz21 May 5, 2024

This has nothing to do with the audio length.

How can I make the memory be used from the CPU as before?
This line now specifies CUDA:
Standalone Faster-Whisper-XXL r192.1.1.1 running on: CUDA

Purfview May 6, 2024
Maintainer Author

Use --device cpu

1001ruchka May 6, 2024

This has nothing to do with the audio length.

How can I make the memory be used from the CPU as before? This line now specifies CUDA: Standalone Faster-Whisper-XXL r192.1.1.1 running on: CUDA

Try using CUDA with int8:
--compute_type=int8_float16 or --compute_type=int8
With these parameters I managed to run large_v2 on 4GB of video memory.

for Russian:

-prompt default

faster-whisper-xxl.exe --language ru --model "large-v2" --compute_type=int8_float16 --sentence -prompt default --beep_off --print_progress --vad_alt_method pyannote_v3 --ff_mdx_kim2 --mdx_device cpu "video.mp4"

or

-prompt None

faster-whisper-xxl.exe --language ru --model "large-v2" --compute_type=int8_float16 --sentence -prompt None --beep_off --print_progress --vad_alt_method pyannote_v3 --ff_mdx_kim2 --mdx_device cpu "video.mp4"

or

-prompt auto

faster-whisper-xxl.exe --language ru --model "large-v2" --compute_type=int8_float16 --sentence -prompt auto --beep_off --print_progress --vad_alt_method pyannote_v3 --ff_mdx_kim2 --mdx_device cpu "video.mp4"

Herzfrequenz21 May 21, 2024

Use --device cpu

It's taking me a long time. I've never been able to wait.
I haven't tried the option suggested below yet.

koebbe14 · 2024-05-22T13:36:45Z

koebbe14
May 22, 2024

Such a great tool. Especially for those who aren't very saavy in Python or command line! Thanks for creating!

Is it possible to perform speaker diarization with this standalone version?

9 replies

koebbe14 Jul 15, 2024

I would throw in $50

dgoryeo Jul 15, 2024

You can count on $20 from me as well.

Purfview Jul 15, 2024
Maintainer Author

A new thread would be better place for this -> #281
Make a donation and post there, or I'll update there with the amount donated.

...posting PayPal addresses or other personal information might not be so clever...

If you don't trust me with your email "address" then how you can trust me running the program?! 😉

thbaero Jul 15, 2024

I'll do the rest of it to make the 100£ ;)
Thanks for the others to chip in.

Let's continue on the other thread. #281

thbaero Jul 16, 2024

@dgoryeo - only 10£ will be needed in the end

MahmoudAshraf97 · 2024-07-02T12:15:26Z

MahmoudAshraf97
Jul 2, 2024

Hey @Purfview , I was wondering if you have (or willing to run) any benchmarks that compare pyannote_v3 with silero_v5
Thanks

4 replies

Purfview Dec 10, 2024
Maintainer Author

Sorry for late answer, probably you are aware now that silero_v5 is not good, worst from all Silero versions.

mjamil85 Dec 11, 2024

In my testing with many audio (english or japanese) samples, silero v5 is the worst one.

My final test result, [--vad_method default (silero_v4_fw)], gives better results.

dgoryeo Dec 12, 2024

@mjamil85 , did you get to test silero_v3 as well?

mjamil85 Dec 29, 2024

Already test, not good either.

rodnvs · 2024-07-30T15:37:21Z

rodnvs
Jul 30, 2024

Hi @Purfview, I did a test with the --ff_mdx_kim2 feature and it took a long time to complete, about 45min for a 10min video. Is the voice extraction feature processed using the GPU, or CPU?

3 replies

1001ruchka Jul 30, 2024

If you have an NVIDIA graphics card with CUDA support and you didn't specify --mdx_device cpu - CUDA should be used. On my NVIDIA 3060 12GB, processing ff_mdx_kim2 for a 90 minute video takes no more than 3-5 minutes. Please note ff_mdx_kim2 is quite demanding on video memory, my video memory usage is over 8GB during ff_mdx_kim2.

rodnvs Jul 31, 2024

Yep, I forgot about it, my bad.
Here are the results from a new test using the GPU, on a RTX 3050 (laptop):
The UVR5 still very fast compared to the built in extractor. Any thoughts?

>> Transcribe only
Command: faster-whisper-xxl.exe test_video.mp4 --language es --task transcribe --model medium --compute_type auto --device cuda --standard --print_progress --vad_alt_method pyannote_v3 --vad_min_silence_duration_ms 2000
Start Time: 2024-07-31 11:40:32.919825
End Time: 2024-07-31 11:42:43.851822
Duration: 2m10s
Result: Successful
-------------------------------------------------
>> Voice extraction + Transcribe
Command: faster-whisper-xxl.exe test_video.mp4 --language es --task transcribe --model medium --compute_type auto --device cuda --standard --print_progress --vad_alt_method pyannote_v3 --vad_min_silence_duration_ms 2000 --ff_mdx_kim2 --mdx_device cuda
Start Time: 2024-07-31 11:45:37.965749
End Time: 2024-07-31 12:01:47.287552
Duration: 16m09s
Result: Successful
-------------------------------------------------
>> UVR5 Python script voice extraction only (https://github.com/nomadkaraoke/python-audio-separator)
Command: audio-separator test_video.mp4 --model_filename Kim_Vocal_2.onnx --output_format wav --single_stem Vocals
Start Time: 2024-07-31 14:29:06.907680
End Time: 2024-07-31 14:30:15.179309
Duration: 1m08s
Result: Successful

Purfview Jul 31, 2024
Maintainer Author

about 45min for a 10min video

That's too slow for GPU. Try --mdx_device cpu in your benchmark test.
EDIT: I think that's too slow even for CPU.

RTX 3050 (laptop)

How much VRAM?

Is the voice extraction feature processed using the GPU, or CPU?

It can use both.

Xylemm · 2024-08-13T09:41:31Z

Xylemm
Aug 13, 2024

Is there a series of parameters that work best to capture very short audio clips?

My clips with just "Yes" or "Let's go" produce a blank transcription. I've adjusted --vad_min_speech_duration_ms and others, but nothing catches these short clips.

2 replies

dgoryeo Aug 13, 2024

Which vad method do you use? I believe Auditok would perform best for your scenario.
In addition, if your clips are super short (say less than 15 sec), I think you should disable VAD.

Xylemm Aug 14, 2024

I fixed this with --vad_filter=false

Now it perfectly transcribes every file down to about 1 or 2 seconds long.

Thank you!

JustAndreww · 2024-09-21T15:00:53Z

JustAndreww
Sep 21, 2024

Is there any way to make auto dialogs to work?
Quite often I see following lines:

<Time1>: Hello.
<Time2>: Hi.

instead of

<Time1>:- Hello.</br>
- Hi.

Thanks!

0 replies

ad4mts · 2024-10-22T09:33:35Z

ad4mts
Oct 22, 2024

Since this faster Whisper model has been modified from the original version, could you please upload the source code so the community can contribute and add new features or im i missing something? Thanks!

8 replies

soenneker Oct 30, 2024

@Purfview How are you compiling the python project into an .exe?

ad4mts Oct 30, 2024

@Purfview Totally a fair ask, i think, im not looking to contribute directly, i think just it d be helpful to see the source since this standalone has cool tweaks like VAD and --sentence for movies, wich is not implemented in the faster whisper or whisper-ctranslate2 repo. Having it open will make it easier for us to understand and maybe build on it.
it is just a friendly suggestion. at the end it is up to you of course ;)

Purfview Oct 30, 2024
Maintainer Author

@soenneker How are you compiling the python project into an .exe?

https://github.com/pyinstaller/pyinstaller

@ad4mts im not looking to contribute... ...wich is not implemented in the faster whisper or whisper-ctranslate2 repo.

If you want something implemented somewhere you go post there in corresponding repos. And leave programming things to the programmers they know what to do and where to look for sources, no need to fantasize about things.

Ed1ks Dec 2, 2024

@Purfview
Your standalone faster whisper is great,
Could you please provide the source code or parameters you are using in your cli/wrapper?
If I am using the original faster-whisper, then the output is different. I want to see Parameters, which this standalone is doing.

ClaireCJS Dec 2, 2024

@Purfview Your standalone faster whisper is great, Could you please provide the source code or parameters you are using in your cli/wrapper? If I am using the original faster-whisper, then the output is different. I want to see Parameters, which this standalone is doing.

I end up logging my query and output, and if i run it several times, the log shows all the runs.

Maybe a --logging option

but .log not .txt or lyrics will get overwritten

MMasutin · 2024-12-29T08:11:23Z

MMasutin
Dec 29, 2024

Are
pyannote_wespeaker-voxceleb-resnet34-LM.bin and
reverb_v2.bin
enabled by placing them in a model's folder?

0 replies

SeriousOldMan · 2025-02-14T03:09:51Z

SeriousOldMan
Feb 14, 2025

Hi, first I want to thank you for the great solution. Makes the life much easier.

Maybe I am missing out one of the myriads of call arguments. Is there any option to keep the python backend running between invocations to shorten startup times? I want to use the solution for a voice chat bot and short turnaround times are key here.

3 replies

Purfview Feb 14, 2025
Maintainer Author

Currently such feature is not implemented.
I'm not sure how such would work, I guess subsequent commands could be passed with pipe.

SeriousOldMan Feb 14, 2025

That would be a great feature - I am willing to donate for that.

Purfview Apr 20, 2025
Maintainer Author

I looked at it, it's possible to do, but as much as I love to do feature creep , such implementation I think is way out of scope for this project.
You should look for another project dedicated for such thing.

ClaireCJS · 2025-02-14T10:33:24Z

ClaireCJS
Feb 14, 2025

I’ve been thinking about this issue during my music-collection-transcription-project (which has grown to 6000+ lines of script & code outside of WhisperXXL) ... I’ve cranked out 7,200 SRT files now. Really, just need a separate agent that keeps the model in memory which is called upon by the transcriber. The agent would just take the filename and/or type/alias of the model and store it in a way that can be called via the same filename and/or type/alias. So we could ```load_model_into_memory.exe whisper``` or ```load_model_into_memory.exe c:\whatever\model_folder``` Which I'm sure is much easier said than done. But at this point 50% of my energy bill is going to reloading the model 500+ times a day, I hit a lifetime-record electric bill of $620 this month, and I’ve only done 10% of my collection, with ≥50% to go. We’re talking 3–5 more months of hugely-increased electric bills and the environmental impact that comes with it. I’ve had people call me a monster for not pausing my music when I leave the house. It was quite over-the-top, but my point is some people care about this more than others and for those people it’s a way to mitigate the ethical concerns of the environmental impact of AI My January electric bill is usually $370-$470 and this January it was $620, in part due to the extra-cold January, and us setting the heat a bit higher now that we’re older ....... but no doubt the lifetime record was partially achieved due to me running my GPU non-stop and I expect to see a huge bill at the end of February too. And I have months to go. I’ve gotten my transcription compliance for my 60,000-song collection from an initial starting point of 29% now up to 45%. But that’s not even halfway through... I’ve done 14% myself with the 7,200 SRTs i’ve generated. All of this would finish so much faster without that model load time. It would halve the the impact. And the cost. We’re talking several hundred dollars being thrown out the window to further warm our planet. Nobody has an agent like this and I’m sure it would be rapidly picked up by other coders as a way of speeding up workflows and mitigating some of the ethical concerns with AI usage. I almost feel like it would be award-worthy for introducing a concept that, if caught on, could greatly mitigate various ethical and environmental concerns. Concerns i don’t car about too much personally... I care about my electric bill, llol -𝓒𝓵𝓪𝓲𝓻𝓮 p.s. [alas, batch_dir is not useful for me as I have a lot of per-file processing that occurs outside of whisper, to the point of using 6 different alternative data stream tags to manage my music files and their status while passing through this workflow]

…

On Fri, Feb 14, 2025 at 4:48 AM Purfview ***@***.***> wrote: Currently such feature is not implemented. I'm not sure how such would work, I guess subsequent commands could be passed with pipe. — Reply to this email directly, view it on GitHub <#231 (reply in thread)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAGAYVYVJAWLSUIXKZMOS432PW3WJAVCNFSM6AAAAABF2RK6GKVHI2DSMVQWIX3LMV43URDJONRXK43TNFXW4Q3PNVWWK3TUHMYTEMJZHA3DKNI> . You are receiving this because you are subscribed to this thread.Message ID: <Purfview/whisper-standalone-win/repo-discussions/231/comments/12198655@ github.com>

5 replies

dgoryeo Feb 14, 2025

I'm just amazed by the sheer size of your (vintage) music collection, @ClaireCJS. Hats off!

ClaireCJS Feb 14, 2025

I'm just amazed by the sheer size of your (vintage) music collection, @ClaireCJS. Hats off!

It’s actually mostly post-1975 punk rock, thrash metal, industrial metal, 80’s pop, cartoon & movie soundtracks & chiptune covers/cover albums. I think the vintage is only about 4,000 songs 🤣

dgoryeo Feb 14, 2025

Impressive!

May I ask what parameter settings/values do you use for transcribing songs? Would vocal extraxtion be needed?

ClaireCJS Feb 14, 2025

Impressive!

May I ask what parameter settings/values do you use for transcribing songs? Would vocal extraxtion be needed?

Hey!
I had 17 different major evolutions of prompts, with lots of minor evolutions.

What I ended up with was:

👂 20250205151625: prompt v17: Faster-Whisper-XXL.exe --model=large-v2 --language=en --output_dir "C:\mp3\They Might Be Giants\2015 - Why_" --output_format srt --vad_filter True --max_line_count 1 --max_line_width 20 --ff_mdx_kim2 --highlight_words False --beep_off --check_files --sentence --verbose True --vad_filter=True --vad_threshold=0.1 --vad_min_speech_duration_ms=150 --vad_min_silence_duration_ms=200 --vad_max_speech_duration_s 5 --vad_speech_pad_ms=198 --vad_dump --best_of 5 --max_comma_cent 70 --max_gap 2.0 --initial_prompt "[Robin Goldwasser] Don't accidentally do things you'll later regret. Oh, no, you did. Some things can't be taken back. Don't do them. Too late, you did. Don't say these words to your dad. That he's gross or that he smells bad. Don't spill ink on the rug. Don't hammer the wall. Why won't you take directions. Or listen to instructions. Now you're informed. And you've been warned. So don't. Oh, wait, you did. [John and John] Sorry, sorry. Don't be angry. I didn't mean to do it. I swear I'll never do that again. Don't. Oh, you did. Really, you should not go anywhere high up and. Like the Pyramids of Gizza. Or the Leaning Tower of Pizza. You know what. Too late you did. How did the carton of milk get filled with shaved off puppy hairs? What's that river of water doing pouring down the stairs? Did you just eat a napkin. To find out what would hapkin? I thought I told you not to but you had to and you did. Blah blah blah da blee. Blah blee blah blah. Blah blah blah blee blah. Blah blah blah da blah. Blah blah blee blah. Blah blah blee blah blah. Hmm hmm hmm hmm hmm hmm hmm. Blah blah blah blah. Blah blah blah blah da blah. Blah blah blah blah blah." "c:\mp3\They Might Be Giants\2015 - Why_\01_Oh You Did (feat Robin Goldwasser).mp3"

Part of the system is downloading lyrics in order to set up a prompt for more accurate transcription. In fact I’d say that’s 75% of the work.

My documentation file mainly for my own use:
https://github.com/ClaireCJS/clairecjs_bat/blob/main/BAT-and-UTIL-files-1/docs/song-lyric-transcription-with-ai.md

Unfortunately it’s mostly written in BAT for an obscure command line. I have my reasons, including self-challenge and that the system will remain intact longer. I’ll probably turn it into a virtual machine someday for kicks.

The periods in the lyric prompting are VERY important for proper segmentation. I then remove them afterward with a postprocessor.

dgoryeo Feb 14, 2025

Great approach. FWIW, and may be an idea, your approach reminds me this work which is for a different use case, but it impleneted a GUI on top of the core:

https://github.com/machinewrapped/gpt-subtrans/wiki/GUI#gui-subtrans

Although this repo is for translation, it uses a systemic way to propmpt each defined section with a relevant input. The GUI on top, makes it very engaging.

elupin · 2025-02-20T19:57:31Z

elupin
Feb 20, 2025

This is probably a remarkable naive question, but I can't find any method to put the command line options into a text file and run like this:
faster-whisper-xxl.exe @options.txt audiofile.mp3
What am I missing?

1 reply

ClaireCJS Feb 20, 2025

What you are looking for is a BAT file. But you’ll have a way to go before it gets super useful.

Pick a command line... If you have $100 or a pirate hat for TCC it’s worth it even if hardly anyone uses it. I’d avoid power shell. Probably just stick to CMD.EXE as lame as it is, for most compatibility.

Ask ChatGPT to help you write BAT files to do what you want.

Look up the commands you don’t understand. Bootstrap that knowledge.

You’ll get something useful for way less effort in the end.

I’m using ... a couple 2500 line ones to do this at this point as I’ve been adding to them for ~6 months

Basically:

save what you have as "runme.bat" in the same folder
then click it (if you are lucky) and specify to open it with CMD.EXE if it doesn’t know what to open it with

You can go further tho and make one to do all the files in a folder:

for %%audiofile in (*.mp3;*.flac) do faster-whisper-xxl.exe @options.txt %%audiofile

you may need to fill in the full location to that EXE tho...

elupin · 2025-02-20T20:34:43Z

elupin
Feb 20, 2025

Thank you, @ClaireCJS. Another seemingly obvious question. I am using "--without_timestamps true" but I'm still getting output with timestamps. Maybe "true" is the wrong parameter.

3 replies

AuroraMartell Feb 20, 2025

Thank you, @ClaireCJS. Another seemingly obvious question. I am using "--without_timestamps true" but I'm still getting output with timestamps. Maybe "true" is the wrong parameter.

Maybe use without_timestamps false?

elupin Feb 20, 2025

Still getting timestamps with "false"

elupin Feb 20, 2025

I found it: SYSTRAN/faster-whisper#509
--output_format=text
Of course, that leaves the question of what does "--without_timestamps" actually do.

mohamadzamaani · 2025-06-09T13:29:18Z

mohamadzamaani
Jun 9, 2025

Hello
I am using this tool a lot.
For translate and transcription in Deutsch
It's very helpful
But I wish the mistakes were less.
Thank you

0 replies

wangfeng35 · 2025-07-13T01:30:53Z

wangfeng35
Jul 13, 2025

A new vad called ten-vad is out. It shows superior precision compared to Silero VAD, and offers lower computational complexity and reduced memory usage compared to Silero VAD.
onnx file is here.

0 replies

Standalone Faster-Whisper-XXL features #231

Uh oh!

Uh oh!

Purfview Apr 6, 2024 Maintainer

Vocal extraction model:

Alternative VAD (Voice activity detection) methods:

Speaker Diarization:

Latest CTranslate2:

Replies: 17 comments · 65 replies

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Purfview Apr 7, 2024 Maintainer Author

Uh oh!

Uh oh!

Uh oh!

Purfview Apr 7, 2024 Maintainer Author

Uh oh!

Uh oh!

Purfview Apr 9, 2024 Maintainer Author

Uh oh!

Uh oh!

Uh oh!

Purfview Apr 9, 2024 Maintainer Author

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Purfview Apr 21, 2024 Maintainer Author

Uh oh!

Uh oh!

Uh oh!

Purfview Apr 22, 2024 Maintainer Author

Uh oh!

Uh oh!

Uh oh!

Purfview Apr 22, 2024 Maintainer Author

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Purfview May 6, 2024 Maintainer Author

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Purfview
Apr 6, 2024
Maintainer

Replies: 17 comments 65 replies

Purfview Apr 7, 2024
Maintainer Author

Purfview Apr 7, 2024
Maintainer Author

Purfview Apr 9, 2024
Maintainer Author

Purfview Apr 9, 2024
Maintainer Author

Purfview Apr 21, 2024
Maintainer Author

Purfview Apr 22, 2024
Maintainer Author

Purfview Apr 22, 2024
Maintainer Author

Purfview May 6, 2024
Maintainer Author