Why Does faster-whisper-large-v2 Use More Than Twice the VRAM on RTX 5060 Compared to RTX 2060? #471

ethansy1995 · 2025-05-26T14:08:27Z

ethansy1995
May 26, 2025

I am experiencing a significant difference in VRAM usage when running the faster-whisper-large-v2 model on different GPUs.

On an RTX 2060 (6GB VRAM), the quantized version of the model (int8_float16) runs smoothly and uses about 4GB of VRAM.

However, on an RTX 5060 (8GB VRAM), running the same model with the same quantization settings leads to VRAM usage exceeding 8GB, causing out-of-memory errors and preventing the model from running properly.

This is unexpected because the RTX 5060 is a newer generation GPU with more VRAM and better performance, so it should not consume more memory than the older 2060, especially not more than double.

Answered by Purfview

May 27, 2025

I still don't see what you are doing there. Where is the parameters used? Where is verbose output?

while using software that integrates Faster-Whisper-XXL

What software? Maybe you should ask at the place of that software, because your OP doesn't make sense, obviously not the same quantization is used if there is such big difference in the memory usage.

View full answer

Purfview · 2025-05-26T18:47:08Z

Purfview
May 26, 2025
Maintainer

Sorry, but I can't make sense of your post.

Please post screenshots of what you are doing there with --verbose true

0 replies

ethansy1995 · 2025-05-27T08:41:04Z

ethansy1995
May 27, 2025
Author

I encountered this issue while using software that integrates Faster-Whisper-XXL. Here is a screenshot of the error log from the software:

Previously, I was using an RTX 2060 to run faster-whisper-large-v2 with FP16, and it only used less than 4GB of VRAM.
Now, after switching to an RTX 5060, the VRAM is no longer sufficient. I also tested with an RTX 5070, and it requires around 8.5GB of VRAM to run.

1 reply

Purfview May 27, 2025
Maintainer

I still don't see what you are doing there. Where is the parameters used? Where is verbose output?

while using software that integrates Faster-Whisper-XXL

What software? Maybe you should ask at the place of that software, because your OP doesn't make sense, obviously not the same quantization is used if there is such big difference in the memory usage.

Answer selected by Purfview

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Why Does faster-whisper-large-v2 Use More Than Twice the VRAM on RTX 5060 Compared to RTX 2060? #471

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 2 comments 1 reply

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Select a reply

Uh oh!

Why Does faster-whisper-large-v2 Use More Than Twice the VRAM on RTX 5060 Compared to RTX 2060? #471

Uh oh!

Uh oh!

ethansy1995 May 26, 2025

Replies: 2 comments · 1 reply

Uh oh!

Purfview May 26, 2025 Maintainer

Uh oh!

ethansy1995 May 27, 2025 Author

Uh oh!

Uh oh!

Purfview May 27, 2025 Maintainer

ethansy1995
May 26, 2025

Replies: 2 comments 1 reply

Purfview
May 26, 2025
Maintainer

ethansy1995
May 27, 2025
Author

Purfview May 27, 2025
Maintainer