Why Does faster-whisper-large-v2 Use More Than Twice the VRAM on RTX 5060 Compared to RTX 2060? #471
-
|
I am experiencing a significant difference in VRAM usage when running the faster-whisper-large-v2 model on different GPUs. On an RTX 2060 (6GB VRAM), the quantized version of the model (int8_float16) runs smoothly and uses about 4GB of VRAM. However, on an RTX 5060 (8GB VRAM), running the same model with the same quantization settings leads to VRAM usage exceeding 8GB, causing out-of-memory errors and preventing the model from running properly. This is unexpected because the RTX 5060 is a newer generation GPU with more VRAM and better performance, so it should not consume more memory than the older 2060, especially not more than double. |
Beta Was this translation helpful? Give feedback.
Replies: 2 comments 1 reply
-
|
Sorry, but I can't make sense of your post. Please post screenshots of what you are doing there with |
Beta Was this translation helpful? Give feedback.
-
Beta Was this translation helpful? Give feedback.

I still don't see what you are doing there. Where is the parameters used? Where is verbose output?
What software? Maybe you should ask at the place of that software, because your OP doesn't make sense, obviously not the same quantization is used if there is such big difference in the memory usage.