Replies: 1 comment
-
|
What happens is based on the Nvidia driver settings, specifically sysmemfallback. When enabled, that allows an overloaded GPU to automatically offload to the CPU (however it actually gets slower). If disabled, the program will exit. So what you can try doing is google how to change your Sysmem Fallback Policy. However the best approach is to simply offload fewer layers which will give you better results. |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Does Kobold on Linux offload layers to the CPU the same way as Kobold on Windows? I'm trying to use a model that requires 41 layers. Here’s the question: on Windows, even though I set it to 41 layers, it automatically offloads some layers to the CPU. But on Linux, it crashes, saying CUDA couldn’t allocate the memory. I understand why, so I lowered it to the maximum I could fit, 32 layers but the performance is much worse than on Windows. Why is that, and is there anything I can do?
Both systems were tested on the same PC.
Beta Was this translation helpful? Give feedback.
All reactions