Koboldcpp on Linux vs Windows #1843

TooShyTo · 2025-11-13T03:47:09Z

TooShyTo
Nov 13, 2025

Does Kobold on Linux offload layers to the CPU the same way as Kobold on Windows? I'm trying to use a model that requires 41 layers. Here’s the question: on Windows, even though I set it to 41 layers, it automatically offloads some layers to the CPU. But on Linux, it crashes, saying CUDA couldn’t allocate the memory. I understand why, so I lowered it to the maximum I could fit, 32 layers but the performance is much worse than on Windows. Why is that, and is there anything I can do?

Both systems were tested on the same PC.

LostRuins · 2025-11-13T07:00:52Z

LostRuins
Nov 13, 2025
Maintainer

What happens is based on the Nvidia driver settings, specifically sysmemfallback. When enabled, that allows an overloaded GPU to automatically offload to the CPU (however it actually gets slower). If disabled, the program will exit.

So what you can try doing is google how to change your Sysmem Fallback Policy.

However the best approach is to simply offload fewer layers which will give you better results.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Koboldcpp on Linux vs Windows #1843

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Koboldcpp on Linux vs Windows #1843

Uh oh!

TooShyTo Nov 13, 2025

Replies: 1 comment

Uh oh!

LostRuins Nov 13, 2025 Maintainer

TooShyTo
Nov 13, 2025

LostRuins
Nov 13, 2025
Maintainer