-
Notifications
You must be signed in to change notification settings - Fork 13.2k
Description
Name and Version
[docker@dd353b48e141 ~]$ llama-server --version
ggml_vulkan: Found 1 Vulkan devices:
ggml_vulkan: 0 = Radeon 8060S Graphics (AMD proprietary driver) | uma: 1 | fp16: 1 | bf16: 0 | warp size: 64 | shared memory: 32768 | int dot: 1 | matrix cores: KHR_coopmat
version: 6096 (fd1234c)
built with cc (GCC) 15.1.1 20250729 for x86_64-pc-linux-gnu
Operating systems
Linux
GGML backends
Vulkan
Hardware
AMD 395+ Strix Halo APU with 8060s iGPU
Models
GPT-OSS-120B
Problem description & steps to reproduce
llama-server -m ~/.cache/llama.cpp/ggml-org_gpt-oss-120b-GGUF_gpt-oss-120b-mxfp4-00001-of-00003.gguf -c 0 --host 0.0.0.0 --port 9000 -ngl 999
This gives the following error:
ggml_vulkan: Device memory allocation of size 17482395648 failed.
ggml_vulkan: Requested buffer size exceeds device memory allocation limit: ErrorOutOfDeviceMemory
ggml_gallocr_reserve_n: failed to allocate Vulkan0 buffer of size 17482395648
The model itself is under 70GB, and there is 128GB of GTT memory available. This issue seems specific to GPT-OSS-120B, since bigger models (Such as Qwen3-235B in Q3_K_XL) do not show the same issue.
First Bad Commit
No response
Relevant log output
ggml_vulkan: Device memory allocation of size 17482395648 failed.
ggml_vulkan: Requested buffer size exceeds device memory allocation limit: ErrorOutOfDeviceMemory
ggml_gallocr_reserve_n: failed to allocate Vulkan0 buffer of size 17482395648