Skip to content

Eval bug: GPT-OSS-120B: Vulkan backend fails to allocate KV cache with OOM error, despite enough free memory #15120

@Mushoz

Description

@Mushoz

Name and Version

[docker@dd353b48e141 ~]$ llama-server --version
ggml_vulkan: Found 1 Vulkan devices:
ggml_vulkan: 0 = Radeon 8060S Graphics (AMD proprietary driver) | uma: 1 | fp16: 1 | bf16: 0 | warp size: 64 | shared memory: 32768 | int dot: 1 | matrix cores: KHR_coopmat
version: 6096 (fd1234c)
built with cc (GCC) 15.1.1 20250729 for x86_64-pc-linux-gnu

Operating systems

Linux

GGML backends

Vulkan

Hardware

AMD 395+ Strix Halo APU with 8060s iGPU

Models

GPT-OSS-120B

Problem description & steps to reproduce

llama-server -m ~/.cache/llama.cpp/ggml-org_gpt-oss-120b-GGUF_gpt-oss-120b-mxfp4-00001-of-00003.gguf -c 0 --host 0.0.0.0 --port 9000 -ngl 999

This gives the following error:

ggml_vulkan: Device memory allocation of size 17482395648 failed.
ggml_vulkan: Requested buffer size exceeds device memory allocation limit: ErrorOutOfDeviceMemory
ggml_gallocr_reserve_n: failed to allocate Vulkan0 buffer of size 17482395648

The model itself is under 70GB, and there is 128GB of GTT memory available. This issue seems specific to GPT-OSS-120B, since bigger models (Such as Qwen3-235B in Q3_K_XL) do not show the same issue.

First Bad Commit

No response

Relevant log output

ggml_vulkan: Device memory allocation of size 17482395648 failed.
ggml_vulkan: Requested buffer size exceeds device memory allocation limit: ErrorOutOfDeviceMemory
ggml_gallocr_reserve_n: failed to allocate Vulkan0 buffer of size 17482395648

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions