Eval bug: GPT-OSS-120B: Vulkan backend fails to allocate KV cache with OOM error, despite enough free memory

### Name and Version

[docker@dd353b48e141 ~]$ llama-server --version
ggml_vulkan: Found 1 Vulkan devices:
ggml_vulkan: 0 = Radeon 8060S Graphics (AMD proprietary driver) | uma: 1 | fp16: 1 | bf16: 0 | warp size: 64 | shared memory: 32768 | int dot: 1 | matrix cores: KHR_coopmat
version: 6096 (fd1234cb)
built with cc (GCC) 15.1.1 20250729 for x86_64-pc-linux-gnu

### Operating systems

Linux

### GGML backends

Vulkan

### Hardware

AMD 395+ Strix Halo APU with 8060s iGPU

### Models

GPT-OSS-120B

### Problem description & steps to reproduce

`llama-server -m ~/.cache/llama.cpp/ggml-org_gpt-oss-120b-GGUF_gpt-oss-120b-mxfp4-00001-of-00003.gguf -c 0 --host 0.0.0.0 --port 9000 -ngl 999`

This gives the following error:

```
ggml_vulkan: Device memory allocation of size 17482395648 failed.
ggml_vulkan: Requested buffer size exceeds device memory allocation limit: ErrorOutOfDeviceMemory
ggml_gallocr_reserve_n: failed to allocate Vulkan0 buffer of size 17482395648
```

The model itself is under 70GB, and there is 128GB of GTT memory available. This issue seems specific to GPT-OSS-120B, since bigger models (Such as Qwen3-235B in Q3_K_XL) do not show the same issue.

### First Bad Commit

_No response_

### Relevant log output

```shell
ggml_vulkan: Device memory allocation of size 17482395648 failed.
ggml_vulkan: Requested buffer size exceeds device memory allocation limit: ErrorOutOfDeviceMemory
ggml_gallocr_reserve_n: failed to allocate Vulkan0 buffer of size 17482395648
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Eval bug: GPT-OSS-120B: Vulkan backend fails to allocate KV cache with OOM error, despite enough free memory #15120

Name and Version

Operating systems

GGML backends

Hardware

Models

Problem description & steps to reproduce

First Bad Commit

Relevant log output

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Eval bug: GPT-OSS-120B: Vulkan backend fails to allocate KV cache with OOM error, despite enough free memory #15120

Description

Name and Version

Operating systems

GGML backends

Hardware

Models

Problem description & steps to reproduce

First Bad Commit

Relevant log output

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions