Skip to content

vulkan : add GGML_VK_FORCE_HEAP_INDEX env var #9734

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: master
Choose a base branch
from
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
18 changes: 15 additions & 3 deletions ggml/src/ggml-vulkan.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -148,6 +148,7 @@ struct vk_device_struct {
vk::PhysicalDeviceProperties properties;
std::string name;
uint64_t max_memory_allocation_size;
uint32_t force_heap_index;
bool fp16;
vk::Device device;
uint32_t vendor_id;
Expand Down Expand Up @@ -1008,9 +1009,12 @@ static void ggml_vk_queue_cleanup(vk_device& device, vk_queue& q) {
q.cmd_buffer_idx = 0;
}

static uint32_t find_properties(const vk::PhysicalDeviceMemoryProperties* mem_props, vk::MemoryRequirements* mem_req, vk::MemoryPropertyFlags flags) {
static uint32_t find_properties(const vk::PhysicalDeviceMemoryProperties* mem_props, vk::MemoryRequirements* mem_req, vk::MemoryPropertyFlags flags, uint32_t force_heap_index = UINT32_MAX) {
for (uint32_t i = 0; i < mem_props->memoryTypeCount; ++i) {
vk::MemoryType memory_type = mem_props->memoryTypes[i];
if (force_heap_index != UINT32_MAX && memory_type.heapIndex != force_heap_index) {
continue;
}
if ((mem_req->memoryTypeBits & ((uint64_t)1 << i)) &&
(flags & memory_type.propertyFlags) == flags &&
mem_props->memoryHeaps[memory_type.heapIndex].size >= mem_req->size) {
Expand Down Expand Up @@ -1053,11 +1057,11 @@ static vk_buffer ggml_vk_create_buffer(vk_device& device, size_t size, vk::Memor

uint32_t memory_type_index = UINT32_MAX;

memory_type_index = find_properties(&mem_props, &mem_req, req_flags);
memory_type_index = find_properties(&mem_props, &mem_req, req_flags, device->force_heap_index);
buf->memory_property_flags = req_flags;

if (memory_type_index == UINT32_MAX && fallback_flags) {
memory_type_index = find_properties(&mem_props, &mem_req, fallback_flags);
memory_type_index = find_properties(&mem_props, &mem_req, fallback_flags, device->force_heap_index);
buf->memory_property_flags = fallback_flags;
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this fallback already handle that situation? That's basically what it was introduced for. You have a smaller (DEVICE_LOCAL) and a larger heap (DEVICE_LOCAL, HOST_VISIBLE and HOST_COHERENT), so once the small one is exhausted you fall back to the larger one.
Can you add the vulkaninfo of the device where this fallback does not work and describe why?

Copy link
Author

@gyf304 gyf304 Oct 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

vulkaninfo.txt

The problem is that find_properties doesn't take in account about memory usage (as vk::PhysicalDeviceMemoryProperties reports total mem, not available mem)
On boot, with a few chrome tabs open, my laptop consumes 2.8/4GiB of the smaller heap, so if it tries to allocate 2GiB, it will fail.

One could argue the better fix is to either

  • use VK_EXT_memory_budget for querying actual memory usage in find_properties, so it doesn't select a memory type that does not have enough available memory
  • use a loop while allocating, if the allocation fails with ErrorOutOfDeviceMemory, try the next available memory type
  • or a combination of both

I did the minimal viable solution here, which is to force allocation from a known bigger heap (in my case, heap 1).

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gyf304 Thank you.

Take a look at ggml-org/whisper.cpp#2451. This might be the kind of fallback that you described in your second point. Can you check whether that would fix your issue?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI, I just pulled ggml-org/whisper.cpp#2451 and will soon sync it upstream with ggml and llama.cpp. Not sure if it fixes the problem here, but it worked for a couple of whisper.cpp-related issues.


Expand Down Expand Up @@ -1851,6 +1855,14 @@ static vk_device ggml_vk_get_device(size_t idx) {
device->max_memory_allocation_size = props3.maxMemoryAllocationSize;
}

const char* GGML_VK_FORCE_HEAP_INDEX = getenv("GGML_VK_FORCE_HEAP_INDEX");

if (GGML_VK_FORCE_HEAP_INDEX != nullptr) {
device->force_heap_index = std::stoi(GGML_VK_FORCE_HEAP_INDEX);
} else {
device->force_heap_index = UINT32_MAX;
}

device->vendor_id = device->properties.vendorID;
device->subgroup_size = subgroup_props.subgroupSize;
device->uma = device->properties.deviceType == vk::PhysicalDeviceType::eIntegratedGpu;
Expand Down