Skip to content

Conversation

@spalicki
Copy link
Contributor

@spalicki spalicki commented Nov 19, 2025

Allow allocation of buffers bigger than CL_DEVICE_MAX_MEM_ALLOC_SIZE limited by CL_DEVICE_GLOBAL_MEM_SIZE.

@spalicki spalicki self-assigned this Nov 19, 2025
@github-actions github-actions bot added platform:gpu-intel Codeowner: @oneapi-src/onednn-gpu-intel component:graph-api Codeowner: @oneapi-src/onednn-graph component:tests Codeowner: @oneapi-src/onednn-arch component:common labels Nov 19, 2025
@spalicki spalicki force-pushed the spalicki/add_ocl_4gb_flag branch from 83b8209 to bdf6bda Compare November 19, 2025 20:00
@spalicki spalicki marked this pull request as ready for review November 19, 2025 20:00
@spalicki spalicki requested review from a team as code owners November 19, 2025 20:00
@spalicki
Copy link
Contributor Author

make test
disable test_device_cpu
disable build_cpu_runtime_omp
disable build_cpu_runtime_sycl
disable build_cpu_runtime_tbb

bool large_buffer = size
> utils::downcast<const xpu::ocl::engine_impl_t *>(engine->impl())
->max_allocation_size();
static cl_bitfield properties[]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor: I'm not sure static is good for properties object that's passed to CL calls especially in case it will need to be extended in the future. Also current code doesn't provide a scalable way of expanding it either, probably a TODO with that note could help to look into that direction when/if needed.

#endif

int get_gpu_ram_sizes(size_t &ram_size, size_t &max_alloc_size) {
int get_gpu_ram_size(size_t &ram_size) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you have a chance to verify the change works with all 4 supported memory kinds times correctness and fast performance mode where different approach used for memory object management?

As a part of this question also: should this call be updated?

Copy link
Contributor Author

@spalicki spalicki Nov 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, it seems that CL_MEM_ALLOW_UNRESTRICTED_SIZE_INTEL is truly unrestricted, allowing to allocate any size even over the GPU VRAM. So, I am going to have to add some additional guard.
NVM it looks like driver bug.

The flag according to documentation only applies to clCreateBuffer, clCreateBufferWithProperties, clCreateBufferWithPropertiesINTEL, clSVMAlloc, clSharedMemAllocINTEL, clDeviceMemAllocINTEL and clHostMemAllocINTEL.

@spalicki spalicki force-pushed the spalicki/add_ocl_4gb_flag branch 5 times, most recently from f3bf210 to 79caa93 Compare November 24, 2025 18:10
@spalicki spalicki force-pushed the spalicki/add_ocl_4gb_flag branch from 79caa93 to 179f908 Compare November 24, 2025 18:31
@spalicki
Copy link
Contributor Author

make test
disable test_device_cpu
disable build_cpu_runtime_omp
disable build_cpu_runtime_sycl
disable build_cpu_runtime_tbb

@atkassen
Copy link
Contributor

make test perf-gpu

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

component:common component:graph-api Codeowner: @oneapi-src/onednn-graph component:tests Codeowner: @oneapi-src/onednn-arch platform:gpu-intel Codeowner: @oneapi-src/onednn-gpu-intel

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants