Skip to content

OpenCL issue on AMD AI Max+ Pro 395+ #1083

@LispEngineer

Description

@LispEngineer

Hi folks,

I'm using KataGo v1.16.3 (OpenCL, Linux x64) on an HP Zbook Ultra G1a which has an AMD Strix Halo 395+ CPU/GPU/NPU in it. I have 128G of RAM and allocated 32G to the GPU in BIOS.

I'm running the OEM Ubuntu 24.04. I have installed the mesa-opencl-icd package to get KataGo to run, version 24.2.8-1ubuntu1~24.04.1.

I am able to get KataGo to launch, but it crashes and dumps core after a while, or sometimes just locks up the whole machine hard.

./katago benchmark -model kata1-b28c512nbt-s9584861952-d4960414494.bin.gz -config default_gtp.cfg

The output I get is:

2025-07-13 14:06:50-0400: Running with following config:
allowResignation = true
lagBuffer = 1.0
logAllGTPCommunication = true
logDir = gtp_logs
logSearchInfo = true
logSearchInfoForChosenMove = false
logToStderr = false
maxTimePondering = 60.0
maxVisits = 500
numSearchThreads = 6
ponderingEnabled = false
resignConsecTurns = 3
resignThreshold = -0.90
rules = tromp-taylor
searchFactorAfterOnePass = 0.50
searchFactorAfterTwoPass = 0.25
searchFactorWhenWinning = 0.40
searchFactorWhenWinningThreshold = 0.95

2025-07-13 14:06:50-0400: Loading model and initializing benchmark...
2025-07-13 14:06:50-0400: Testing with default positions for board size: 19
2025-07-13 14:06:50-0400: nnRandSeed0 = 5831653519054986926
2025-07-13 14:06:50-0400: After dedups: nnModelFile0 = kata1-b28c512nbt-s9584861952-d4960414494.bin.gz useFP16 auto useNHWC auto
2025-07-13 14:06:50-0400: Initializing neural net buffer to be size 19 * 19 exactly
2025-07-13 14:06:52-0400: Found OpenCL Platform 0: Clover (Mesa) (OpenCL 1.1 Mesa 24.2.8-1ubuntu1~24.04.1)
2025-07-13 14:06:52-0400: Found 1 device(s) on platform 0 with type CPU or GPU or Accelerator
2025-07-13 14:06:52-0400: Found OpenCL Platform 1: rusticl (Mesa/X.org) (OpenCL 3.0 )
2025-07-13 14:06:52-0400: Found 0 device(s) on platform 1 with type CPU or GPU or Accelerator, skipping
2025-07-13 14:06:52-0400: Found OpenCL Device 0: AMD Radeon Graphics (radeonsi, gfx1151, LLVM 19.1.1, DRM 3.61, 6.11.0-1025-oem) (AMD) (score 11000101)
2025-07-13 14:06:52-0400: Creating context for OpenCL Platform: Clover (Mesa) (OpenCL 1.1 Mesa 24.2.8-1ubuntu1~24.04.1)
2025-07-13 14:06:52-0400: Using OpenCL Device 0: AMD Radeon Graphics (radeonsi, gfx1151, LLVM 19.1.1, DRM 3.61, 6.11.0-1025-oem) (AMD) OpenCL 1.1 Mesa 24.2.8-1ubuntu1~24.04.1 (Extensions: cl_khr_byte_addressable_store cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_int64_base_atomics cl_khr_int64_extended_atomics cl_khr_fp64 cl_khr_extended_versioning)
2025-07-13 14:06:52-0400: No existing tuning parameters found or parseable or valid at: /home/dfields/.katago/opencltuning/tune11_gpuAMDRadeonGraphicsradeonsigfx1151LLVM1911DRM36161101025oem_x19_y19_c512_mv15.txt
2025-07-13 14:06:52-0400: Performing autotuning
2025-07-13 14:06:52-0400: *** On some systems, this may take several minutes, please be patient ***
2025-07-13 14:06:52-0400: Found OpenCL Platform 0: Clover (Mesa) (OpenCL 1.1 Mesa 24.2.8-1ubuntu1~24.04.1)
2025-07-13 14:06:52-0400: Found 1 device(s) on platform 0 with type CPU or GPU or Accelerator
2025-07-13 14:06:52-0400: Found OpenCL Platform 1: rusticl (Mesa/X.org) (OpenCL 3.0 )
2025-07-13 14:06:52-0400: Found 0 device(s) on platform 1 with type CPU or GPU or Accelerator, skipping
2025-07-13 14:06:52-0400: Found OpenCL Device 0: AMD Radeon Graphics (radeonsi, gfx1151, LLVM 19.1.1, DRM 3.61, 6.11.0-1025-oem) (AMD) (score 11000101)
2025-07-13 14:06:52-0400: Creating context for OpenCL Platform: Clover (Mesa) (OpenCL 1.1 Mesa 24.2.8-1ubuntu1~24.04.1)
2025-07-13 14:06:52-0400: Using OpenCL Device 0: AMD Radeon Graphics (radeonsi, gfx1151, LLVM 19.1.1, DRM 3.61, 6.11.0-1025-oem) (AMD) OpenCL 1.1 Mesa 24.2.8-1ubuntu1~24.04.1 (Extensions: cl_khr_byte_addressable_store cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_int64_base_atomics cl_khr_int64_extended_atomics cl_khr_fp64 cl_khr_extended_versioning)
Beginning GPU tuning for AMD Radeon Graphics (radeonsi, gfx1151, LLVM 19.1.1, DRM 3.61, 6.11.0-1025-oem) modelVersion 15 channels 512
2025-07-13 14:06:52-0400: Dummy tuning thread starting
2025-07-13 14:06:52-0400: Creating context for OpenCL Platform: Clover (Mesa) (OpenCL 1.1 Mesa 24.2.8-1ubuntu1~24.04.1)
2025-07-13 14:06:52-0400: Using OpenCL Device 0: AMD Radeon Graphics (radeonsi, gfx1151, LLVM 19.1.1, DRM 3.61, 6.11.0-1025-oem) (AMD) OpenCL 1.1 Mesa 24.2.8-1ubuntu1~24.04.1 (Extensions: cl_khr_byte_addressable_store cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_int64_base_atomics cl_khr_int64_extended_atomics cl_khr_fp64 cl_khr_extended_versioning)
Setting winograd3x3TileSize = 4
------------------------------------------------------
Tuning xGemmDirect for 1x1 convolutions and matrix mult
Testing 55 different configs
amdgpu: The CS has cancelled because the context is lost. This context is innocent.
Aborted (core dumped)

At this point, sometimes I get a command prompt back, and other times the machine locks up hard. Either way, during the execution the screen goes blank for a moment once or twice, and the mouse stops responding for a few moments now and again.

Does anyone have any thoughts on how to get KataGo to run on this AMD RYZEN AI MAX+ PRO 395 w/ Radeon 8060S processor using either the GPU or NPU under Ubuntu 24.04 please?

Thanks!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions