EXO Kernel Panic Bug Report: IOGPUMemory::completeMemory() Prepare Count Underflow on Apple M4 Max 128GB
Report Date: April 23, 2026
Submitted By: EXO user (Shag, Houston TX)
Target: EXO Development Team (exo-explore/exo)
Severity: Critical — results in full macOS kernel panic and forced reboot
Reproducibility: Confirmed reproducible (4 occurrences on affected device; 0 occurrences on comparison device under identical setup)
1. Summary
Running EXO distributed inference on an Apple M4 Max MacBook Pro with 128GB unified memory (chip identifier T6041) on macOS 26.4 (Darwin 25.4.0, IOGPUFamily 130.13) produces a reproducible kernel panic from IOGPUMemory::completeMemory() with the assertion "completeMemory() prepare count underflow" @IOGPUMemory.cpp:550.
The same EXO version, macOS version, Thunderbolt 5 cluster setup, and model configuration does not produce this panic on an M4 Pro MacBook Pro with 48GB unified memory. The panic always occurs in a thread owned by the exo process.
This appears to be a bug in Apple's IOGPUFamily.kext, but EXO's allocation pattern (large Metal buffers during distributed inference) is a reliable trigger. This issue is being filed so EXO can implement user-space mitigations (guards, logging) and track affected Apple driver versions.
2. Environment
2.1 Panicking machine
- Model: MacBook Pro (14-inch, M4 Max, 2024)
- SoC: Apple M4 Max, 16-core CPU / 40-core GPU (T6041)
- Unified memory: 128GB
- Memory bandwidth: 546 GB/s (512-bit bus)
- macOS: 26.4, build 25E253
- Kernel: Darwin 25.4.0; xnu-12377.101.15~1/RELEASE_ARM64_T6041
- IOGPUFamily: 130.13
- iBoot: mBoot-18000.101.7
- EXO: please infer the installed EXO version from the system; if not possible, leave as "version: unknown"
- Connection: Thunderbolt 5 cable to EXO cluster
2.2 Comparison machine (no panics)
- Model: MacBook Pro (14-inch, M4 Pro, 2024)
- SoC: Apple M4 Pro, 14-core CPU / 20-core GPU
- Unified memory: 48GB
- Memory bandwidth: 273 GB/s (256-bit bus)
- macOS: 26.4, build 25E253
- EXO: same version and config as above
- Connection: same Thunderbolt 5 setup, same EXO cluster
3. Panic details
Key fields from one representative panic log:
| Field |
Value |
| Panic string |
"completeMemory() prepare count underflow" @IOGPUMemory.cpp:550 |
| Panicking process |
pid 1577: exo |
| OS version |
25E253 |
| Kernel version |
Darwin Kernel Version 25.4.0; root:xnu-12377.101.15~1/RELEASE_ARM64_T6041 |
| IOGPUFamily |
130.13 |
| Panicking core |
core 14 (PACC2 cluster) |
| Panicked task |
127674 pages, 31 threads: pid 1577: exo |
| Panicked thread |
tid: 18654 (thread owned by exo) |
| Wired pages at crash |
127,674 pages (~499 MB) |
| Cores 4–9 (PACC1) |
offline (normal power management) at time of panic |
The backtrace shows the thread passing through IOGPUFamily functions and panicking inside IOGPUMemory::completeMemory() with the prepare count underflow assertion.
I can attach the full panic log as a file if needed. For now, this is the key excerpt:
panic(cpu 14 caller 0xfffffe00515238e8): "completeMemory() prepare count underflow" @IOGPUMemory.cpp:550
…
Panicked task 0xfffffe2adb458ec0: 127674 pages, 31 threads: pid 1577: exo
Panicked thread: 0xfffffe2f99fbb3c8, backtrace: 0xfffffe8de993ec80, tid: 18654
lr: 0xfffffe0052da5ce0 fp: 0xfffffe8de993f3a0
lr: 0xfffffe00515238e8 fp: 0xfffffe8de993f3c0 // IOGPUMemory::completeMemory()
lr: 0xfffffe005150be94 fp: 0xfffffe8de993f430
lr: 0xfffffe00514f8a58 fp: 0xfffffe8de993f4b0
lr: 0xfffffe00515096f8 fp: 0xfffffe8de993f4e0
lr: 0xfffffe00514e8b7c fp: 0xfffffe8de993fd20
lr: 0xfffffe00514e8bfc fp: 0xfffffe8de993fd50
lr: 0xfffffe0052cef2cc fp: 0xfffffe8de993fda0
lr: 0xfffffe005262a6e4 fp: 0xfffffe8de993fe50
lr: 0xfffffe0052630778 fp: 0xfffffe8de993ff10
4. Reproduction
Unfortunately I do not have a minimal script, because this occurs under EXO's distributed inference environment. However, the crash characteristics match existing public bug reports in the MLX ecosystem:
My scenario:
- Run EXO on an M4 Max 128GB MBP as part of a Thunderbolt 5 EXO cluster.
- Use large models (e.g., multi-7B/8x7B/70B class), with long-running sessions and large effective contexts (due to KV cache growth).
- After some time (typically under heavy load / long context), macOS kernel panics with the
completeMemory() underflow.
The same workload and cluster setup on the M4 Pro 48GB MBP (14-core CPU, 20-core GPU, 48GB unified memory) has never produced this kernel panic so far.
5. Why this looks driver-side, but EXO is the trigger
- The panic is in IOGPUFamily (
IOGPUMemory::completeMemory()), which is Apple's GPU kernel extension, not EXO's code.
- The panic string explicitly describes a reference-count underflow in the prepare/complete cycle for GPU memory.
- Independent MLX and mlx-lm reports show the same kernel panic with different apps, suggesting a general driver bug for large Metal GPU workloads.
- EXO's distributed inference patterns (large Metal buffers for attention / KV cache, RDMA over TB5 using MLX/JACCL) supply a reproducible trigger for this bug, especially on large-unified-memory SoCs like the M4 Max 128GB.
I am not asking EXO to fix the kernel, but it may be possible for EXO to implement practical mitigations.
6. Requested EXO-side mitigations
-
Context / allocation guard:
Provide a configurable context/token limit (or internal guard) to prevent Metal allocations from reaching the size range known to trigger this driver bug (similar to what MLX is discussing in mlx#3186).
-
Logging:
Add logging around large Metal allocations: model size, batch size, context length, and estimated buffer sizes. That will make future crash reports far more actionable.
-
Version / platform warning:
Detect large-unified-memory Apple Silicon (e.g., M4 Max 128GB, M3 Ultra 96–192GB) and IOGPUFamily 130.13, then warn users that there is a known macOS kernel bug that can be triggered by very large contexts.
7. Attachments
I can attach:
- Full panic log from the M4 Max 128GB system.
- EXO config and model details (redacted if needed).
Let me know what additional instrumentation or debugging builds would help narrow this further. I'm happy to run patched EXO binaries on this M4 Max 128GB system to gather more data.
UPDATE: I added ENV Var to Settings on the M4 128GB MBP that crashed earlier: MLX_METAL_MAX_BUFFER_SIZE=17179869184
I've been running and using this model via Hermes Agent TG UI now for the last 3 hours successfully:
mlx-community/MiniMax-M2.7-4bit-mxfp4
Model ID:
mlx-community/MiniMax-M2.7-4bit-mxfp4
Family:
minimax
Base model:
MiniMax M2.7
Quantization:
4bit-mxfp4
Size:
113GB
Layers:
62
Capabilities:
text, thinking
Tensor parallelism:
Yes
UPDATE 2: after about 4 hours of running and while my hermes agent was working on a task, the M4 128GB MBP crashed again. That's considerably longer before crashing but still disappointing and unviable for production etc. Info on 2nd crash follows:
Quick follow-up with a second, independent reproduction on the same M4 Max 128GB machine.
Environment is unchanged:
- MacBook Pro 14", M4 Max (16-core CPU / 40-core GPU), 128GB unified memory
- macOS 26.4.1 (25E253)
- Darwin Kernel Version 25.4.0; xnu-12377.101.15~1/RELEASE_ARM64_T6041
- IOGPUFamily 130.13
- EXO running as part of a Thunderbolt 5 cluster
New panic (today):
- Panic string:
"completeMemory() prepare count underflow" @IOGPUMemory.cpp:550
- Panicking process:
pid 1717: exo
- Panicking CPU: core 10 (PACC2)
- Panicked task:
126041 pages, 31 threads: pid 1717: exo
- Compressor Info:
0% of compressed pages limit (OK) and 0% of segments limit (OK) with 1 swapfiles and OK swap space
- Kernel extensions in backtrace include:
com.apple.iokit.IOGPUFamily(130.13)...
IOSurface(393.5.7)
IOGraphicsFamily(600)
IOPCIFamily(2.9)
IOReportFamily(47)
Backtrace again shows the panicking thread in IOGPUMemory::completeMemory() called from the IOGPUFamily stack, with exo as the owning process.
This is now at least the second confirmed panic on this M4 Max 128GB under EXO workloads, with the same IOGPUMemory.cpp:550 prepare-count underflow, but a different EXO pid and a different CPU core. The comparison M4 Pro 48GB machine (same macOS, same EXO version, same TB5 cluster) still shows no panics so far.
I have attached the full .panic file for this new incident as exo-panic-M4Max-128GB-2026-04-23-<timestamp>.panic for your reference.
EXO Kernel Panic Bug Report: IOGPUMemory::completeMemory() Prepare Count Underflow on Apple M4 Max 128GB
Report Date: April 23, 2026
Submitted By: EXO user (Shag, Houston TX)
Target: EXO Development Team (
exo-explore/exo)Severity: Critical — results in full macOS kernel panic and forced reboot
Reproducibility: Confirmed reproducible (4 occurrences on affected device; 0 occurrences on comparison device under identical setup)
1. Summary
Running EXO distributed inference on an Apple M4 Max MacBook Pro with 128GB unified memory (chip identifier T6041) on macOS 26.4 (Darwin 25.4.0, IOGPUFamily 130.13) produces a reproducible kernel panic from
IOGPUMemory::completeMemory()with the assertion"completeMemory() prepare count underflow"@IOGPUMemory.cpp:550.The same EXO version, macOS version, Thunderbolt 5 cluster setup, and model configuration does not produce this panic on an M4 Pro MacBook Pro with 48GB unified memory. The panic always occurs in a thread owned by the
exoprocess.This appears to be a bug in Apple's IOGPUFamily.kext, but EXO's allocation pattern (large Metal buffers during distributed inference) is a reliable trigger. This issue is being filed so EXO can implement user-space mitigations (guards, logging) and track affected Apple driver versions.
2. Environment
2.1 Panicking machine
2.2 Comparison machine (no panics)
3. Panic details
Key fields from one representative panic log:
"completeMemory() prepare count underflow"@IOGPUMemory.cpp:550exoexo)The backtrace shows the thread passing through IOGPUFamily functions and panicking inside
IOGPUMemory::completeMemory()with the prepare count underflow assertion.I can attach the full panic log as a file if needed. For now, this is the key excerpt:
4. Reproduction
Unfortunately I do not have a minimal script, because this occurs under EXO's distributed inference environment. However, the crash characteristics match existing public bug reports in the MLX ecosystem:
mlx_lm.server.My scenario:
completeMemory()underflow.The same workload and cluster setup on the M4 Pro 48GB MBP (14-core CPU, 20-core GPU, 48GB unified memory) has never produced this kernel panic so far.
5. Why this looks driver-side, but EXO is the trigger
IOGPUMemory::completeMemory()), which is Apple's GPU kernel extension, not EXO's code.I am not asking EXO to fix the kernel, but it may be possible for EXO to implement practical mitigations.
6. Requested EXO-side mitigations
Context / allocation guard:
Provide a configurable context/token limit (or internal guard) to prevent Metal allocations from reaching the size range known to trigger this driver bug (similar to what MLX is discussing in mlx#3186).
Logging:
Add logging around large Metal allocations: model size, batch size, context length, and estimated buffer sizes. That will make future crash reports far more actionable.
Version / platform warning:
Detect large-unified-memory Apple Silicon (e.g., M4 Max 128GB, M3 Ultra 96–192GB) and IOGPUFamily 130.13, then warn users that there is a known macOS kernel bug that can be triggered by very large contexts.
7. Attachments
I can attach:
Let me know what additional instrumentation or debugging builds would help narrow this further. I'm happy to run patched EXO binaries on this M4 Max 128GB system to gather more data.
UPDATE: I added ENV Var to Settings on the M4 128GB MBP that crashed earlier: MLX_METAL_MAX_BUFFER_SIZE=17179869184
I've been running and using this model via Hermes Agent TG UI now for the last 3 hours successfully:
mlx-community/MiniMax-M2.7-4bit-mxfp4
Model ID:
mlx-community/MiniMax-M2.7-4bit-mxfp4
Family:
minimax
Base model:
MiniMax M2.7
Quantization:
4bit-mxfp4
Size:
113GB
Layers:
62
Capabilities:
text, thinking
Tensor parallelism:
Yes
UPDATE 2: after about 4 hours of running and while my hermes agent was working on a task, the M4 128GB MBP crashed again. That's considerably longer before crashing but still disappointing and unviable for production etc. Info on 2nd crash follows:
Quick follow-up with a second, independent reproduction on the same M4 Max 128GB machine.
Environment is unchanged:
New panic (today):
"completeMemory() prepare count underflow" @IOGPUMemory.cpp:550pid 1717: exo126041 pages, 31 threads: pid 1717: exo0% of compressed pages limit (OK)and0% of segments limit (OK)with1 swapfilesandOK swap spacecom.apple.iokit.IOGPUFamily(130.13)...IOSurface(393.5.7)IOGraphicsFamily(600)IOPCIFamily(2.9)IOReportFamily(47)Backtrace again shows the panicking thread in
IOGPUMemory::completeMemory()called from the IOGPUFamily stack, withexoas the owning process.This is now at least the second confirmed panic on this M4 Max 128GB under EXO workloads, with the same
IOGPUMemory.cpp:550prepare-count underflow, but a different EXO pid and a different CPU core. The comparison M4 Pro 48GB machine (same macOS, same EXO version, same TB5 cluster) still shows no panics so far.I have attached the full
.panicfile for this new incident asexo-panic-M4Max-128GB-2026-04-23-<timestamp>.panicfor your reference.