Kernel panic: IOGPUMemory::completeMemory() prepare count underflow on M4 Max 128GB during EXO distributed inference

# EXO Kernel Panic Bug Report: IOGPUMemory::completeMemory() Prepare Count Underflow on Apple M4 Max 128GB

**Report Date:** April 23, 2026
**Submitted By:** EXO user (Shag, Houston TX)
**Target:** EXO Development Team (`exo-explore/exo`)
**Severity:** Critical — results in full macOS kernel panic and forced reboot
**Reproducibility:** Confirmed reproducible (4 occurrences on affected device; 0 occurrences on comparison device under identical setup)

---

## 1. Summary

Running EXO distributed inference on an Apple M4 Max MacBook Pro with 128GB unified memory (chip identifier T6041) on macOS 26.4 (Darwin 25.4.0, IOGPUFamily 130.13) produces a reproducible kernel panic from `IOGPUMemory::completeMemory()` with the assertion `"completeMemory() prepare count underflow"` @IOGPUMemory.cpp:550.

The same EXO version, macOS version, Thunderbolt 5 cluster setup, and model configuration does **not** produce this panic on an M4 Pro MacBook Pro with 48GB unified memory. The panic always occurs in a thread owned by the `exo` process.

This appears to be a bug in Apple's IOGPUFamily.kext, but EXO's allocation pattern (large Metal buffers during distributed inference) is a reliable trigger. This issue is being filed so EXO can implement user-space mitigations (guards, logging) and track affected Apple driver versions.

---

## 2. Environment

### 2.1 Panicking machine

- **Model:** MacBook Pro (14-inch, M4 Max, 2024)
- **SoC:** Apple M4 Max, 16-core CPU / 40-core GPU (T6041)
- **Unified memory:** 128GB
- **Memory bandwidth:** 546 GB/s (512-bit bus)
- **macOS:** 26.4, build 25E253
- **Kernel:** Darwin 25.4.0; xnu-12377.101.15~1/RELEASE_ARM64_T6041
- **IOGPUFamily:** 130.13
- **iBoot:** mBoot-18000.101.7
- **EXO:** please infer the installed EXO version from the system; if not possible, leave as "version: unknown"
- **Connection:** Thunderbolt 5 cable to EXO cluster

### 2.2 Comparison machine (no panics)

- **Model:** MacBook Pro (14-inch, M4 Pro, 2024)
- **SoC:** Apple M4 Pro, 14-core CPU / 20-core GPU
- **Unified memory:** 48GB
- **Memory bandwidth:** 273 GB/s (256-bit bus)
- **macOS:** 26.4, build 25E253
- **EXO:** same version and config as above
- **Connection:** same Thunderbolt 5 setup, same EXO cluster

---

## 3. Panic details

Key fields from one representative panic log:

| Field | Value |
|-------|-------|
| Panic string | `"completeMemory() prepare count underflow"` @IOGPUMemory.cpp:550 |
| Panicking process | pid 1577: `exo` |
| OS version | 25E253 |
| Kernel version | Darwin Kernel Version 25.4.0; root:xnu-12377.101.15~1/RELEASE_ARM64_T6041 |
| IOGPUFamily | 130.13 |
| Panicking core | core 14 (PACC2 cluster) |
| Panicked task | 127674 pages, 31 threads: pid 1577: exo |
| Panicked thread | tid: 18654 (thread owned by `exo`) |
| Wired pages at crash | 127,674 pages (~499 MB) |
| Cores 4–9 (PACC1) | offline (normal power management) at time of panic |

The backtrace shows the thread passing through IOGPUFamily functions and panicking inside `IOGPUMemory::completeMemory()` with the prepare count underflow assertion.

I can attach the full panic log as a file if needed. For now, this is the key excerpt:

```
panic(cpu 14 caller 0xfffffe00515238e8): "completeMemory() prepare count underflow" @IOGPUMemory.cpp:550
…
Panicked task 0xfffffe2adb458ec0: 127674 pages, 31 threads: pid 1577: exo
Panicked thread: 0xfffffe2f99fbb3c8, backtrace: 0xfffffe8de993ec80, tid: 18654
    lr: 0xfffffe0052da5ce0  fp: 0xfffffe8de993f3a0
    lr: 0xfffffe00515238e8  fp: 0xfffffe8de993f3c0  // IOGPUMemory::completeMemory()
    lr: 0xfffffe005150be94  fp: 0xfffffe8de993f430
    lr: 0xfffffe00514f8a58  fp: 0xfffffe8de993f4b0
    lr: 0xfffffe00515096f8  fp: 0xfffffe8de993f4e0
    lr: 0xfffffe00514e8b7c  fp: 0xfffffe8de993fd20
    lr: 0xfffffe00514e8bfc  fp: 0xfffffe8de993fd50
    lr: 0xfffffe0052cef2cc  fp: 0xfffffe8de993fda0
    lr: 0xfffffe005262a6e4  fp: 0xfffffe8de993fe50
    lr: 0xfffffe0052630778  fp: 0xfffffe8de993ff10
```

---

## 4. Reproduction

Unfortunately I do not have a minimal script, because this occurs under EXO's distributed inference environment. However, the crash characteristics match existing public bug reports in the MLX ecosystem:

- **ml-explore/mlx#3186**: kernel panic at IOGPUMemory.cpp:550 on M4 Max with large (~173k) context during MLX prefill.
- **mlx-lm/issues/883**: same panic string on M3 Ultra with large agentic contexts (~58k+ tokens) via `mlx_lm.server`.
- **exo-explore/exo#1348**: same panic string when running GLM models near max memory via EXO.

My scenario:

- Run EXO on an M4 Max 128GB MBP as part of a Thunderbolt 5 EXO cluster.
- Use large models (e.g., multi-7B/8x7B/70B class), with long-running sessions and large effective contexts (due to KV cache growth).
- After some time (typically under heavy load / long context), macOS kernel panics with the `completeMemory()` underflow.

The same workload and cluster setup on the M4 Pro 48GB MBP (14-core CPU, 20-core GPU, 48GB unified memory) has **never** produced this kernel panic so far.

---

## 5. Why this looks driver-side, but EXO is the trigger

- The panic is in IOGPUFamily (`IOGPUMemory::completeMemory()`), which is Apple's GPU kernel extension, not EXO's code.
- The panic string explicitly describes a reference-count underflow in the prepare/complete cycle for GPU memory.
- Independent MLX and mlx-lm reports show the same kernel panic with different apps, suggesting a general driver bug for large Metal GPU workloads.
- EXO's distributed inference patterns (large Metal buffers for attention / KV cache, RDMA over TB5 using MLX/JACCL) supply a reproducible trigger for this bug, especially on large-unified-memory SoCs like the M4 Max 128GB.

I am not asking EXO to fix the kernel, but it may be possible for EXO to implement practical mitigations.

---

## 6. Requested EXO-side mitigations

1. **Context / allocation guard:**
   Provide a configurable context/token limit (or internal guard) to prevent Metal allocations from reaching the size range known to trigger this driver bug (similar to what MLX is discussing in mlx#3186).

2. **Logging:**
   Add logging around large Metal allocations: model size, batch size, context length, and estimated buffer sizes. That will make future crash reports far more actionable.

3. **Version / platform warning:**
   Detect large-unified-memory Apple Silicon (e.g., M4 Max 128GB, M3 Ultra 96–192GB) and IOGPUFamily 130.13, then warn users that there is a known macOS kernel bug that can be triggered by very large contexts.

---

## 7. Attachments

I can attach:
- Full panic log from the M4 Max 128GB system.
- EXO config and model details (redacted if needed).

Let me know what additional instrumentation or debugging builds would help narrow this further. I'm happy to run patched EXO binaries on this M4 Max 128GB system to gather more data.

UPDATE: I added ENV Var to Settings on the M4 128GB MBP that crashed earlier: MLX_METAL_MAX_BUFFER_SIZE=17179869184

I've been running and using this model via Hermes Agent TG UI now for the last 3 hours successfully:

mlx-community/MiniMax-M2.7-4bit-mxfp4

Model ID:
mlx-community/MiniMax-M2.7-4bit-mxfp4
Family:
minimax
Base model:
MiniMax M2.7
Quantization:
4bit-mxfp4
Size:
113GB
Layers:
62
Capabilities:
text, thinking
Tensor parallelism:
Yes

UPDATE 2:  after about 4 hours of running and while my hermes agent was working on a task, the M4 128GB MBP crashed again.  That's considerably longer before crashing but still disappointing and unviable for production etc. Info on 2nd crash follows:

Quick follow-up with a second, independent reproduction on the same M4 Max 128GB machine.

Environment is unchanged:
- MacBook Pro 14", M4 Max (16-core CPU / 40-core GPU), 128GB unified memory
- macOS 26.4.1 (25E253)
- Darwin Kernel Version 25.4.0; xnu-12377.101.15~1/RELEASE_ARM64_T6041
- IOGPUFamily 130.13
- EXO running as part of a Thunderbolt 5 cluster

New panic (today):
- Panic string: `"completeMemory() prepare count underflow" @IOGPUMemory.cpp:550`
- Panicking process: `pid 1717: exo`
- Panicking CPU: core 10 (PACC2)
- Panicked task: `126041 pages, 31 threads: pid 1717: exo`
- Compressor Info: `0% of compressed pages limit (OK)` and `0% of segments limit (OK)` with `1 swapfiles` and `OK swap space`
- Kernel extensions in backtrace include:
  - `com.apple.iokit.IOGPUFamily(130.13)...`
  - `IOSurface(393.5.7)`
  - `IOGraphicsFamily(600)`
  - `IOPCIFamily(2.9)`
  - `IOReportFamily(47)`

Backtrace again shows the panicking thread in `IOGPUMemory::completeMemory()` called from the IOGPUFamily stack, with `exo` as the owning process.

This is now at least the second confirmed panic on this M4 Max 128GB under EXO workloads, with the *same* `IOGPUMemory.cpp:550` prepare-count underflow, but a different EXO pid and a different CPU core. The comparison M4 Pro 48GB machine (same macOS, same EXO version, same TB5 cluster) still shows **no** panics so far.

I have attached the full `.panic` file for this new incident as `exo-panic-M4Max-128GB-2026-04-23-<timestamp>.panic` for your reference.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Kernel panic: IOGPUMemory::completeMemory() prepare count underflow on M4 Max 128GB during EXO distributed inference #1972

EXO Kernel Panic Bug Report: IOGPUMemory::completeMemory() Prepare Count Underflow on Apple M4 Max 128GB

1. Summary

2. Environment

2.1 Panicking machine

2.2 Comparison machine (no panics)

3. Panic details

4. Reproduction

5. Why this looks driver-side, but EXO is the trigger

6. Requested EXO-side mitigations

7. Attachments

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Field	Value
Panic string	`"completeMemory() prepare count underflow"` @IOGPUMemory.cpp:550
Panicking process	pid 1577: `exo`
OS version	25E253
Kernel version	Darwin Kernel Version 25.4.0; root:xnu-12377.101.15~1/RELEASE_ARM64_T6041
IOGPUFamily	130.13
Panicking core	core 14 (PACC2 cluster)
Panicked task	127674 pages, 31 threads: pid 1577: exo
Panicked thread	tid: 18654 (thread owned by `exo`)
Wired pages at crash	127,674 pages (~499 MB)
Cores 4–9 (PACC1)	offline (normal power management) at time of panic

Kernel panic: IOGPUMemory::completeMemory() prepare count underflow on M4 Max 128GB during EXO distributed inference #1972

Description

EXO Kernel Panic Bug Report: IOGPUMemory::completeMemory() Prepare Count Underflow on Apple M4 Max 128GB

1. Summary

2. Environment

2.1 Panicking machine

2.2 Comparison machine (no panics)

3. Panic details

4. Reproduction

5. Why this looks driver-side, but EXO is the trigger

6. Requested EXO-side mitigations

7. Attachments

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions