Fix Task WDT crashes from LVGL priority starvation#11
Conversation
Two root causes for frequent device reboots: 1. LVGL task (priority 2) starved loopTask (priority 1) on core 1. During heavy screen rendering, loopTask couldn't run for 30+ seconds, triggering the Task WDT. Fixed by lowering LVGL to priority 1 so FreeRTOS round-robins both tasks fairly. 2. BLE task was registered with the 30s Task WDT, but blocking NimBLE GATT operations (connect + service discovery + subscribe + read) can legitimately take 30-60s total. Removed BLE task from WDT since NimBLE has its own internal ~30s timeouts per GATT operation. Also added ble_hs_synced() guards to write(), read(), notify(), writeCharacteristic(), discoverServices(), and enableNotifications() to prevent use-after-free on stale NimBLE client pointers during host resets. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Greptile SummaryThis PR fixes two independent root causes of Task WDT crashes: LVGL priority starvation of Key changes:
Confidence Score: 5/5
Important Files Changed
Sequence DiagramsequenceDiagram
participant LT as loopTask (Core 1, P1)
participant LV as LVGL Task (Core 1, P1)
participant BT as BLE Task (Core 1, P1)
participant NB as NimBLE Host (Core 0)
participant WDT as Task WDT
Note over LT,WDT: After PR — LVGL at P1 (was P2), BLE task unsubscribed from WDT
LT->>WDT: esp_task_wdt_reset() [subscribed]
LV->>LT: Round-robin share (equal priority)
LT->>WDT: esp_task_wdt_reset() [within 30s window]
BT->>NB: connect() [blocks ~5s]
BT->>NB: discoverServices() [blocks ~10s]
Note over BT: ble_hs_synced() guard checked before each call
BT->>NB: enableNotifications() [blocks ~5s]
BT->>NB: read(identity) [blocks ~5s]
Note over BT,WDT: BLE task NOT subscribed — 60s total block is fine
NB-->>BT: host reset (ble_hs_synced() → false)
BT->>BT: ble_hs_synced() guard → return false early
Note over BT: Stale client pointer use-after-free prevented
Last reviewed commit: ed8c081 |
| @@ -1447,10 +1449,13 @@ bool NimBLEPlatform::discoverServices(uint16_t conn_handle) { | |||
| return false; | |||
| } | |||
|
|
|||
| // Get characteristics | |||
| // Get characteristics — each is a blocking GATT operation | |||
| NimBLERemoteCharacteristic* rxChar = service->getCharacteristic(UUID::RX_CHAR); | |||
| esp_task_wdt_reset(); | |||
| NimBLERemoteCharacteristic* txChar = service->getCharacteristic(UUID::TX_CHAR); | |||
| esp_task_wdt_reset(); | |||
| NimBLERemoteCharacteristic* idChar = service->getCharacteristic(UUID::IDENTITY_CHAR); | |||
| esp_task_wdt_reset(); | |||
There was a problem hiding this comment.
esp_task_wdt_reset() calls are no-ops — BLE task is not subscribed to WDT
The BLE task is intentionally removed from the FreeRTOS Task WDT in this PR (the esp_task_wdt_add(NULL) call was removed from ble_task()). Per the ESP-IDF API, esp_task_wdt_reset() only resets the timer for the calling task, and returns ESP_ERR_NOT_FOUND (a no-op) if that task is not subscribed to the WDT.
This means all the newly added esp_task_wdt_reset() calls throughout discoverServices(), writeCharacteristic(), read(), and enableNotifications() are dead code when called from the BLE task context. They create a misleading impression of active WDT protection that doesn't exist.
Consider either:
- Removing these calls with a comment explaining WDT is intentionally disabled for this task, or
- Adding a comment here making clear these are preserved only for future re-subscription scenarios
Prompt To Fix With AI
This is a comment left during a code review.
Path: lib/ble_interface/platforms/NimBLEPlatform.cpp
Line: 1440-1458
Comment:
`esp_task_wdt_reset()` calls are no-ops — BLE task is not subscribed to WDT
The BLE task is intentionally removed from the FreeRTOS Task WDT in this PR (the `esp_task_wdt_add(NULL)` call was removed from `ble_task()`). Per the ESP-IDF API, `esp_task_wdt_reset()` only resets the timer for the **calling task**, and returns `ESP_ERR_NOT_FOUND` (a no-op) if that task is not subscribed to the WDT.
This means all the newly added `esp_task_wdt_reset()` calls throughout `discoverServices()`, `writeCharacteristic()`, `read()`, and `enableNotifications()` are dead code when called from the BLE task context. They create a misleading impression of active WDT protection that doesn't exist.
Consider either:
- Removing these calls with a comment explaining WDT is intentionally disabled for this task, or
- Adding a comment here making clear these are preserved only for future re-subscription scenarios
How can I resolve this? If you propose a fix, please make it concise.
Additional Comments (1)
The PR adds a During a NimBLE host reset, the host stack invalidates server-side characteristic objects on core 0. If Prompt To Fix With AIThis is a comment left during a code review.
Path: lib/ble_interface/platforms/NimBLEPlatform.cpp
Line: 1843-1849
Comment:
`notifyAll()` missing `ble_hs_synced()` guard — stale pointer risk during host reset
The PR adds a `ble_hs_synced()` guard to `notify()` (line 1835) to prevent use of a stale `_tx_char` pointer during a host reset. However, `notifyAll()` directly below it accesses the same `_tx_char` pointer without the same guard — it only checks `!_tx_char` for nullness, not for host sync state.
During a NimBLE host reset, the host stack invalidates server-side characteristic objects on core 0. If `notifyAll()` is called in that window, `_tx_char` could be a dangling pointer even though `_tx_char != nullptr`.
```suggestion
bool NimBLEPlatform::notifyAll(const Bytes& data) {
if (!ble_hs_synced() || !_tx_char) {
return false;
}
_tx_char->setValue(data.data(), data.size());
return _tx_char->notify(true); // Notifies all subscribed clients
}
```
How can I resolve this? If you propose a fix, please make it concise. |
BLE task is no longer subscribed to WDT, so these 23 calls were silently returning ESP_ERR_NOT_FOUND. Removes dead code and the now-unused esp_task_wdt.h include. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Additional Comments (1)
|
Matches the guard already on notify() to prevent use-after-free of _tx_char during a NimBLE host reset. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Summary
ble_hs_synced()guards to NimBLE GATT operations to prevent use of stale client pointers during host resetTest plan
🤖 Generated with Claude Code