Skip to content

Fix Task WDT crashes from LVGL priority starvation#11

Merged
torlando-tech merged 3 commits intomainfrom
fix/ble-wdt-stability
Mar 4, 2026
Merged

Fix Task WDT crashes from LVGL priority starvation#11
torlando-tech merged 3 commits intomainfrom
fix/ble-wdt-stability

Conversation

@torlando-tech
Copy link
Copy Markdown
Owner

Summary

  • Lower LVGL task priority from 2 to 1 (same core as loopTask) to prevent priority starvation that caused 30s Task WDT timeouts
  • Remove BLE task from FreeRTOS WDT — blocking NimBLE GATT operations (service discovery, subscribe, read) legitimately take 30-60s, exceeding the 30s WDT threshold
  • Add ble_hs_synced() guards to NimBLE GATT operations to prevent use of stale client pointers during host reset

Test plan

  • 10-minute soak test with zero WDT crashes (previously crashed every ~5 min)
  • Extended run to verify BLE connections still recover from desyncs

🤖 Generated with Claude Code

Two root causes for frequent device reboots:

1. LVGL task (priority 2) starved loopTask (priority 1) on core 1.
   During heavy screen rendering, loopTask couldn't run for 30+ seconds,
   triggering the Task WDT. Fixed by lowering LVGL to priority 1 so
   FreeRTOS round-robins both tasks fairly.

2. BLE task was registered with the 30s Task WDT, but blocking NimBLE
   GATT operations (connect + service discovery + subscribe + read) can
   legitimately take 30-60s total. Removed BLE task from WDT since
   NimBLE has its own internal ~30s timeouts per GATT operation.

Also added ble_hs_synced() guards to write(), read(), notify(),
writeCharacteristic(), discoverServices(), and enableNotifications()
to prevent use-after-free on stale NimBLE client pointers during
host resets.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps bot commented Mar 4, 2026

Greptile Summary

This PR fixes two independent root causes of Task WDT crashes: LVGL priority starvation of loopTask (addressed by demoting the LVGL task from priority 2 to 1) and legitimate 30–60 s blocking in NimBLE GATT operations exceeding the 30 s WDT threshold (addressed by unsubscribing the BLE task from the WDT entirely). Additionally, ble_hs_synced() guards are introduced at the entry point of every blocking GATT function to prevent use-after-free on stale client pointers during a host reset.

Key changes:

  • src/main.cpp: LVGL task priority lowered from 2 → 1 (equal to loopTask), enabling FreeRTOS round-robin scheduling between them on core 1 and eliminating priority preemption
  • BLEInterface.cpp: esp_task_wdt_add(NULL) and the per-loop esp_task_wdt_reset() removed from ble_task(); rationale clearly documented in a block comment
  • NimBLEPlatform.cpp: ble_hs_synced() guards added to discoverServices, write, writeCharacteristic, read, enableNotifications, notify, and notifyAll to prevent stale client pointer access

Confidence Score: 5/5

  • PR is safe to merge — WDT and priority fixes are sound, ble_hs_synced() guards prevent stale pointer access.
  • All three changes are well-reasoned and minimal: (1) LVGL priority reduction from 2→1 eliminates priority preemption with loopTask and is proven by 10-minute soak test, (2) BLE task WDT unsubscription is justified by legitimate 30-60s GATT blocking times, (3) ble_hs_synced() guards at GATT operation entry points prevent use-after-free on stale client pointers. No logic defects remain.
  • No files require special attention.

Important Files Changed

Filename Overview
lib/ble_interface/platforms/NimBLEPlatform.cpp Removes all esp_task_wdt_reset() calls (consistent with BLE task no longer being WDT-subscribed) and adds ble_hs_synced() guards to seven GATT operation entry points (discoverServices, write, writeCharacteristic, read, enableNotifications, notify, notifyAll). Guards are correctly placed and prevent use-after-free on stale client pointers by checking host sync status before any NimBLE client operations. No logic defects.
lib/ble_interface/BLEInterface.cpp Removes esp_task_wdt_add(NULL) and the per-loop esp_task_wdt_reset() from ble_task(), intentionally opting the BLE task out of the FreeRTOS Task WDT. The rationale (GATT operations legitimately blocking 30–60 s) is clearly documented in a block comment. Change is self-contained and carries no new logic risk.
src/main.cpp Lowers the LVGL task priority from 2 to 1 (equal to Arduino loopTask) on core 1. This prevents LVGL from priority-starving loopTask, which was the root cause of 30 s Task WDT timeouts. At equal priority FreeRTOS time-slices between the two tasks, so loopTask is guaranteed scheduled within each tick period. Change is minimal and well-justified by the 10-minute soak test with zero crashes.

Sequence Diagram

sequenceDiagram
    participant LT as loopTask (Core 1, P1)
    participant LV as LVGL Task (Core 1, P1)
    participant BT as BLE Task (Core 1, P1)
    participant NB as NimBLE Host (Core 0)
    participant WDT as Task WDT

    Note over LT,WDT: After PR — LVGL at P1 (was P2), BLE task unsubscribed from WDT

    LT->>WDT: esp_task_wdt_reset() [subscribed]
    LV->>LT: Round-robin share (equal priority)
    LT->>WDT: esp_task_wdt_reset() [within 30s window]

    BT->>NB: connect() [blocks ~5s]
    BT->>NB: discoverServices() [blocks ~10s]
    Note over BT: ble_hs_synced() guard checked before each call
    BT->>NB: enableNotifications() [blocks ~5s]
    BT->>NB: read(identity) [blocks ~5s]
    Note over BT,WDT: BLE task NOT subscribed — 60s total block is fine

    NB-->>BT: host reset (ble_hs_synced() → false)
    BT->>BT: ble_hs_synced() guard → return false early
    Note over BT: Stale client pointer use-after-free prevented
Loading

Last reviewed commit: ed8c081

Comment on lines +1440 to +1458
@@ -1447,10 +1449,13 @@ bool NimBLEPlatform::discoverServices(uint16_t conn_handle) {
return false;
}

// Get characteristics
// Get characteristics — each is a blocking GATT operation
NimBLERemoteCharacteristic* rxChar = service->getCharacteristic(UUID::RX_CHAR);
esp_task_wdt_reset();
NimBLERemoteCharacteristic* txChar = service->getCharacteristic(UUID::TX_CHAR);
esp_task_wdt_reset();
NimBLERemoteCharacteristic* idChar = service->getCharacteristic(UUID::IDENTITY_CHAR);
esp_task_wdt_reset();
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

esp_task_wdt_reset() calls are no-ops — BLE task is not subscribed to WDT

The BLE task is intentionally removed from the FreeRTOS Task WDT in this PR (the esp_task_wdt_add(NULL) call was removed from ble_task()). Per the ESP-IDF API, esp_task_wdt_reset() only resets the timer for the calling task, and returns ESP_ERR_NOT_FOUND (a no-op) if that task is not subscribed to the WDT.

This means all the newly added esp_task_wdt_reset() calls throughout discoverServices(), writeCharacteristic(), read(), and enableNotifications() are dead code when called from the BLE task context. They create a misleading impression of active WDT protection that doesn't exist.

Consider either:

  • Removing these calls with a comment explaining WDT is intentionally disabled for this task, or
  • Adding a comment here making clear these are preserved only for future re-subscription scenarios
Prompt To Fix With AI
This is a comment left during a code review.
Path: lib/ble_interface/platforms/NimBLEPlatform.cpp
Line: 1440-1458

Comment:
`esp_task_wdt_reset()` calls are no-ops — BLE task is not subscribed to WDT

The BLE task is intentionally removed from the FreeRTOS Task WDT in this PR (the `esp_task_wdt_add(NULL)` call was removed from `ble_task()`). Per the ESP-IDF API, `esp_task_wdt_reset()` only resets the timer for the **calling task**, and returns `ESP_ERR_NOT_FOUND` (a no-op) if that task is not subscribed to the WDT.

This means all the newly added `esp_task_wdt_reset()` calls throughout `discoverServices()`, `writeCharacteristic()`, `read()`, and `enableNotifications()` are dead code when called from the BLE task context. They create a misleading impression of active WDT protection that doesn't exist.

Consider either:
- Removing these calls with a comment explaining WDT is intentionally disabled for this task, or
- Adding a comment here making clear these are preserved only for future re-subscription scenarios

How can I resolve this? If you propose a fix, please make it concise.

@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps bot commented Mar 4, 2026

Additional Comments (1)

lib/ble_interface/platforms/NimBLEPlatform.cpp
notifyAll() missing ble_hs_synced() guard — stale pointer risk during host reset

The PR adds a ble_hs_synced() guard to notify() (line 1835) to prevent use of a stale _tx_char pointer during a host reset. However, notifyAll() directly below it accesses the same _tx_char pointer without the same guard — it only checks !_tx_char for nullness, not for host sync state.

During a NimBLE host reset, the host stack invalidates server-side characteristic objects on core 0. If notifyAll() is called in that window, _tx_char could be a dangling pointer even though _tx_char != nullptr.

bool NimBLEPlatform::notifyAll(const Bytes& data) {
    if (!ble_hs_synced() || !_tx_char) {
        return false;
    }

    _tx_char->setValue(data.data(), data.size());
    return _tx_char->notify(true);  // Notifies all subscribed clients
}
Prompt To Fix With AI
This is a comment left during a code review.
Path: lib/ble_interface/platforms/NimBLEPlatform.cpp
Line: 1843-1849

Comment:
`notifyAll()` missing `ble_hs_synced()` guard — stale pointer risk during host reset

The PR adds a `ble_hs_synced()` guard to `notify()` (line 1835) to prevent use of a stale `_tx_char` pointer during a host reset. However, `notifyAll()` directly below it accesses the same `_tx_char` pointer without the same guard — it only checks `!_tx_char` for nullness, not for host sync state.

During a NimBLE host reset, the host stack invalidates server-side characteristic objects on core 0. If `notifyAll()` is called in that window, `_tx_char` could be a dangling pointer even though `_tx_char != nullptr`.

```suggestion
bool NimBLEPlatform::notifyAll(const Bytes& data) {
    if (!ble_hs_synced() || !_tx_char) {
        return false;
    }

    _tx_char->setValue(data.data(), data.size());
    return _tx_char->notify(true);  // Notifies all subscribed clients
}
```

How can I resolve this? If you propose a fix, please make it concise.

BLE task is no longer subscribed to WDT, so these 23 calls were
silently returning ESP_ERR_NOT_FOUND. Removes dead code and the
now-unused esp_task_wdt.h include.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@torlando-tech
Copy link
Copy Markdown
Owner Author

@greptileai

@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps bot commented Mar 4, 2026

Additional Comments (1)

lib/ble_interface/platforms/NimBLEPlatform.cpp, line 1826
notifyAll() missing ble_hs_synced() guard

notify() was updated in this PR to guard against stale _tx_char pointers during a host reset, but notifyAll() — which accesses _tx_char in exactly the same way — was not updated. If a host reset occurs while peripheral connections exist, notifyAll() can attempt to use an invalidated NimBLEServerCharacteristic* pointer, leading to the same use-after-free the notify() fix was designed to prevent.

bool NimBLEPlatform::notifyAll(const Bytes& data) {
    if (!ble_hs_synced() || !_tx_char) {
        return false;
    }

Matches the guard already on notify() to prevent use-after-free
of _tx_char during a NimBLE host reset.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@torlando-tech
Copy link
Copy Markdown
Owner Author

@greptileai

@torlando-tech torlando-tech merged commit 3c81eeb into main Mar 4, 2026
3 checks passed
@torlando-tech torlando-tech deleted the fix/ble-wdt-stability branch March 4, 2026 22:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant