Skip to content

BLE: nimble_host LoadProhibited crash in BLEServer::removePeerDevice (race on m_connectedServersMap) #59

@skialpine

Description

@skialpine

Summary

During a sustained multi-protocol soak (WS + HTTP/WS churn + mDNS + USB binary + a connected BT app) on fix/ws-oom-broadcast (which carries the OOM fix from PR #58 plus the telemetry from PR #57), the NimBLE host task crashed with a LoadProhibited exception inside BLEServer::removePeerDevice while handling a BLE disconnect event. The scale auto-rebooted (reset_reason=panic) and the user's BT app reconnected normally — visible to the operator only via the telemetry reset_reason field added in PR #57.

This is a separate bug class from the OOM crash PR #58 fixes (which was bad_alloc in the WS broadcast path). This one is a long-known data race in the upstream Arduino-ESP32 BLE library, not in this firmware's code.

Evidence (decoded coredump from /coredump flash partition)

Crashed task: 'nimble_host'              ← BLE stack thread
exccause:     0x1c (LoadProhibitedCause)
excvaddr:     0x60533                    ← garbage pointer (not a valid heap addr)
pc:           0x420202d9  std::_Rb_tree<uint16_t, conn_status_t>::erase + 109

Stack (innermost → outermost):

#0 std::_Rb_tree<...>::_M_upper_bound     (walking corrupted tree)
#1 std::_Rb_tree<...>::equal_range
#2 std::_Rb_tree<...>::erase
#3 std::map<uint16_t, conn_status_t>::erase(1)
#4 BLEServer::removePeerDevice(conn_id=1, _client=false)
   at framework-arduinoespressif32/libraries/BLE/src/BLEServer.cpp:345
#5 BLEServer::handleGATTServerEvent          (BLEServer.cpp:820)
#6 ble_gap_call_event_cb
#7 ble_gap_conn_broken
#8 ble_gap_rx_disconn_complete               ← BLE disconnect event
#9 ble_hs_hci_evt_disconn_complete
#10..#15  NimBLE host task → vPortTaskWrapper

Root cause

BLEServer::m_connectedServersMap (std::map<uint16_t, conn_status_t>) is accessed without synchronization from both:

  • the NimBLE host task (addPeerDevice on connect, removePeerDevice on disconnect), and
  • the loop / app task (any code that touches getPeerDevices / getPeerMTU / etc., directly or indirectly through notify()).

A concurrent access leaves the red-black tree in an inconsistent state; the next operation walks a corrupted child pointer and faults.

Upstream

This is a known upstream bug:

Why we haven't picked it up

Component Version Date
Arduino-ESP32 in use 3.3.8 (latest tag) 2026-04-12
Upstream fix landed (master) cac3896c 2026-05-18
Master is N commits ahead of 3.3.8 53 commits, ~6 weeks (as of 2026-05-26)
Next arduino-esp32 release 3.3.9 (not yet tagged)

The fix is in arduino-esp32 master but not yet in any release. PlatformIO doesn't auto-upgrade (we pin pioarduino's stable URL), so we'll need to pio pkg update once 3.3.9 ships.

Current behavior

  • Scale auto-recovers via the panic handler's reboot (a few seconds outage); the user's BT app reconnects automatically.
  • The reset_reason=panic field from PR Scale telemetry: SoC temp, weight-stall watchdog, reset reason #57 is what makes this visible at all — without that telemetry, the reboot is invisible.
  • Coredump is captured in flash automatically (Arduino-ESP32's CONFIG_ESP_COREDUMP_ENABLE_TO_FLASH=y + default partition table includes the coredump partition) and can be pulled with esptool / esp-coredump.

Recommended resolution

  1. Wait for arduino-esp32 3.3.9 to be released, then pio pkg update to pick it up. The 10-line semaphore-wrap fix is exactly what we need.
  2. If we hit this crash painfully often before 3.3.9 ships, vendor a patched BLEServer.cpp / BLEServer.h into lib/ to override the framework version. The patch is the diff of cac3896c against 3.3.8.

Reproduction

Loose — runs of tools/thermal_load_test.sh (full multi-protocol load) with a real BT app connected for ~1 h. Triggers on a BLE disconnect under load. Auto-recovers each time so the only signal in the firmware (without coredump pull) is reset_reason flipping to panic.

🤖 Filed automatically — see espressif/arduino-esp32#12554 for the resolution path.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions