You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
During a sustained multi-protocol soak (WS + HTTP/WS churn + mDNS + USB binary + a connected BT app) on fix/ws-oom-broadcast (which carries the OOM fix from PR #58 plus the telemetry from PR #57), the NimBLE host task crashed with a LoadProhibited exception inside BLEServer::removePeerDevice while handling a BLE disconnect event. The scale auto-rebooted (reset_reason=panic) and the user's BT app reconnected normally — visible to the operator only via the telemetry reset_reason field added in PR #57.
This is a separate bug class from the OOM crash PR #58 fixes (which was bad_alloc in the WS broadcast path). This one is a long-known data race in the upstream Arduino-ESP32 BLE library, not in this firmware's code.
Evidence (decoded coredump from /coredump flash partition)
Fix commit: cac3896c on master, merged 2026-05-18.
Why we haven't picked it up
Component
Version
Date
Arduino-ESP32 in use
3.3.8 (latest tag)
2026-04-12
Upstream fix landed (master)
cac3896c
2026-05-18
Master is N commits ahead of 3.3.8
53 commits, ~6 weeks
(as of 2026-05-26)
Next arduino-esp32 release
3.3.9 (not yet tagged)
—
The fix is in arduino-esp32 master but not yet in any release. PlatformIO doesn't auto-upgrade (we pin pioarduino's stable URL), so we'll need to pio pkg update once 3.3.9 ships.
Current behavior
Scale auto-recovers via the panic handler's reboot (a few seconds outage); the user's BT app reconnects automatically.
Coredump is captured in flash automatically (Arduino-ESP32's CONFIG_ESP_COREDUMP_ENABLE_TO_FLASH=y + default partition table includes the coredump partition) and can be pulled with esptool / esp-coredump.
Recommended resolution
Wait for arduino-esp32 3.3.9 to be released, then pio pkg update to pick it up. The 10-line semaphore-wrap fix is exactly what we need.
If we hit this crash painfully often before 3.3.9 ships, vendor a patched BLEServer.cpp / BLEServer.h into lib/ to override the framework version. The patch is the diff of cac3896c against 3.3.8.
Reproduction
Loose — runs of tools/thermal_load_test.sh (full multi-protocol load) with a real BT app connected for ~1 h. Triggers on a BLE disconnect under load. Auto-recovers each time so the only signal in the firmware (without coredump pull) is reset_reason flipping to panic.
Summary
During a sustained multi-protocol soak (WS + HTTP/WS churn + mDNS + USB binary + a connected BT app) on
fix/ws-oom-broadcast(which carries the OOM fix from PR #58 plus the telemetry from PR #57), the NimBLE host task crashed with aLoadProhibitedexception insideBLEServer::removePeerDevicewhile handling a BLE disconnect event. The scale auto-rebooted (reset_reason=panic) and the user's BT app reconnected normally — visible to the operator only via the telemetryreset_reasonfield added in PR #57.This is a separate bug class from the OOM crash PR #58 fixes (which was
bad_allocin the WS broadcast path). This one is a long-known data race in the upstream Arduino-ESP32 BLE library, not in this firmware's code.Evidence (decoded coredump from
/coredumpflash partition)Stack (innermost → outermost):
Root cause
BLEServer::m_connectedServersMap(std::map<uint16_t, conn_status_t>) is accessed without synchronization from both:addPeerDeviceon connect,removePeerDeviceon disconnect), andgetPeerDevices/getPeerMTU/ etc., directly or indirectly throughnotify()).A concurrent access leaves the red-black tree in an inconsistent state; the next operation walks a corrupted child pointer and faults.
Upstream
This is a known upstream bug:
m_semaphoreMapAccesssemaphore around every read/write of the map.cac3896con master, merged 2026-05-18.Why we haven't picked it up
cac3896cThe fix is in arduino-esp32 master but not yet in any release. PlatformIO doesn't auto-upgrade (we pin pioarduino's
stableURL), so we'll need topio pkg updateonce 3.3.9 ships.Current behavior
reset_reason=panicfield from PR Scale telemetry: SoC temp, weight-stall watchdog, reset reason #57 is what makes this visible at all — without that telemetry, the reboot is invisible.CONFIG_ESP_COREDUMP_ENABLE_TO_FLASH=y+ default partition table includes thecoredumppartition) and can be pulled with esptool / esp-coredump.Recommended resolution
pio pkg updateto pick it up. The 10-line semaphore-wrap fix is exactly what we need.BLEServer.cpp/BLEServer.hintolib/to override the framework version. The patch is the diff ofcac3896cagainst3.3.8.Reproduction
Loose — runs of
tools/thermal_load_test.sh(full multi-protocol load) with a real BT app connected for ~1 h. Triggers on a BLE disconnect under load. Auto-recovers each time so the only signal in the firmware (without coredump pull) isreset_reasonflipping topanic.🤖 Filed automatically — see espressif/arduino-esp32#12554 for the resolution path.