-
Notifications
You must be signed in to change notification settings - Fork 174
Description
Answers checklist.
- I have read the documentation for esp-protocols components and the issue is not addressed there.
- I have updated my esp-protocols branch (master or release) to the latest version and checked that the issue is present there.
- I have searched the issue tracker for a similar issue and not found a similar issue.
What component are you using? If you choose Other, provide details in More Information.
esp_websocket_client
component version
git commit c078c36 or current master
IDF version.
v6.0-dev-3218-g811e27118d
More Information.
Hi @david-cermak @gabsuren @euripedesrocha
It looks like somehow with ESP-IDF v6.0 and the recent versions of esp-websocket-client, calling esp_websocket_client_stop() may get stuck forever in some conditions when the internet is down. Here's our test environment:
- Genuine ESP32-S3-WROOM-1-N16R8 with PSRAM enabled
- ESP-IDF v6.0
- Recent esp-websocket-client (e.g. c078c36)
- Quectel EG800K modem with a SIM that has no data credit left, but still able to dial out and then retrieved IP address, but no Internet access, and a W5500 NIC but no ethernet cable plugged in
- Also have a MQTT client running, it talks to a MQTT server on the internet over WebSocket protocol
- Set
esp_websocket_client_config_t->disable_auto_reconnectto true so that we can handle the reconnect logic manually - Set
esp_websocket_client_config_t->network_timeout_msto 10000ms - No external proxy is used
Now, if I let the device to run for a while and attempt to connect to a WS and a MQTT server over the modem, since there's no internet access, it will always fail. Then in my firmware, I call the esp_websocket_client_stop() and then esp_websocket_client_destroy(), and try to recreate a new WS client. If I run more than 3-5 attempts, it will stuck forever.
We dig into this issue a bit further and we realised it looks like in the esp_websocket_client_task(), the client->state is 1 (WEBSOCKET_STATE_INIT), and this websocket task stuck at the previous esp_transport_connect() call, even though the network_timeout_ms is 10000ms and it should've been timed out and return way earlier. Therefore I guess this might also be a tcp_transport issue, not the WS client's issue.
This sort of lockup also occasionally happens on MQTT client. It may stuck at the esp_mqtt_client_stop() forever as well. But somehow for us it's less likely to happen.
Here are some of our logs:
- I added a line of log
ESP_LOGW(TAG, "Client state: %u; state_bit: 0x%x", client->state, xEventGroupGetBits(client->status_bits));in theesp_websocket_client_stop()beforexEventGroupGetBits(client->status_bits) & STOPPED_BIT - This is what it be like when it did not stuck, WS client state is 0 (because never connect successfully)
- This is what it be like when it get stuck, WS client state is 1
Regards,
Jackson