Skip to content

Websocket client doesn't handle transport timeout errors correctly (IDFGH-16202) #858

@Lupindakaas

Description

@Lupindakaas

Answers checklist.

  • I have read the documentation for esp-protocols components and the issue is not addressed there.
  • I have updated my esp-protocols branch (master or release) to the latest version and checked that the issue is present there.
  • I have searched the issue tracker for a similar issue and not found a similar issue.

General issue report

Might be related to #795.

What i've gathered

The websocket client uses esp_transport_ws for it's transport layer. When using the esp_transport_read() function, you can use a timeout. This is configureable in the websocket client as 'network_timeout_ms'. When this timeout is reached, the esp_transport_read() function returns 0. This is however also the case for length 0 messages like PING or CLOSE. This already makes it hard to detect a timed out message. It seems that when this timeout is reached, an error isn't correctly detected, and we handle the message as regular. After this happens, the websocket client keeps publishing offset/randomly fragmented data.

Log

See log below. When the timeout is reached, the function returns 0. After this though, we don't handle this as a timeout and still emit the data event. This is what the client_handler is subscribed to. After this happens once, we enter a weird 'fragmented read' state where the client is confused about what bit is where. Also the websocket headers get shuffled around.

Funnily enough, this confused/fragmented read state continues untill 'E (294271) transport_ws: Error read data(0)' happens again. Then it recovers. This is clearly linked to the network_timeout_ms, as i can control the frequency of this error by enlongating/shortening the timeout time. Shorter timeout increases the frequency.

I (292621) client_handler: WS received data: opcode=1, len=63, data={"message":"type","data":{"data":{"point":3262,"point2":1952}}}
E (294271) transport_ws: Error read data(0)
E (294271) transport_ws: Error reading payload data(0)

-> len=0 && opcode=1 -> timeout reached but still publishes event

I (294271) client_handler: WS received data: opcode=1, len=0, data= 
E (294281) client_handler: on_data_event(467): Received unexpected text data

-> After this, the client publishes weirdly fragmented data:

I (295091) client_handler: WS received data: opcode=11, len=34, data={"message":"type","data":{"data"
W (295091) client_handler: WS unhandled op code 11
I (295101) client_handler: WS received data: opcode=10, len=30, data=:{"point":3262,"point2":1952}}}
I (295121) client_handler: WS received data: opcode=10, len=2, data=�U
I (295121) client_handler: WS received data: opcode=10, len=56, data={"message":"type","data":{"data":{"point":3262,"point2":
I (295131) client_handler: WS received data: opcode=14, len=7, data=1952}}}
W (295141) client_handler: WS unhandled op code 14
I (295611) client_handler: WS received data: opcode=14, len=2, data=�S
W (295611) client_handler: WS unhandled op code 14
I (295871) client_handler: WS received data: opcode=14, len=63, data={"message":"type","data":{"data":{"point":3262,"point2":1952}}}
W (295881) client_handler: WS unhandled op code 14

-> Another timeout

E (296401) transport_ws: Error read data(0)
E (296401) transport_ws: Error reading payload data(0)
I (296401) client_handler: WS received data: opcode=14, len=0, data=
W (296411) client_handler: WS unhandled op code 14

-> State is restored again

I (296711) app_remote: WS received data: opcode=1, len=63, data={"message":"type","data":{"data":{"point":3262,"point2":1952}}}

I had to obfuscate the actual data being send, so maybe the lengths don't match exactly.

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions