EPoll is not notified about a fatal error when receiving data from a socket

We have a hidden issue in handling the fatal connection errors with epoll.
If `EReadStatus CChannel::recvfrom(sockaddr* addr, CPacket& packet) const` returns `RST_ERROR`, the epoll never gets notified about the error on that socket.
The test condition can be easily reproduced by returning `RST_ERROR` from the `recvfrom` function.
Also reverting changes of #468 (i.e. treating `WSAECONNRESET` as a fatal error on Windows) make the issue reproduceable on Windows (see the `ConnectionTimeoutTest` in the `test` folder).

**What is happening in case `recvfrom` DOES NOT return an error?**

The worker thread `CRcvQueue::worker` is in the while() loop. The socket, that attempts to connect, is in the list `m_lRendezvousID`. The `CRcvQueue::worker` thread calls `void CRendezvousQueue::updateConnStatus()` on every loop iteration. The function updates the socket status, in particular, when TTL is expired and the socket has to be closed.
```
// partial pseudocode
void* CRcvQueue::worker(void* param)
{
    // (...)
    while (!self->m_bClosing)
    {
        EReadStatus rst = self->worker_RetrieveUnit(Ref(id), Ref(unit), &sa);
        if (rst == RST_ERROR)
        {
            // !!! This break will skip updateConnStatus on the socket
            // and epoll is never notified
            break;
        }
    
        self->m_pRendezvousQueue->updateConnStatus(rst, cst, unit->m_Packet);
    }
    
    return;
}
```
One of the following calls to `void CRendezvousQueue::updateConnStatus()` triggers an opened socket to be closed by timeout.
After this the `epoll` gets notified about the error on that socket. And then all the threads are closed as expected.

**What is happening in case `recvfrom` DOES return an error?**

If the `recvfrom` return an error, the worker thread `CRcvQueue::worker` loop breaks and thread function returns. This way the next call to the `updateConnStatus()` is skipped and never happens. Therefore the epoll is never notified about the error on this socket. `srt_epoll_wait` can only return on a wait timeout, instead of handling the error event on the socket.

This is probably a minor issue, because in general we still have the `epoll` wait timeout, that handles these conditions.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

EPoll is not notified about a fatal error when receiving data from a socket #511

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

EPoll is not notified about a fatal error when receiving data from a socket #511

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions