Skip to content
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 4 additions & 1 deletion prov/util/src/uffd_mem_monitor.c
Original file line number Diff line number Diff line change
Expand Up @@ -108,6 +108,9 @@ static void *ofi_uffd_handler(void *arg)

for (;;) {
ret = poll(fds, 2, -1);
if (ret < 0 && errno == EINTR)
continue;

if (ret < 0 || fds[1].revents)
break;

Expand All @@ -117,7 +120,7 @@ static void *ofi_uffd_handler(void *arg)
if (ret != sizeof(msg)) {
pthread_mutex_unlock(&mm_lock);
pthread_rwlock_unlock(&mm_list_rwlock);
if (errno != EAGAIN)
if (errno != EAGAIN && errno != EINTR)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not a review but thought it was something worth raising. This is from the man page of read:

On success, the number of bytes read is returned (zero indicates end of file), and the file position is advanced by this number.
It is not an error if this number is smaller than the number of bytes requested; this may happen for example because fewer bytes
are actually available right now (maybe because we were close to end-of-file, or because we are reading from a pipe, or from a
terminal), or because read() was interrupted by a signal.

As far as I can tell, there a chance that the size/amount read from the syscall might not exactly equal if interrupted by a signal. I do not know much about libfabric to know how this might affect this case but could there be a chance that a signal sent here could allow for only a partial read? Would a partial read here potential affect future reads in the loop?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That can be an issue but according to the man page:

       If a read() is interrupted by a signal before it reads any data, it shall return −1 with errno set to [EINTR].

       If a read() is interrupted by a signal after it has successfully read some data, it shall return the number of bytes read.

Partial read would not set errno to EINTR so the thread would exit early as before.

Let me see if I can find a good way to handle this situation.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

According to the userfaultfd man page https://www.man7.org/linux/man-pages/man2/userfaultfd.2.html, reading from the fd returns 1 or more msg structures based on available events and the buffer size. Passing a buffer smaller than the msg structure would return error. This implies that the partial reading scenario is not going to happen (otherwise application has no way to recover).

break;
continue;
}
Expand Down