Skip to content

Conversation

@j-xiong
Copy link
Contributor

@j-xiong j-xiong commented Nov 19, 2025

If unhandled, a signal such as SIGPROF sent to the thread that is running ofi_uffd_handler will cause it to exit early.

Fix #11627

If unhandled, a signal such as SIGPROF sent to the thread that is running
ofi_uffd_handler will cause it to exit early.

Signed-off-by: Jianxin Xiong <[email protected]>
@j-xiong
Copy link
Contributor Author

j-xiong commented Nov 20, 2025

@swelch Could you or someone else from HPE review this PR? The authors that made most recent changes to this file were from HPE but they have moved or retired.

pthread_mutex_unlock(&mm_lock);
pthread_rwlock_unlock(&mm_list_rwlock);
if (errno != EAGAIN)
if (errno != EAGAIN && errno != EINTR)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not a review but thought it was something worth raising. This is from the man page of read:

On success, the number of bytes read is returned (zero indicates end of file), and the file position is advanced by this number.
It is not an error if this number is smaller than the number of bytes requested; this may happen for example because fewer bytes
are actually available right now (maybe because we were close to end-of-file, or because we are reading from a pipe, or from a
terminal), or because read() was interrupted by a signal.

As far as I can tell, there a chance that the size/amount read from the syscall might not exactly equal if interrupted by a signal. I do not know much about libfabric to know how this might affect this case but could there be a chance that a signal sent here could allow for only a partial read? Would a partial read here potential affect future reads in the loop?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That can be an issue but according to the man page:

       If a read() is interrupted by a signal before it reads any data, it shall return −1 with errno set to [EINTR].

       If a read() is interrupted by a signal after it has successfully read some data, it shall return the number of bytes read.

Partial read would not set errno to EINTR so the thread would exit early as before.

Let me see if I can find a good way to handle this situation.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

According to the userfaultfd man page https://www.man7.org/linux/man-pages/man2/userfaultfd.2.html, reading from the fd returns 1 or more msg structures based on available events and the buffer size. Passing a buffer smaller than the msg structure would return error. This implies that the partial reading scenario is not going to happen (otherwise application has no way to recover).

@j-xiong
Copy link
Contributor Author

j-xiong commented Nov 25, 2025

Intel CI failure is an unrelated known issue (Bad file descriptor).

@j-xiong j-xiong merged commit d3e54cb into ofiwg:main Nov 29, 2025
19 of 20 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

prov/uffd: The uffd memory cache monitor cannot handle signal intterupts

3 participants