Skip to content

[#722] fix segfault and hung threads on KeyboardIinterrupt during parallel get #728

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

d-w-moore
Copy link
Collaborator

@d-w-moore d-w-moore commented May 19, 2025

Wherein we close down threads in an orderly way, so that things don't leave things to be disposed in the wrong order for the ever persnickety SSL shutdown logic.

Experiments show that SIGTERM actually does induce the Python interpreter to shut down non-daemonic threads, so installing a signal handler for that may not be necessary in the end.

@d-w-moore d-w-moore changed the title [_722] fix segfault and hung threads on SIGINT during parallel get [#722] fix segfault and hung threads on KeyboardIinterrupt during parallel get May 19, 2025
@d-w-moore d-w-moore self-assigned this May 19, 2025
@d-w-moore d-w-moore marked this pull request as draft May 19, 2025 17:09
@d-w-moore
Copy link
Collaborator Author

After a bit of manual testing, will attempt to make a proper test for SIGINT and SIGTERM to ensure things are left in an ok state.

@d-w-moore
Copy link
Collaborator Author

d-w-moore commented Jun 5, 2025

A GUI for example that maintains background asynch parallel transfers using PRC could trap and guard against Ctrl-C thusly:

from irods.parallel import abort_asynchronous_transfers
signal(SIGINT, lambda *_:exit(0 if abort_asynchronous_transfers() else 0))

@d-w-moore d-w-moore force-pushed the segfault_parallel_io_722.m branch from abafff5 to fb36836 Compare June 6, 2025 13:48
Copy link
Contributor

@alanking alanking left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems reasonable. Just a couple of things in the test

Comment on lines +9 to +10
import irods
import irods.helpers
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It appears that we only actually use irods.helpers. Could we remove the import irods?

_clock_polling_interval = max(0.01, time.clock_getres(time.CLOCK_BOOTTIME))


def wait_till_true(function, timeout=None):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just in case... can we set the default timeout to some high-ish value that is abundantly more than enough time to complete whatever transfer we are waiting for, but that will eventually fail if it is stuck? We can still support None as "no timeout", but making it the default makes me squirm. The value I had in mind was like... 10 minutes?

test_case.assertEqual(
process.wait(timeout=15),
-sig,
"Unexpected subprocess return code.",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please print the expected return code and the actual return code, if possible.

@korydraughn
Copy link
Contributor

Looks like we have a conflict.

Seems this PR is close to completion?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

3 participants