Skip to content

Conversation

copybara-service[bot]
Copy link

Wait for the gofer process in Container.Wait().

The linked issue is regarding runsc run --detach=false not waiting on the
gofer (which is its child process). Because of this, if the container is killed
externally via runsc delete, then the gofer process becomes a zombie. There
is a deadlock because runsc delete is waiting for the gofer process to be
reaped but the runsc run process is stuck waiting for a filesystem lock held
by the runsc delete process before it can reap it in Container.Destroy().

The fix is to also attempt reaping the child gofer process in Container.Wait().
Notice that this is similar to what Sandbox does. Both Sandbox.Wait() and
Sandbox.destroy() call Sandbox.waitForStopped() which reaps the child sandbox
process. However, only Container.Destroy() => stop() => waitForStopped(). This
change updates Container.Wait() to also call Container.waitForStopped().

This fixes the above deadlock because runsc run will reap the gofer before
getting stuck waiting for the filesystem lock.

Fixes #12051

The linked issue is regarding `runsc run --detach=false` not waiting on the
gofer (which is its child process). Because of this, if the container is killed
externally via `runsc delete`, then the gofer process becomes a zombie. There
is a deadlock because `runsc delete` is waiting for the gofer process to be
reaped but the `runsc run` process is stuck waiting for a filesystem lock held
by the `runsc delete` process before it can reap it in Container.Destroy().

The fix is to also attempt reaping the child gofer process in Container.Wait().
Notice that this is similar to what Sandbox does. Both Sandbox.Wait() and
Sandbox.destroy() call Sandbox.waitForStopped() which reaps the child sandbox
process. However, only Container.Destroy() => stop() => waitForStopped(). This
change updates Container.Wait() to also call Container.waitForStopped().

This fixes the above deadlock because `runsc run` will reap the gofer before
getting stuck waiting for the filesystem lock.

Fixes #12051

PiperOrigin-RevId: 798074124
@copybara-service copybara-service bot added the exported Issue was exported automatically label Aug 22, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
exported Issue was exported automatically
Projects
None yet
Development

Successfully merging this pull request may close these issues.

runsc delete -force is slow with non-detached container
1 participant