Terminate process group after cram execution #11841

Leonidas-from-XIV · 2025-05-23T10:47:59Z

After discussing an alternate solution for #11827 with @rgrinberg we agreed that the best course of action is to just terminate the entire process group of the cram test after execution.

This PR reuses @Alizter's cram test repro case.

It introduces a slight race condition that we might be terminating an unrelated process group that accidentally got the same PID as the cram test. I am unaware of a way to avoid it, the chance is not substantial as OSes usually increment PIDs until they wrap around. In any case the engine is already terminating process groups if the process to be waited on has a timeout, so this code just extends it to the case where there is no timeout.

Fixes #11820

Leonidas-from-XIV · 2025-05-23T15:01:02Z

Surprising behavior on Windows when we kill_process_group which attempts to Unix.kill the PID which somehow causes corruption in the files that ocamldep writes. That's odd because at that time we have awaited the process so it should have exited already.

Alizter · 2025-05-24T08:40:50Z

@Leonidas-from-XIV that sounds to me like we have a race condition. Is it similar to the issue in #11010?

src/dune_engine/scheduler.ml

rgrinberg · 2025-06-10T21:45:33Z

Seems like there's a failure in CI. Is it something relevant?

Leonidas-from-XIV · 2025-06-11T08:12:36Z

I'm trying to figure out. It works reliably on my system (only failures are from Melange), CI sometimes hangs forever which is unrelated to this PR but its hard to say whether the failure is related since I can't reproduce it.

Alizter · 2025-06-11T22:23:49Z

@Leonidas-from-XIV Could you push a temporary commit disabling all of exec-watch and perhaps watching. Those would be my goto culprits for tests that doesn't terminate.

Signed-off-by: Ali Caglayan <[email protected]> Signed-off-by: Marek Kubica <[email protected]>

Signed-off-by: Marek Kubica <[email protected]>

Not quite sure why this would make a difference, because by the time `kill_process_group` is called on the PID the PID should be terminated and the operation a no-op, but CI is failing on Windows, so this commit skips the call and checks if that makes a difference. Signed-off-by: Marek Kubica <[email protected]>

Signed-off-by: Marek Kubica <[email protected]>

Leonidas-from-XIV requested review from rgrinberg and Alizter May 23, 2025 10:47

rgrinberg reviewed May 24, 2025

View reviewed changes

src/dune_engine/scheduler.ml Outdated Show resolved Hide resolved

rgrinberg approved these changes Jun 7, 2025

View reviewed changes

Leonidas-from-XIV force-pushed the cram-terminate-pgroup branch from bba405d to 87bd387 Compare June 9, 2025 09:53

Leonidas-from-XIV force-pushed the cram-terminate-pgroup branch from 87bd387 to 5a0e91b Compare June 11, 2025 08:07

Alizter mentioned this pull request Jun 12, 2025

Allow concurrent exec with watch mode #11840

Merged

Alizter and others added 6 commits June 19, 2025 11:41

test: crams not terminating subprocesses

3d95faf

Signed-off-by: Ali Caglayan <[email protected]> Signed-off-by: Marek Kubica <[email protected]>

Kill process group once main process exits

49d8460

Signed-off-by: Marek Kubica <[email protected]>

Update cram test to show that the process is terminated

14ed710

Signed-off-by: Marek Kubica <[email protected]>

Add changelog

1b7a05c

Signed-off-by: Marek Kubica <[email protected]>

Use SIGTERM instead of SIGKILL

ce45310

Signed-off-by: Marek Kubica <[email protected]>

Leonidas-from-XIV force-pushed the cram-terminate-pgroup branch from 5a0e91b to ce45310 Compare June 19, 2025 09:41

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Terminate process group after cram execution #11841

Terminate process group after cram execution #11841

Uh oh!

Leonidas-from-XIV commented May 23, 2025 •

edited

Loading

Uh oh!

Leonidas-from-XIV commented May 23, 2025

Uh oh!

Alizter commented May 24, 2025

Uh oh!

Uh oh!

rgrinberg commented Jun 10, 2025

Uh oh!

Leonidas-from-XIV commented Jun 11, 2025

Uh oh!

Alizter commented Jun 11, 2025

Uh oh!

Uh oh!

Terminate process group after cram execution #11841

Are you sure you want to change the base?

Terminate process group after cram execution #11841

Uh oh!

Conversation

Leonidas-from-XIV commented May 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Leonidas-from-XIV commented May 23, 2025

Uh oh!

Alizter commented May 24, 2025

Uh oh!

Uh oh!

rgrinberg commented Jun 10, 2025

Uh oh!

Leonidas-from-XIV commented Jun 11, 2025

Uh oh!

Alizter commented Jun 11, 2025

Uh oh!

Uh oh!

Leonidas-from-XIV commented May 23, 2025 •

edited

Loading