Skip to content

Conversation

shayne-fletcher
Copy link
Contributor

Summary:
integrates the ProcStatus / ProcHandle lifecycle surface into BootstrapProcManager. the manager now stores ProcHandles instead of raw Childs, stamps Running on spawn when a pid is observable, wires stdout/stderr to BOOTSTRAP_LOG_CHANNEL, and runs an exit monitor that wait()s and records Stopped(code) or Killed(signal, core_dumped) via ExitStatusExt. illegal transitions are rejected and the mark_*s return bool to make that explicit in tests.

adds BootstrapProcManager::status(&ProcId) for a snapshot of the current status. documents constructors, transport() (Unix), and clarifies mark_running as a best-effort OS-level liveness stamp, not a readiness guarantee.

tests exercise handle transitions, clean exit and SIGKILL observed by the monitor, unknown-proc status, the “child already taken” path, pid disappearing once the monitor claims the child, and the fast-exit path that goes Starting -> Stopped without Running. the manager-drop test now tolerates linux zombie semantics by polling /proc/<pid>/status and accepting Z or disappearance. includes an unsafe justification for libc::kill in the SIGKILL test.

graceful shutdown (SIGTERM → wait → SIGKILL) remains a TODO. tombstone semantics in children are unchanged for now.

Differential Revision: D82846206

Copy link

pytorch-bot bot commented Sep 19, 2025

No ciflow labels are configured for this repo.
For information on how to enable CIFlow bot see this wiki

@meta-cla meta-cla bot added the CLA Signed This label is managed by the Meta Open Source bot. label Sep 19, 2025
@facebook-github-bot
Copy link
Contributor

@shayne-fletcher has exported this pull request. If you are a Meta employee, you can view the originating diff in D82846206.

shayne-fletcher added a commit to shayne-fletcher/monarch-1 that referenced this pull request Sep 19, 2025
…ch#1282)

Summary:

integrates the `ProcStatus` / `ProcHandle` lifecycle surface into `BootstrapProcManager`. the manager now stores `ProcHandle`s instead of raw `Child`s, stamps `Running` on `spawn` when a pid is observable, wires `stdout`/`stderr` to `BOOTSTRAP_LOG_CHANNEL`, and runs an exit monitor that `wait()`s and records `Stopped(code)` or `Killed(signal, core_dumped)` via `ExitStatusExt`. illegal transitions are rejected and the `mark_*`s return `bool` to make that explicit in tests.

adds `BootstrapProcManager::status(&ProcId)` for a snapshot of the current status. documents constructors, `transport()` (Unix), and clarifies `mark_running` as a best-effort OS-level liveness stamp, not a readiness guarantee.

tests exercise handle transitions, clean exit and `SIGKILL` observed by the monitor, unknown-proc status, the “child already taken” path, pid disappearing once the monitor claims the child, and the fast-exit path that goes `Starting` -> `Stopped` without `Running`. the manager-drop test now tolerates linux zombie semantics by polling `/proc/<pid>/status` and accepting `Z` or disappearance. includes an unsafe justification for `libc::kill` in the `SIGKILL` test.

graceful shutdown (SIGTERM → wait → SIGKILL) remains a TODO. tombstone semantics in children are unchanged for now.

Differential Revision: D82846206
@facebook-github-bot
Copy link
Contributor

@shayne-fletcher has exported this pull request. If you are a Meta employee, you can view the originating diff in D82846206.

shayne-fletcher added a commit to shayne-fletcher/monarch-1 that referenced this pull request Sep 19, 2025
…ch#1282)

Summary:

integrates the `ProcStatus` / `ProcHandle` lifecycle surface into `BootstrapProcManager`. the manager now stores `ProcHandle`s instead of raw `Child`s, stamps `Running` on `spawn` when a pid is observable, wires `stdout`/`stderr` to `BOOTSTRAP_LOG_CHANNEL`, and runs an exit monitor that `wait()`s and records `Stopped(code)` or `Killed(signal, core_dumped)` via `ExitStatusExt`. illegal transitions are rejected and the `mark_*`s return `bool` to make that explicit in tests.

adds `BootstrapProcManager::status(&ProcId)` for a snapshot of the current status. documents constructors, `transport()` (Unix), and clarifies `mark_running` as a best-effort OS-level liveness stamp, not a readiness guarantee.

tests exercise handle transitions, clean exit and `SIGKILL` observed by the monitor, unknown-proc status, the “child already taken” path, pid disappearing once the monitor claims the child, and the fast-exit path that goes `Starting` -> `Stopped` without `Running`. the manager-drop test now tolerates linux zombie semantics by polling `/proc/<pid>/status` and accepting `Z` or disappearance. includes an unsafe justification for `libc::kill` in the `SIGKILL` test.

graceful shutdown (SIGTERM → wait → SIGKILL) remains a TODO. tombstone semantics in children are unchanged for now.

Differential Revision: D82846206
@facebook-github-bot
Copy link
Contributor

@shayne-fletcher has exported this pull request. If you are a Meta employee, you can view the originating diff in D82846206.

shayne-fletcher added a commit to shayne-fletcher/monarch-1 that referenced this pull request Sep 19, 2025
…ch#1282)

Summary:

integrates the `ProcStatus` / `ProcHandle` lifecycle surface into `BootstrapProcManager`. the manager now stores `ProcHandle`s instead of raw `Child`s, stamps `Running` on `spawn` when a pid is observable, wires `stdout`/`stderr` to `BOOTSTRAP_LOG_CHANNEL`, and runs an exit monitor that `wait()`s and records `Stopped(code)` or `Killed(signal, core_dumped)` via `ExitStatusExt`. illegal transitions are rejected and the `mark_*`s return `bool` to make that explicit in tests.

adds `BootstrapProcManager::status(&ProcId)` for a snapshot of the current status. documents constructors, `transport()` (Unix), and clarifies `mark_running` as a best-effort OS-level liveness stamp, not a readiness guarantee.

adds a `pid_table` and a `Drop` impl that best-effort `SIGKILL`s recorded pids on `drop`. this replaces relying on `Child.kill_on_drop` now that the exit monitor owns the `Child`.

tests exercise handle transitions, clean exit and `SIGKILL` observed by the monitor, unknown-proc status, the “child already taken” path, pid disappearing once the monitor claims the child, and the fast-exit path that goes `Starting` -> `Stopped` without `Running`. the manager-drop test now tolerates linux zombie semantics by polling `/proc/<pid>/status` and accepting `Z` or disappearance. includes an unsafe justification for `libc::kill` in the `SIGKILL` test.

graceful shutdown (SIGTERM → wait → SIGKILL) remains a TODO. tombstone semantics in children are unchanged for now.

Differential Revision: D82846206
@facebook-github-bot
Copy link
Contributor

@shayne-fletcher has exported this pull request. If you are a Meta employee, you can view the originating diff in D82846206.

shayne-fletcher added a commit to shayne-fletcher/monarch-1 that referenced this pull request Sep 19, 2025
…ch#1282)

Summary:

integrates the `ProcStatus` / `ProcHandle` lifecycle surface into `BootstrapProcManager`. the manager now stores `ProcHandle`s instead of raw `Child`s, stamps `Running` on `spawn` when a pid is observable, wires `stdout`/`stderr` to `BOOTSTRAP_LOG_CHANNEL`, and runs an exit monitor that `wait()`s and records `Stopped(code)` or `Killed(signal, core_dumped)` via `ExitStatusExt`. illegal transitions are rejected and the `mark_*`s return `bool` to make that explicit in tests.

adds `BootstrapProcManager::status(&ProcId)` for a snapshot of the current status. documents constructors, `transport()` (Unix), and clarifies `mark_running` as a best-effort OS-level liveness stamp, not a readiness guarantee.

adds a `pid_table` and a `Drop` impl that best-effort `SIGKILL`s recorded pids on `drop`. this replaces relying on `Child.kill_on_drop` now that the exit monitor owns the `Child`.

tests exercise handle transitions, clean exit and `SIGKILL` observed by the monitor, unknown-proc status, the “child already taken” path, pid disappearing once the monitor claims the child, and the fast-exit path that goes `Starting` -> `Stopped` without `Running`. the manager-drop test now tolerates linux zombie semantics by polling `/proc/<pid>/status` and accepting `Z` or disappearance. includes an unsafe justification for `libc::kill` in the `SIGKILL` test.

graceful shutdown (SIGTERM → wait → SIGKILL) remains a TODO. tombstone semantics in children are unchanged for now.

Differential Revision: D82846206
@facebook-github-bot
Copy link
Contributor

@shayne-fletcher has exported this pull request. If you are a Meta employee, you can view the originating diff in D82846206.

shayne-fletcher added a commit to shayne-fletcher/monarch-1 that referenced this pull request Sep 19, 2025
…ch#1282)

Summary:

integrates the `ProcStatus` / `ProcHandle` lifecycle surface into `BootstrapProcManager`. the manager now stores `ProcHandle`s instead of raw `Child`s, stamps `Running` on `spawn` when a pid is observable, wires `stdout`/`stderr` to `BOOTSTRAP_LOG_CHANNEL`, and runs an exit monitor that `wait()`s and records `Stopped(code)` or `Killed(signal, core_dumped)` via `ExitStatusExt`. illegal transitions are rejected and the `mark_*`s return `bool` to make that explicit in tests.

adds `BootstrapProcManager::status(&ProcId)` for a snapshot of the current status. documents constructors, `transport()` (Unix), and clarifies `mark_running` as a best-effort OS-level liveness stamp, not a readiness guarantee.

adds a `pid_table` and a `Drop` impl that best-effort `SIGKILL`s recorded pids on `drop`. this replaces relying on `Child.kill_on_drop` now that the exit monitor owns the `Child`.

tests exercise handle transitions, clean exit and `SIGKILL` observed by the monitor, unknown-proc status, the “child already taken” path, pid disappearing once the monitor claims the child, and the fast-exit path that goes `Starting` -> `Stopped` without `Running`. the manager-drop test now tolerates linux zombie semantics by polling `/proc/<pid>/status` and accepting `Z` or disappearance. includes an unsafe justification for `libc::kill` in the `SIGKILL` test.

graceful shutdown (SIGTERM → wait → SIGKILL) remains a TODO. tombstone semantics in children are unchanged for now.

Differential Revision: D82846206
@facebook-github-bot
Copy link
Contributor

@shayne-fletcher has exported this pull request. If you are a Meta employee, you can view the originating diff in D82846206.

shayne-fletcher added a commit to shayne-fletcher/monarch-1 that referenced this pull request Sep 19, 2025
…ch#1282)

Summary:

integrates the `ProcStatus` / `ProcHandle` lifecycle surface into `BootstrapProcManager`. the manager now stores `ProcHandle`s instead of raw `Child`s, stamps `Running` on `spawn` when a pid is observable, wires `stdout`/`stderr` to `BOOTSTRAP_LOG_CHANNEL`, and runs an exit monitor that `wait()`s and records `Stopped(code)` or `Killed(signal, core_dumped)` via `ExitStatusExt`. illegal transitions are rejected and the `mark_*`s return `bool` to make that explicit in tests.

adds `BootstrapProcManager::status(&ProcId)` for a snapshot of the current status. documents constructors, `transport()` (Unix), and clarifies `mark_running` as a best-effort OS-level liveness stamp, not a readiness guarantee.

adds a `pid_table` and a `Drop` impl that best-effort `SIGKILL`s recorded pids on `drop`. this replaces relying on `Child.kill_on_drop` now that the exit monitor owns the `Child`.

tests exercise handle transitions, clean exit and `SIGKILL` observed by the monitor, unknown-proc status, the “child already taken” path, pid disappearing once the monitor claims the child, and the fast-exit path that goes `Starting` -> `Stopped` without `Running`. the manager-drop test now tolerates linux zombie semantics by polling `/proc/<pid>/status` and accepting `Z` or disappearance. includes an unsafe justification for `libc::kill` in the `SIGKILL` test.

graceful shutdown (SIGTERM → wait → SIGKILL) remains a TODO. tombstone semantics in children are unchanged for now.

Differential Revision: D82846206
@facebook-github-bot
Copy link
Contributor

@shayne-fletcher has exported this pull request. If you are a Meta employee, you can view the originating diff in D82846206.

shayne-fletcher added a commit to shayne-fletcher/monarch-1 that referenced this pull request Sep 19, 2025
…ch#1282)

Summary:

integrates the `ProcStatus` / `ProcHandle` lifecycle surface into `BootstrapProcManager`. the manager now stores `ProcHandle`s instead of raw `Child`s, stamps `Running` on `spawn` when a pid is observable, wires `stdout`/`stderr` to `BOOTSTRAP_LOG_CHANNEL`, and runs an exit monitor that `wait()`s and records `Stopped(code)` or `Killed(signal, core_dumped)` via `ExitStatusExt`. illegal transitions are rejected and the `mark_*`s return `bool` to make that explicit in tests.

adds `BootstrapProcManager::status(&ProcId)` for a snapshot of the current status. documents constructors, `transport()` (Unix), and clarifies `mark_running` as a best-effort OS-level liveness stamp, not a readiness guarantee.

adds a `pid_table` and a `Drop` impl that best-effort `SIGKILL`s recorded pids on `drop`. this replaces relying on `Child.kill_on_drop` now that the exit monitor owns the `Child`.

tests exercise handle transitions, clean exit and `SIGKILL` observed by the monitor, unknown-proc status, the “child already taken” path, pid disappearing once the monitor claims the child, and the fast-exit path that goes `Starting` -> `Stopped` without `Running`. the manager-drop test now tolerates linux zombie semantics by polling `/proc/<pid>/status` and accepting `Z` or disappearance. includes an unsafe justification for `libc::kill` in the `SIGKILL` test.

graceful shutdown (SIGTERM → wait → SIGKILL) remains a TODO. tombstone semantics in children are unchanged for now.

Differential Revision: D82846206
@facebook-github-bot
Copy link
Contributor

@shayne-fletcher has exported this pull request. If you are a Meta employee, you can view the originating diff in D82846206.

shayne-fletcher added a commit to shayne-fletcher/monarch-1 that referenced this pull request Sep 19, 2025
…ch#1282)

Summary:

integrates the `ProcStatus` / `ProcHandle` lifecycle surface into `BootstrapProcManager`. the manager now stores `ProcHandle`s instead of raw `Child`s, stamps `Running` on `spawn` when a pid is observable, wires `stdout`/`stderr` to `BOOTSTRAP_LOG_CHANNEL`, and runs an exit monitor that `wait()`s and records `Stopped(code)` or `Killed(signal, core_dumped)` via `ExitStatusExt`. illegal transitions are rejected and the `mark_*`s return `bool` to make that explicit in tests.

adds `BootstrapProcManager::status(&ProcId)` for a snapshot of the current status. documents constructors, `transport()` (Unix), and clarifies `mark_running` as a best-effort OS-level liveness stamp, not a readiness guarantee.

adds a `pid_table` and a `Drop` impl that best-effort `SIGKILL`s recorded pids on `drop`. this replaces relying on `Child.kill_on_drop` now that the exit monitor owns the `Child`.

tests exercise handle transitions, clean exit and `SIGKILL` observed by the monitor, unknown-proc status, the “child already taken” path, pid disappearing once the monitor claims the child, and the fast-exit path that goes `Starting` -> `Stopped` without `Running`. the manager-drop test now tolerates linux zombie semantics by polling `/proc/<pid>/status` and accepting `Z` or disappearance. includes an unsafe justification for `libc::kill` in the `SIGKILL` test.

graceful shutdown (SIGTERM → wait → SIGKILL) remains a TODO. tombstone semantics in children are unchanged for now.

Differential Revision: D82846206
@facebook-github-bot
Copy link
Contributor

@shayne-fletcher has exported this pull request. If you are a Meta employee, you can view the originating diff in D82846206.

Summary:

this wires `BootstrapProcManager `up to retain child process handles and drain stdout/stderr so processes don’t block on full pipes. output is sent into tracing for now; we’ll come back to structured log forwarding (like v0) in a later diff. a simple test confirms that dropping the manager tears down its children correctly (`kill_on_drop(true)`), with a /proc check on linux. this gives us safe baseline lifecycle management, with graceful shutdown and other improvements to follow.

Reviewed By: mariusae

Differential Revision: D82753182
Summary:

this replaces the ad-hoc stdout/stderr draining with structured log forwarding, borrowing the v0 model in a lighter form. the proc manager now tees child output into writers that forward over a channel (set in `BOOTSTRAP_LOG_CHANNEL`). on the child side, `boot_v1` starts a `LogForwardActor` + `LogClientActor`, so logs are picked up and aggregated without blocking pipes. a small test shows a fake log making it through forwarder → client → tap. this isn’t full parity with v0’s tailer and global aggregation, but it’s the minimal working building block.

Reviewed By: mariusae

Differential Revision: D82763884
Summary:

this diff lands `ProcStatus` and `ProcHandle` as the lifecycle surface for procs launched under bootstrap. the handle pairs a `ProcId` with the `Child` and tracks status through controlled `mark_*` transitions (starting → running → stopping → stopped/killed/failed). transitions are checked, illegal moves leave the state unchanged, and return values make that explicit in tests.

this establishes the basic lifecycle API; the next step is to integrate these handles into `BootstrapProcManager` so that callers can observe and control proc lifecycles directly.

Reviewed By: mariusae

Differential Revision: D82827662
…ch#1282)

Summary:

integrates the `ProcStatus` / `ProcHandle` lifecycle surface into `BootstrapProcManager`. the manager now stores `ProcHandle`s instead of raw `Child`s, stamps `Running` on `spawn` when a pid is observable, wires `stdout`/`stderr` to `BOOTSTRAP_LOG_CHANNEL`, and runs an exit monitor that `wait()`s and records `Stopped(code)` or `Killed(signal, core_dumped)` via `ExitStatusExt`. illegal transitions are rejected and the `mark_*`s return `bool` to make that explicit in tests.

adds `BootstrapProcManager::status(&ProcId)` for a snapshot of the current status. documents constructors, `transport()` (Unix), and clarifies `mark_running` as a best-effort OS-level liveness stamp, not a readiness guarantee.

adds a `pid_table` and a `Drop` impl that best-effort `SIGKILL`s recorded pids on `drop`. this replaces relying on `Child.kill_on_drop` now that the exit monitor owns the `Child`.

tests exercise handle transitions, clean exit and `SIGKILL` observed by the monitor, unknown-proc status, the “child already taken” path, pid disappearing once the monitor claims the child, and the fast-exit path that goes `Starting` -> `Stopped` without `Running`. the manager-drop test now tolerates linux zombie semantics by polling `/proc/<pid>/status` and accepting `Z` or disappearance. includes an unsafe justification for `libc::kill` in the `SIGKILL` test.

graceful shutdown (SIGTERM → wait → SIGKILL) remains a TODO. tombstone semantics in children are unchanged for now.

Reviewed By: mariusae

Differential Revision: D82846206
@facebook-github-bot
Copy link
Contributor

@shayne-fletcher has exported this pull request. If you are a Meta employee, you can view the originating diff in D82846206.

@facebook-github-bot
Copy link
Contributor

This pull request has been merged in d944cc1.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants