-
Notifications
You must be signed in to change notification settings - Fork 51
Signal handler for RemoteProcessAlloc #540
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
This pull request was exported from Phabricator. Differential Revision: D78097380 |
This pull request was exported from Phabricator. Differential Revision: D78097380 |
c049216
to
a623c66
Compare
Summary: Pull Request resolved: pytorch-labs#540 What's going on here: 1. `RemoteProcessAlloc` is instantiated in the client code (ex. https://fburl.com/code/p4t5aewo) 2. `RemoteProcessAlloc::new()` now spawns a signal handler and holds onto the `JoinHandle`. A tx-rx pair is created so that the signal handler task is aware of the addresses of hosts in `RemoteProcessAlloc::host_states` as they are added and removed 3. `RemoteProcessAlloc::host_states` is now wrapped in a struct `HostStates` which contains the tx side and aims to have the same interface as a `HashMap` but sends updates to the map over the tx. When a `RemoteProcessAllocHostState` is inserted, the address and `HostId` is sent over the tx. When a `RemoteProcessAllocHostState` is removed, the `HostId` is sent over the tx (address is None). 4. When the handler receives a `HostId` and `Some(ChannelAddr)` it will dial this address, and insert the `ChannelTx` into it's own `HashMap` with the `HostId` as the key 5. When the handler receives a `HostId` and `None`, it will remove the corresponding entry from it's `HashMap` 6. When the handler receives a signal, it will iterate over all `ChannelTx`s in the `HashMap` and send `RemoteProcessAllocatorMessage::Signal(signal)` over each `ChannelTx` to the `RemoteProcessAllocator` running on a remote machine 7. The`RemoteProcessAllocator` receives the message. If the signal == SIGINT, it calls `ensure_previous_alloc_stopped` to stop gracefully, then reraises the signal Reviewed By: moonli Differential Revision: D78097380
This pull request was exported from Phabricator. Differential Revision: D78097380 |
Summary: Pull Request resolved: pytorch-labs#540 What's going on here: 1. `RemoteProcessAlloc` is instantiated in the client code (ex. https://fburl.com/code/p4t5aewo) 2. `RemoteProcessAlloc::new()` now spawns a signal handler and holds onto the `JoinHandle`. A tx-rx pair is created so that the signal handler task is aware of the addresses of hosts in `RemoteProcessAlloc::host_states` as they are added and removed 3. `RemoteProcessAlloc::host_states` is now wrapped in a struct `HostStates` which contains the tx side and aims to have the same interface as a `HashMap` but sends updates to the map over the tx. When a `RemoteProcessAllocHostState` is inserted, the address and `HostId` is sent over the tx. When a `RemoteProcessAllocHostState` is removed, the `HostId` is sent over the tx (address is None). 4. When the handler receives a `HostId` and `Some(ChannelAddr)` it will dial this address, and insert the `ChannelTx` into it's own `HashMap` with the `HostId` as the key 5. When the handler receives a `HostId` and `None`, it will remove the corresponding entry from it's `HashMap` 6. When the handler receives a signal, it will iterate over all `ChannelTx`s in the `HashMap` and send `RemoteProcessAllocatorMessage::Signal(signal)` over each `ChannelTx` to the `RemoteProcessAllocator` running on a remote machine 7. The`RemoteProcessAllocator` receives the message. If the signal == SIGINT, it calls `ensure_previous_alloc_stopped` to stop gracefully, then reraises the signal Reviewed By: moonli Differential Revision: D78097380
7386ca5
to
4a12d9e
Compare
Summary: What's going on here: 1. `RemoteProcessAlloc` is instantiated in the client code (ex. https://fburl.com/code/p4t5aewo) 2. `RemoteProcessAlloc::new()` now spawns a signal handler and holds onto the `JoinHandle`. A tx-rx pair is created so that the signal handler task is aware of the addresses of hosts in `RemoteProcessAlloc::host_states` as they are added and removed 3. `RemoteProcessAlloc::host_states` is now wrapped in a struct `HostStates` which contains the tx side and aims to have the same interface as a `HashMap` but sends updates to the map over the tx. When a `RemoteProcessAllocHostState` is inserted, the address and `HostId` is sent over the tx. When a `RemoteProcessAllocHostState` is removed, the `HostId` is sent over the tx (address is None). 4. When the handler receives a `HostId` and `Some(ChannelAddr)` it will dial this address, and insert the `ChannelTx` into it's own `HashMap` with the `HostId` as the key 5. When the handler receives a `HostId` and `None`, it will remove the corresponding entry from it's `HashMap` 6. When the handler receives a signal, it will iterate over all `ChannelTx`s in the `HashMap` and send `RemoteProcessAllocatorMessage::Signal(signal)` over each `ChannelTx` to the `RemoteProcessAllocator` running on a remote machine 7. The`RemoteProcessAllocator` receives the message. If the signal == SIGINT, it calls `ensure_previous_alloc_stopped` to stop gracefully, then reraises the signal Reviewed By: moonli Differential Revision: D78097380
This pull request was exported from Phabricator. Differential Revision: D78097380 |
Summary: Pull Request resolved: pytorch-labs#540 What's going on here: 1. `RemoteProcessAlloc` is instantiated in the client code (ex. https://fburl.com/code/p4t5aewo) 2. `RemoteProcessAlloc::new()` now spawns a signal handler and holds onto the `JoinHandle`. A tx-rx pair is created so that the signal handler task is aware of the addresses of hosts in `RemoteProcessAlloc::host_states` as they are added and removed 3. `RemoteProcessAlloc::host_states` is now wrapped in a struct `HostStates` which contains the tx side and aims to have the same interface as a `HashMap` but sends updates to the map over the tx. When a `RemoteProcessAllocHostState` is inserted, the address and `HostId` is sent over the tx. When a `RemoteProcessAllocHostState` is removed, the `HostId` is sent over the tx (address is None). 4. When the handler receives a `HostId` and `Some(ChannelAddr)` it will dial this address, and insert the `ChannelTx` into it's own `HashMap` with the `HostId` as the key 5. When the handler receives a `HostId` and `None`, it will remove the corresponding entry from it's `HashMap` 6. When the handler receives a signal, it will iterate over all `ChannelTx`s in the `HashMap` and send `RemoteProcessAllocatorMessage::Signal(signal)` over each `ChannelTx` to the `RemoteProcessAllocator` running on a remote machine 7. The`RemoteProcessAllocator` receives the message. If the signal == SIGINT, it calls `ensure_previous_alloc_stopped` to stop gracefully, then reraises the signal Reviewed By: moonli Differential Revision: D78097380
4a12d9e
to
8620240
Compare
Summary: What's going on here: 1. `RemoteProcessAlloc` is instantiated in the client code (ex. https://fburl.com/code/p4t5aewo) 2. `RemoteProcessAlloc::new()` now spawns a signal handler and holds onto the `JoinHandle`. A tx-rx pair is created so that the signal handler task is aware of the addresses of hosts in `RemoteProcessAlloc::host_states` as they are added and removed 3. `RemoteProcessAlloc::host_states` is now wrapped in a struct `HostStates` which contains the tx side and aims to have the same interface as a `HashMap` but sends updates to the map over the tx. When a `RemoteProcessAllocHostState` is inserted, the address and `HostId` is sent over the tx. When a `RemoteProcessAllocHostState` is removed, the `HostId` is sent over the tx (address is None). 4. When the handler receives a `HostId` and `Some(ChannelAddr)` it will dial this address, and insert the `ChannelTx` into it's own `HashMap` with the `HostId` as the key 5. When the handler receives a `HostId` and `None`, it will remove the corresponding entry from it's `HashMap` 6. When the handler receives a signal, it will iterate over all `ChannelTx`s in the `HashMap` and send `RemoteProcessAllocatorMessage::Signal(signal)` over each `ChannelTx` to the `RemoteProcessAllocator` running on a remote machine 7. The`RemoteProcessAllocator` receives the message. If the signal == SIGINT, it calls `ensure_previous_alloc_stopped` to stop gracefully, then reraises the signal Reviewed By: moonli Differential Revision: D78097380
8620240
to
8cd728c
Compare
This pull request was exported from Phabricator. Differential Revision: D78097380 |
Summary: Pull Request resolved: pytorch-labs#540 What's going on here: 1. `RemoteProcessAlloc` is instantiated in the client code (ex. https://fburl.com/code/p4t5aewo) 2. `RemoteProcessAlloc::new()` now spawns a signal handler and holds onto the `JoinHandle`. A tx-rx pair is created so that the signal handler task is aware of the addresses of hosts in `RemoteProcessAlloc::host_states` as they are added and removed 3. `RemoteProcessAlloc::host_states` is now wrapped in a struct `HostStates` which contains the tx side and aims to have the same interface as a `HashMap` but sends updates to the map over the tx. When a `RemoteProcessAllocHostState` is inserted, the address and `HostId` is sent over the tx. When a `RemoteProcessAllocHostState` is removed, the `HostId` is sent over the tx (address is None). 4. When the handler receives a `HostId` and `Some(ChannelAddr)` it will dial this address, and insert the `ChannelTx` into it's own `HashMap` with the `HostId` as the key 5. When the handler receives a `HostId` and `None`, it will remove the corresponding entry from it's `HashMap` 6. When the handler receives a signal, it will iterate over all `ChannelTx`s in the `HashMap` and send `RemoteProcessAllocatorMessage::Signal(signal)` over each `ChannelTx` to the `RemoteProcessAllocator` running on a remote machine 7. The`RemoteProcessAllocator` receives the message. If the signal == SIGINT, it calls `ensure_previous_alloc_stopped` to stop gracefully, then reraises the signal Reviewed By: moonli Differential Revision: D78097380
8cd728c
to
6919663
Compare
Summary: What's going on here: 1. `RemoteProcessAlloc` is instantiated in the client code (ex. https://fburl.com/code/p4t5aewo) 2. `RemoteProcessAlloc::new()` now spawns a signal handler and holds onto the `JoinHandle`. A tx-rx pair is created so that the signal handler task is aware of the addresses of hosts in `RemoteProcessAlloc::host_states` as they are added and removed 3. `RemoteProcessAlloc::host_states` is now wrapped in a struct `HostStates` which contains the tx side and aims to have the same interface as a `HashMap` but sends updates to the map over the tx. When a `RemoteProcessAllocHostState` is inserted, the address and `HostId` is sent over the tx. When a `RemoteProcessAllocHostState` is removed, the `HostId` is sent over the tx (address is None). 4. When the handler receives a `HostId` and `Some(ChannelAddr)` it will dial this address, and insert the `ChannelTx` into it's own `HashMap` with the `HostId` as the key 5. When the handler receives a `HostId` and `None`, it will remove the corresponding entry from it's `HashMap` 6. When the handler receives a signal, it will iterate over all `ChannelTx`s in the `HashMap` and send `RemoteProcessAllocatorMessage::Signal(signal)` over each `ChannelTx` to the `RemoteProcessAllocator` running on a remote machine 7. The`RemoteProcessAllocator` receives the message. If the signal == SIGINT, it calls `ensure_previous_alloc_stopped` to stop gracefully, then reraises the signal Reviewed By: moonli Differential Revision: D78097380
6919663
to
4ab51d9
Compare
This pull request was exported from Phabricator. Differential Revision: D78097380 |
Summary: What's going on here: 1. `RemoteProcessAlloc` is instantiated in the client code (ex. https://fburl.com/code/p4t5aewo) 2. `RemoteProcessAlloc::new()` now spawns a signal handler and holds onto the `JoinHandle`. A tx-rx pair is created so that the signal handler task is aware of the addresses of hosts in `RemoteProcessAlloc::host_states` as they are added and removed 3. `RemoteProcessAlloc::host_states` is now wrapped in a struct `HostStates` which contains the tx side and aims to have the same interface as a `HashMap` but sends updates to the map over the tx. When a `RemoteProcessAllocHostState` is inserted, the address and `HostId` is sent over the tx. When a `RemoteProcessAllocHostState` is removed, the `HostId` is sent over the tx (address is None). 4. When the handler receives a `HostId` and `Some(ChannelAddr)` it will dial this address, and insert the `ChannelTx` into it's own `HashMap` with the `HostId` as the key 5. When the handler receives a `HostId` and `None`, it will remove the corresponding entry from it's `HashMap` 6. When the handler receives a signal, it will iterate over all `ChannelTx`s in the `HashMap` and send `RemoteProcessAllocatorMessage::Signal(signal)` over each `ChannelTx` to the `RemoteProcessAllocator` running on a remote machine 7. The`RemoteProcessAllocator` receives the message. If the signal == SIGINT, it calls `ensure_previous_alloc_stopped` to stop gracefully, then reraises the signal Reviewed By: moonli Differential Revision: D78097380
Summary: What's going on here: 1. `RemoteProcessAlloc` is instantiated in the client code (ex. https://fburl.com/code/p4t5aewo) 2. `RemoteProcessAlloc::new()` now spawns a signal handler and holds onto the `JoinHandle`. A tx-rx pair is created so that the signal handler task is aware of the addresses of hosts in `RemoteProcessAlloc::host_states` as they are added and removed 3. `RemoteProcessAlloc::host_states` is now wrapped in a struct `HostStates` which contains the tx side and aims to have the same interface as a `HashMap` but sends updates to the map over the tx. When a `RemoteProcessAllocHostState` is inserted, the address and `HostId` is sent over the tx. When a `RemoteProcessAllocHostState` is removed, the `HostId` is sent over the tx (address is None). 4. When the handler receives a `HostId` and `Some(ChannelAddr)` it will dial this address, and insert the `ChannelTx` into it's own `HashMap` with the `HostId` as the key 5. When the handler receives a `HostId` and `None`, it will remove the corresponding entry from it's `HashMap` 6. When the handler receives a signal, it will iterate over all `ChannelTx`s in the `HashMap` and send `RemoteProcessAllocatorMessage::Signal(signal)` over each `ChannelTx` to the `RemoteProcessAllocator` running on a remote machine 7. The`RemoteProcessAllocator` receives the message. If the signal == SIGINT, it calls `ensure_previous_alloc_stopped` to stop gracefully, then reraises the signal Reviewed By: moonli Differential Revision: D78097380
Summary: What's going on here: 1. `RemoteProcessAlloc` is instantiated in the client code (ex. https://fburl.com/code/p4t5aewo) 2. `RemoteProcessAlloc::new()` now spawns a signal handler and holds onto the `JoinHandle`. A tx-rx pair is created so that the signal handler task is aware of the addresses of hosts in `RemoteProcessAlloc::host_states` as they are added and removed 3. `RemoteProcessAlloc::host_states` is now wrapped in a struct `HostStates` which contains the tx side and aims to have the same interface as a `HashMap` but sends updates to the map over the tx. When a `RemoteProcessAllocHostState` is inserted, the address and `HostId` is sent over the tx. When a `RemoteProcessAllocHostState` is removed, the `HostId` is sent over the tx (address is None). 4. When the handler receives a `HostId` and `Some(ChannelAddr)` it will dial this address, and insert the `ChannelTx` into it's own `HashMap` with the `HostId` as the key 5. When the handler receives a `HostId` and `None`, it will remove the corresponding entry from it's `HashMap` 6. When the handler receives a signal, it will iterate over all `ChannelTx`s in the `HashMap` and send `RemoteProcessAllocatorMessage::Signal(signal)` over each `ChannelTx` to the `RemoteProcessAllocator` running on a remote machine 7. The`RemoteProcessAllocator` receives the message. If the signal == SIGINT, it calls `ensure_previous_alloc_stopped` to stop gracefully, then reraises the signal Reviewed By: moonli Differential Revision: D78097380
Summary: What's going on here: 1. `RemoteProcessAlloc` is instantiated in the client code (ex. https://fburl.com/code/p4t5aewo) 2. `RemoteProcessAlloc::new()` now spawns a signal handler and holds onto the `JoinHandle`. A tx-rx pair is created so that the signal handler task is aware of the addresses of hosts in `RemoteProcessAlloc::host_states` as they are added and removed 3. `RemoteProcessAlloc::host_states` is now wrapped in a struct `HostStates` which contains the tx side and aims to have the same interface as a `HashMap` but sends updates to the map over the tx. When a `RemoteProcessAllocHostState` is inserted, the address and `HostId` is sent over the tx. When a `RemoteProcessAllocHostState` is removed, the `HostId` is sent over the tx (address is None). 4. When the handler receives a `HostId` and `Some(ChannelAddr)` it will dial this address, and insert the `ChannelTx` into it's own `HashMap` with the `HostId` as the key 5. When the handler receives a `HostId` and `None`, it will remove the corresponding entry from it's `HashMap` 6. When the handler receives a signal, it will iterate over all `ChannelTx`s in the `HashMap` and send `RemoteProcessAllocatorMessage::Signal(signal)` over each `ChannelTx` to the `RemoteProcessAllocator` running on a remote machine 7. The`RemoteProcessAllocator` receives the message. If the signal == SIGINT, it calls `ensure_previous_alloc_stopped` to stop gracefully, then reraises the signal Reviewed By: moonli Differential Revision: D78097380
4ab51d9
to
075ec07
Compare
This pull request was exported from Phabricator. Differential Revision: D78097380 |
This pull request has been merged in 1c96f6d. |
Summary:
What's going on here:
RemoteProcessAlloc
is instantiated in the client code (ex. https://fburl.com/code/p4t5aewo)RemoteProcessAlloc::new()
now spawns a signal handler and holds onto theJoinHandle
. A tx-rx pair is created so that the signal handler task is aware of the addresses of hosts inRemoteProcessAlloc::host_states
as they are added and removedRemoteProcessAlloc::host_states
is now wrapped in a structHostStates
which contains the tx side and aims to have the same interface as aHashMap
but sends updates to the map over the tx. When aRemoteProcessAllocHostState
is inserted, the address andHostId
is sent over the tx. When aRemoteProcessAllocHostState
is removed, theHostId
is sent over the tx (address is None).HostId
andSome(ChannelAddr)
it will dial this address, and insert theChannelTx
into it's ownHashMap
with theHostId
as the keyHostId
andNone
, it will remove the corresponding entry from it'sHashMap
ChannelTx
s in theHashMap
and sendRemoteProcessAllocatorMessage::Signal(signal)
over eachChannelTx
to theRemoteProcessAllocator
running on a remote machineRemoteProcessAllocator
receives the message. If the signal == SIGINT, it callsensure_previous_alloc_stopped
to stop gracefully, then reraises the signalDifferential Revision: D78097380