Skip to content

Commit 675aceb

Browse files
committed
Add Elasticity Support via NIXL Integration
Co-authored-by: Roey Azran <[email protected]> Co-authored-by: Asaf Schwartz <[email protected]> Signed-off-by: Itay Alroy <[email protected]>
1 parent f0d34aa commit 675aceb

27 files changed

+2377
-335
lines changed

NIXL_README.md

Lines changed: 100 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,100 @@
1+
# DeepEP with NIXL - Build and Setup Guide
2+
3+
## Overview
4+
5+
This guide covers building and running DeepEP with NIXL integration, which enables **elastic scaling capabilities** for dynamic addition and removal of processes (ranks) during runtime.
6+
7+
### Build Dependencies
8+
9+
Follow the build instructions in the [NIXL repository](https://github.com/ai-dynamo/nixl) to install:
10+
- **NIXL** (NVIDIA Inference Xfer Library)
11+
- **UCX** (Unified Communication X)
12+
- **ETCD** and ETCD C++ client library
13+
- **DOCA** (with GPUNetIO)
14+
15+
## Building DeepEP with NIXL
16+
17+
### Step 1: Configure Environment Variables
18+
19+
Edit `scripts/set_env.sh` to match your installation paths and source the environment:
20+
```bash
21+
source scripts/set_env.sh
22+
```
23+
24+
### Step 2: Build DeepEP with NIXL
25+
26+
Edit the paths in `scripts/build.sh` to match your installation paths and build DeepEP using the provided build script:
27+
28+
```bash
29+
./scripts/build.sh
30+
```
31+
32+
**Build output**:
33+
- Compiled library: `build/lib.linux-x86_64-3.10/deep_ep_cpp.cpython-310-x86_64-linux-gnu.so`
34+
35+
## Running Elastic Tests
36+
37+
### Adjust UCX Network Devices
38+
39+
Edit `tests/elastic/elastic.py` / `tests/test_internode.py` to adjust the UCX network devices to match your system:
40+
```python
41+
pxb_nics = ["mlx5_0", "mlx5_3", "mlx5_4", "mlx5_5", "mlx5_6", "mlx5_9", "mlx5_10", "mlx5_11"]
42+
tcp_nics = ',ibp154s0,ibp192s0,ibp206s0,ibp220s0,ibp94s0'
43+
os.environ['UCX_NET_DEVICES'] = f'cuda{local_rank}-{pxb_nics[local_rank]}:1' + tcp_nics
44+
```
45+
46+
**Note**: This is a workaround to force UCX to chose correct network devices on some systems.
47+
48+
### Start ETCD Server
49+
50+
If not already running:
51+
```bash
52+
# Local test (single node)
53+
etcd --listen-client-urls http://127.0.0.1:2379 --advertise-client-urls http://127.0.0.1:2379
54+
55+
# Multi-node setup (on master node)
56+
etcd --listen-client-urls http://0.0.0.0:2379 --advertise-client-urls http://<MASTER_IP>:2379
57+
```
58+
59+
### Set Runtime Environment
60+
61+
```bash
62+
export UCX_LOG_LEVEL=error
63+
export LD_PRELOAD=$DOCA_HOME/lib/x86_64-linux-gnu/libdoca_common.so:$DOCA_HOME/lib/x86_64-linux-gnu/libdoca_gpunetio.so:$DOCA_HOME/lib/x86_64-linux-gnu/libdoca_verbs.so
64+
export LD_LIBRARY_PATH=$UCX_HOME/lib:$LD_LIBRARY_PATH
65+
```
66+
67+
### Run Elastic Scaling Test
68+
69+
#### Single Node (8 ranks, 4→8 expansion):
70+
```bash
71+
python3 tests/elastic/elastic.py \
72+
--plan tests/elastic/single_expansion.json \
73+
--num-processes 8 \
74+
--etcd-server http://127.0.0.1:2379
75+
```
76+
77+
#### Multi-Node Setup:
78+
79+
**Node 1** (will launch the first phase with 4 ranks):
80+
```bash
81+
python3 tests/elastic/elastic.py \
82+
--plan tests/elastic/single_expansion.json \
83+
--num-processes 4 \
84+
```
85+
86+
**Node 2** (will join the second phase with additional 4 ranks):
87+
```bash
88+
python3 tests/elastic/elastic.py \
89+
--plan tests/elastic/single_expansion.json \
90+
--num-processes 4 \
91+
--rank-server $MASTER_IP \
92+
--etcd-server http://$MASTER_IP:2379
93+
```
94+
95+
### Available Test Plans
96+
97+
- `no_expansion.json`: Static 4 ranks (baseline)
98+
- `single_expansion.json`: 4 → 8 ranks (single expansion)
99+
- `double_expansion.json`: 4 → 6 → 8 ranks (two expansions)
100+
- `expansion_contraction.json`: 4 → 8 → 6 ranks (scale up then down)

csrc/config.hpp

Lines changed: 0 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -56,9 +56,7 @@ struct Config {
5656
size_t num_bytes = 0;
5757
num_bytes += num_channels * num_nvl_ranks * (2 * num_rdma_ranks + 3) * sizeof(int);
5858
num_bytes += num_channels * num_nvl_ranks * num_max_nvl_chunked_recv_tokens * hidden_bytes;
59-
#ifndef DISABLE_NVSHMEM
6059
num_bytes += num_channels * num_nvl_ranks * num_max_nvl_chunked_recv_tokens * internode::get_source_meta_bytes();
61-
#endif
6260
num_bytes += num_channels * num_nvl_ranks * num_max_nvl_chunked_recv_tokens * kNumMaxTopK * sizeof(int64_t);
6361
num_bytes += num_channels * num_nvl_ranks * num_max_nvl_chunked_recv_tokens * kNumMaxTopK * sizeof(float);
6462
num_bytes += num_channels * num_nvl_ranks * num_max_nvl_chunked_recv_tokens * kNumMaxScales * sizeof(float);
@@ -67,7 +65,6 @@ struct Config {
6765
}
6866

6967
size_t get_rdma_buffer_size_hint(int64_t hidden_bytes, int num_ranks) const {
70-
#ifndef DISABLE_NVSHMEM
7168
// Legacy mode
7269
if (num_ranks <= NUM_MAX_NVL_PEERS)
7370
return 0;
@@ -91,9 +88,6 @@ struct Config {
9188
num_bytes += num_channels * num_rdma_ranks * num_max_rdma_chunked_recv_tokens * sizeof(int4) * 2;
9289
num_bytes = ((num_bytes + 127) / 128) * 128;
9390
return num_bytes;
94-
#else
95-
EP_HOST_ASSERT(false and "NVSHMEM is disable during compilation");
96-
#endif
9791
}
9892
};
9993

0 commit comments

Comments
 (0)