-
Notifications
You must be signed in to change notification settings - Fork 115
RDMA support in ANO #819
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
RDMA support in ANO #819
Conversation
Signed-off-by: Cliff Burdick <[email protected]>
Signed-off-by: Cliff Burdick <[email protected]>
Signed-off-by: Cliff Burdick <[email protected]>
Signed-off-by: Cliff Burdick <[email protected]>
… issue Signed-off-by: Cliff Burdick <[email protected]>
Signed-off-by: Cliff Burdick <[email protected]>
applications/adv_networking_bench/adv_networking_bench_rdma_tx_rx.yaml
Outdated
Show resolved
Hide resolved
applications/adv_networking_bench/adv_networking_bench_rdma_tx_rx.yaml
Outdated
Show resolved
Hide resolved
applications/adv_networking_bench/adv_networking_bench_rdma_tx_rx.yaml
Outdated
Show resolved
Hide resolved
applications/adv_networking_bench/adv_networking_bench_rdma_tx_rx.yaml
Outdated
Show resolved
Hide resolved
| - name: data1 | ||
| rdma_mode: client | ||
| rdma_transport_mode: RC | ||
| address: 192.168.11.2 # The address to use, or leave blank for auto-detect |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
leave blank for auto-detect
- RDMA only?
- For client only, or server as well?
- How does that work?
2
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the client it acts like any other socket connection where if the client doesn't specify a source IP then the routing tables dictate which interface is used. By putting the IP here we are telling it specifically which we want.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Update docs to indicate that
addresssupportsIPbut only for RDMA, and autodetect only works or client. - Can we check the failure when passing an IP with non-rdma backends? make sure error message is clear
- Check failure when passing NIC to RDMA backend, make sure error message is clear
applications/adv_networking_bench/adv_networking_bench_rdma_tx_rx.yaml
Outdated
Show resolved
Hide resolved
applications/adv_networking_bench/adv_networking_bench_rdma_tx_rx.yaml
Outdated
Show resolved
Hide resolved
applications/adv_networking_bench/adv_networking_bench_rdma_tx_rx.yaml
Outdated
Show resolved
Hide resolved
applications/adv_networking_bench/adv_networking_bench_rdma_tx_rx.yaml
Outdated
Show resolved
Hide resolved
applications/adv_networking_bench/adv_networking_bench_rdma_tx_rx.yaml
Outdated
Show resolved
Hide resolved
applications/adv_networking_bench/adv_networking_bench_rdma_tx_rx.yaml
Outdated
Show resolved
Hide resolved
applications/adv_networking_bench/adv_networking_bench_rdma_tx_rx.yaml
Outdated
Show resolved
Hide resolved
| if (send_.get()) { | ||
| send_mr_name_ = server_.get() ? "DATA_TX_CPU_SERVER" : "DATA_TX_CPU_CLIENT"; | ||
| } | ||
| if (receive_.get()) { | ||
| receive_mr_name_ = server_.get() ? "DATA_RX_CPU_SERVER" : "DATA_RX_CPU_CLIENT"; | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I'm not mistaking, we've never had to interface directly with the memory regions in the operators when using the other backends. I see it's passed to rdma_set_header. Can ANO not infer the adequate memory region to use looking at the ANO config and the other inputs (port, queue)?
Any change of the memory regions in the yaml config file would require updates in the operator implementation as well, which defeats the purpose of a config file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Other backends referenced the memory region with the port and queue number, and that was tied 1:1 with the memory region. With RDMA you have a little more flexibility in that you can use the same memory region for many different ports and queues if you want. We could allow them to put a port/queue pair, but they would still have to edit the code. Another option is I could just add it to the client and server application config...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
per discussion, switch to port and queue for now for consistency with other backends + ensuring there is no conflicting "binding" between what is in the config and what is written in the app code. Will revisit design for Tx with dynamic memory regions with C++ interface in the future.
agirault
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @cliffburdick.
Please update docs, tests, and CHANGELOG.md 🙏
| * @param conn_id Connection ID | ||
| * @param server True if server, false if client | ||
| */ | ||
| Status get_rx_burst(BurstParams** burst, uintptr_t conn_id, bool server); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
get_rdma_burst ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had it use an overload instead of a different name to keep the API the same. I'm not too convinced either way is better.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i'd say for anyone not super familiar with rdma vs other, seeing the signature only won't make it clear it's for rdma only.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The connection ID only applies to RDMA. I could change it to be a more specific type. Specifically with the RX and TX functions I really wanted to avoid changing the signature otherwise it's very different from the other backends.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we have docs clarify how connection id and server are used?
…_rx.yaml Co-authored-by: Alexis Girault <[email protected]> Signed-off-by: Cliff Burdick <[email protected]>
Co-authored-by: Alexis Girault <[email protected]> Signed-off-by: Cliff Burdick <[email protected]>
Co-authored-by: Alexis Girault <[email protected]> Signed-off-by: Cliff Burdick <[email protected]>
Co-authored-by: Alexis Girault <[email protected]> Signed-off-by: Cliff Burdick <[email protected]>
Co-authored-by: Alexis Girault <[email protected]> Signed-off-by: Cliff Burdick <[email protected]>
applications/adv_networking_bench/adv_networking_bench_rdma_tx_rx.yaml
Outdated
Show resolved
Hide resolved
Signed-off-by: Cliff Burdick <[email protected]>
|
Hi @cliffburdick, could you please resolve the conflicts for this PR and update it with the latest changes on |
|
@bhashemian I can do that, but since I'm really the only person working on the ANO I think we need to merge these much faster after I've tested them. It's a large amount of effort to rebase these months after they haven't been merged. |
That’s fair, @cliffburdick! Thanks for your feedback. We’re working on streamlining the reviewing process to expedite the merging of PRs. |
|
@cliffburdick could you please let me know when are you planning to update this PR? I just want to make sure that we can merge it as soon as possible. Thanks |
Hi Bruce, the PR needs a number of items addressed outside of rebasing. These are captured in some comments above and on slack. I plan to get to it next week since I don't have a system configured to test this on at the moment. |
@cliffburdick that sounds great! Just ping me when this is ready. Thanks |
Adds RDMA support to ANO
Initial support is RC mode only with support for both client and server modes, multiple queues, and multiple threads. API is similar to existing ANO backends. See design document for more details.