Skip to content

verbs: make libibverbs plugins relocatable #1604

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

fwyzard
Copy link

@fwyzard fwyzard commented Apr 30, 2025

Allow overriding the libibverbs configuration directory setting the environment variable VERBS_CONFIG_DIR.

@fwyzard
Copy link
Author

fwyzard commented Apr 30, 2025

The reason for proposing this change is to make it possible to install the binaries in a path different from what was used at build time.

Allow overriding the libibverbs configuration directory setting the
environment variable VERBS_CONFIG_DIR.

Signed-off-by: Andrea Bocci <[email protected]>
@fwyzard fwyzard force-pushed the relocatable_libibverbs_plugins branch from fc0c97f to 0285f75 Compare April 30, 2025 14:13
@jgunthorpe
Copy link
Member

That's not enough though, there are many other paths that get wired into the binary.. Why do you want to do this?

@fwyzard
Copy link
Author

fwyzard commented Apr 30, 2025

The reason is that our software stack is (supposed to be) fully relocatable: we build it once and deploy it on different systems at different paths. Normally we rely on LD_LIBRARY_PATH for finding all shared libraries, but libibverbs complains about not finding the .../etc/libibverbs.d directory - hence the proposal.

The other issue that I had to address was finding the actual libibverbs plugins (mlx5, ...) but for those I added the full path to the driver files (.../etc/libibverbs.d/mlx5.driver).

The difference here is that it's easy to update a path in a text file, but I'd rather not try to "patch" the paths in the library binary.

@fwyzard
Copy link
Author

fwyzard commented Apr 30, 2025

It is possible that this is not enough, so far I've done only a quick test with ibv_devices.

I configured and built the modified rdma-core on one one machine at /data/user/fwyzard/build/build/tmp/BUILDROOT/b68fb9f06cf5f415e08fb05b27e6e34c/opt/cmssw/el8_amd64_gcc12/external/rdma-core/57.0-b68fb9f06cf5f415e08fb05b27e6e34c.

Then I packaged it and installed it on another machine at /data/cmssw/el8_amd64_gcc12/external/rdma-core/57.0-b68fb9f06cf5f415e08fb05b27e6e34c, and added its bin and lib64 subdirectories to the PATH and LD_LIBRARY_PATH.

By default I get

$ ibv_devices
libibverbs: Warning: couldn't open config directory '/data/user/fwyzard/build/build/tmp/BUILDROOT/b68fb9f06cf5f415e08fb05b27e6e34c/opt/cmssw/el8_amd64_gcc12/external/rdma-core/57.0-b68fb9f06cf5f415e08fb05b27e6e34c/etc/libibverbs.d'.
    device                 node GUID
    ------              ----------------

If I set VERBS_CONFIG_DIR I get

$ export VERBS_CONFIG_DIR=/data/cmssw/el8_amd64_gcc12/external/rdma-core/57.0-b68fb9f06cf5f415e08fb05b27e6e34c/etc/libibverbs.d
$ ibv_devices
    device                 node GUID
    ------              ----------------
    mlx5_4              48b02d03008fc360
    mlx5_2              7cc25515d9a7155a
    mlx5_0              506b4b03000ff644
    mlx5_5              48b02d03008fc361
    mlx5_3              7cc25515d9a7155b
    mlx5_1              506b4b03000ff645

@jgunthorpe
Copy link
Member

To be honest, I'd rather give an option to get rid of the loadable libraries than mess with environment variables. The whole plugin thing is an old hold over from long ago and doesn't make a lot of sense now. I think we can just link them all into libibverbs statically and be done with it. Most of the code that would be required to do that should be there already.

@fwyzard
Copy link
Author

fwyzard commented May 6, 2025

Mhm, OK, I don't have any objections to that - it just goes beyond what I can easily implement :-)

@jgunthorpe
Copy link
Member

Maybe you can just use the existing RDMAV_DRIVERS, if you specify an absolute path in that env var it will load the provider from that file? We don't actually need the config files.

@fwyzard
Copy link
Author

fwyzard commented May 6, 2025

Thanks for the suggestion, I will try it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants