Skip to content

GPU Direct Storage Not Patching NVME #1829

@schmaustech

Description

@schmaustech

What happened:

In the nic cluster policy there are two env variables that should enable GPUDirect Storage

ENABLE_NFSRDMA which should enable GDS for nvme or NFS

In some cases though there are only nvme drives in a system and the customer only wants NFS so this option was created

ENABLE_NFSRDMA_NO_NVME which enabled NFS only

What we are seeing in 25.7 is that when we use ENABLE_NFSRDMA it does not patch the nvme driver and GDS does not work for nvme.

What you expected to happen:

If nvme is patches properly we should see the following:

cat /proc/kallsyms | grep nvfs_dma
0000000000000000 t nvme_v1_unregister_nvfs_dma_ops [nvme]
0000000000000000 t nvme_v1_register_nvfs_dma_ops [nvme]

How to reproduce it (as minimally and precisely as possible):

Install 25.7 in OpenShift and set ENABLE_NFSRDMA

Anything else we need to know?:

Logs:
Will provide if needed.

Environment:
Dell R760xa
Mellanox CX7 or BF3
OpenShift 4.19

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions