Skip to content

Problem with RDMAvsTcpBenchmark of DiSNI over SoftRoCE #37

@faramir

Description

@faramir

Hello,

After a hard and long time spent trying to compile Soft-iWARP (without success), I found a module RXE (kernel: 4.9.0-8-amd64) and Soft-RoCE that works without any problems (sudo apt-get -t stretch-backports install rdma-core ibverbs-providers ibverbs-utils libibverbs-dev librdmacm-dev).

I've successfully compiled and installed DiSNI on the virtual machines (Debian on VirtualBox).

The basic, simple example (com.ibm.disni.examples.SendRecv*) works like a charm.
However, I have problem with com.ibm.disni.benchmarks.RDMAvsTcpBenchmark*

Sometimes it finishes as it should, sometimes it hangs. I've added some diagnostic outputs in the classes, and sometimes it hangs after 2000 iterations of the RDMA loop, sometimes after 1942 iterations and so on.
Then I have also changed the value that is being send. Client sends current iteration number, server sends the minus current iteration number. The received value is not always the next expected value (eg. 0, 1, 1, 3, 3, 4, ... or -1, -2, -4, -5, -5, -6, -7, -8, -9, ...)

I does not have a clue, where I should look what causes the problem so I'm unable to find a solution.

Best regards,
Marek

---cut---

server-101$ java -cp disni-2.0-jar-with-dependencies.jar:disni-tests.jar com.ibm.disni.benchmarks.RDMAvsTcpBenchmarkServer -a 192.168.56.101 -k 3000
client-102$ java -cp disni-2.0-jar-with-dependencies.jar:disni-tests.jar com.ibm.disni.benchmarks.RDMAvsTcpBenchmarkClient -a 192.168.56.101 -k 3000

...

jstack prints (GC and other JVM threads ommited) on the server:
"main" #1 prio=5 os_prio=0 tid=0x00007f577800a800 nid=0x773 waiting on condition [0x00007f577fa75000]
   java.lang.Thread.State: WAITING (parking)
        at sun.misc.Unsafe.park(Native Method)
        - parking to wait for  <0x00000000f5c061f0> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
        at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
        at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039)
        at java.util.concurrent.ArrayBlockingQueue.take(ArrayBlockingQueue.java:403)
        at com.ibm.disni.benchmarks.RDMAvsTcpBenchmarkServer.runRDMA(RDMAvsTcpBenchmarkServer.java:116)  # clientEndpoint.getWcEvents().take();
        at com.ibm.disni.benchmarks.RDMAvsTcpBenchmarkServer.launch(RDMAvsTcpBenchmarkServer.java:159)
        at com.ibm.disni.benchmarks.RDMAvsTcpBenchmarkServer.main(RDMAvsTcpBenchmarkServer.java:48)
"Thread-1" #10 prio=5 os_prio=0 tid=0x00007f5778189000 nid=0x784 runnable [0x00007f57575bc000]
   java.lang.Thread.State: RUNNABLE
        at com.ibm.disni.verbs.impl.NativeDispatcher._getCqEvent(Native Method)
        at com.ibm.disni.verbs.impl.RdmaVerbsNat.getCqEvent(RdmaVerbsNat.java:165)
        at com.ibm.disni.verbs.IbvCompChannel.getCqEvent(IbvCompChannel.java:77)
        at com.ibm.disni.RdmaCqProcessor.run(RdmaCqProcessor.java:120)
        at java.lang.Thread.run(Thread.java:748)
"Thread-0" #9 prio=5 os_prio=0 tid=0x00007f57781f2800 nid=0x783 runnable [0x00007f575c1c8000]
   java.lang.Thread.State: RUNNABLE
        at com.ibm.disni.verbs.impl.NativeDispatcher._getCmEvent(Native Method)
        at com.ibm.disni.verbs.impl.RdmaCmNat.getCmEvent(RdmaCmNat.java:193)
        at com.ibm.disni.verbs.RdmaEventChannel.getCmEvent(RdmaEventChannel.java:75)
        at com.ibm.disni.RdmaCmProcessor.run(RdmaCmProcessor.java:68)
        at java.lang.Thread.run(Thread.java:748)

on the client:
"main" #1 prio=5 os_prio=0 tid=0x00007f274000a800 nid=0x748 waiting on condition [0x00007f2749ece000]
   java.lang.Thread.State: WAITING (parking)
        at sun.misc.Unsafe.park(Native Method)
        - parking to wait for  <0x00000000f5c24ab0> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
        at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
        at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039)
        at java.util.concurrent.ArrayBlockingQueue.take(ArrayBlockingQueue.java:403)
        at com.ibm.disni.benchmarks.RDMAvsTcpBenchmarkClient.runRDMA(RDMAvsTcpBenchmarkClient.java:115)  # endpoint.getWcEvents().take();
        at com.ibm.disni.benchmarks.RDMAvsTcpBenchmarkClient.launch(RDMAvsTcpBenchmarkClient.java:156)
        at com.ibm.disni.benchmarks.RDMAvsTcpBenchmarkClient.main(RDMAvsTcpBenchmarkClient.java:46)
"Thread-1" #10 prio=5 os_prio=0 tid=0x00007f2740209800 nid=0x759 runnable [0x00007f26f9bf4000]
   java.lang.Thread.State: RUNNABLE
        at com.ibm.disni.verbs.impl.NativeDispatcher._getCqEvent(Native Method)
        at com.ibm.disni.verbs.impl.RdmaVerbsNat.getCqEvent(RdmaVerbsNat.java:165)
        at com.ibm.disni.verbs.IbvCompChannel.getCqEvent(IbvCompChannel.java:77)
        at com.ibm.disni.RdmaCqProcessor.run(RdmaCqProcessor.java:120)
        at java.lang.Thread.run(Thread.java:748)
"Thread-0" #9 prio=5 os_prio=0 tid=0x00007f2740202000 nid=0x758 runnable [0x00007f26f9cf5000]
   java.lang.Thread.State: RUNNABLE
        at com.ibm.disni.verbs.impl.NativeDispatcher._getCmEvent(Native Method)
        at com.ibm.disni.verbs.impl.RdmaCmNat.getCmEvent(RdmaCmNat.java:193)
        at com.ibm.disni.verbs.RdmaEventChannel.getCmEvent(RdmaEventChannel.java:75)
        at com.ibm.disni.RdmaCmProcessor.run(RdmaCmProcessor.java:68)
        at java.lang.Thread.run(Thread.java:748)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions