Skip to content
This repository was archived by the owner on Dec 20, 2022. It is now read-only.

Commit a2bdc34

Browse files
authored
Merge pull request #1 from Mellanox/yuvaldeg-patch-1
Update README.md
2 parents e93b525 + aaf9f96 commit a2bdc34

File tree

1 file changed

+77
-0
lines changed

1 file changed

+77
-0
lines changed

README.md

Lines changed: 77 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,77 @@
1+
# SparkRDMA Shuffle Manager Plugin
2+
The SparkRDMA Plugin is a high performance shuffle manager that uses RDMA (instead of TCP) when
3+
performing the shuffle phase of the Spark job.
4+
5+
This open-source project is developed, maintained and supported by [Mellanox Technologies](http://www.mellanox.com).
6+
7+
## Performance results
8+
Example performance speedup for HiBench TeraSort:
9+
![Alt text](https://user-images.githubusercontent.com/20062725/28947340-30d45c6a-7864-11e7-96ea-ca3cf505ce7a.png)
10+
11+
Running TeraSort with SparkRDMA is x1.41 faster than standard Spark (runtime in seconds)
12+
13+
Testbed:
14+
15+
175GB Workload
16+
17+
15 Workers, 2x Intel Xeon E5-2697 v3 @ 2.60GHz, 28 cores per Worker, 256GB RAM, non-flash storage (HDD)
18+
19+
Mellanox ConnectX-4 network adapter with 100GbE RoCE fabric, connected with a Mellanox Spectrum switch
20+
21+
## Runtime requirements
22+
* Apache Spark 2.0.0 (more versions to be supported)
23+
* Java 8
24+
* libdisni 1.2
25+
* An RDMA-supported network, e.g. RoCE or Infiniband
26+
27+
## Build
28+
29+
* Building the SparkRDMA plugin requires [Apache Maven](http://maven.apache.org/) and Java 8
30+
31+
1. Obtain a clone of [SparkRDMA](https://github.com/Mellanox/SparkRDMA)
32+
33+
2. Build the plugin:
34+
```
35+
mvn -DskipTests package
36+
```
37+
38+
3. Obtain a clone of [DiSNI](https://github.com/zrlio/disni) for building libdisni 1.2:
39+
40+
```
41+
git clone https://github.com/zrlio/disni.git
42+
cd disni
43+
git checkout -b v1.2 247fe8abe54c90b450d2a4b0679e59cfa83205f6
44+
```
45+
46+
4. Compile and install only libdisni (the jars are already included in the SparkRDMA plugin):
47+
48+
```
49+
cd libdisni
50+
sh autoprepare.sh
51+
./configure --with-jdk=/path/to/java8/jdk
52+
make
53+
make install
54+
```
55+
5. libdisni **must** be installed on every Spark Master and Worker
56+
57+
## Configuration
58+
59+
* Provide Spark the location of the SparkRDMA plugin jars by using the extraClassPath option. For standalone mode this can
60+
be added to either spark-defaults.conf or any runtime configuration file. For client mode this **must** be added to spark-defaults.conf
61+
62+
```
63+
spark.driver.extraClassPath /path/to/SparkRDMA/target/spark-rdma-1.0-jar-with-dependencies.jar
64+
spark.executor.extraClassPath /path/to/SparkRDMA/target/spark-rdma-1.0-jar-with-dependencies.jar
65+
```
66+
67+
## Running
68+
69+
* To enable and use the SparkRDMA Shuffle Manager plugin, add the following line to either spark-defaults.conf or any runtime configuration file:
70+
71+
```
72+
spark.shuffle.manager org.apache.spark.shuffle.rdma.RdmaShuffleManager
73+
```
74+
75+
## Contributions
76+
77+
Any PR submissions are welcome

0 commit comments

Comments
 (0)