-
Notifications
You must be signed in to change notification settings - Fork 73
ClassNotFoundException: org.apache.spark.shuffle.rdma.RdmaShuffleManager #36
Description
the spark-defaults.conf of both workers and master are
spark.shuffle.manager org.apache.spark.shuffle.rdma.RdmaShuffleManager
spark.driver.extraClassPath /home/rui/data/spark-rdma-3.1-for-spark-2.4.0-jar-with-dependencies.jar
spark.executor.extraClassPath /home/rui/data/spark-rdma-3.1-for-spark-2.4.0-jar-with-dependencies.jar
and I have placed libdisni.so in /usr/lib.
When I run TeraGen build from spark-terasort,I can run it with --master spark://master:7077 --deploy-mode client,which whole command is
spark-submit --master spark://master:7077 --deploy-mode client --class com.github.ehiggs.spark.terasort.TeraGen /home/rui/software/spark-2.4.0-bin-hadoop2.7/spark-terasort/target/spark-terasort-1.1-SNAPSHOT-jar-with-dependencies.jar 1g hdfs://master:9000/data/terasort_in1g
but failed with ClassNotFoundException while using --master spark://master:7077 --deploy-mode cluster,which whole command is
spark-submit --master spark://master:7077 --deploy-mode cluster --class com.github.ehiggs.spark.terasort.TeraGen /home/rui/software/spark-2.4.0-bin-hadoop2.7/spark-terasort/target/spark-terasort-1.1-SNAPSHOT-jar-with-dependencies.jar 1g hdfs://master:9000/data/terasort_in1g
the error info is
Launch Command: "/home/rui/software/jdk1.8.0_212/bin/java" "-cp" "/home/rui/software/spark-2.4.0-bin-hadoop2.7/conf/:/home/rui/software/spark-2.4.0-bin-hadoop2.7/jars/*" "-Xmx1024M" "-Dspark.executor.extraClassPath=/home/rui/data/spark-rdma-3.1-for-spark-2.4.0-jar-with-dependencies.jar" "-Dspark.driver.supervise=false" "-Dspark.submit.deployMode=cluster" "-Dspark.master=spark://master:7077" "-Dspark.driver.extraClassPath=/home/rui/data/spark-rdma-3.1-for-spark-2.4.0-jar-with-dependencies.jar" "-Dspark.jars=file:/home/rui/software/spark-2.4.0-bin-hadoop2.7/spark-terasort/target/spark-terasort-1.1-SNAPSHOT-jar-with-dependencies.jar" "-Dspark.rpc.askTimeout=10s" "-Dspark.app.name=com.github.ehiggs.spark.terasort.TeraGen" "-Dspark.shuffle.manager=org.apache.spark.shuffle.rdma.RdmaShuffleManager" "org.apache.spark.deploy.worker.DriverWrapper" "spark://[email protected]:43489" "/home/rui/software/spark-2.4.0-bin-hadoop2.7/work/driver-20190930101138-0006/spark-terasort-1.1-SNAPSHOT-jar-with-dependencies.jar" "com.github.ehiggs.spark.terasort.TeraGen" "5g" "hdfs://master:9000/data/terasort_in5g2"
========================================
Exception in thread "main" java.lang.reflect.InvocationTargetException
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.spark.deploy.worker.DriverWrapper$.main(DriverWrapper.scala:65)
at org.apache.spark.deploy.worker.DriverWrapper.main(DriverWrapper.scala)
Caused by: java.lang.ClassNotFoundException: org.apache.spark.shuffle.rdma.RdmaShuffleManager
at java.net.URLClassLoader.findClass(URLClassLoader.java:382)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:348)
at org.apache.spark.util.Utils$.classForName(Utils.scala:238)
at org.apache.spark.SparkEnv$.instantiateClass$1(SparkEnv.scala:259)
at org.apache.spark.SparkEnv$.create(SparkEnv.scala:323)
at org.apache.spark.SparkEnv$.createDriverEnv(SparkEnv.scala:175)
at org.apache.spark.SparkContext.createSparkEnv(SparkContext.scala:257)
at org.apache.spark.SparkContext.<init>(SparkContext.scala:424)
at com.github.ehiggs.spark.terasort.TeraGen$.main(TeraGen.scala:48)
at com.github.ehiggs.spark.terasort.TeraGen.main(TeraGen.scala)
... 6 more
How should I fix it?
What's more, in client deployment , when I generate 50GB data using teragen and use raw spark with no spark-defaults.conf, the transfer speed between master and slave is about 270MB/s. However,when I change my spark-defaults.conf and replace spark.shuffle.manager to org.apache.spark.shuffle.rdma.RdmaShuffleManager, the speed is also 270MB/s.
Is this because I used hdfs storage but it has nothing to do with spark shuffle?
Can you recommed a workload for me to significantly improve completion time when using spark rdma?
Thanks a lot !