You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
How to use "sparkdl$ SPARK_HOME=/usr/local/lib/spark-2.1.1-bin-hadoop2.7 PYSPARK_PYTHON=python2 SCALA_VERSION=2.11.8 SPARK_VERSION=2.1.1 ./python/run-tests.sh " #60
I follow the instructions: download the project and use build/sbt assembly and then I execute the python/run-tests.sh, but it gives me the following info:
List of assembly jars found, the last one will be used:
ls: /Users/lei.cui/Documents/Workspace/DeepLearninginApacheSpark/spark-deep-learning-master/python/../target/scala-2.12/spark-deep-learning-assembly*.jar: No such file or directory
============= Searching for tests in: /Users/lei.cui/Documents/Workspace/DeepLearninginApacheSpark/spark-deep-learning-master/python/tests =============
============= Running the tests in: /Users/lei.cui/Documents/Workspace/DeepLearninginApacheSpark/spark-deep-learning-master/python/tests/graph/test_builder.py =============
/usr/local/opt/python/bin/python2.7: No module named nose
Actually, after sbt building, it produces the scala-2.11/spark-deep-learning-assembly*.jar instead of scala-2.12/spark-deep-learning-assembly*.jar. In addition, I installed the python2 at the /usr/local/bin/python2, why it will have /usr/local/opt/python/bin/python2.7: No module named nose.
Actually, I am not sure how to use the "sparkdl$ SPARK_HOME=/usr/local/lib/spark-2.1.1-bin-hadoop2.7 PYSPARK_PYTHON=python2 SCALA_VERSION=2.11.8 SPARK_VERSION=2.1.1 ./python/run-tests.sh ", can it be executed at the command line, but it will give "sparkdl$: command not found".
sparkdl$ means your current directory is spark deep learning project . SPARK_HOME is need by pyspark , SCALA_VERSION and SPARK_VERSION are used to locate the spark-deep-learning-assembly*.jar.
./python/run-tests.sh will setup enviroment and find all py in python/tests and run them one by one.
you should run command build/sbt assembly first to make sure assembly jar is ready ,then run SPARK_HOME=/usr/local/lib/spark-2.1.1-bin-hadoop2.7 PYSPARK_PYTHON=python2 SCALA_VERSION=2.11.8 SPARK_VERSION=2.1.1 ./python/run-tests.sh
@RayTsui thank you for reporting the issue! @allwefantasy thank you for helping out!
In addition, we also have some scripts/sbt-plugins we use to facilitate development process, which we put in #59.
You can try running SPARK_HOME="path/to/your/spark/home/directory" ./bin/totgen.sh which will generate pyspark (.py2.spark.shell, .py3.spark.shell) and spark-shell (.spark.shell) REPLs.
@allwefantasy Thanks a lot for your answer, actually, as for the command "SPARK_HOME=/usr/local/lib/spark-2.1.1-bin-hadoop2.7 PYSPARK_PYTHON=python2 SCALA_VERSION=2.11.8 SPARK_VERSION=2.1.1 ./python/run-tests.sh", I have few doubts,
the value for each config is fixed and common to all envs, or I need to set the value based on my current env, because I install spark via "brew install apache-spark" instead of downloading the spark with its dependency hadoop(e.g., spark-2.1.1-bin/hadoop). In addition, version number for scala and spark is also based on my env?
do I need to set env variable "SPARK_HOME=/usr/local/lib/spark-2.1.1-bin-hadoop2.7 PYSPARK_PYTHON=python2 SCALA_VERSION=2.11.8 SPARK_VERSION=2.1.1 " in ~/.bash_profile or I directly run the command "RK_HOME=/usr/local/lib/spark-2.1.1-bin-hadoop2.7 PYSPARK_PYTHON=python2 SCALA_VERSION=2.11.8 SPARK_VERSION=2.1.1 ./python/run-tests.sh" at the prompt.
after tentative attempt, I still came cross the errors above.
if you have some suggestions, It will help me a lot.
To make sure you have the dependencies in the following list are installed :
# This file should list any python package dependencies.
coverage>=4.4.1
h5py>=2.7.0
keras==2.0.4 # NOTE: this package has only been tested with keras 2.0.4 and may not work with other releases
nose>=1.3.7 # for testing
numpy>=1.11.2
pillow>=4.1.1,<4.2
pygments>=2.2.0
tensorflow==1.3.0
pandas>=0.19.1
six>=1.10.0
kafka-python>=1.3.5
tensorflowonspark>=1.0.5
tensorflow-tensorboard>=0.1.6
Or you can just run command to finish this:
pip2 install -r python/requirements.txt
2.Just keep PYSPARK_PYTHON=python2 SCALA_VERSION=2.11.8 SPARK_VERSION=2.1.1 no change. As I have mentioned, these envs are just for locating the assembly jar. The only env you should set is SPARK_HOME. I suggest that you should not configure them in .bashrc which may have side effect in your other program.
Run command as the following steps:
step 1:
build/sbt assembly
then you should find the spark-deep-learning-assembly-0.1.0-spark2.1.jar in your-project/target/scala-2.11.
@allwefantasy
Hi, I am really appreciated for your explanation, I understood and repeated again, it moves a lot progress, at least the unit test can cover
Name Stmts Miss Cover
Hello @RayTsui , I have no problem using OSX for development purposes.
Can you run first:
build/sbt clean
followed by:
build/sbt assembly
You should see a line that writes: [info] Including: tensorframes-0.2.9-s_2.11.jar
this indicates that tensorframes is properly included in the assembly jar, and that your problem is rather that the proper assembly cannot be found.
@thunterdb Thanks a lot for your suggestions. I ran the commands, yes I can see the
[info] Including: tensorframes-0.2.8-s_2.11.jar.
And as you said, my issue is about
"List of assembly jars found, the last one will be used:
ls: $DIR/spark-deep-learning-master/python/../target/scala-2.11/spark-deep-learning-assembly*.jar: No such file or directory"
I suppose that all related jars are packaged in spark-deep-learning-assembly*.jar, but my spark-deep-learning-master-assembly-0.1.0-spark2.1.jar is generated at the path
"$DIR/spark-deep-learning-master/target/scala-2.11/spark-deep-learning-master-assembly-0.1.0-spark2.1.jar"
instead of
"$DIR/spark-deep-learning-master/python/../target/scala-2.11/spark-deep-learning-assembly*.jar". And I tried to modified the segment of the run-tests.sh file, but it does not work.
Do you know how to locate the spark-deep-learning-master-assembly-0.1.0-spark2.1.jar?
How to use "sparkdl$ SPARK_HOME=/usr/local/lib/spark-2.1.1-bin-hadoop2.7 PYSPARK_PYTHON=python2 SCALA_VERSION=2.11.8 SPARK_VERSION=2.1.1 ./python/run-tests.sh " · Issue #60 · databricks/spark-deep-learning
Activity
RayTsui commentedon Oct 17, 2017
Actually, I am not sure how to use the "sparkdl$ SPARK_HOME=/usr/local/lib/spark-2.1.1-bin-hadoop2.7 PYSPARK_PYTHON=python2 SCALA_VERSION=2.11.8 SPARK_VERSION=2.1.1 ./python/run-tests.sh ", can it be executed at the command line, but it will give "sparkdl$: command not found".
allwefantasy commentedon Oct 18, 2017
sparkdl$
means your current directory isspark deep learning project
. SPARK_HOME is need by pyspark , SCALA_VERSION and SPARK_VERSION are used to locate thespark-deep-learning-assembly*.jar
../python/run-tests.sh will setup enviroment and find all py in python/tests and run them one by one.
you should run command
build/sbt assembly
first to make sure assembly jar is ready ,then runSPARK_HOME=/usr/local/lib/spark-2.1.1-bin-hadoop2.7 PYSPARK_PYTHON=python2 SCALA_VERSION=2.11.8 SPARK_VERSION=2.1.1 ./python/run-tests.sh
phi-dbq commentedon Oct 18, 2017
@RayTsui thank you for reporting the issue!
@allwefantasy thank you for helping out!
In addition, we also have some scripts/sbt-plugins we use to facilitate development process, which we put in #59.
You can try running
SPARK_HOME="path/to/your/spark/home/directory" ./bin/totgen.sh
which will generate pyspark (.py2.spark.shell
,.py3.spark.shell
) and spark-shell (.spark.shell
) REPLs.RayTsui commentedon Oct 18, 2017
@allwefantasy Thanks a lot for your answer, actually, as for the command "SPARK_HOME=/usr/local/lib/spark-2.1.1-bin-hadoop2.7 PYSPARK_PYTHON=python2 SCALA_VERSION=2.11.8 SPARK_VERSION=2.1.1 ./python/run-tests.sh", I have few doubts,
the value for each config is fixed and common to all envs, or I need to set the value based on my current env, because I install spark via "brew install apache-spark" instead of downloading the spark with its dependency hadoop(e.g., spark-2.1.1-bin/hadoop). In addition, version number for scala and spark is also based on my env?
do I need to set env variable "SPARK_HOME=/usr/local/lib/spark-2.1.1-bin-hadoop2.7 PYSPARK_PYTHON=python2 SCALA_VERSION=2.11.8 SPARK_VERSION=2.1.1 " in ~/.bash_profile or I directly run the command "RK_HOME=/usr/local/lib/spark-2.1.1-bin-hadoop2.7 PYSPARK_PYTHON=python2 SCALA_VERSION=2.11.8 SPARK_VERSION=2.1.1 ./python/run-tests.sh" at the prompt.
after tentative attempt, I still came cross the errors above.
if you have some suggestions, It will help me a lot.
RayTsui commentedon Oct 18, 2017
@phi-dbq Thanks a lot for your response, I will try to what you refer and give necessary feedback.
allwefantasy commentedon Oct 19, 2017
Or you can just run command to finish this:
2.Just keep
PYSPARK_PYTHON=python2 SCALA_VERSION=2.11.8 SPARK_VERSION=2.1.1
no change. As I have mentioned, these envs are just for locating the assembly jar. The only env you should set is SPARK_HOME. I suggest that you should not configure them in .bashrc which may have side effect in your other program.step 1:
then you should find the spark-deep-learning-assembly-0.1.0-spark2.1.jar in your-project/target/scala-2.11.
step 2:
Also,you can specify the target file to run instead of the all the files which almost take 30m. Like this:
RayTsui commentedon Oct 20, 2017
@allwefantasy
Hi, I am really appreciated for your explanation, I understood and repeated again, it moves a lot progress, at least the unit test can cover
Name Stmts Miss Cover
sparkdl/graph/init.py 0 0 100%
sparkdl/graph/utils.py 81 64 21%
sparkdl/image/init.py 0 0 100%
sparkdl/image/imageIO.py 94 66 30%
sparkdl/transformers/init.py 0 0 100%
sparkdl/transformers/keras_utils.py 13 7 46%
sparkdl/transformers/param.py 46 26 43%
TOTAL 234 163 30%
But there still exists some error as follows:
ModuleNotFoundError: No module named 'tensorframes'
I guess that the tensorframes can officially support linux 64, but right now I use the mac OS, is that the issue?
thunterdb commentedon Oct 20, 2017
Hello @RayTsui , I have no problem using OSX for development purposes.
Can you run first:
followed by:
You should see a line that writes:
[info] Including: tensorframes-0.2.9-s_2.11.jar
this indicates that tensorframes is properly included in the assembly jar, and that your problem is rather that the proper assembly cannot be found.
RayTsui commentedon Oct 23, 2017
@thunterdb Thanks a lot for your suggestions. I ran the commands, yes I can see the
[info] Including: tensorframes-0.2.8-s_2.11.jar.
And as you said, my issue is about
"List of assembly jars found, the last one will be used:
ls: $DIR/spark-deep-learning-master/python/../target/scala-2.11/spark-deep-learning-assembly*.jar: No such file or directory"
I suppose that all related jars are packaged in spark-deep-learning-assembly*.jar, but my spark-deep-learning-master-assembly-0.1.0-spark2.1.jar is generated at the path
"$DIR/spark-deep-learning-master/target/scala-2.11/spark-deep-learning-master-assembly-0.1.0-spark2.1.jar"
instead of
"$DIR/spark-deep-learning-master/python/../target/scala-2.11/spark-deep-learning-assembly*.jar". And I tried to modified the segment of the run-tests.sh file, but it does not work.
Do you know how to locate the spark-deep-learning-master-assembly-0.1.0-spark2.1.jar?