How to use "sparkdl$ SPARK_HOME=/usr/local/lib/spark-2.1.1-bin-hadoop2.7 PYSPARK_PYTHON=python2 SCALA_VERSION=2.11.8 SPARK_VERSION=2.1.1 ./python/run-tests.sh " #60

Open

How to use "sparkdl$ SPARK_HOME=/usr/local/lib/spark-2.1.1-bin-hadoop2.7 PYSPARK_PYTHON=python2 SCALA_VERSION=2.11.8 SPARK_VERSION=2.1.1 ./python/run-tests.sh "#60

Labels

devtool

RayTsui

opened

on Oct 17, 2017

I follow the instructions: download the project and use build/sbt assembly and then I execute the python/run-tests.sh, but it gives me the following info:

List of assembly jars found, the last one will be used:
ls: /Users/lei.cui/Documents/Workspace/DeepLearninginApacheSpark/spark-deep-learning-master/python/../target/scala-2.12/spark-deep-learning-assembly*.jar: No such file or directory

============= Searching for tests in: /Users/lei.cui/Documents/Workspace/DeepLearninginApacheSpark/spark-deep-learning-master/python/tests =============
============= Running the tests in: /Users/lei.cui/Documents/Workspace/DeepLearninginApacheSpark/spark-deep-learning-master/python/tests/graph/test_builder.py =============
/usr/local/opt/python/bin/python2.7: No module named nose

Actually, after sbt building, it produces the scala-2.11/spark-deep-learning-assembly*.jar instead of scala-2.12/spark-deep-learning-assembly*.jar. In addition, I installed the python2 at the /usr/local/bin/python2, why it will have /usr/local/opt/python/bin/python2.7: No module named nose.

RayTsui

Author

Actually, I am not sure how to use the "sparkdl$ SPARK_HOME=/usr/local/lib/spark-2.1.1-bin-hadoop2.7 PYSPARK_PYTHON=python2 SCALA_VERSION=2.11.8 SPARK_VERSION=2.1.1 ./python/run-tests.sh ", can it be executed at the command line, but it will give "sparkdl$: command not found".

allwefantasy

sparkdl$ means your current directory is spark deep learning project . SPARK_HOME is need by pyspark , SCALA_VERSION and SPARK_VERSION are used to locate the spark-deep-learning-assembly*.jar.

./python/run-tests.sh will setup enviroment and find all py in python/tests and run them one by one.

you should run command build/sbt assembly first to make sure assembly jar is ready ,then run SPARK_HOME=/usr/local/lib/spark-2.1.1-bin-hadoop2.7 PYSPARK_PYTHON=python2 SCALA_VERSION=2.11.8 SPARK_VERSION=2.1.1 ./python/run-tests.sh

phi-dbq

Contributor

@RayTsui thank you for reporting the issue!
@allwefantasy thank you for helping out!
In addition, we also have some scripts/sbt-plugins we use to facilitate development process, which we put in #59.
You can try running SPARK_HOME="path/to/your/spark/home/directory" ./bin/totgen.sh which will generate pyspark (.py2.spark.shell, .py3.spark.shell) and spark-shell (.spark.shell) REPLs.

added

Author

@allwefantasy Thanks a lot for your answer, actually, as for the command "SPARK_HOME=/usr/local/lib/spark-2.1.1-bin-hadoop2.7 PYSPARK_PYTHON=python2 SCALA_VERSION=2.11.8 SPARK_VERSION=2.1.1 ./python/run-tests.sh", I have few doubts,

the value for each config is fixed and common to all envs, or I need to set the value based on my current env, because I install spark via "brew install apache-spark" instead of downloading the spark with its dependency hadoop(e.g., spark-2.1.1-bin/hadoop). In addition, version number for scala and spark is also based on my env?
do I need to set env variable "SPARK_HOME=/usr/local/lib/spark-2.1.1-bin-hadoop2.7 PYSPARK_PYTHON=python2 SCALA_VERSION=2.11.8 SPARK_VERSION=2.1.1 " in ~/.bash_profile or I directly run the command "RK_HOME=/usr/local/lib/spark-2.1.1-bin-hadoop2.7 PYSPARK_PYTHON=python2 SCALA_VERSION=2.11.8 SPARK_VERSION=2.1.1 ./python/run-tests.sh" at the prompt.
after tentative attempt, I still came cross the errors above.

if you have some suggestions, It will help me a lot.

RayTsui

Author

@phi-dbq Thanks a lot for your response, I will try to what you refer and give necessary feedback.

allwefantasy

To make sure you have the dependencies in the following list are installed :

# This file should list any python package dependencies.
coverage>=4.4.1
h5py>=2.7.0
keras==2.0.4 # NOTE: this package has only been tested with keras 2.0.4 and may not work with other releases
nose>=1.3.7  # for testing
numpy>=1.11.2
pillow>=4.1.1,<4.2
pygments>=2.2.0
tensorflow==1.3.0
pandas>=0.19.1
six>=1.10.0
kafka-python>=1.3.5
tensorflowonspark>=1.0.5
tensorflow-tensorboard>=0.1.6

Or you can just run command to finish this:

 pip2 install -r python/requirements.txt

2.Just keep PYSPARK_PYTHON=python2 SCALA_VERSION=2.11.8 SPARK_VERSION=2.1.1 no change. As I have mentioned, these envs are just for locating the assembly jar. The only env you should set is SPARK_HOME. I suggest that you should not configure them in .bashrc which may have side effect in your other program.

Run command as the following steps:

step 1:

      build/sbt assembly

then you should find the spark-deep-learning-assembly-0.1.0-spark2.1.jar in your-project/target/scala-2.11.

step 2:

 SCALA_VERSION=2.11.8 SPARK_VERSION=2.1.1 ./python/run-tests.sh

Also,you can specify the target file to run instead of the all the files which almost take 30m. Like this：

 SCALA_VERSION=2.11.8 SPARK_VERSION=2.1.1 ./python/run-tests.sh  /Users/allwefantasy/CSDNWorkSpace/spark-deep-learning/python/tests/transformers/tf_image_test.py

RayTsui

Author

@allwefantasy
Hi, I am really appreciated for your explanation, I understood and repeated again, it moves a lot progress, at least the unit test can cover
Name Stmts Miss Cover

sparkdl/graph/init.py 0 0 100%
sparkdl/graph/utils.py 81 64 21%
sparkdl/image/init.py 0 0 100%
sparkdl/image/imageIO.py 94 66 30%
sparkdl/transformers/init.py 0 0 100%
sparkdl/transformers/keras_utils.py 13 7 46%
sparkdl/transformers/param.py 46 26 43%

TOTAL 234 163 30%

But there still exists some error as follows:

ModuleNotFoundError: No module named 'tensorframes'

I guess that the tensorframes can officially support linux 64, but right now I use the mac OS, is that the issue?

thunterdb

Contributor

Hello @RayTsui , I have no problem using OSX for development purposes.
Can you run first:

build/sbt clean

followed by:

build/sbt assembly

You should see a line that writes:
[info] Including: tensorframes-0.2.9-s_2.11.jar
this indicates that tensorframes is properly included in the assembly jar, and that your problem is rather that the proper assembly cannot be found.

RayTsui

Author

@thunterdb Thanks a lot for your suggestions. I ran the commands, yes I can see the
[info] Including: tensorframes-0.2.8-s_2.11.jar.
And as you said, my issue is about
"List of assembly jars found, the last one will be used:
ls: $DIR/spark-deep-learning-master/python/../target/scala-2.11/spark-deep-learning-assembly*.jar: No such file or directory"

I suppose that all related jars are packaged in spark-deep-learning-assembly*.jar, but my spark-deep-learning-master-assembly-0.1.0-spark2.1.jar is generated at the path
"$DIR/spark-deep-learning-master/target/scala-2.11/spark-deep-learning-master-assembly-0.1.0-spark2.1.jar"
instead of
"$DIR/spark-deep-learning-master/python/../target/scala-2.11/spark-deep-learning-assembly*.jar". And I tried to modified the segment of the run-tests.sh file, but it does not work.

Do you know how to locate the spark-deep-learning-master-assembly-0.1.0-spark2.1.jar?

to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Assignees

No one assigned

Labels

devtool

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

How to use "sparkdl$ SPARK_HOME=/usr/local/lib/spark-2.1.1-bin-hadoop2.7 PYSPARK_PYTHON=python2 SCALA_VERSION=2.11.8 SPARK_VERSION=2.1.1 ./python/run-tests.sh " #60

@allwefantasy
Hi, I am really appreciated for your explanation, I understood and repeated again, it moves a lot progress, at least the unit test can cover
Name Stmts Miss Cover

sparkdl/graph/init.py 0 0 100%
sparkdl/graph/utils.py 81 64 21%
sparkdl/image/init.py 0 0 100%
sparkdl/image/imageIO.py 94 66 30%
sparkdl/transformers/init.py 0 0 100%
sparkdl/transformers/keras_utils.py 13 7 46%
sparkdl/transformers/param.py 46 26 43%

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Participants

How to use "sparkdl$ SPARK_HOME=/usr/local/lib/spark-2.1.1-bin-hadoop2.7 PYSPARK_PYTHON=python2 SCALA_VERSION=2.11.8 SPARK_VERSION=2.1.1 ./python/run-tests.sh " #60

Description

Activity

RayTsui commented on Oct 17, 2017

allwefantasy commented on Oct 18, 2017

phi-dbq commented on Oct 18, 2017

RayTsui commented on Oct 18, 2017

RayTsui commented on Oct 18, 2017

allwefantasy commented on Oct 19, 2017

RayTsui commented on Oct 20, 2017

@allwefantasy Hi, I am really appreciated for your explanation, I understood and repeated again, it moves a lot progress, at least the unit test can cover Name Stmts Miss Cover

sparkdl/graph/init.py 0 0 100% sparkdl/graph/utils.py 81 64 21% sparkdl/image/init.py 0 0 100% sparkdl/image/imageIO.py 94 66 30% sparkdl/transformers/init.py 0 0 100% sparkdl/transformers/keras_utils.py 13 7 46% sparkdl/transformers/param.py 46 26 43%

thunterdb commented on Oct 20, 2017

RayTsui commented on Oct 23, 2017

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Participants

Issue actions

@allwefantasy
Hi, I am really appreciated for your explanation, I understood and repeated again, it moves a lot progress, at least the unit test can cover
Name Stmts Miss Cover

sparkdl/graph/init.py 0 0 100%
sparkdl/graph/utils.py 81 64 21%
sparkdl/image/init.py 0 0 100%
sparkdl/image/imageIO.py 94 66 30%
sparkdl/transformers/init.py 0 0 100%
sparkdl/transformers/keras_utils.py 13 7 46%
sparkdl/transformers/param.py 46 26 43%