Skip to content

How to use "sparkdl$ SPARK_HOME=/usr/local/lib/spark-2.1.1-bin-hadoop2.7 PYSPARK_PYTHON=python2 SCALA_VERSION=2.11.8 SPARK_VERSION=2.1.1 ./python/run-tests.sh " #60

@RayTsui

Description

@RayTsui

I follow the instructions: download the project and use build/sbt assembly and then I execute the python/run-tests.sh, but it gives me the following info:

List of assembly jars found, the last one will be used:
ls: /Users/lei.cui/Documents/Workspace/DeepLearninginApacheSpark/spark-deep-learning-master/python/../target/scala-2.12/spark-deep-learning-assembly*.jar: No such file or directory

============= Searching for tests in: /Users/lei.cui/Documents/Workspace/DeepLearninginApacheSpark/spark-deep-learning-master/python/tests =============
============= Running the tests in: /Users/lei.cui/Documents/Workspace/DeepLearninginApacheSpark/spark-deep-learning-master/python/tests/graph/test_builder.py =============
/usr/local/opt/python/bin/python2.7: No module named nose

Actually, after sbt building, it produces the scala-2.11/spark-deep-learning-assembly*.jar instead of scala-2.12/spark-deep-learning-assembly*.jar. In addition, I installed the python2 at the /usr/local/bin/python2, why it will have /usr/local/opt/python/bin/python2.7: No module named nose.

Activity

RayTsui

RayTsui commented on Oct 17, 2017

@RayTsui
Author

Actually, I am not sure how to use the "sparkdl$ SPARK_HOME=/usr/local/lib/spark-2.1.1-bin-hadoop2.7 PYSPARK_PYTHON=python2 SCALA_VERSION=2.11.8 SPARK_VERSION=2.1.1 ./python/run-tests.sh ", can it be executed at the command line, but it will give "sparkdl$: command not found".

allwefantasy

allwefantasy commented on Oct 18, 2017

@allwefantasy

sparkdl$ means your current directory is spark deep learning project . SPARK_HOME is need by pyspark , SCALA_VERSION and SPARK_VERSION are used to locate the spark-deep-learning-assembly*.jar.

./python/run-tests.sh will setup enviroment and find all py in python/tests and run them one by one.

you should run command build/sbt assembly first to make sure assembly jar is ready ,then run SPARK_HOME=/usr/local/lib/spark-2.1.1-bin-hadoop2.7 PYSPARK_PYTHON=python2 SCALA_VERSION=2.11.8 SPARK_VERSION=2.1.1 ./python/run-tests.sh

phi-dbq

phi-dbq commented on Oct 18, 2017

@phi-dbq
Contributor

@RayTsui thank you for reporting the issue!
@allwefantasy thank you for helping out!
In addition, we also have some scripts/sbt-plugins we use to facilitate development process, which we put in #59.
You can try running SPARK_HOME="path/to/your/spark/home/directory" ./bin/totgen.sh which will generate pyspark (.py2.spark.shell, .py3.spark.shell) and spark-shell (.spark.shell) REPLs.

RayTsui

RayTsui commented on Oct 18, 2017

@RayTsui
Author

@allwefantasy Thanks a lot for your answer, actually, as for the command "SPARK_HOME=/usr/local/lib/spark-2.1.1-bin-hadoop2.7 PYSPARK_PYTHON=python2 SCALA_VERSION=2.11.8 SPARK_VERSION=2.1.1 ./python/run-tests.sh", I have few doubts,

  1. the value for each config is fixed and common to all envs, or I need to set the value based on my current env, because I install spark via "brew install apache-spark" instead of downloading the spark with its dependency hadoop(e.g., spark-2.1.1-bin/hadoop). In addition, version number for scala and spark is also based on my env?

  2. do I need to set env variable "SPARK_HOME=/usr/local/lib/spark-2.1.1-bin-hadoop2.7 PYSPARK_PYTHON=python2 SCALA_VERSION=2.11.8 SPARK_VERSION=2.1.1 " in ~/.bash_profile or I directly run the command "RK_HOME=/usr/local/lib/spark-2.1.1-bin-hadoop2.7 PYSPARK_PYTHON=python2 SCALA_VERSION=2.11.8 SPARK_VERSION=2.1.1 ./python/run-tests.sh" at the prompt.

  3. after tentative attempt, I still came cross the errors above.

if you have some suggestions, It will help me a lot.

RayTsui

RayTsui commented on Oct 18, 2017

@RayTsui
Author

@phi-dbq Thanks a lot for your response, I will try to what you refer and give necessary feedback.

allwefantasy

allwefantasy commented on Oct 19, 2017

@allwefantasy
  1. To make sure you have the dependencies in the following list are installed :
# This file should list any python package dependencies.
coverage>=4.4.1
h5py>=2.7.0
keras==2.0.4 # NOTE: this package has only been tested with keras 2.0.4 and may not work with other releases
nose>=1.3.7  # for testing
numpy>=1.11.2
pillow>=4.1.1,<4.2
pygments>=2.2.0
tensorflow==1.3.0
pandas>=0.19.1
six>=1.10.0
kafka-python>=1.3.5
tensorflowonspark>=1.0.5
tensorflow-tensorboard>=0.1.6

Or you can just run command to finish this:

 pip2 install -r python/requirements.txt

2.Just keep PYSPARK_PYTHON=python2 SCALA_VERSION=2.11.8 SPARK_VERSION=2.1.1 no change. As I have mentioned, these envs are just for locating the assembly jar. The only env you should set is SPARK_HOME. I suggest that you should not configure them in .bashrc which may have side effect in your other program.

  1. Run command as the following steps:

step 1:

      build/sbt assembly

then you should find the spark-deep-learning-assembly-0.1.0-spark2.1.jar in your-project/target/scala-2.11.

step 2:

 SCALA_VERSION=2.11.8 SPARK_VERSION=2.1.1 ./python/run-tests.sh 

Also,you can specify the target file to run instead of the all the files which almost take 30m. Like this:

 SCALA_VERSION=2.11.8 SPARK_VERSION=2.1.1 ./python/run-tests.sh  /Users/allwefantasy/CSDNWorkSpace/spark-deep-learning/python/tests/transformers/tf_image_test.py
RayTsui

RayTsui commented on Oct 20, 2017

@RayTsui
Author

@allwefantasy
Hi, I am really appreciated for your explanation, I understood and repeated again, it moves a lot progress, at least the unit test can cover
Name Stmts Miss Cover

sparkdl/graph/init.py 0 0 100%
sparkdl/graph/utils.py 81 64 21%
sparkdl/image/init.py 0 0 100%
sparkdl/image/imageIO.py 94 66 30%
sparkdl/transformers/init.py 0 0 100%
sparkdl/transformers/keras_utils.py 13 7 46%
sparkdl/transformers/param.py 46 26 43%

TOTAL 234 163 30%

But there still exists some error as follows:

ModuleNotFoundError: No module named 'tensorframes'

I guess that the tensorframes can officially support linux 64, but right now I use the mac OS, is that the issue?

thunterdb

thunterdb commented on Oct 20, 2017

@thunterdb
Contributor

Hello @RayTsui , I have no problem using OSX for development purposes.
Can you run first:

build/sbt clean

followed by:

build/sbt assembly

You should see a line that writes:
[info] Including: tensorframes-0.2.9-s_2.11.jar
this indicates that tensorframes is properly included in the assembly jar, and that your problem is rather that the proper assembly cannot be found.

RayTsui

RayTsui commented on Oct 23, 2017

@RayTsui
Author

@thunterdb Thanks a lot for your suggestions. I ran the commands, yes I can see the
[info] Including: tensorframes-0.2.8-s_2.11.jar.
And as you said, my issue is about
"List of assembly jars found, the last one will be used:
ls: $DIR/spark-deep-learning-master/python/../target/scala-2.11/spark-deep-learning-assembly*.jar: No such file or directory"

I suppose that all related jars are packaged in spark-deep-learning-assembly*.jar, but my spark-deep-learning-master-assembly-0.1.0-spark2.1.jar is generated at the path
"$DIR/spark-deep-learning-master/target/scala-2.11/spark-deep-learning-master-assembly-0.1.0-spark2.1.jar"
instead of
"$DIR/spark-deep-learning-master/python/../target/scala-2.11/spark-deep-learning-assembly*.jar". And I tried to modified the segment of the run-tests.sh file, but it does not work.

Do you know how to locate the spark-deep-learning-master-assembly-0.1.0-spark2.1.jar?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

      Development

      No branches or pull requests

        Participants

        @allwefantasy@thunterdb@RayTsui@phi-dbq

        Issue actions

          How to use "sparkdl$ SPARK_HOME=/usr/local/lib/spark-2.1.1-bin-hadoop2.7 PYSPARK_PYTHON=python2 SCALA_VERSION=2.11.8 SPARK_VERSION=2.1.1 ./python/run-tests.sh " · Issue #60 · databricks/spark-deep-learning