Skip to content

Latest commit

 

History

History
91 lines (56 loc) · 3.16 KB

File metadata and controls

91 lines (56 loc) · 3.16 KB

Testing PySpark

In order to run PySpark tests, you should build Spark itself first via Maven or SBT. For example,

build/mvn -DskipTests clean package
build/sbt -Phive clean package

After that, the PySpark test cases can be run via using python/run-tests. For example,

python/run-tests --python-executable=python3

Note that you may set OBJC_DISABLE_INITIALIZE_FORK_SAFETY environment variable to YES if you are running tests on Mac OS.

Please see the guidance on how to |building_spark|_, run tests for a module, or individual tests.

Running Individual PySpark Tests

You can run a specific test via using python/run-tests, for example, as below:

python/run-tests --testnames pyspark.sql.tests.test_arrow

Please refer to Testing PySpark for more details.

breakpoint() Support in PySpark Tests

To debug a certain test, you can add breakpoint() in the test code, and run the test with python/run-tests as usual. The script will stop at the breakpoint() line and open an interactive pdb debugging session.

Running Tests using GitHub Actions

You can run the full PySpark tests by using GitHub Actions in your own forked GitHub repository with a few clicks. Please refer to Running tests in your forked repository using GitHub Actions for more details.

Running Tests for Spark Connect

Running Tests for Python Client

In order to test the changes in Protobuf definitions, for example, at spark/sql/connect/common/src/main/protobuf/spark/connect, you should regenerate Python Protobuf client first by running dev/connect-gen-protos.sh.

Running PySpark Shell with Python Client

The command below starts Spark Connect server automatically locally, and creates a Spark Connect client connected to the server.

bin/pyspark --remote "local[*]"