-
Notifications
You must be signed in to change notification settings - Fork 28.6k
[SPARK-52561][PYTHON][INFRA] Upgrade the minimum version of Python to 3.10 #51259
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We would need to fix more places below. Can be done in a separate PR
.github/workflows/build_infra_images_cache.yml: - name: Build and push (PySpark with Python 3.9)
.github/workflows/build_infra_images_cache.yml: - name: Image digest (PySpark with Python 3.9)
.github/workflows/build_python_3.9.yml:name: "Build / Python-only (master, Python 3.9)"
dev/create-release/spark-rm/Dockerfile:# Install Python 3.9
dev/infra/Dockerfile:# Install Python 3.9
dev/spark-test-image/python-309/Dockerfile:LABEL org.opencontainers.image.ref.name="Apache Spark Infra Image For PySpark with Python 3.09"
dev/spark-test-image/python-309/Dockerfile:# Install Python 3.9
dev/spark-test-image/python-309/Dockerfile:# Python deps for Spark Connect
dev/spark-test-image/python-309/Dockerfile:# Install Python 3.9 packages
dev/spark-test-image/python-minimum/Dockerfile:# Install Python 3.9
dev/spark-test-image/python-minimum/Dockerfile:# Install Python 3.9 packages
dev/spark-test-image/python-ps-minimum/Dockerfile:# Install Python 3.9
dev/spark-test-image/python-ps-minimum/Dockerfile:# Install Python 3.9 packages
docs/index.md:Spark runs on Java 17/21, Scala 2.13, Python 3.9+, and R 3.5+ (Deprecated).
docs/rdd-programming-guide.md:Spark {{site.SPARK_VERSION}} works with Python 3.9+. It can use the standard CPython interpreter,
python/docs/source/development/contributing.rst: # Python 3.9+ is required
python/docs/source/development/contributing.rst:With Python 3.9+, pip can be used as below to install and set up the development environment.
python/docs/source/getting_started/install.rst:Python 3.9 and above.
python/docs/source/tutorial/pandas_on_spark/typehints.rst:With Python 3.9+, you can specify the type hints by using pandas instances as follows:
python/packaging/classic/setup.py: "Programming Language :: Python :: 3.9",
python/packaging/client/setup.py: "Programming Language :: Python :: 3.9",
python/packaging/connect/setup.py: "Programming Language :: Python :: 3.9",
python/pyspark/cloudpickle/cloudpickle.py: # "nogil" Python: modified attributes from 3.9
python/pyspark/pandas/typedef/typehints.py:# TODO: Remove this variadic-generic hack by tuple once ww drop Python up to 3.9.
python/pyspark/sql/tests/pandas/test_pandas_udf_grouped_agg.py: # SPARK-30921: We should not pushdown predicates of PythonUDFs through Aggregate.
python/pyspark/sql/udf.py: # Note: Python 3.9.15, Pandas 1.5.2 and PyArrow 10.0.1 are used.
python/run-tests: echo "Python versions prior to 3.9 are not supported."
.github/workflows/build_and_test.yml: python3.9 ./dev/structured_logging_style.py
.github/workflows/build_and_test.yml: python3.9 -m pip install 'flake8==3.9.0' pydata_sphinx_theme 'mypy==0.982' 'pytest==7.1.3' 'pytest-mypy-plugins==1.9.3' numpydoc 'jinja2<3.0.0' 'black==22.6.0'
.github/workflows/build_and_test.yml: python3.9 -m pip install 'pandas-stubs==1.2.0.53' ipython 'grpcio==1.56.0' 'grpc-stubs==1.24.11' 'googleapis-common-protos-stubs==2.2.0'
.github/workflows/build_and_test.yml: run: python3.9 -m pip list
.github/workflows/build_and_test.yml: run: PYTHON_EXECUTABLE=python3.9 ./dev/lint-python
.github/workflows/build_and_test.yml: python3.9 -m pip install 'protobuf==4.25.1' 'mypy-protobuf==3.3.0'
.github/workflows/build_and_test.yml: run: if test -f ./dev/connect-check-protos.py; then PATH=$PATH:$HOME/buf/bin PYTHON_EXECUTABLE=python3.9 ./dev/connect-check-protos.py; fi
.github/workflows/build_and_test.yml: PYSPARK_DRIVER_PYTHON: python3.9
.github/workflows/build_and_test.yml: PYSPARK_PYTHON: python3.9
.github/workflows/build_and_test.yml: python3.9 -m pip install 'sphinx==4.5.0' mkdocs 'pydata_sphinx_theme>=0.13' sphinx-copybutton nbsphinx numpydoc jinja2 markupsafe 'pyzmq<24.0.0' 'sphinxcontrib-applehelp==1.0.4' 'sphinxcontrib-devhelp==1.0.2' 'sphinxcontrib-htmlhelp==2.0.1' 'sphinxcontrib-qthelp==1.0.3' 'sphinxcontrib-serializinghtml==1.1.5'
.github/workflows/build_and_test.yml: python3.9 -m pip install ipython_genutils # See SPARK-38517
.github/workflows/build_and_test.yml: python3.9 -m pip install sphinx_plotly_directive 'numpy>=1.20.0' pyarrow pandas 'plotly<6.0.0'
.github/workflows/build_and_test.yml: python3.9 -m pip install 'docutils<0.18.0' # See SPARK-39421
.github/workflows/build_and_test.yml: run: python3.9 -m pip list
.github/workflows/build_and_test.yml: # We need this link to make sure `python3` points to `python3.9` which contains the prerequisite packages.
.github/workflows/build_and_test.yml: ln -s "$(which python3.9)" "/usr/local/bin/python3"
.github/workflows/build_and_test.yml: pyspark_modules=`cd dev && python3.9 -c "import sparktestsupport.modules as m; print(','.join(m.name for m in m.all_modules if m.name.startswith('pyspark')))"`
.github/workflows/build_infra_images_cache.yml: - 'dev/spark-test-image/python-309/Dockerfile'
.github/workflows/build_infra_images_cache.yml: if: hashFiles('dev/spark-test-image/python-309/Dockerfile') != ''
.github/workflows/build_infra_images_cache.yml: id: docker_build_pyspark_python_309
.github/workflows/build_infra_images_cache.yml: context: ./dev/spark-test-image/python-309/
.github/workflows/build_infra_images_cache.yml: tags: ghcr.io/apache/spark/apache-spark-github-action-image-pyspark-python-309-cache:${{ github.ref_name }}-static
.github/workflows/build_infra_images_cache.yml: cache-from: type=registry,ref=ghcr.io/apache/spark/apache-spark-github-action-image-pyspark-python-309-cache:${{ github.ref_name }}
.github/workflows/build_infra_images_cache.yml: cache-to: type=registry,ref=ghcr.io/apache/spark/apache-spark-github-action-image-pyspark-python-309-cache:${{ github.ref_name }},mode=max
.github/workflows/build_infra_images_cache.yml: if: hashFiles('dev/spark-test-image/python-309/Dockerfile') != ''
.github/workflows/build_infra_images_cache.yml: run: echo ${{ steps.docker_build_pyspark_python_309.outputs.digest }}
.github/workflows/build_python_3.9.yml: "PYSPARK_IMAGE_TO_TEST": "python-309",
.github/workflows/build_python_3.9.yml: "PYTHON_TO_TEST": "python3.9"
.github/workflows/build_python_minimum.yml: "PYTHON_TO_TEST": "python3.9"
.github/workflows/build_python_ps_minimum.yml: "PYTHON_TO_TEST": "python3.9"
README.md:| | [](https://github.com/apache/spark/actions/workflows/build_python_3.9.yml) |
dev/create-release/spark-rm/Dockerfile: python3.9 python3.9-distutils \
dev/create-release/spark-rm/Dockerfile:RUN curl -sS https://bootstrap.pypa.io/get-pip.py | python3.9
dev/create-release/spark-rm/Dockerfile:RUN python3.9 -m pip install --ignore-installed blinker>=1.6.2 # mlflow needs this
dev/create-release/spark-rm/Dockerfile:RUN python3.9 -m pip install --force $BASIC_PIP_PKGS unittest-xml-reporting $CONNECT_PIP_PKGS && \
dev/create-release/spark-rm/Dockerfile: python3.9 -m pip install 'torch<2.6.0' torchvision --index-url https://download.pytorch.org/whl/cpu && \
dev/create-release/spark-rm/Dockerfile: python3.9 -m pip install torcheval && \
dev/create-release/spark-rm/Dockerfile: python3.9 -m pip cache purge
dev/create-release/spark-rm/Dockerfile:RUN python3.9 -m pip install 'sphinx==4.5.0' mkdocs 'pydata_sphinx_theme>=0.13' sphinx-copybutton nbsphinx numpydoc jinja2 markupsafe 'pyzmq<24.0.0' \
dev/create-release/spark-rm/Dockerfile:RUN python3.9 -m pip list
dev/create-release/spark-rm/Dockerfile:RUN ln -s "$(which python3.9)" "/usr/local/bin/python"
dev/create-release/spark-rm/Dockerfile:RUN ln -s "$(which python3.9)" "/usr/local/bin/python3"
dev/infra/Dockerfile: python3.9 python3.9-distutils \
dev/infra/Dockerfile:RUN curl -sS https://bootstrap.pypa.io/get-pip.py | python3.9
dev/infra/Dockerfile:RUN python3.9 -m pip install --ignore-installed blinker>=1.6.2 # mlflow needs this
dev/infra/Dockerfile:RUN python3.9 -m pip install --force $BASIC_PIP_PKGS unittest-xml-reporting $CONNECT_PIP_PKGS && \
dev/infra/Dockerfile: python3.9 -m pip install torch torchvision --index-url https://download.pytorch.org/whl/cpu && \
dev/infra/Dockerfile: python3.9 -m pip install torcheval && \
dev/infra/Dockerfile: python3.9 -m pip cache purge
dev/spark-test-image-util/docs/run-in-container:# We need this link to make sure `python3` points to `python3.9` which contains the prerequisite packages.
dev/spark-test-image-util/docs/run-in-container:ln -s "$(which python3.9)" "/usr/local/bin/python3"
dev/spark-test-image/python-309/Dockerfile: libpython3-dev \
dev/spark-test-image/python-309/Dockerfile: python3.9 \
dev/spark-test-image/python-309/Dockerfile: python3.9-distutils \
dev/spark-test-image/python-309/Dockerfile:RUN curl -sS https://bootstrap.pypa.io/get-pip.py | python3.9
dev/spark-test-image/python-309/Dockerfile:RUN python3.9 -m pip install --ignore-installed blinker>=1.6.2 # mlflow needs this
dev/spark-test-image/python-309/Dockerfile:RUN python3.9 -m pip install --force $BASIC_PIP_PKGS unittest-xml-reporting $CONNECT_PIP_PKGS && \
dev/spark-test-image/python-309/Dockerfile: python3.9 -m pip install 'torch<2.6.0' torchvision --index-url https://download.pytorch.org/whl/cpu && \
dev/spark-test-image/python-309/Dockerfile: python3.9 -m pip install torcheval && \
dev/spark-test-image/python-309/Dockerfile: python3.9 -m pip cache purge
dev/spark-test-image/python-minimum/Dockerfile: python3.9 \
dev/spark-test-image/python-minimum/Dockerfile: python3.9-distutils \
dev/spark-test-image/python-minimum/Dockerfile:RUN curl -sS https://bootstrap.pypa.io/get-pip.py | python3.9
dev/spark-test-image/python-minimum/Dockerfile:RUN python3.9 -m pip install --force $BASIC_PIP_PKGS $CONNECT_PIP_PKGS && \
dev/spark-test-image/python-minimum/Dockerfile: python3.9 -m pip cache purge
dev/spark-test-image/python-ps-minimum/Dockerfile: python3.9 \
dev/spark-test-image/python-ps-minimum/Dockerfile: python3.9-distutils \
dev/spark-test-image/python-ps-minimum/Dockerfile:RUN curl -sS https://bootstrap.pypa.io/get-pip.py | python3.9
dev/spark-test-image/python-ps-minimum/Dockerfile:RUN python3.9 -m pip install --force $BASIC_PIP_PKGS $CONNECT_PIP_PKGS && \
dev/spark-test-image/python-ps-minimum/Dockerfile: python3.9 -m pip cache purge
python/docs/source/development/contributing.rst: conda create --name pyspark-dev-env python=3.9
python/docs/source/getting_started/install.rst: conda install -c conda-forge pyspark # can also add "python=3.9 some_package [etc.]" here
python/packaging/classic/setup.py: python_requires=">=3.9",
python/packaging/client/setup.py: python_requires=">=3.9",
python/packaging/connect/setup.py: python_requires=">=3.9",
Spark in yarn-client mode seems failing |
convert to draft since jobs gets stuck |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for working on this.
What changes were proposed in this pull request?
Upgrade the minimum version of Python to 3.10
Why are the changes needed?
Python 3.9 is reaching its EOL
Does this PR introduce any user-facing change?
yes, doc change
How was this patch tested?
PR builder with upgraded image
Was this patch authored or co-authored using generative AI tooling?
No