Skip to content

[SPARK-52561][PYTHON][INFRA] Upgrade the minimum version of Python to 3.10 #51259

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 7 commits into
base: master
Choose a base branch
from

Conversation

zhengruifeng
Copy link
Contributor

What changes were proposed in this pull request?

Upgrade the minimum version of Python to 3.10

Why are the changes needed?

Python 3.9 is reaching its EOL

Does this PR introduce any user-facing change?

yes, doc change

How was this patch tested?

PR builder with upgraded image

Was this patch authored or co-authored using generative AI tooling?

No

Copy link
Member

@HyukjinKwon HyukjinKwon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We would need to fix more places below. Can be done in a separate PR

.github/workflows/build_infra_images_cache.yml:      - name: Build and push (PySpark with Python 3.9)
.github/workflows/build_infra_images_cache.yml:      - name: Image digest (PySpark with Python 3.9)
.github/workflows/build_python_3.9.yml:name: "Build / Python-only (master, Python 3.9)"
dev/create-release/spark-rm/Dockerfile:# Install Python 3.9
dev/infra/Dockerfile:# Install Python 3.9
dev/spark-test-image/python-309/Dockerfile:LABEL org.opencontainers.image.ref.name="Apache Spark Infra Image For PySpark with Python 3.09"
dev/spark-test-image/python-309/Dockerfile:# Install Python 3.9
dev/spark-test-image/python-309/Dockerfile:# Python deps for Spark Connect
dev/spark-test-image/python-309/Dockerfile:# Install Python 3.9 packages
dev/spark-test-image/python-minimum/Dockerfile:# Install Python 3.9
dev/spark-test-image/python-minimum/Dockerfile:# Install Python 3.9 packages
dev/spark-test-image/python-ps-minimum/Dockerfile:# Install Python 3.9
dev/spark-test-image/python-ps-minimum/Dockerfile:# Install Python 3.9 packages
docs/index.md:Spark runs on Java 17/21, Scala 2.13, Python 3.9+, and R 3.5+ (Deprecated).
docs/rdd-programming-guide.md:Spark {{site.SPARK_VERSION}} works with Python 3.9+. It can use the standard CPython interpreter,
python/docs/source/development/contributing.rst:    # Python 3.9+ is required
python/docs/source/development/contributing.rst:With Python 3.9+, pip can be used as below to install and set up the development environment.
python/docs/source/getting_started/install.rst:Python 3.9 and above.
python/docs/source/tutorial/pandas_on_spark/typehints.rst:With Python 3.9+, you can specify the type hints by using pandas instances as follows:
python/packaging/classic/setup.py:            "Programming Language :: Python :: 3.9",
python/packaging/client/setup.py:            "Programming Language :: Python :: 3.9",
python/packaging/connect/setup.py:            "Programming Language :: Python :: 3.9",
python/pyspark/cloudpickle/cloudpickle.py:        # "nogil" Python: modified attributes from 3.9
python/pyspark/pandas/typedef/typehints.py:# TODO: Remove this variadic-generic hack by tuple once ww drop Python up to 3.9.
python/pyspark/sql/tests/pandas/test_pandas_udf_grouped_agg.py:        # SPARK-30921: We should not pushdown predicates of PythonUDFs through Aggregate.
python/pyspark/sql/udf.py:    # Note: Python 3.9.15, Pandas 1.5.2 and PyArrow 10.0.1 are used.
python/run-tests:  echo "Python versions prior to 3.9 are not supported."
.github/workflows/build_and_test.yml:            python3.9 ./dev/structured_logging_style.py
.github/workflows/build_and_test.yml:        python3.9 -m pip install 'flake8==3.9.0' pydata_sphinx_theme 'mypy==0.982' 'pytest==7.1.3' 'pytest-mypy-plugins==1.9.3' numpydoc 'jinja2<3.0.0' 'black==22.6.0'
.github/workflows/build_and_test.yml:        python3.9 -m pip install 'pandas-stubs==1.2.0.53' ipython 'grpcio==1.56.0' 'grpc-stubs==1.24.11' 'googleapis-common-protos-stubs==2.2.0'
.github/workflows/build_and_test.yml:      run: python3.9 -m pip list
.github/workflows/build_and_test.yml:      run: PYTHON_EXECUTABLE=python3.9 ./dev/lint-python
.github/workflows/build_and_test.yml:        python3.9 -m pip install 'protobuf==4.25.1' 'mypy-protobuf==3.3.0'
.github/workflows/build_and_test.yml:      run: if test -f ./dev/connect-check-protos.py; then PATH=$PATH:$HOME/buf/bin PYTHON_EXECUTABLE=python3.9 ./dev/connect-check-protos.py; fi
.github/workflows/build_and_test.yml:      PYSPARK_DRIVER_PYTHON: python3.9
.github/workflows/build_and_test.yml:      PYSPARK_PYTHON: python3.9
.github/workflows/build_and_test.yml:        python3.9 -m pip install 'sphinx==4.5.0' mkdocs 'pydata_sphinx_theme>=0.13' sphinx-copybutton nbsphinx numpydoc jinja2 markupsafe 'pyzmq<24.0.0' 'sphinxcontrib-applehelp==1.0.4' 'sphinxcontrib-devhelp==1.0.2' 'sphinxcontrib-htmlhelp==2.0.1' 'sphinxcontrib-qthelp==1.0.3' 'sphinxcontrib-serializinghtml==1.1.5'
.github/workflows/build_and_test.yml:        python3.9 -m pip install ipython_genutils # See SPARK-38517
.github/workflows/build_and_test.yml:        python3.9 -m pip install sphinx_plotly_directive 'numpy>=1.20.0' pyarrow pandas 'plotly<6.0.0'
.github/workflows/build_and_test.yml:        python3.9 -m pip install 'docutils<0.18.0' # See SPARK-39421
.github/workflows/build_and_test.yml:      run: python3.9 -m pip list
.github/workflows/build_and_test.yml:        # We need this link to make sure `python3` points to `python3.9` which contains the prerequisite packages.
.github/workflows/build_and_test.yml:        ln -s "$(which python3.9)" "/usr/local/bin/python3"
.github/workflows/build_and_test.yml:          pyspark_modules=`cd dev && python3.9 -c "import sparktestsupport.modules as m; print(','.join(m.name for m in m.all_modules if m.name.startswith('pyspark')))"`
.github/workflows/build_infra_images_cache.yml:    - 'dev/spark-test-image/python-309/Dockerfile'
.github/workflows/build_infra_images_cache.yml:        if: hashFiles('dev/spark-test-image/python-309/Dockerfile') != ''
.github/workflows/build_infra_images_cache.yml:        id: docker_build_pyspark_python_309
.github/workflows/build_infra_images_cache.yml:          context: ./dev/spark-test-image/python-309/
.github/workflows/build_infra_images_cache.yml:          tags: ghcr.io/apache/spark/apache-spark-github-action-image-pyspark-python-309-cache:${{ github.ref_name }}-static
.github/workflows/build_infra_images_cache.yml:          cache-from: type=registry,ref=ghcr.io/apache/spark/apache-spark-github-action-image-pyspark-python-309-cache:${{ github.ref_name }}
.github/workflows/build_infra_images_cache.yml:          cache-to: type=registry,ref=ghcr.io/apache/spark/apache-spark-github-action-image-pyspark-python-309-cache:${{ github.ref_name }},mode=max
.github/workflows/build_infra_images_cache.yml:        if: hashFiles('dev/spark-test-image/python-309/Dockerfile') != ''
.github/workflows/build_infra_images_cache.yml:        run: echo ${{ steps.docker_build_pyspark_python_309.outputs.digest }}
.github/workflows/build_python_3.9.yml:          "PYSPARK_IMAGE_TO_TEST": "python-309",
.github/workflows/build_python_3.9.yml:          "PYTHON_TO_TEST": "python3.9"
.github/workflows/build_python_minimum.yml:          "PYTHON_TO_TEST": "python3.9"
.github/workflows/build_python_ps_minimum.yml:          "PYTHON_TO_TEST": "python3.9"
README.md:|            | [![GitHub Actions Build](https://github.com/apache/spark/actions/workflows/build_python_3.9.yml/badge.svg)](https://github.com/apache/spark/actions/workflows/build_python_3.9.yml)                             |
dev/create-release/spark-rm/Dockerfile:    python3.9 python3.9-distutils \
dev/create-release/spark-rm/Dockerfile:RUN curl -sS https://bootstrap.pypa.io/get-pip.py | python3.9
dev/create-release/spark-rm/Dockerfile:RUN python3.9 -m pip install --ignore-installed blinker>=1.6.2 # mlflow needs this
dev/create-release/spark-rm/Dockerfile:RUN python3.9 -m pip install --force $BASIC_PIP_PKGS unittest-xml-reporting $CONNECT_PIP_PKGS && \
dev/create-release/spark-rm/Dockerfile:    python3.9 -m pip install 'torch<2.6.0' torchvision --index-url https://download.pytorch.org/whl/cpu && \
dev/create-release/spark-rm/Dockerfile:    python3.9 -m pip install torcheval && \
dev/create-release/spark-rm/Dockerfile:    python3.9 -m pip cache purge
dev/create-release/spark-rm/Dockerfile:RUN python3.9 -m pip install 'sphinx==4.5.0' mkdocs 'pydata_sphinx_theme>=0.13' sphinx-copybutton nbsphinx numpydoc jinja2 markupsafe 'pyzmq<24.0.0' \
dev/create-release/spark-rm/Dockerfile:RUN python3.9 -m pip list
dev/create-release/spark-rm/Dockerfile:RUN ln -s "$(which python3.9)" "/usr/local/bin/python"
dev/create-release/spark-rm/Dockerfile:RUN ln -s "$(which python3.9)" "/usr/local/bin/python3"
dev/infra/Dockerfile:    python3.9 python3.9-distutils \
dev/infra/Dockerfile:RUN curl -sS https://bootstrap.pypa.io/get-pip.py | python3.9
dev/infra/Dockerfile:RUN python3.9 -m pip install --ignore-installed blinker>=1.6.2 # mlflow needs this
dev/infra/Dockerfile:RUN python3.9 -m pip install --force $BASIC_PIP_PKGS unittest-xml-reporting $CONNECT_PIP_PKGS && \
dev/infra/Dockerfile:    python3.9 -m pip install torch torchvision --index-url https://download.pytorch.org/whl/cpu && \
dev/infra/Dockerfile:    python3.9 -m pip install torcheval && \
dev/infra/Dockerfile:    python3.9 -m pip cache purge
dev/spark-test-image-util/docs/run-in-container:# We need this link to make sure `python3` points to `python3.9` which contains the prerequisite packages.
dev/spark-test-image-util/docs/run-in-container:ln -s "$(which python3.9)" "/usr/local/bin/python3"
dev/spark-test-image/python-309/Dockerfile:    libpython3-dev \
dev/spark-test-image/python-309/Dockerfile:    python3.9 \
dev/spark-test-image/python-309/Dockerfile:    python3.9-distutils \
dev/spark-test-image/python-309/Dockerfile:RUN curl -sS https://bootstrap.pypa.io/get-pip.py | python3.9
dev/spark-test-image/python-309/Dockerfile:RUN python3.9 -m pip install --ignore-installed blinker>=1.6.2 # mlflow needs this
dev/spark-test-image/python-309/Dockerfile:RUN python3.9 -m pip install --force $BASIC_PIP_PKGS unittest-xml-reporting $CONNECT_PIP_PKGS && \
dev/spark-test-image/python-309/Dockerfile:    python3.9 -m pip install 'torch<2.6.0' torchvision --index-url https://download.pytorch.org/whl/cpu && \
dev/spark-test-image/python-309/Dockerfile:    python3.9 -m pip install torcheval && \
dev/spark-test-image/python-309/Dockerfile:    python3.9 -m pip cache purge
dev/spark-test-image/python-minimum/Dockerfile:    python3.9 \
dev/spark-test-image/python-minimum/Dockerfile:    python3.9-distutils \
dev/spark-test-image/python-minimum/Dockerfile:RUN curl -sS https://bootstrap.pypa.io/get-pip.py | python3.9
dev/spark-test-image/python-minimum/Dockerfile:RUN python3.9 -m pip install --force $BASIC_PIP_PKGS $CONNECT_PIP_PKGS && \
dev/spark-test-image/python-minimum/Dockerfile:    python3.9 -m pip cache purge
dev/spark-test-image/python-ps-minimum/Dockerfile:    python3.9 \
dev/spark-test-image/python-ps-minimum/Dockerfile:    python3.9-distutils \
dev/spark-test-image/python-ps-minimum/Dockerfile:RUN curl -sS https://bootstrap.pypa.io/get-pip.py | python3.9
dev/spark-test-image/python-ps-minimum/Dockerfile:RUN python3.9 -m pip install --force $BASIC_PIP_PKGS $CONNECT_PIP_PKGS && \
dev/spark-test-image/python-ps-minimum/Dockerfile:    python3.9 -m pip cache purge
python/docs/source/development/contributing.rst:    conda create --name pyspark-dev-env python=3.9
python/docs/source/getting_started/install.rst:    conda install -c conda-forge pyspark  # can also add "python=3.9 some_package [etc.]" here
python/packaging/classic/setup.py:        python_requires=">=3.9",
python/packaging/client/setup.py:        python_requires=">=3.9",
python/packaging/connect/setup.py:        python_requires=">=3.9",

@xinrong-meng
Copy link
Member

Spark in yarn-client mode seems failing

@zhengruifeng zhengruifeng marked this pull request as draft June 25, 2025 03:17
@zhengruifeng
Copy link
Contributor Author

convert to draft since jobs gets stuck

Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for working on this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants