Skip to content

[SPARK-51647][INFRA] Add a job to guard REPLs: spark-sql and spark-shell #50423

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 16 commits into from

Conversation

zhengruifeng
Copy link
Contributor

@zhengruifeng zhengruifeng commented Mar 27, 2025

What changes were proposed in this pull request?

Add a job to guard REPLs:

  1. spark-sql
  2. spark-shell
  3. spark-shell on connect

Why are the changes needed?

when spark shell is broken, almost all developers will be blocked

Does this PR introduce any user-facing change?

no

How was this patch tested?

ci

https://github.com/zhengruifeng/spark/actions/runs/14120484007/job/39559698116

Was this patch authored or co-authored using generative AI tooling?

no

@github-actions github-actions bot added the INFRA label Mar 27, 2025
@zhengruifeng zhengruifeng force-pushed the infra_repl branch 3 times, most recently from fe0ccc8 to 51074c5 Compare March 27, 2025 10:08
@zhengruifeng zhengruifeng changed the title [WIP][INFRA] Add a job to guide spark shell [SPARK-51647][INFRA] Add a job to guide REPLs: spark-sql and spark-shell Mar 28, 2025
@zhengruifeng zhengruifeng marked this pull request as ready for review March 28, 2025 02:09
@zhengruifeng zhengruifeng requested review from HyukjinKwon, dongjoon-hyun, LuciferYang and panbingkun and removed request for HyukjinKwon March 28, 2025 02:09
@zhengruifeng
Copy link
Contributor Author

zhengruifeng commented Mar 28, 2025

the python commands work in my local but never work in github action, no sure how to fix.

echo 'spark.range(512).union(spark.range(512)).count()' | ./bin/pyspark

echo 'spark.range(512).union(spark.range(512)).count()' | ./bin/pyspark --remote local

so I give up and wanna start with non-python shells

python3.11 -m pip list
- name: Build Spark
run: |
./build/sbt -Phive -Phive-thriftserver clean package
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ultimately, we still need to use Maven to compile and package the Spark Client. Although the result produced by sbt package can also be used for testing, I think it does not necessarily mean that the Client packaged by Maven will also be healthy and free of issues.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I believe that the verification of the output from dev/make-distribution.sh should be more rigorous and convincing.

Copy link
Contributor Author

@zhengruifeng zhengruifeng Mar 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My original thoughts is to protect contributors' daily development.
In case the shell built by sbt was broken (happened multiple times), it will be very hard to figure out the offending commits.

I think we can also test maven here if necessary.

uses: actions/setup-python@v5
with:
python-version: '3.11'
- name: Install dependencies for PySpark
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we are only checking spark-shell and spark-sql, there is no need to install these Python dependencies.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we can remove it, since pyspark doesn't work

@zhengruifeng zhengruifeng changed the title [SPARK-51647][INFRA] Add a job to guide REPLs: spark-sql and spark-shell [SPARK-51647][INFRA] Add a job to guard REPLs: spark-sql and spark-shell Mar 28, 2025
- name: Spark SQL
shell: 'script -q -e -c "bash {0}"'
run: |
echo 'SELECT 512 + 512;' | ./bin/spark-sql 2>&1 > spark-sql-repl.log
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI, org.apache.spark.sql.hive.thriftserver.CliSuite

Copy link

We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable.
If you'd like to revive this PR, please reopen it and ask a committer to remove the Stale tag!

@github-actions github-actions bot added the Stale label Jul 10, 2025
@github-actions github-actions bot closed this Jul 11, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants