-
Notifications
You must be signed in to change notification settings - Fork 28.7k
[SPARK-51647][INFRA] Add a job to guard REPLs: spark-sql and spark-shell #50423
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
fe0ccc8
to
51074c5
Compare
238560b
to
7064bcf
Compare
the python commands work in my local but never work in github action, no sure how to fix.
so I give up and wanna start with non-python shells |
python3.11 -m pip list | ||
- name: Build Spark | ||
run: | | ||
./build/sbt -Phive -Phive-thriftserver clean package |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ultimately, we still need to use Maven to compile and package the Spark Client. Although the result produced by sbt package
can also be used for testing, I think it does not necessarily mean that the Client packaged by Maven will also be healthy and free of issues.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So I believe that the verification of the output from dev/make-distribution.sh
should be more rigorous and convincing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My original thoughts is to protect contributors' daily development.
In case the shell built by sbt
was broken (happened multiple times), it will be very hard to figure out the offending commits.
I think we can also test maven
here if necessary.
uses: actions/setup-python@v5 | ||
with: | ||
python-version: '3.11' | ||
- name: Install dependencies for PySpark |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we are only checking spark-shell and spark-sql, there is no need to install these Python dependencies.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we can remove it, since pyspark doesn't work
- name: Spark SQL | ||
shell: 'script -q -e -c "bash {0}"' | ||
run: | | ||
echo 'SELECT 512 + 512;' | ./bin/spark-sql 2>&1 > spark-sql-repl.log |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FYI, org.apache.spark.sql.hive.thriftserver.CliSuite
We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable. |
What changes were proposed in this pull request?
Add a job to guard REPLs:
Why are the changes needed?
when spark shell is broken, almost all developers will be blocked
Does this PR introduce any user-facing change?
no
How was this patch tested?
ci
https://github.com/zhengruifeng/spark/actions/runs/14120484007/job/39559698116
Was this patch authored or co-authored using generative AI tooling?
no