-
Notifications
You must be signed in to change notification settings - Fork 28.6k
[SPARK-51338][INFRA] Add automated CI build for connect-examples
#50187
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -92,6 +92,7 @@ jobs: | |
pyspark_pandas_modules=`cd dev && python -c "import sparktestsupport.modules as m; print(','.join(m.name for m in m.all_modules if m.name.startswith('pyspark-pandas')))"` | ||
pyspark=`./dev/is-changed.py -m $pyspark_modules` | ||
pandas=`./dev/is-changed.py -m $pyspark_pandas_modules` | ||
connect_examples=`./dev/is-changed.py -m "connect-examples"` | ||
if [[ "${{ github.repository }}" != 'apache/spark' ]]; then | ||
yarn=`./dev/is-changed.py -m yarn` | ||
kubernetes=`./dev/is-changed.py -m kubernetes` | ||
|
@@ -127,6 +128,7 @@ jobs: | |
\"k8s-integration-tests\" : \"$kubernetes\", | ||
\"buf\" : \"$buf\", | ||
\"ui\" : \"$ui\", | ||
\"connect-examples\": \"$connect_examples\" | ||
}" | ||
echo $precondition # For debugging | ||
# Remove `\n` to avoid "Invalid format" error | ||
|
@@ -1290,3 +1292,35 @@ jobs: | |
cd ui-test | ||
npm install --save-dev | ||
node --experimental-vm-modules node_modules/.bin/jest | ||
|
||
connect-examples-build: | ||
name: "Build modules: server-library-example" | ||
needs: precondition | ||
if: fromJson(needs.precondition.outputs.required).connect-examples == 'true' | ||
runs-on: ubuntu-latest | ||
steps: | ||
- name: Checkout Spark repository | ||
uses: actions/checkout@v4 | ||
with: | ||
fetch-depth: 0 | ||
repository: apache/spark | ||
ref: ${{ inputs.branch }} | ||
|
||
- name: Sync the current branch with the latest in Apache Spark | ||
if: github.repository != 'apache/spark' | ||
run: | | ||
echo "APACHE_SPARK_REF=$(git rev-parse HEAD)" >> $GITHUB_ENV | ||
git fetch https://github.com/$GITHUB_REPOSITORY.git ${GITHUB_REF#refs/heads/} | ||
git -c user.name='Apache Spark Test Account' -c user.email='[email protected]' merge --no-commit --progress --squash FETCH_HEAD | ||
git -c user.name='Apache Spark Test Account' -c user.email='[email protected]' commit -m "Merged commit" --allow-empty | ||
|
||
- name: Set up Java | ||
uses: actions/setup-java@v4 | ||
with: | ||
distribution: zulu | ||
java-version: ${{ inputs.java }} | ||
|
||
- name: Build server-library-example | ||
run: | | ||
cd connect-examples/server-library-example | ||
mvn clean package | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Can we use |
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -36,7 +36,8 @@ | |
<project.build.sourceEncoding>UTF-8</project.build.sourceEncoding> | ||
<scala.binary>2.13</scala.binary> | ||
<scala.version>2.13.15</scala.version> | ||
<protobuf.version>3.25.4</protobuf.version> | ||
<spark.version>4.0.0-preview2</spark.version> | ||
<protobuf.version>4.29.3</protobuf.version> | ||
<spark.version>4.1.0-SNAPSHOT</spark.version> | ||
<connect.guava.version>33.4.0-jre</connect.guava.version> | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The parent of this project should inherit from Spark's parent pom.xml, and the project version should be consistent with Spark's version, then Otherwise, the releasing script seems unable to auto change the project version to the official version during the release process now(4.1.0-SNAPSHOT -> 4.1.0). There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. What if we update the release script and add a rule/command to auto-change the project version in this pom file as well? This way, we can satisfy both continuous build compatibility with Spark and be somewhat independent (modulo the dependency on the ASF snapshot repo). I'd like to avoid inheriting the parent pom as that would lead to the project pulling in Spark's default shading rules, version definitions etc. In this specific case, it wouldn't be favourable as it's intended to demonstrate the extension's development using a minimal set of dependencies (spark-sql-api, spark-connect-client, etc.). There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. If feasible, it's certainly ok. However, I have a few questions regarding this:
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
@vicennial Is there any progress on this pr? I think it would be best if we could resolve this issue in Spark 4.0. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @vicennial Is there any progress on this hypothetical plan? Or can we remove this example module from branch-4.0 first? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Thanks for the questions, @LuciferYang. I had been AFK last week, back now
Some (but not all) dependencies need to be consistent, such as the protobuf version. These would require being updated as the Spark Connect code/deps evolves
Since we've decided to add CI tests, I think it would make sense to a final check at time of release as well.
I am not opposed to a separate branch/repository and I could see it working but I must admit that I do not know the implications or pros/cons of creating a separate repository under ASF. Perhaps the more seasoned committers may know, any idea @hvanhovell / @HyukjinKwon / @cloud-fan ? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
After some consideration, if this project does not want to inherit Spark's parent Another possible approach is to configure the ASF snapshot repository, but in this case, the project will not obtain a timely snapshot but rather a nightly build. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Thanks for the suggestions @LuciferYang , I am exploring the first option atm |
||
</properties> | ||
</project> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we want to run this with Java 21 (and other scheduled builds), we would need to add this in https://github.com/apache/spark/blob/master/.github/workflows/build_java21.yml too as an example, .e.g.,
"connect_examples": "true"