CI: Use one Java version for PR checks by ajantha-bhat · Pull Request #16945 · apache/iceberg

ajantha-bhat · 2026-06-24T07:44:42Z

Summary

This updates the PR JVM matrix for the Java, Spark, Flink, Hive, Kafka Connect, and Delta conversion workflows.

For ordinary pull requests, these workflows now run on JDK 17 only. For non-PR runs and PRs labeled full-ci, they continue to run on both JDK 17 and JDK 21.

The JVM policy is expressed inline in each workflow matrix, so this avoids adding a separate planner workflow/job while keeping the behavior easy to inspect in the workflow file.

Rationale

JDK 17 is the lower supported Java baseline and the project bytecode target, so it gives the most useful compatibility signal for regular PR validation while reducing GitHub Actions runner usage.

This does not remove JDK 21 coverage. Main branch/release/tag runs and full-ci PRs still use both supported JVMs.

Testing

actionlint .github/workflows/java-ci.yml .github/workflows/spark-ci.yml .github/workflows/flink-ci.yml .github/workflows/hive-ci.yml .github/workflows/kafka-connect-ci.yml .github/workflows/delta-conversion-ci.yml
zizmor --min-severity medium --min-confidence medium --no-progress .github/workflows/java-ci.yml .github/workflows/spark-ci.yml .github/workflows/flink-ci.yml .github/workflows/hive-ci.yml .github/workflows/kafka-connect-ci.yml .github/workflows/delta-conversion-ci.yml
YAML parse for the updated workflow files
git diff --check HEAD~1..HEAD

AI Disclosure

Model: GPT-5 Codex
Platform/Tool: Codex
Human Oversight: partially reviewed
Prompt Summary: Update Iceberg CI workflows to use one JVM for regular PR checks while preserving full JVM coverage for full CI paths.

Baunsgaard · 2026-06-24T13:41:28Z

Hi @ajantha-bhat,

Inspired by your comments on the mailing lists I took the liberty to split your suggestion into two, and only do the reduction in Java versions to run Ci on. see #16951 .

This makes the change more maintainable while achieving similar gains.

When i look at the current Pr from you that already have split some of the logic, it still feels a bit too verbose, and mixes many parts that could be simpler.

Let me know what you think?

Baunsgaard

First of all, thanks for driving this suggestion from mailing list @ajantha-bhat, and splitting the implementation into multiple PRs to reduce complexity.

Benefits for your suggestions is that centralizing the JVM policy in one place is a real plus, and it lines up with the discussion you have had on the dev-list discussion on cutting the GitHub Actions usage. Two things I'd like to discuss, both in the spirit of that thread's goals (lower runner consumption + shorter queue time):

The plan job costs a runner per workflow. Each uses: ./.github/workflows/jvm-ci-plan.yml runs as its own ubuntu-24.04 job just to emit a JSON string, and every test job needs: plan, so fan-out waits on it. The thread's motivation is the ASF shared-runner ceiling and queue pressure, so adding N short-lived-but-still-scheduled jobs plus a serialization gate works a little against that. The JVM matrix can be derived for free at workflow-setup time from the github context (event + labels), no runner required e.g. in my pr #16951. Would it be worth computing it inline to avoid the extra jobs, even if the expression is repeated per workflow?
Keep a JDK 21 canary on PRs? In the discussion thread the suggested shape was "PRs: ... mostly one JVM, plus targeted canaries", this PR currently drops JDK 21 from PRs entirely. Could we keep one JDK 21 canary on the latest engine version (e.g. Spark 4.1 / Scala 2.13) so there's still a pre-merge JDK 21 signal at ~2 jobs?

Either way I'm glad to close #16951 if people are more in favour of this centralized approach suggested here. I just took a look earlier and thought another approach was perhaps more easily understandable and maintainable. I just wanted to suggest something without indirection and the extra job count that came up.

ajantha-bhat · 2026-06-24T15:11:27Z

@Baunsgaard: My thinking with the reusable planner was to avoid copying the same JVM policy into every workflow and risking drift between Java/Spark/Flink/Hive/Kafka/Delta.

In terms of raw runner minutes, the planner jobs should be tiny compared to the JDK 21 jobs this removes. This PR adds a few short plan jobs, but removes many heavy test/build jobs from the default PR path, so the net runner-minute savings should still be large.

That said, I agree the plan job is not free: it is still a scheduled runner job and it gates fan-out in each workflow. Since the original problem is ASF shared-runner pressure and queueing, I can see why avoiding even small extra jobs is attractive.

So I think the trade-off is:

reusable planner: less duplication and one JVM policy to maintain
inline expression: no extra jobs and no additional fan-out gate

I am gonna update this PR to go with inline expressions instead of a single JVM policy to make it more faster.

Baunsgaard · 2026-06-24T15:15:43Z

Thanks @ajantha-bhat

I like this code much better already! The one thing missing in my opinion is to still run at least one JVM 21 version. This would still nearly cut the CI in half, but still give some indication on the PR if Java 21 is broken!

Generated-by: GPT-5 Codex

ajantha-bhat · 2026-06-24T15:20:31Z

I like this code much better already! The one thing missing in my opinion is to still run at least one JVM 21 version. This would still nearly cut the CI in half, but still give some indication on the PR if Java 21 is broken!

Good point! I actually thought about doing exactly that, but I still worry it might be too expensive for PR runs. Usually, if a build passes on one JDK, it passes on the others—I can't recall a time we had a version-specific failure on a PR. Since we have the safeguard on main, I'd vote to keep it lean for now.

Let's see what the rest of the group thinks. If we have the luxury of extra resources and want that extra peace of mind, we can always turn it on later!

ajantha-bhat · 2026-06-24T15:22:29Z

-          ./gradlew -DsparkVersions=${{ matrix.spark }} -DscalaVersion=${{ matrix.scala }} -DflinkVersions= -DkafkaVersions= \
-            $projects -Pquick=true -x javadoc -DtestParallelism=auto
+          ./gradlew -DsparkVersions="${SPARK_VERSION}" -DscalaVersion="${SCALA_VERSION}" -DflinkVersions= -DkafkaVersions= \
+            "${projects[@]}" -Pquick=true -x javadoc -DtestParallelism=auto


found this using actionlint

"${projects[@]}" is Bash array expansion. Before: projects=":spark-extension:check :spark-runtime:check" ./gradlew ... $projects ... That relies on unquoted word splitting so the string becomes two Gradle arguments. actionlint runs ShellCheck, and ShellCheck flags this as unsafe: unquoted variables can accidentally split/glob. After: projects=( ":spark-extension:check" ":spark-runtime:check" ) ./gradlew ... "${projects[@]}" ... "${projects[@]}" expands each array item as its own safely quoted argument. So Gradle still receives: :spark-extension:check :spark-runtime:check but without unsafe word splitting. So behavior is the same, just cleaner and actionlint-safe.

you could consider not having this as part of the PR, to keep it self contained, and open another one with this change. Just to keep a single responsibility for each PR.

Baunsgaard · 2026-06-24T15:28:17Z

I like this code much better already! The one thing missing in my opinion is to still run at least one JVM 21 version. This would still nearly cut the CI in half, but still give some indication on the PR if Java 21 is broken!

Good point! I actually thought about doing exactly that, but I still worry it might be too expensive for PR runs. Usually, if a build passes on one JDK, it passes on the others—I can't recall a time we had a version-specific failure on a PR. Since we have the safeguard on main, I'd vote to keep it lean for now.

Let's see what the rest of the group thinks. If we have the luxury of extra resources and want that extra peace of mind, we can always turn it on later!

I would still suggest to have that one Java21 run. The difference in CI usage is minimal (12.5%), and it is better to have some of the coverage than none. I have seen now many times that something breaks in other versions if not tested.

manuzhang · 2026-06-25T02:09:40Z

      max-parallel: 15
      matrix:
-        jvm: [17, 21]
+        jvm: ${{ fromJson((github.event_name == 'pull_request' && contains(github.event.pull_request.labels.*.name, 'full-ci') == false) && '[17]' || '[17,21]') }}


I'm not sure whether it's easy to read and maintain with this format. Is there a way to separate configs from logic?

If you see my initial commit before amending it,
f840149

It had clean planner job but it comes with the queue overhead.
GitHub Actions matrix limitations force a trade-off. Since we intentionally removed the planner job to avoid queue overhead, the inline expression is the smallest no-extra-job version.

If you have any other suggestions, it is welcome.

Might be worth trying. Can we keep matrix.jvm and use matrix.exclude to just keep the first option for pull_request?

There are some options if you want:

Option 1: across lines with the '>-' prepend. It does not move the logic out but it splits the line at least.

matrix: # PRs run JDK 17 only; push/tag and 'full-ci'-labeled PRs run 17 + 21. jvm: >- ${{ fromJson( (github.event_name == 'pull_request' && !contains(github.event.pull_request.labels.*.name, 'full-ci')) && '[17]' || '[17,21]' ) }}

Option 2: Use exclude:

matrix: jvm: [17, 21] # On PRs without the 'full-ci' label, drop JDK 21. exclude: - jvm: >- ${{ (github.event_name == 'pull_request' && !contains(github.event.pull_request.labels.*.name, 'full-ci')) && 21 || 0 }}

Alternatively as @ajantha-bhat says, the original PR setup the planner job, but it actively increase the number of actions run. I lean towards what it is currently with a oneliner, but Option 1 where the statement is split seems fine too.

exclude is more confusing IMO (option 2), hence I didn't use it initially.

I can go with option 1 if at all required. It is just a line separation.
For me it looks good as it is since we don't have other options.

ajantha-bhat · 2026-06-25T13:59:30Z

@kevinjqliu, @nastra: Can we please take a look at this?

manuzhang · 2026-06-26T07:25:12Z

      max-parallel: 15
      matrix:
-        jvm: [17, 21]
+        jvm: ${{ fromJson((github.event_name == 'pull_request' && contains(github.event.pull_request.labels.*.name, 'full-ci') == false) && '[17]' || '[17,21]') }}


Will add full-ci label ever trigger java 21 CI given the default PR activity types are opened, synchronize and reopened?

As mentioned in the mailing list. There will be follow up PR for full-ci handling and remaining things.
Draft branch for full-ci handling can be seen here
ajantha-bhat@82f4591

It handles these cases too.

github-actions Bot added the INFRA label Jun 24, 2026

ajantha-bhat force-pushed the codex/ci-java-pr-jdk17 branch from f840149 to 8968e52 Compare June 24, 2026 08:04

This was referenced Jun 24, 2026

CI: Select Spark PR matrix by changed version #16946

Open

CI: Select Flink PR matrix by changed version #16947

Open

ajantha-bhat force-pushed the codex/ci-java-pr-jdk17 branch 2 times, most recently from 1655715 to 06419a6 Compare June 24, 2026 08:47

ajantha-bhat requested review from kevinjqliu and manuzhang June 24, 2026 13:29

Baunsgaard mentioned this pull request Jun 24, 2026

CI: Reduce Java 21 Spark matrix on pull requests #16951

Closed

Baunsgaard reviewed Jun 24, 2026

View reviewed changes

ajantha-bhat force-pushed the codex/ci-java-pr-jdk17 branch from 06419a6 to 23bb6fb Compare June 24, 2026 15:13

CI: Use one Java version for PR checks

f381826

Generated-by: GPT-5 Codex

ajantha-bhat force-pushed the codex/ci-java-pr-jdk17 branch from 23bb6fb to f381826 Compare June 24, 2026 15:17

ajantha-bhat commented Jun 24, 2026

View reviewed changes

Baunsgaard approved these changes Jun 24, 2026

View reviewed changes

manuzhang reviewed Jun 25, 2026

View reviewed changes

nssalian requested a review from nastra June 25, 2026 23:31

ajantha-bhat mentioned this pull request Jun 26, 2026

CI: Select PR matrices incrementally #16566

Draft

manuzhang reviewed Jun 26, 2026

View reviewed changes

Uh oh!

Conversation

ajantha-bhat commented Jun 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Rationale

Testing

Uh oh!

Baunsgaard commented Jun 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Baunsgaard left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ajantha-bhat commented Jun 24, 2026

Uh oh!

Baunsgaard commented Jun 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ajantha-bhat commented Jun 24, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Baunsgaard commented Jun 24, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ajantha-bhat commented Jun 25, 2026

Uh oh!

manuzhang Jun 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

ajantha-bhat commented Jun 24, 2026 •

edited

Loading

Baunsgaard commented Jun 24, 2026 •

edited

Loading

Baunsgaard left a comment •

edited

Loading

Baunsgaard commented Jun 24, 2026 •

edited

Loading

manuzhang Jun 26, 2026 •

edited

Loading