Skip to content

CI: Use one Java version for PR checks#16945

Open
ajantha-bhat wants to merge 1 commit into
apache:mainfrom
ajantha-bhat:codex/ci-java-pr-jdk17
Open

CI: Use one Java version for PR checks#16945
ajantha-bhat wants to merge 1 commit into
apache:mainfrom
ajantha-bhat:codex/ci-java-pr-jdk17

Conversation

@ajantha-bhat

@ajantha-bhat ajantha-bhat commented Jun 24, 2026

Copy link
Copy Markdown
Member

Summary

This updates the PR JVM matrix for the Java, Spark, Flink, Hive, Kafka Connect, and Delta conversion workflows.

For ordinary pull requests, these workflows now run on JDK 17 only. For non-PR runs and PRs labeled full-ci, they continue to run on both JDK 17 and JDK 21.

The JVM policy is expressed inline in each workflow matrix, so this avoids adding a separate planner workflow/job while keeping the behavior easy to inspect in the workflow file.

Rationale

JDK 17 is the lower supported Java baseline and the project bytecode target, so it gives the most useful compatibility signal for regular PR validation while reducing GitHub Actions runner usage.

This does not remove JDK 21 coverage. Main branch/release/tag runs and full-ci PRs still use both supported JVMs.

Testing

  • actionlint .github/workflows/java-ci.yml .github/workflows/spark-ci.yml .github/workflows/flink-ci.yml .github/workflows/hive-ci.yml .github/workflows/kafka-connect-ci.yml .github/workflows/delta-conversion-ci.yml
  • zizmor --min-severity medium --min-confidence medium --no-progress .github/workflows/java-ci.yml .github/workflows/spark-ci.yml .github/workflows/flink-ci.yml .github/workflows/hive-ci.yml .github/workflows/kafka-connect-ci.yml .github/workflows/delta-conversion-ci.yml
  • YAML parse for the updated workflow files
  • git diff --check HEAD~1..HEAD

AI Disclosure

  • Model: GPT-5 Codex
  • Platform/Tool: Codex
  • Human Oversight: partially reviewed
  • Prompt Summary: Update Iceberg CI workflows to use one JVM for regular PR checks while preserving full JVM coverage for full CI paths.

@Baunsgaard

Baunsgaard commented Jun 24, 2026

Copy link
Copy Markdown
Contributor

Hi @ajantha-bhat,

Inspired by your comments on the mailing lists I took the liberty to split your suggestion into two, and only do the reduction in Java versions to run Ci on. see #16951 .

This makes the change more maintainable while achieving similar gains.

When i look at the current Pr from you that already have split some of the logic, it still feels a bit too verbose, and mixes many parts that could be simpler.

Let me know what you think?

@Baunsgaard Baunsgaard left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First of all, thanks for driving this suggestion from mailing list @ajantha-bhat, and splitting the implementation into multiple PRs to reduce complexity.

Benefits for your suggestions is that centralizing the JVM policy in one place is a real plus, and it lines up with the discussion you have had on the dev-list discussion on cutting the GitHub Actions usage. Two things I'd like to discuss, both in the spirit of that thread's goals (lower runner consumption + shorter queue time):

  1. The plan job costs a runner per workflow. Each uses: ./.github/workflows/jvm-ci-plan.yml runs as its own ubuntu-24.04 job just to emit a JSON string, and every test job needs: plan, so fan-out waits on it. The thread's motivation is the ASF shared-runner ceiling and queue pressure, so adding N short-lived-but-still-scheduled jobs plus a serialization gate works a little against that. The JVM matrix can be derived for free at workflow-setup time from the github context (event + labels), no runner required e.g. in my pr #16951. Would it be worth computing it inline to avoid the extra jobs, even if the expression is repeated per workflow?

  2. Keep a JDK 21 canary on PRs? In the discussion thread the suggested shape was "PRs: ... mostly one JVM, plus targeted canaries", this PR currently drops JDK 21 from PRs entirely. Could we keep one JDK 21 canary on the latest engine version (e.g. Spark 4.1 / Scala 2.13) so there's still a pre-merge JDK 21 signal at ~2 jobs?

Either way I'm glad to close #16951 if people are more in favour of this centralized approach suggested here. I just took a look earlier and thought another approach was perhaps more easily understandable and maintainable. I just wanted to suggest something without indirection and the extra job count that came up.

@ajantha-bhat

Copy link
Copy Markdown
Member Author

@Baunsgaard: My thinking with the reusable planner was to avoid copying the same JVM policy into every workflow and risking drift between Java/Spark/Flink/Hive/Kafka/Delta.

In terms of raw runner minutes, the planner jobs should be tiny compared to the JDK 21 jobs this removes. This PR adds a few short plan jobs, but removes many heavy test/build jobs from the default PR path, so the net runner-minute savings should still be large.

That said, I agree the plan job is not free: it is still a scheduled runner job and it gates fan-out in each workflow. Since the original problem is ASF shared-runner pressure and queueing, I can see why avoiding even small extra jobs is attractive.

So I think the trade-off is:

  • reusable planner: less duplication and one JVM policy to maintain
  • inline expression: no extra jobs and no additional fan-out gate

I am gonna update this PR to go with inline expressions instead of a single JVM policy to make it more faster.

@ajantha-bhat ajantha-bhat force-pushed the codex/ci-java-pr-jdk17 branch from 06419a6 to 23bb6fb Compare June 24, 2026 15:13
@Baunsgaard

Baunsgaard commented Jun 24, 2026

Copy link
Copy Markdown
Contributor

Thanks @ajantha-bhat

I like this code much better already! The one thing missing in my opinion is to still run at least one JVM 21 version. This would still nearly cut the CI in half, but still give some indication on the PR if Java 21 is broken!

Generated-by: GPT-5 Codex
@ajantha-bhat ajantha-bhat force-pushed the codex/ci-java-pr-jdk17 branch from 23bb6fb to f381826 Compare June 24, 2026 15:17
@ajantha-bhat

Copy link
Copy Markdown
Member Author

I like this code much better already! The one thing missing in my opinion is to still run at least one JVM 21 version. This would still nearly cut the CI in half, but still give some indication on the PR if Java 21 is broken!

Good point! I actually thought about doing exactly that, but I still worry it might be too expensive for PR runs. Usually, if a build passes on one JDK, it passes on the others—I can't recall a time we had a version-specific failure on a PR. Since we have the safeguard on main, I'd vote to keep it lean for now.

Let's see what the rest of the group thinks. If we have the luxury of extra resources and want that extra peace of mind, we can always turn it on later!

./gradlew -DsparkVersions=${{ matrix.spark }} -DscalaVersion=${{ matrix.scala }} -DflinkVersions= -DkafkaVersions= \
$projects -Pquick=true -x javadoc -DtestParallelism=auto
./gradlew -DsparkVersions="${SPARK_VERSION}" -DscalaVersion="${SCALA_VERSION}" -DflinkVersions= -DkafkaVersions= \
"${projects[@]}" -Pquick=true -x javadoc -DtestParallelism=auto

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

found this using actionlint

"${projects[@]}" is Bash array expansion.

Before:

projects=":spark-extension:check :spark-runtime:check"
./gradlew ... $projects ...
That relies on unquoted word splitting so the string becomes two Gradle arguments. actionlint runs ShellCheck, and ShellCheck flags this as unsafe: unquoted variables can accidentally split/glob.

After:

projects=(
  ":spark-extension:check"
  ":spark-runtime:check"
)

./gradlew ... "${projects[@]}" ...
"${projects[@]}" expands each array item as its own safely quoted argument. So Gradle still receives:

:spark-extension:check
:spark-runtime:check
but without unsafe word splitting.

So behavior is the same, just cleaner and actionlint-safe.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you could consider not having this as part of the PR, to keep it self contained, and open another one with this change. Just to keep a single responsibility for each PR.

@Baunsgaard

Copy link
Copy Markdown
Contributor

I like this code much better already! The one thing missing in my opinion is to still run at least one JVM 21 version. This would still nearly cut the CI in half, but still give some indication on the PR if Java 21 is broken!

Good point! I actually thought about doing exactly that, but I still worry it might be too expensive for PR runs. Usually, if a build passes on one JDK, it passes on the others—I can't recall a time we had a version-specific failure on a PR. Since we have the safeguard on main, I'd vote to keep it lean for now.

Let's see what the rest of the group thinks. If we have the luxury of extra resources and want that extra peace of mind, we can always turn it on later!

I would still suggest to have that one Java21 run. The difference in CI usage is minimal (12.5%), and it is better to have some of the coverage than none. I have seen now many times that something breaks in other versions if not tested.

max-parallel: 15
matrix:
jvm: [17, 21]
jvm: ${{ fromJson((github.event_name == 'pull_request' && contains(github.event.pull_request.labels.*.name, 'full-ci') == false) && '[17]' || '[17,21]') }}

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure whether it's easy to read and maintain with this format. Is there a way to separate configs from logic?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you see my initial commit before amending it,
f840149

It had clean planner job but it comes with the queue overhead.
GitHub Actions matrix limitations force a trade-off. Since we intentionally removed the planner job to avoid queue overhead, the inline expression is the smallest no-extra-job version.

If you have any other suggestions, it is welcome.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might be worth trying. Can we keep matrix.jvm and use matrix.exclude to just keep the first option for pull_request?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are some options if you want:

Option 1: across lines with the '>-' prepend. It does not move the logic out but it splits the line at least.

  matrix:
    # PRs run JDK 17 only; push/tag and 'full-ci'-labeled PRs run 17 + 21.
    jvm: >-
      ${{ fromJson(
        (github.event_name == 'pull_request'
          && !contains(github.event.pull_request.labels.*.name, 'full-ci'))
        && '[17]'
        || '[17,21]'
      ) }}

Option 2: Use exclude:

  matrix:
    jvm: [17, 21]
    # On PRs without the 'full-ci' label, drop JDK 21.
    exclude:
      - jvm: >-
          ${{ (github.event_name == 'pull_request'
            && !contains(github.event.pull_request.labels.*.name, 'full-ci'))
            && 21 || 0 }}

Alternatively as @ajantha-bhat says, the original PR setup the planner job, but it actively increase the number of actions run. I lean towards what it is currently with a oneliner, but Option 1 where the statement is split seems fine too.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

exclude is more confusing IMO (option 2), hence I didn't use it initially.

I can go with option 1 if at all required. It is just a line separation.
For me it looks good as it is since we don't have other options.

@ajantha-bhat

Copy link
Copy Markdown
Member Author

@kevinjqliu, @nastra: Can we please take a look at this?

max-parallel: 15
matrix:
jvm: [17, 21]
jvm: ${{ fromJson((github.event_name == 'pull_request' && contains(github.event.pull_request.labels.*.name, 'full-ci') == false) && '[17]' || '[17,21]') }}

@manuzhang manuzhang Jun 26, 2026

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will add full-ci label ever trigger java 21 CI given the default PR activity types are opened, synchronize and reopened?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As mentioned in the mailing list. There will be follow up PR for full-ci handling and remaining things.
Draft branch for full-ci handling can be seen here
ajantha-bhat@82f4591

It handles these cases too.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants