Skip to content

Conversation

@atheo89
Copy link
Member

@atheo89 atheo89 commented Jul 23, 2025

Description

Sync changes from rhds:main to rhds:rhoai2.24

How Has This Been Tested?

Merge criteria:

  • The commits are squashed in a cohesive manner and have meaningful messages.
  • Testing instructions have been added in the PR body (for PRs involving changes that are not immediately obvious).
  • The developer has manually tested the changes and verified that the changes work

mtchoum1 and others added 30 commits July 17, 2025 13:35
[RHOAIENG-26264] commit*.env references updates automatically at 11:00 UTC using cron job
When installing python wheels that contain native bits, the
Pipfile.lock will only contain artifact hashes for the architecture
that `pip lock` was run against, along with the source archive
hash. So when installing on a different architecture, pip will attempt
to compile from the source archive, and therefore will need the
appropriate development files for the native dependencies that are
used by the programs that it's compiling.

In this case, the h5py python package is needed in the tensorflow
images, and to compile the native shared object files that it contains
from source, the libhdf5.so file from hdf5-devel is needed. The
compiled object files will be dynamically linked to the .so files from
hdf5 (so technically the hdf5-devel is only needed at compile time,
then hdf5 at runtime, but since the compilation is only done on _some_
architectures, there isn't a dedicated build stage for these python
packages, so to try to make minimal changes, the -devel package is
left in place.

On the architecture that the Pipfile.lock was generated on (x86_64),
the native bits are downloaded pre-compiles as before. This makes
things a little weird, as on x86_64 we'll have .so files that are
precompiled and link to other .so files downloaded from PyPI, whereas
on aarch64 we'll have .so files that were compiled as part of the
build and linked to other .so files from hdf5 and other RPMs from the
system.
For the tensorflow CUDA images, when trying to build them on aarch64,
the `pip install` stage fails because of the lack of support for the
version of nvidia-nccl-cu12 (2.21.5) requested by tensorflow
2.18. Since it's a proprietary package, there also isn't a source
distribution, so it can't just be compiled at installation time.

Updating to tensorflow 2.19.0 pulls in a newer
nvidia-nccl-cu12 (2.23.4), which does have wheels available for both
x86_64 and aarch64 on PyPI.
…ahub-io#1414)

This commit introduces tests for ROCm-enabled workbench images on OpenShift. These tests verify that the images can be deployed successfully on a cluster with AMD GPUs and that both PyTorch and TensorFlow can correctly detect the available accelerator.
To support the testing of large accelerator images, the following changes were made:
- The pod readiness timeout in the test framework has been increased to 10 minutes to allow sufficient time for image pulling.
- The utility was updated to allow for configurable timeouts. `ImageDeployment`
- Existing CUDA tests were updated to use this new configurable timeout.
…tahub-io#1414)

The best fix is to make the SocketProxy more robust so that it doesn't crash when a connection attempt fails. By catching the expected BrokenPipeError, the proxy can simply discard the failed connection and continue listening for the next attempt from the Wait.until loop. This turns your test from a "hope it works" scenario into a reliable polling check.
…atahub-io#1412)

Previously, the linux/s390x build would fail to install Podman if Podman was not yet in the GitHub Actions cache.

Generalize the non-native architecture build process by using `tonistiigi/binfmt` to install QEMU handlers. This enables building container images for `linux/s390x` and `linux/ppc64le` on amd64 runners.

The podman installation step is now also performed for these new platforms. This replaces the previous approach that used `docker/setup-qemu-action` and only supported `s390x`.
…mysql-connector-python

This is a followup of the previous PR that bumped this version in
Pipfiles only. This change is for manifests to the relevant images so
it's then properly grabbed by UI etc.
This is a followup of the recent image update to propagate this upgrade
also into the image manifest metadata.
…o b728be3

Image created from 'https://github.com/opendatahub-io/notebooks?rev=93039d467b1015fad749387ec637e2b2a8f81dec'

Signed-off-by: red-hat-konflux <126015336+red-hat-konflux[bot]@users.noreply.github.com>
…mponent-updates/component-update-odh-pipeline-runtime-datascience-cpu-py311-ubi9

chore(deps): update odh-pipeline-runtime-datascience-cpu-py311-ubi9 to b728be3
…nflux nudging (opendatahub-io#1424)

* Update the params-latest.env with Python 3.12 correct images

* Update the commit-latest.env with Python 3.12 correct hashes
…c3935

Image created from 'https://github.com/opendatahub-io/notebooks?rev=3d2444838c5031d9a8b8c5fadcfe6dbfa3815d1e'

Signed-off-by: red-hat-konflux <126015336+red-hat-konflux[bot]@users.noreply.github.com>
…mponent-updates/component-update-odh-pipeline-runtime-minimal-cpu-py312-ubi9

chore(deps): update odh-pipeline-runtime-minimal-cpu-py312-ubi9 to efc3935
…o 5d53c5f

Image created from 'https://github.com/opendatahub-io/notebooks?rev=3d2444838c5031d9a8b8c5fadcfe6dbfa3815d1e'

Signed-off-by: red-hat-konflux <126015336+red-hat-konflux[bot]@users.noreply.github.com>
…208eb2

Image created from 'https://github.com/opendatahub-io/notebooks?rev=3d2444838c5031d9a8b8c5fadcfe6dbfa3815d1e'

Signed-off-by: red-hat-konflux <126015336+red-hat-konflux[bot]@users.noreply.github.com>
red-hat-konflux bot and others added 26 commits July 23, 2025 04:20
…130158

Image created from 'https://github.com/opendatahub-io/notebooks?rev=4cdec0a985b3b7c8033561d8e657dd6a25f550e6'

Signed-off-by: red-hat-konflux <126015336+red-hat-konflux[bot]@users.noreply.github.com>
…mponent-updates/component-update-odh-workbench-jupyter-minimal-cuda-py311-ubi9

chore(deps): update odh-workbench-jupyter-minimal-cuda-py311-ubi9 to b9a2972
…mponent-updates/component-update-odh-pipeline-runtime-pytorch-cuda-py311-ubi9

chore(deps): update odh-pipeline-runtime-pytorch-cuda-py311-ubi9 to 9130158
…6ca1971

Image created from 'https://github.com/opendatahub-io/notebooks?rev=4cdec0a985b3b7c8033561d8e657dd6a25f550e6'

Signed-off-by: red-hat-konflux <126015336+red-hat-konflux[bot]@users.noreply.github.com>
…ebe0f2

Image created from 'https://github.com/opendatahub-io/notebooks?rev=2663f3b4a2784044a5df7eb5794e242be83d4d7a'

Signed-off-by: red-hat-konflux <126015336+red-hat-konflux[bot]@users.noreply.github.com>
…485d52c

Image created from 'https://github.com/opendatahub-io/notebooks?rev=4cdec0a985b3b7c8033561d8e657dd6a25f550e6'

Signed-off-by: red-hat-konflux <126015336+red-hat-konflux[bot]@users.noreply.github.com>
…mponent-updates/component-update-odh-workbench-jupyter-trustyai-cpu-py311-ubi9

chore(deps): update odh-workbench-jupyter-trustyai-cpu-py311-ubi9 to 485d52c
…mponent-updates/component-update-odh-workbench-jupyter-pytorch-cuda-py311-ubi9

chore(deps): update odh-workbench-jupyter-pytorch-cuda-py311-ubi9 to 6ca1971
…mponent-updates/component-update-odh-pipeline-runtime-pytorch-cuda-py312-ubi9

chore(deps): update odh-pipeline-runtime-pytorch-cuda-py312-ubi9 to 2ebe0f2
…to e73f814

Image created from 'https://github.com/opendatahub-io/notebooks?rev=4cdec0a985b3b7c8033561d8e657dd6a25f550e6'

Signed-off-by: red-hat-konflux <126015336+red-hat-konflux[bot]@users.noreply.github.com>
…to 7b6f6f3

Image created from 'https://github.com/opendatahub-io/notebooks?rev=2663f3b4a2784044a5df7eb5794e242be83d4d7a'

Signed-off-by: red-hat-konflux <126015336+red-hat-konflux[bot]@users.noreply.github.com>
…o 5bfdec2

Image created from 'https://github.com/opendatahub-io/notebooks?rev=4cdec0a985b3b7c8033561d8e657dd6a25f550e6'

Signed-off-by: red-hat-konflux <126015336+red-hat-konflux[bot]@users.noreply.github.com>
…mponent-updates/component-update-odh-workbench-jupyter-tensorflow-cuda-py311-ubi9

chore(deps): update odh-workbench-jupyter-tensorflow-cuda-py311-ubi9 to e73f814
…6f43e71

Image created from 'https://github.com/opendatahub-io/notebooks?rev=2663f3b4a2784044a5df7eb5794e242be83d4d7a'

Signed-off-by: red-hat-konflux <126015336+red-hat-konflux[bot]@users.noreply.github.com>
…mponent-updates/component-update-odh-workbench-jupyter-tensorflow-cuda-py312-ubi9

chore(deps): update odh-workbench-jupyter-tensorflow-cuda-py312-ubi9 to 7b6f6f3
…mponent-updates/component-update-odh-pipeline-runtime-tensorflow-cuda-py311-ubi9

chore(deps): update odh-pipeline-runtime-tensorflow-cuda-py311-ubi9 to 5bfdec2
…mponent-updates/component-update-odh-workbench-jupyter-pytorch-cuda-py312-ubi9

chore(deps): update odh-workbench-jupyter-pytorch-cuda-py312-ubi9 to 6f43e71
…6d8afa6

Image created from 'https://github.com/opendatahub-io/notebooks?rev=4cdec0a985b3b7c8033561d8e657dd6a25f550e6'

Signed-off-by: red-hat-konflux <126015336+red-hat-konflux[bot]@users.noreply.github.com>
…mponent-updates/component-update-odh-workbench-jupyter-minimal-rocm-py311-ubi9

chore(deps): update odh-workbench-jupyter-minimal-rocm-py311-ubi9 to 6d8afa6
…4c99152

Image created from 'https://github.com/opendatahub-io/notebooks?rev=2663f3b4a2784044a5df7eb5794e242be83d4d7a'

Signed-off-by: red-hat-konflux <126015336+red-hat-konflux[bot]@users.noreply.github.com>
…6a637f

Image created from 'https://github.com/opendatahub-io/notebooks?rev=2663f3b4a2784044a5df7eb5794e242be83d4d7a'

Signed-off-by: red-hat-konflux <126015336+red-hat-konflux[bot]@users.noreply.github.com>
…mponent-updates/component-update-odh-pipeline-runtime-pytorch-rocm-py312-ubi9

chore(deps): update odh-pipeline-runtime-pytorch-rocm-py312-ubi9 to 06a637f
…mponent-updates/component-update-odh-workbench-jupyter-minimal-rocm-py312-ubi9

chore(deps): update odh-workbench-jupyter-minimal-rocm-py312-ubi9 to 4c99152
@openshift-ci openshift-ci bot requested review from dibryant and jiridanek July 23, 2025 09:30
@openshift-ci
Copy link

openshift-ci bot commented Jul 23, 2025

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign paulovmr for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@moulalis moulalis merged commit 4099309 into red-hat-data-services:rhoai-2.24 Jul 23, 2025
36 of 54 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants