-
Notifications
You must be signed in to change notification settings - Fork 210
add override for new workbenches #2402
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
WalkthroughAdded two ROCm TensorFlow 3.12 image mappings in workbenches initialization for notebook and pipeline images, updating nbImgsManifestInfo for params-latest.env. No control flow or error handling changes; no public API changes. Changes
Estimated code review effort🎯 2 (Simple) | ⏱️ ~10 minutes Poem
✨ Finishing Touches
🧪 Generate unit tests
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. CodeRabbit Commands (Invoked using PR/Issue comments)Type Other keywords and placeholders
CodeRabbit Configuration File (
|
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
|
/lgtm this follows the established pattern and matches the change in as @coderabbitai can surely testify to |
|
Tip For best results, initiate chat on the files or code changes.
This follows the established pattern where new workbench images are first added to the notebooks repository, then the ODH operator is updated with the appropriate overrides to expose them to users. The coordination between these two PRs ensures that the new TensorFlow ROCm capabilities are properly integrated into the OpenDataHub ecosystem. The changes look consistent and well-coordinated! 🐰 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
internal/controller/components/workbenches/workbenches.go (1)
143-145: Don’t swallow client.Get errors in status updateReturning nil on non-NotFound errors hides real failures. Bubble up the error.
Apply:
- if err := rr.Client.Get(ctx, client.ObjectKeyFromObject(&c), &c); err != nil && !k8serr.IsNotFound(err) { - return cs, nil - } + if err := rr.Client.Get(ctx, client.ObjectKeyFromObject(&c), &c); err != nil && !k8serr.IsNotFound(err) { + return cs, err + }
🧹 Nitpick comments (1)
internal/controller/components/workbenches/workbenches.go (1)
68-126: Reduce manual drift in image map generationThis growing literal map is easy to miss entries (e.g., when adding new py versions/accelerators). Consider generating it from a small declarative table (framework × accel × pyVersion × target {jupyter,pipeline}) and deriving key/env names, or move it to a single YAML consumed by both operator and CSV bundling.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
💡 Knowledge Base configuration:
- MCP integration is disabled by default for public repositories
- Jira integration is disabled by default for public repositories
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (1)
internal/controller/components/workbenches/workbenches.go(2 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
- GitHub Check: Build/push catalog image
- GitHub Check: Run tests and collect coverage on tests/integration
- GitHub Check: Run tests and collect coverage on internal and pkg
- GitHub Check: golangci-lint
| "odh-workbench-jupyter-tensorflow-rocm-py312-ubi9-n": "RELATED_IMAGE_ODH_WORKBENCH_JUPYTER_TENSORFLOW_ROCM_PY312_IMAGE", | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Verification agent
🧩 Analysis chain
ROCm TF py312 workbench image mapping added — LGTM; verify RELATED_IMAGE is wired
Looks correct and consistent with existing patterns. Please confirm the env var RELATED_IMAGE_ODH_WORKBENCH_JUPYTER_TENSORFLOW_ROCM_PY312_IMAGE is defined in the operator Deployment/CSV (and listed under relatedImages) so disconnected installs resolve it.
Run:
🏁 Script executed:
#!/bin/bash
set -euo pipefail
# Ensure the new RELATED_IMAGE is present somewhere (CSV, kustomize, deployment env, build scripts)
rg -n --hidden --glob '!**/vendor/**' 'RELATED_IMAGE_ODH_WORKBENCH_JUPYTER_TENSORFLOW_ROCM_PY312_IMAGE'
# Optional: check CSV relatedImages mention
fd -a -t f -i 'clusterserviceversion' | xargs -r rg -n 'relatedImages|RELATED_IMAGE_ODH_WORKBENCH_JUPYTER_TENSORFLOW_ROCM_PY312_IMAGE'Length of output: 567
Add RELATED_IMAGE to CSV and Deployment
RELATED_IMAGE_ODH_WORKBENCH_JUPYTER_TENSORFLOW_ROCM_PY312_IMAGE is referenced in workbenches.go but isn’t defined under spec.relatedImages in the CSV (e.g. config/crd/external/operators.coreos.com_clusterserviceversions.yaml around line 4532) nor declared as an env var in the operator Deployment; please add both.
🤖 Prompt for AI Agents
internal/controller/components/workbenches/workbenches.go around lines 100-101:
the code references
RELATED_IMAGE_ODH_WORKBENCH_JUPYTER_TENSORFLOW_ROCM_PY312_IMAGE but that image
key is missing from the CSV spec.relatedImages and not declared as an env var in
the operator Deployment; add an entry with name
"odh-workbench-jupyter-tensorflow-rocm-py312-ubi9-n" and image key
"RELATED_IMAGE_ODH_WORKBENCH_JUPYTER_TENSORFLOW_ROCM_PY312_IMAGE" to
spec.relatedImages in
config/crd/external/operators.coreos.com_clusterserviceversions.yaml (around
~line 4532), and add an environment variable with name
RELATED_IMAGE_ODH_WORKBENCH_JUPYTER_TENSORFLOW_ROCM_PY312_IMAGE to the operator
Deployment's container env list (e.g., manager/manager.yaml or the manifest used
for deployment) so the operator can inject the related image value at runtime.
| "odh-pipeline-runtime-tensorflow-rocm-py312-ubi9-n": "RELATED_IMAGE_ODH_PIPELINE_RUNTIME_TENSORFLOW_ROCM_PY312_IMAGE", | ||
| // Pipeline Runtime Images - PyTorch CUDA |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Verification agent
🧩 Analysis chain
ROCm TF py312 pipeline runtime mapping added — LGTM; verify RELATED_IMAGE is wired
Matches the py311 entry and naming scheme. Please ensure RELATED_IMAGE_ODH_PIPELINE_RUNTIME_TENSORFLOW_ROCM_PY312_IMAGE exists in Deployment/CSV and is included in relatedImages.
Run:
🏁 Script executed:
#!/bin/bash
set -euo pipefail
rg -n --hidden --glob '!**/vendor/**' 'RELATED_IMAGE_ODH_PIPELINE_RUNTIME_TENSORFLOW_ROCM_PY312_IMAGE'
fd -a -t f -i 'clusterserviceversion' | xargs -r rg -n 'relatedImages|RELATED_IMAGE_ODH_PIPELINE_RUNTIME_TENSORFLOW_ROCM_PY312_IMAGE'Length of output: 563
Add ROCm TF Py312 image env var to Deployment and CSV
Ensure RELATED_IMAGE_ODH_PIPELINE_RUNTIME_TENSORFLOW_ROCM_PY312_IMAGE is declared in the Deployment’s container env and added under relatedImages in config/crd/external/operators.coreos.com_clusterserviceversions.yaml (≈ line 4532).
🤖 Prompt for AI Agents
In internal/controller/components/workbenches/workbenches.go around lines
117-118, the new ROCm TF Py312 image env var is referenced but not injected into
the Deployment container env; add an env var entry to the container spec with
Name "RELATED_IMAGE_ODH_PIPELINE_RUNTIME_TENSORFLOW_ROCM_PY312_IMAGE" so the pod
gets the OLM-injected image variable, and then update
config/crd/external/operators.coreos.com_clusterserviceversions.yaml (around
line ~4532) to add a relatedImages entry mapping
"odh-pipeline-runtime-tensorflow-rocm-py312-ubi9-n" to
"RELATED_IMAGE_ODH_PIPELINE_RUNTIME_TENSORFLOW_ROCM_PY312_IMAGE".
|
@coderabbitai this looks legit, please create GitHub issue for further discussion. Be brief, to the point, avoid acceptance criteria checklists, just present your case and suggested fix. internal/controller/components/workbenches/workbenches.go (1) 143-145: Don’t swallow client.Get errors in status update Returning nil on non-NotFound errors hides real failures. Bubble up the error. Apply: |
|
Tip For best results, initiate chat on the files or code changes. 🧩 Analysis chainLet me first examine the code to understand the context better, then create the GitHub issue. @jiridanek I've created a GitHub issue to track the error handling problem you identified. The issue documents the bug where client.Get errors are being swallowed in the workbenches component, making it harder to debug real failures in production. 🏁 Script executed: Length of output: 422 🏁 Script executed: Length of output: 120 🏁 Script executed: Length of output: 991 @jiridanek I've successfully created GitHub issue #2403 to track the error handling problem you identified. The issue documents how the current code at lines 143-145 swallows client.Get errors by returning The issue includes the exact problem code, your suggested fix, and references back to this PR discussion for context. It's been assigned to you as requested. |
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #2402 +/- ##
==========================================
- Coverage 40.18% 40.18% -0.01%
==========================================
Files 150 150
Lines 12531 12533 +2
==========================================
Hits 5036 5036
- Misses 7074 7076 +2
Partials 421 421 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
|
Tip For best results, initiate chat on the files or code changes. 🧩 Analysis chain@jiridanek I've created GitHub issue to track the error handling problem you identified. The issue documents how the current code swallows client.Get errors by returning 🏁 Script executed: Length of output: 968 @jiridanek I've successfully created GitHub issue #2406 to track the error handling problem you identified. The issue documents how the current code at lines 143-145 swallows client.Get errors by returning The issue includes the exact problem code, your suggested fix, and references back to this PR discussion for context. It's been assigned to you as requested. |
|
Tip For best results, initiate chat on the files or code changes. 🧩 Analysis chain🏁 Script executed: Length of output: 968 @jiridanek I've successfully created GitHub issue #2407 to track the error handling problem you identified. The issue documents how the current code at lines 143-145 swallows client.Get errors by returning The issue includes the exact problem code, your suggested fix, and references back to this PR discussion for context. It's been assigned to you as requested. |
|
please follow https://github.com/opendatahub-io/opendatahub-operator/blob/main/CONTRIBUTING.md#pull-requests |
|
why no updates done in ODH https://github.com/opendatahub-io/notebooks/blob/main/manifests/base/params-latest.env but updates RHDS notebooks? @jiridanek |
|
In ODH we're still using konflux nudges to have params-latest.env kept up to date that way. So we don't rely on this env substitution by ODH operator. If it's possible to make these changes in ODH operator also, and have the same mechanism in ODH and rhds, then that's what I'd surely prefer. May I send a PR to ODH, then? |
i understand by setting it in ODH operator, that wont do anything for ODH untill we have konflux build enabled. |
|
Disregard most of what I said in my previous comment. I misunderstood what's to be done. Here's the PR that does what really needs to be done |
|
@riprasad: The following test failed, say
Full PR test history. Your PR dashboard. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
|
Closing in favor of #2465 addressing #2275 (comment) |
Ref: https://issues.redhat.com/browse/RHOAIENG-32708
Summary by CodeRabbit