Skip to content

Conversation

@ggordaniRed
Copy link
Collaborator

@ggordaniRed ggordaniRed commented Nov 19, 2025

Creating workflow for NNO dashboard

Summary by CodeRabbit

Release Notes

  • New Features
    • Added Network Operator CI Dashboard alongside the existing GPU Operator Dashboard for comprehensive test reporting across operator and OpenShift versions
    • Implemented per-operator change detection in workflows to independently generate dashboards when respective data changes
    • Enhanced dashboards with per-test-flavor categorization and reporting for more granular test result visibility

Guy Gordani and others added 7 commits November 5, 2025 12:34
adding nno dashboard module for generate html page for nvidia network operator
- Extract test flavors from job names (DOCA4, Bare Metal, etc.)
- Structure data by actual OCP version instead of infrastructure type
- Update templates to display test flavors as subsections
- Fix merge logic to preserve test_flavors field

Fixes GitHub Actions workflow failure on PR 67673
- Make test_flavor an optional field in TestResult dataclass
- Include test_flavor in to_dict() only when set
- Fixes TypeError when merging NNO results with test_flavor field

Resolves: TypeError: TestResult.__init__() got an unexpected keyword argument 'test_flavor'
- Properly distinguish RDMA Legacy SR-IOV vs RDMA Shared Device
- Detect GPU-enabled tests (with GPU / without GPU)
- Support Hosted, Bare Metal, and DOCA4 infrastructure types
- Better formatting: 'DOCA4 - RDMA Legacy SR-IOV with GPU' instead of separate parts

Test flavors now supported:
- RDMA Legacy SR-IOV (with/without GPU)
- RDMA Shared Device (with/without GPU)
- Bare Metal E2E
- Hosted configurations
- DOCA4 configurations
- Add is_valid_ocp_version() to validate OpenShift version keys
- Filter out infrastructure types: doca4, bare-metal, hosted, Unknown
- Only show actual OCP versions (4.17.16, 4.16, etc.) in TOC and sections
- Log filtered keys for debugging

This fixes the issue where 'doca4', 'bare-metal', and 'Unknown'
appeared as OpenShift versions instead of being properly organized
under the actual OCP version.
- When OCP version cannot be determined (infrastructure types without valid version), skip the result instead of creating 'Unknown' entries
- Add validation to ensure OCP version starts with digit and contains dots
- Improve logging to show resolution path for infrastructure types
- This prevents 'doca4', 'bare-metal', 'hosted', and 'Unknown' from appearing as OCP versions in the data

The dashboard now only processes actual OpenShift versions like '4.17.16'.
@openshift-ci
Copy link

openshift-ci bot commented Nov 19, 2025

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: ggordaniRed

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@ggordaniRed ggordaniRed requested review from TomerNewman and removed request for vtruhpon and wabouhamad November 19, 2025 14:17
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Nov 19, 2025

Walkthrough

This PR extends the GPU Operator dashboard infrastructure to support Network Operator (NNO) dashboards. It adds NNO-specific CI data fetching, HTML dashboard generation, HTML templates, workflow orchestration with change detection, and corresponding documentation, while extending the GPU operator's data model to support per-flavor test results.

Changes

Cohort / File(s) Summary
Workflow orchestration
.github/workflows/generate_matrix_page.yaml
Adds Network Operator dashboard paths (JSON and HTML outputs), installs NNO dependencies, fetches NNO CI data, detects changes in NNO JSON files, and conditionally generates both GPU and NNO dashboards based on respective change flags.
GPU operator data model extension
workflows/gpu_operator_dashboard/fetch_ci_data.py
Extends TestResult dataclass with optional test_flavor field; augments merge_ocp_version_results to initialize and merge per-flavor test results and job history links into test_flavors map.
Network Operator documentation
workflows/nno_dashboard/README.md
Introduces comprehensive documentation covering dashboard purpose, architecture (reusing GPU operator core logic), supported test patterns, usage instructions for fetch_ci_data and generate_ci_dashboard CLIs, testing, GitHub Actions integration, troubleshooting, and example data structures.
NNO CI data fetching
workflows/nno_dashboard/fetch_ci_data.py
Introduces new module with NNO-specific regex patterns, PR/build processing logic (process_closed_prs, fetch_filtered_files, extract_build_components, build_files_lookup, process_single_build, fetch_pr_files, filter_network_finished_files), test flavor extraction from job names, per-PR orchestration (process_tests_for_pr), and CLI entry point (main).
NNO dashboard generation
workflows/nno_dashboard/generate_ci_dashboard.py
Introduces new module for HTML dashboard generation; adds OCP version validation (is_valid_ocp_version), test flavor HTML section building (build_test_flavors_sections), full dashboard assembly (generate_test_matrix), and CLI entry point (main), reusing GPU operator dashboard core helpers.
NNO dependencies
workflows/nno_dashboard/requirements.txt
Specifies 12 pinned Python dependencies including pydantic, requests, semver, typing extensions, and urllib3.
NNO HTML templates
workflows/nno_dashboard/templates/{footer,header,main_table,test_flavor_section}.html
Adds HTML template fragments: header with document structure and sortTable JavaScript function, main_table with per-OCP version blocks and placeholders, test_flavor_section with flavor-keyed table structure, and footer with last-updated timestamp and document closure.

Sequence Diagram

sequenceDiagram
    actor WF as Workflow
    participant UD as Change Detection
    participant GF as GPU fetch_ci_data
    participant NF as NNO fetch_ci_data
    participant GG as GPU generate_dashboard
    participant NG as NNO generate_dashboard
    
    WF->>UD: Detect GPU JSON changes
    WF->>UD: Detect NNO JSON changes
    alt GPU_CHANGED
        WF->>GF: Fetch GPU CI data
        GF-->>WF: GPU JSON output
        WF->>GG: Generate GPU dashboard
        GG-->>WF: GPU HTML output
    end
    alt NNO_CHANGED
        WF->>NF: Fetch NNO CI data
        NF-->>WF: NNO JSON output
        WF->>NG: Generate NNO dashboard
        NG-->>WF: NNO HTML output
    end
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

  • Dense new modules: workflows/nno_dashboard/fetch_ci_data.py contains 9 new functions with intricate build processing, file filtering, and data transformation logic; generate_ci_dashboard.py introduces 4 new functions with template rendering and OCP version validation
  • Data structure extensions: GPU operator model extended with test_flavor field and per-flavor merge logic in merge_ocp_version_results, requiring verification of merge correctness
  • Multi-file scope: Changes span workflow, data models, new NNO-specific modules, templates, and dependencies
  • Heterogeneous logic: PR/build extraction, dual-build handling, test flavor derivation, and HTML generation require separate reasoning for each component

Areas requiring extra attention:

  • PR history processing logic in process_closed_prs and edge cases in fetch_filtered_files (pagination, error handling)
  • Regex patterns (TEST_RESULT_PATH_REGEX, OCP_FULL_VERSION) correctness for NNO artifact paths
  • Dual-build mapping logic in build_files_lookup and process_single_build to prevent data loss or duplication
  • Test flavor extraction heuristics in extract_test_flavor_from_job_name for accuracy
  • Integration point between GPU operator data model and NNO-specific usage (test_flavor field, per-flavor merge)
  • Template placeholder injection and HTML rendering in generate_test_matrix

Possibly related PRs

Suggested labels

needs-ok-to-test

Suggested reviewers

  • empovit
  • fabiendupont
  • wabouhamad

Poem

🐰 A Network Operator joins the stage,

CI dashboards turn a golden page,

Test flavors bloom in HTML light,

Side by side with GPU might,

Per-PR builds dance in parallel flight! 🚀

Pre-merge checks and finishing touches

❌ Failed checks (1 inconclusive)
Check name Status Explanation Resolution
Title check ❓ Inconclusive The title 'Adding nno dashboard' is vague and uses non-descriptive terms that don't clearly convey the scope or main purpose of the changeset. Improve the title to be more specific about what the NNO dashboard does or accomplishes, such as 'Add Network Operator dashboard generation workflow' or 'Implement NNO CI dashboard with fetch and generation steps'.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Docstring Coverage ✅ Passed Docstring coverage is 82.35% which is sufficient. The required threshold is 80.00%.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Tip

📝 Customizable high-level summaries are now available in beta!

You can now customize how CodeRabbit generates the high-level summary in your pull requests — including its content, structure, tone, and formatting.

  • Provide your own instructions using the high_level_summary_instructions setting.
  • Format the summary however you like (bullet lists, tables, multi-section layouts, contributor stats, etc.).
  • Use high_level_summary_in_walkthrough to move the summary from the description to the walkthrough section.

Example instruction:

"Divide the high-level summary into five sections:

  1. 📝 Description — Summarize the main change in 50–60 words, explaining what was done.
  2. 📓 References — List relevant issues, discussions, documentation, or related PRs.
  3. 📦 Dependencies & Requirements — Mention any new/updated dependencies, environment variable changes, or configuration updates.
  4. 📊 Contributor Summary — Include a Markdown table showing contributions:
    | Contributor | Lines Added | Lines Removed | Files Changed |
  5. ✔️ Additional Notes — Add any extra reviewer context.
    Keep each section concise (under 200 words) and use bullet or numbered lists for clarity."

Note: This feature is currently in beta for Pro-tier users, and pricing will be announced later.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (7)
workflows/nno_dashboard/README.md (1)

37-39: Tighten README accuracy and address minor markdown/doc nits

A few small adjustments would make this README more robust:

  • Merged vs closed PRs (Lines 61–63, 89–90): The implementation in process_closed_prs uses the GitHub API with state="closed" and does not filter on merged_at, so it will process all closed PRs (merged and unmerged). Either:

    • Update the wording to “closed PRs / closed pull requests”, or
    • Adjust the code to explicitly filter for merged PRs and then keep the current wording.
      Keeping docs aligned with behavior will avoid confusion when debugging runs.
  • Fenced code block language (Lines 37–39): The example GCS URL block is just:

    https://gcsweb-...
    

    Adding a language (e.g. text) will satisfy markdownlint and improve consistency:

    ```text
    https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/...
    
    
  • Brittle regex line reference (Line 122): Referring to “regex in fetch_ci_data.py line 36–40” is likely to drift as the file evolves. Suggest rephrasing to something like “see TEST_RESULT_PATH_REGEX in fetch_ci_data.py” without line numbers.

  • Hyphenation nit (Line 76): If you care about LanguageTool’s style suggestion, you could change “Network Operator specific templates” to “Network‑Operator‑specific templates”, but that’s purely stylistic.

Also, using a gcsweb example URL instead of the Prow UI matches how the rest of this repo prefers GCS navigation. Based on learnings.

Also applies to: 61-63, 76-77, 89-90, 121-123

workflows/gpu_operator_dashboard/fetch_ci_data.py (1)

87-108: test_flavor and test_flavors merging look correct; consider generalizing comments and tightening docs

The new test_flavor field on TestResult and the test_flavors merging in merge_ocp_version_results are implemented in a backward-compatible way:

  • to_dict() only emits "test_flavor" when set, so older JSON remains valid and TestResult(**item) still works for legacy entries.
  • The merge_ocp_version_results initializer plus update(existing_version_data) preserves any preexisting fields while ensuring notes, bundle_tests, release_tests, job_history_links, and test_flavors always exist.
  • The flavor merge correctly reuses merge_release_tests and normalizes job_history_links from set/list to a sorted list per flavor.

Two small polish suggestions:

  • Line 95/105: the inline comments call this “NNO-specific”, but now the GPU dashboard’s core model supports flavors generically. Renaming the comment to something like “optional test configuration flavor” would better reflect its shared nature.
  • Lines 573–578 and 614–637: the docstring and type hints for merge_ocp_version_results still describe only bundle/release tests and don’t mention test_flavors or that job_history_links may be sets or lists. Updating those descriptions will help future readers understand the full shape of new_version_data / existing_version_data.

Also applies to: 573-639

workflows/nno_dashboard/fetch_ci_data.py (2)

61-106: Avoid catching bare Exception in fetch_filtered_files to prevent masking real errors

In fetch_filtered_files (Lines 61–106), the while True loop wraps the GCS call with:

try:
    response_data = http_get_json(...)
    ...
except Exception as e:
    logger.debug(f"PR #{pr_number} not found in {repo} or error occurred: {e}")
    break

This will silently treat any unexpected error (authentication issues, JSON decoding bugs, API changes, etc.) as “PR not found”, which can make real failures hard to diagnose.

Consider tightening this logic, for example:

  • Catch only the specific exception(s) that http_get_json is expected to raise when a PR’s artifacts truly don’t exist, and keep the message at debug in that case.
  • For other exceptions, either let them propagate or log at warning/error so issues surface quickly.

This keeps the “try both repos” behavior for missing artifacts while avoiding accidental suppression of genuine problems.


109-138: Small lint/clarity cleanups (unused variable, no-op f-string) in NNO fetcher

There are a couple of low-impact issues that are easy to tidy up:

  • Unused repo unpacking (Lines 132–135 and 293–295): extract_build_components and filter_network_finished_files unpack repo but never use it. To satisfy linters and clarify intent, you can prefix it with _ or drop it from the unpacked tuple if you don’t need it:

    _repo, pr_number, job_name, build_id = extract_build_components(path)
  • No-op f-string (Line 289): logger.debug(f" Skipping - not nested or top-level") doesn’t interpolate anything. You can safely remove the f prefix:

    logger.debug("  Skipping - not nested or top-level")

These are purely cosmetic but will keep Ruff and other tools quiet.

Also applies to: 263-315

workflows/nno_dashboard/templates/test_flavor_section.html (1)

1-14: Template structure looks good; ensure {ocp_key} and {flavor_id} are safe for use in HTML IDs

The section markup is clear and matches the per-flavor data model (label + table of OCP vs Network Operator versions). One thing to watch for is that table-{ocp_key}-{flavor_id} produces valid and unique id attributes:

  • If ocp_key or flavor_id can contain spaces or special characters, consider normalizing them (e.g., lowercasing and replacing non-alphanumerics with -) before substitution.
  • This will also make it easier to target these tables from any JavaScript or CSS you add later.

Otherwise, the template looks ready to integrate with the NNO dashboard generator.

workflows/nno_dashboard/templates/header.html (1)

1-36: Sorting script works but could be hardened and cleaned up

This should work for your current tables, but a couple of optional tweaks would make it more robust and user‑friendly:

  • Clear asc/desc classes from all header cells before toggling on the clicked one, so only one column is visually marked as sorted at a time.
  • For numeric columns, consider explicitly coercing to numbers (e.g. var a = Number(cellA); var b = Number(cellB); and checking Number.isNaN) instead of relying on isNaN + implicit subtraction, which can behave oddly with formatted values.

These are non‑blocking polish items.

.github/workflows/generate_matrix_page.yaml (1)

35-41: JSON-only change detection may skip publishing template/generator updates

The new GPU_CHANGED / NNO_CHANGED flags are based solely on git diff of the JSON files. On pull_request_target runs this means:

  • If a PR only changes HTML templates, CSS, or the Python generator code (but the JSON content stays identical), the dashboards will not be regenerated, so gh‑pages keeps the old HTML until a later run where the JSON actually changes.
  • workflow_dispatch still forces regeneration, but that requires someone to remember to trigger it after non‑data changes.

If you want template / code changes to be published automatically on merge, consider one of:

  • Always regenerating HTML on pull_request_target and relying on git diff in the gh‑pages checkout to avoid redundant commits, or
  • Expanding the change detection to also look at relevant source paths (e.g. workflows/**/generate_ci_dashboard.py, workflows/**/templates/**, styles.css) instead of only the JSON.

Functionally the NNO wiring (env vars, fetch, and generate steps) looks correct; this is mainly about long‑term behavior of the gating logic.

Also applies to: 71-75, 79-88, 91-129

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 6917f2e and cfd3e99.

📒 Files selected for processing (10)
  • .github/workflows/generate_matrix_page.yaml (2 hunks)
  • workflows/gpu_operator_dashboard/fetch_ci_data.py (3 hunks)
  • workflows/nno_dashboard/README.md (1 hunks)
  • workflows/nno_dashboard/fetch_ci_data.py (1 hunks)
  • workflows/nno_dashboard/generate_ci_dashboard.py (1 hunks)
  • workflows/nno_dashboard/requirements.txt (1 hunks)
  • workflows/nno_dashboard/templates/footer.html (1 hunks)
  • workflows/nno_dashboard/templates/header.html (1 hunks)
  • workflows/nno_dashboard/templates/main_table.html (1 hunks)
  • workflows/nno_dashboard/templates/test_flavor_section.html (1 hunks)
🧰 Additional context used
🧠 Learnings (3)
📚 Learning: 2025-08-27T18:08:59.287Z
Learnt from: empovit
Repo: rh-ecosystem-edge/nvidia-ci PR: 291
File: workflows/gpu-operator-versions/nvidia_gpu_operator.py:58-58
Timestamp: 2025-08-27T18:08:59.287Z
Learning: In the nvidia-ci repository, the GH_AUTH_TOKEN environment variable is set using base64 encoding of the GitHub token: `export GH_AUTH_TOKEN=$(echo ${{ secrets.GITHUB_TOKEN }} | base64)` and this works successfully with GitHub Container Registry (GHCR) Bearer authentication in workflows/gpu-operator-versions/nvidia_gpu_operator.py.

Applied to files:

  • .github/workflows/generate_matrix_page.yaml
📚 Learning: 2025-11-09T13:12:43.631Z
Learnt from: empovit
Repo: rh-ecosystem-edge/nvidia-ci PR: 359
File: workflows/gpu_operator_versions/update_versions.py:100-106
Timestamp: 2025-11-09T13:12:43.631Z
Learning: In the nvidia-ci repository's `workflows/gpu_operator_versions/update_versions.py`, the `gpu_releases` variable is intentionally limited to the latest 2 GPU operator versions (via `get_latest_versions(list(gpu_versions.keys()), 2)` in main()). For maintenance OCP versions with pinned GPU operators, if the pinned version is not among the latest 2 releases, the test is intentionally skipped with a warning rather than falling back to test with any available version. This is because testing with outdated GPU operators would fail in their infrastructure, and they prefer to notice missing tests rather than deal with failed test runs, especially since GitHub organization repositories don't send notification emails about failed workflows.

Applied to files:

  • workflows/gpu_operator_dashboard/fetch_ci_data.py
  • workflows/nno_dashboard/README.md
📚 Learning: 2025-08-31T15:44:50.325Z
Learnt from: empovit
Repo: rh-ecosystem-edge/nvidia-ci PR: 295
File: workflows/gpu_operator_dashboard/fetch_ci_data.py:61-64
Timestamp: 2025-08-31T15:44:50.325Z
Learning: In the nvidia-ci project, gcsweb URLs are preferred over prow.ci.openshift.org URLs for the GPU operator dashboard because gcsweb allows easier navigation up the directory structure, while Prow UI requires switching to "Artifacts" first.

Applied to files:

  • workflows/nno_dashboard/README.md
🧬 Code graph analysis (2)
workflows/nno_dashboard/fetch_ci_data.py (1)
workflows/gpu_operator_dashboard/fetch_ci_data.py (12)
  • http_get_json (48-52)
  • fetch_gcs_file_content (55-64)
  • build_prow_job_url (67-69)
  • TestResultKey (75-84)
  • TestResult (88-136)
  • merge_bundle_tests (479-509)
  • merge_release_tests (517-570)
  • merge_ocp_version_results (573-639)
  • merge_and_save_results (642-666)
  • int_or_none (674-680)
  • build_key (121-124)
  • has_exact_versions (126-136)
workflows/nno_dashboard/generate_ci_dashboard.py (2)
workflows/common/templates.py (1)
  • load_template (9-37)
workflows/gpu_operator_dashboard/generate_ci_dashboard.py (5)
  • has_valid_semantic_versions (15-43)
  • build_catalog_table_rows (89-169)
  • build_notes (172-187)
  • build_toc (190-201)
  • build_bundle_info (204-244)
🪛 LanguageTool
workflows/nno_dashboard/README.md

[grammar] ~76-~76: Use a hyphen to join words.
Context: ...te_ci_dashboard- Uses Network Operator specific templates (intemplates/` dire...

(QB_NEW_EN_HYPHEN)

🪛 markdownlint-cli2 (0.18.1)
workflows/nno_dashboard/README.md

37-37: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

🪛 Ruff (0.14.5)
workflows/nno_dashboard/fetch_ci_data.py

1-1: Shebang is present but file is not executable

(EXE001)


101-101: Do not catch blind exception: Exception

(BLE001)


168-168: Unpacked variable repo is never used

Prefix it with an underscore or any other dummy variable pattern

(RUF059)


289-289: f-string without any placeholders

Remove extraneous f prefix

(F541)


293-293: Unpacked variable repo is never used

Prefix it with an underscore or any other dummy variable pattern

(RUF059)

workflows/nno_dashboard/generate_ci_dashboard.py

1-1: Shebang is present but file is not executable

(EXE001)


64-64: Consider moving this statement to an else block

(TRY300)

🔇 Additional comments (4)
workflows/nno_dashboard/fetch_ci_data.py (1)

387-461: Confirm strict OCP version gating via has_exact_versions() is intentional

In process_tests_for_pr (Lines 387–461, 478–495):

  • You derive result from process_single_build, then immediately do:

    actual_ocp_version = result.ocp_full_version if result.has_exact_versions() else None
    if not actual_ocp_version:
        continue

    and later further validate that actual_ocp_version starts with a digit and contains a dot.

  • Compared to the GPU workflow, this is stricter: GPU bundle tests are still keyed by the OCP version parsed from the path even when has_exact_versions() is false, whereas here any build that fails has_exact_versions() is skipped entirely (both bundle and release), unless it falls into the special-case correction for legacy “doca4/bare-metal/hosted” tokens.

This is reasonable if you explicitly want to drop all runs where either OCP or operator version strings are not “exact” and parseable, but it does mean:

  • Any future changes in version formatting that break has_exact_versions() will quietly exclude those runs from the NNO dashboard.
  • Bundle-only jobs with missing/bad ocp.version / operator.version files will be skipped even though the path-derived OCP version may still be usable.

If this stricter behavior is deliberate, consider adding a short comment above this block to document that design choice. Otherwise, you might want to:

  • Use ocp_version from the regex as a fallback key for bundle tests, or
  • Log a warning when has_exact_versions() fails so it’s visible when data is being dropped.

Also applies to: 478-495

workflows/nno_dashboard/requirements.txt (1)

1-11: Dependencies are already well-aligned across workflows; no changes needed.

Verification confirms that the NNO dashboard's pinned versions match perfectly with the GPU operator dashboard and are consistent with other workflow dashboards. All shared dependencies (requests, semver, urllib3) use identical versions across all workflow files, and pydantic versions (2.11.1) are identical where both files use them. No alignment issues found.

workflows/nno_dashboard/templates/main_table.html (1)

1-9: Template structure and placeholders look consistent with generator usage

The anchor id, container markup, and {notes}, {test_flavors_sections}, {bundle_info} placeholders line up with how generate_test_matrix fills this template, so this fragment should integrate cleanly with the TOC and per‑OCP blocks.

workflows/nno_dashboard/templates/footer.html (1)

1-6: Footer wiring for last-updated timestamp looks good

{LAST_UPDATED} is correctly isolated in a dedicated container and matches the replacement logic in generate_test_matrix, with proper closing tags.

Comment on lines +39 to +67
def is_valid_ocp_version(version_key: str) -> bool:
"""
Check if a version key is a valid OpenShift version.
Valid: "4.17.16", "4.16", "4.15.0"
Invalid: "doca4", "bare-metal", "hosted", "Unknown"
"""
# Filter out known infrastructure types
invalid_keys = ["doca4", "bare-metal", "hosted", "unknown"]
if version_key.lower() in invalid_keys:
return False

# Valid OCP versions start with a digit and contain dots
if not version_key or not version_key[0].isdigit():
return False

# Check if it looks like a semantic version (X.Y or X.Y.Z)
parts = version_key.split('.')
if len(parts) < 2:
return False

try:
# Try to parse first two parts as numbers
int(parts[0])
int(parts[1])
return True
except (ValueError, IndexError):
return False

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Fix OCP version ordering and consider small lint cleanups

  1. OCP versions are sorted lexicographically, not numerically

Currently:

valid_ocp_keys = [key for key in ocp_data.keys() if is_valid_ocp_version(key)]
sorted_ocp_keys = sorted(valid_ocp_keys, reverse=True)

Because this is a plain string sort, versions like "4.9" will sort after "4.10" in an unintuitive order. To get a proper numerical/semantic ordering without extra dependencies, you can sort by the numeric components:

-    valid_ocp_keys = [key for key in ocp_data.keys() if is_valid_ocp_version(key)]
-    sorted_ocp_keys = sorted(valid_ocp_keys, reverse=True)
+    valid_ocp_keys = [key for key in ocp_data.keys() if is_valid_ocp_version(key)]
+    sorted_ocp_keys = sorted(
+        valid_ocp_keys,
+        key=lambda v: tuple(int(p) for p in v.split(".")[:3]),
+        reverse=True,
+    )

This keeps behavior aligned with how OCP versions are usually presented in dashboards.

  1. Optional: address Ruff hints (non-functional)
  • If you don’t intend to run this file directly, you can drop the shebang line to silence EXE001, since the workflow uses python -m workflows.nno_dashboard.generate_ci_dashboard anyway.
  • The return True inside the try block in is_valid_ocp_version is fine as-is; if you want to satisfy TRY300, you could move it into an else: after the except, but that’s purely stylistic.

Also applies to: 115-133

🧰 Tools
🪛 Ruff (0.14.5)

64-64: Consider moving this statement to an else block

(TRY300)

🤖 Prompt for AI Agents
In workflows/nno_dashboard/generate_ci_dashboard.py around lines 39-67 (and
similarly 115-133), is_valid_ocp_version and subsequent sorting use string-based
ordering which misorders versions like "4.9" vs "4.10"; update the sort to sort
by numeric version components instead of plain strings by splitting the version
on '.', converting segments to ints (pad shorter versions with zeros or sort by
tuple of ints) and use that tuple as the sort key with reverse=True for
descending order; optionally remove the shebang on top if the file is not
executed directly to silence EXE001 and, if desired for style only, move the
successful return out of the try into an else to satisfy TRY300.

# The functions just reference the field name, they don't care about the actual operator type.


def is_valid_ocp_version(version_key: str) -> bool:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you can use the code we already have because we are doing the same thing, and instead of gpu_operator_dashboard we can have something more generic like dashboard or something else similar

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There seem to be a lot of code duplication. Let's try to either make the dashboard generation code flexible enough to accommodate both the NNO and the GPU operator, or refactor the common code and reuse it for NNO/GPU, implement just the customization.

# GPU Operator dashboard paths
echo "DASHBOARD_DATA_FILEPATH=${DASHBOARD_OUTPUT_DIR}/gpu_operator_matrix.json" >> "$GITHUB_ENV"
echo "DASHBOARD_HTML_FILEPATH=${DASHBOARD_OUTPUT_DIR}/gpu_operator_matrix.html" >> "$GITHUB_ENV"
# Network Operator dashboard paths
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we want to have the NNO dashboard updated in the same workflow as the GPU operator? But even then, I think it would be better to separate the NNO steps from the GPU ones. E.g. "Set GPU operator env vars", "Set NNO env vars", "Fetch GPU operator CI results", "Fetch NNO CI resutls", etc.

"bundle_tests": [],
"release_tests": [],
"job_history_links": [],
"test_flavors": {}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the meaning of test flavors? It looks like you want to incorporate them into "bundle_tests"/"release_tests" instead of having a separate section. E.g.

OpenShift GPU Operator Network Operator
4.19.17 25.10.1 25.3 (eth), 25.3 (infiniband)

Let's think what would be the best way to organize the data.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, this doesn't seem to belong in "gpu_operator_dashboard". We'll need to change the directory structure. Maybe keep the shared code separately, operator-specific code in the respective directories.

@@ -0,0 +1,531 @@
#!/usr/bin/env python
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure why we need to change both gpu_operator_dashboard/fetch_ci_data.py and nno_dashboard/fetch_ci_data.py. Either one script can handle both operators, or they are completely independent and the GPU flow remains intact. It looks like there's be a lot of shared code, and we should try to avoid duplication. Essentially fetching test results should not be very different for NNO from GPU.

# The functions just reference the field name, they don't care about the actual operator type.


def is_valid_ocp_version(version_key: str) -> bool:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There seem to be a lot of code duplication. Let's try to either make the dashboard generation code flexible enough to accommodate both the NNO and the GPU operator, or refactor the common code and reuse it for NNO/GPU, implement just the customization.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants