-
Notifications
You must be signed in to change notification settings - Fork 11
Adding nno dashboard #366
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Adding nno dashboard #366
Changes from all commits
e7ac02a
bc6c880
31fcae7
f659fdb
680e784
4fbba4f
cfd3e99
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change | ||||||
|---|---|---|---|---|---|---|---|---|
|
|
@@ -92,15 +92,20 @@ class TestResult: | |||||||
| test_status: str | ||||||||
| prow_job_url: str | ||||||||
| job_timestamp: str | ||||||||
| test_flavor: Optional[str] = None # NNO-specific: test configuration flavor | ||||||||
|
|
||||||||
| def to_dict(self) -> Dict[str, Any]: | ||||||||
| return { | ||||||||
| result = { | ||||||||
| OCP_FULL_VERSION: self.ocp_full_version, | ||||||||
| GPU_OPERATOR_VERSION: self.gpu_operator_version, | ||||||||
| "test_status": self.test_status, | ||||||||
| "prow_job_url": self.prow_job_url, | ||||||||
| "job_timestamp": self.job_timestamp, | ||||||||
| } | ||||||||
| # Include test_flavor only if it's set (NNO-specific) | ||||||||
| if self.test_flavor is not None: | ||||||||
| result["test_flavor"] = self.test_flavor | ||||||||
| return result | ||||||||
|
|
||||||||
| def composite_key(self) -> TestResultKey: | ||||||||
| repo, pr_number, job_name, build_id = extract_build_components(self.prow_job_url) | ||||||||
|
|
@@ -571,8 +576,15 @@ def merge_ocp_version_results( | |||||||
| bundle_result_limit: Optional[int] = None | ||||||||
| ) -> Dict[str, Any]: | ||||||||
| """Merge results for a single OCP version.""" | ||||||||
| # Initialize the structure | ||||||||
| merged_version_data = {"notes": [], "bundle_tests": [], "release_tests": [], "job_history_links": []} | ||||||||
| # Initialize the structure with all possible fields | ||||||||
| merged_version_data = { | ||||||||
| "notes": [], | ||||||||
| "bundle_tests": [], | ||||||||
| "release_tests": [], | ||||||||
| "job_history_links": [], | ||||||||
| "test_flavors": {} | ||||||||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. What is the meaning of test flavors? It looks like you want to incorporate them into "bundle_tests"/"release_tests" instead of having a separate section. E.g.
Let's think what would be the best way to organize the data.
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Also, this doesn't seem to belong in "gpu_operator_dashboard". We'll need to change the directory structure. Maybe keep the shared code separately, operator-specific code in the respective directories. |
||||||||
| } | ||||||||
| # Update with existing data (preserves any additional fields) | ||||||||
| merged_version_data.update(existing_version_data) | ||||||||
|
|
||||||||
| # Merge bundle tests with limit | ||||||||
|
|
@@ -599,6 +611,31 @@ def merge_ocp_version_results( | |||||||
| # Convert back to sorted list for JSON serialization | ||||||||
| merged_version_data["job_history_links"] = sorted(list(all_job_history_links)) | ||||||||
|
|
||||||||
| # Merge test_flavors (NNO-specific) if present | ||||||||
| new_test_flavors = new_version_data.get("test_flavors", {}) | ||||||||
| existing_test_flavors = merged_version_data.get("test_flavors", {}) | ||||||||
|
|
||||||||
| # Merge test flavors by combining results for each flavor | ||||||||
| for flavor_name, flavor_data in new_test_flavors.items(): | ||||||||
| if flavor_name not in existing_test_flavors: | ||||||||
| existing_test_flavors[flavor_name] = {"results": [], "job_history_links": set()} | ||||||||
|
|
||||||||
| # Merge results for this flavor (using same logic as release_tests) | ||||||||
| new_flavor_results = flavor_data.get("results", []) | ||||||||
| existing_flavor_results = existing_test_flavors[flavor_name].get("results", []) | ||||||||
| existing_test_flavors[flavor_name]["results"] = merge_release_tests( | ||||||||
| new_flavor_results, existing_flavor_results | ||||||||
| ) | ||||||||
|
|
||||||||
| # Merge job history links for this flavor | ||||||||
| new_flavor_links = flavor_data.get("job_history_links", set()) | ||||||||
| existing_flavor_links = existing_test_flavors[flavor_name].get("job_history_links", set()) | ||||||||
| all_flavor_links = set(existing_flavor_links if isinstance(existing_flavor_links, (set, list)) else []) | ||||||||
| all_flavor_links.update(new_flavor_links) | ||||||||
| existing_test_flavors[flavor_name]["job_history_links"] = sorted(list(all_flavor_links)) | ||||||||
|
|
||||||||
| merged_version_data["test_flavors"] = existing_test_flavors | ||||||||
|
|
||||||||
| return merged_version_data | ||||||||
|
|
||||||||
|
|
||||||||
|
|
||||||||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,136 @@ | ||
| # NVIDIA Network Operator Dashboard Workflow | ||
|
|
||
| This workflow generates an HTML dashboard showing NVIDIA Network Operator test results across different operator versions and OpenShift versions. It fetches test data from CI systems and creates visual reports for tracking test status over time. | ||
|
|
||
| ## Overview | ||
|
|
||
| The dashboard workflow: | ||
| - Fetches test results from Google Cloud Storage based on pull request data | ||
| - Supports various network operator test patterns including: | ||
| - `nvidia-network-operator-legacy-sriov-rdma` | ||
| - `nvidia-network-operator-e2e` | ||
| - DOCA-based tests (e.g., `doca4-nvidia-network-operator-*`) | ||
| - Merges new results with existing baseline data | ||
| - Generates HTML dashboard reports | ||
| - Automatically deploys updates to GitHub Pages | ||
|
|
||
| ## Architecture | ||
|
|
||
| This dashboard **reuses** the GPU Operator Dashboard code and only overrides the operator-specific parts: | ||
| - ✅ Imports all core logic from `workflows.gpu_operator_dashboard.fetch_ci_data` | ||
| - ✅ Overrides only Network Operator specific: | ||
| - Regex patterns to match network operator job names | ||
| - Artifact paths (`network-operator-e2e/artifacts/`) | ||
| - Version field names (`network_operator_version` vs `gpu_operator_version`) | ||
| - ✅ Maintains a clean, DRY codebase with minimal duplication | ||
|
|
||
| This design makes maintenance easier - bug fixes in the core logic automatically benefit both dashboards. | ||
|
|
||
| ## Supported Test Patterns | ||
|
|
||
| The dashboard recognizes the following test job patterns: | ||
| - `pull-ci-rh-ecosystem-edge-nvidia-ci-main-{version}-nvidia-network-operator-legacy-sriov-rdma` | ||
| - `pull-ci-rh-ecosystem-edge-nvidia-ci-main-{version}-nvidia-network-operator-e2e` | ||
| - `rehearse-{id}-pull-ci-rh-ecosystem-edge-nvidia-ci-main-doca4-nvidia-network-operator-*` | ||
|
|
||
| Example URL that will be processed: | ||
| ``` | ||
| https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/test-platform-results/pr-logs/pull/openshift_release/67673/rehearse-67673-pull-ci-rh-ecosystem-edge-nvidia-ci-main-doca4-nvidia-network-operator-legacy-sriov-rdma/1961127149603655680/ | ||
| ``` | ||
|
|
||
| ## Usage | ||
|
|
||
| ### Prerequisites | ||
|
|
||
| ```console | ||
| pip install -r workflows/nno_dashboard/requirements.txt | ||
| ``` | ||
|
|
||
| **Important:** Before running fetch_ci_data.py, create the baseline data file and initialize it with an empty JSON object if it doesn't exist: | ||
|
|
||
| ```console | ||
| echo '{}' > nno_data.json | ||
| ``` | ||
|
|
||
| ### Fetch CI Data | ||
|
|
||
| ```console | ||
| # Process a specific PR | ||
| python -m workflows.nno_dashboard.fetch_ci_data --pr_number "123" --baseline_data_filepath nno_data.json --merged_data_filepath nno_data.json | ||
|
|
||
| # Process all merged PRs - limited to 100 most recent (default) | ||
| python -m workflows.nno_dashboard.fetch_ci_data --pr_number "all" --baseline_data_filepath nno_data.json --merged_data_filepath nno_data.json | ||
|
|
||
| # Process with bundle result limit (keep only last 50 bundle tests per version) | ||
| python -m workflows.nno_dashboard.fetch_ci_data --pr_number "all" --baseline_data_filepath nno_data.json --merged_data_filepath nno_data.json --bundle_result_limit 50 | ||
| ``` | ||
|
|
||
| ### Generate Dashboard | ||
|
|
||
| ```console | ||
| python -m workflows.nno_dashboard.generate_ci_dashboard --dashboard_data_filepath nno_data.json --dashboard_html_filepath nno_dashboard.html | ||
| ``` | ||
|
|
||
| The dashboard generator also **reuses** the GPU Operator dashboard code: | ||
| - Imports all HTML generation logic from `workflows.gpu_operator_dashboard.generate_ci_dashboard` | ||
| - Uses Network Operator specific templates (in `templates/` directory) | ||
| - Only aliases `NETWORK_OPERATOR_VERSION` as `GPU_OPERATOR_VERSION` for compatibility | ||
|
|
||
| ### Running Tests | ||
|
|
||
| First, make sure `pytest` is installed. Then, run: | ||
|
|
||
| ```console | ||
| python -m pytest workflows/nno_dashboard/tests/ -v | ||
| ``` | ||
|
|
||
| ## GitHub Actions Integration | ||
|
|
||
| - **Automatic**: Processes merged pull requests to update the dashboard with new test results and deploys to GitHub Pages | ||
| - **Manual**: Can be triggered manually via GitHub Actions workflow dispatch | ||
|
|
||
| ## Data Structure | ||
|
|
||
| The fetched data follows this structure: | ||
|
|
||
| ```json | ||
| { | ||
| "doca4": { | ||
| "notes": [], | ||
| "bundle_tests": [ | ||
| { | ||
| "ocp_full_version": "4.16.0", | ||
| "network_operator_version": "24.10.0", | ||
| "test_status": "SUCCESS", | ||
| "prow_job_url": "https://...", | ||
| "job_timestamp": "1234567890" | ||
| } | ||
| ], | ||
| "release_tests": [...], | ||
| "job_history_links": [ | ||
| "https://prow.ci.openshift.org/job-history/gs/test-platform-results/pr-logs/directory/..." | ||
| ] | ||
| } | ||
| } | ||
| ``` | ||
|
|
||
| ## Troubleshooting | ||
|
|
||
| ### No data being fetched | ||
|
|
||
| 1. Verify the PR number exists and has network operator test runs | ||
| 2. Check that the job names match the expected patterns (see regex in fetch_ci_data.py line 36-40) | ||
| 3. Ensure the test artifacts contain the required files: | ||
| - `finished.json` | ||
| - `network-operator-e2e/artifacts/ocp.version` | ||
| - `network-operator-e2e/artifacts/operator.version` | ||
|
|
||
| ### Regex pattern not matching | ||
|
|
||
| The regex pattern is designed to match: | ||
| - Repository: `rh-ecosystem-edge_nvidia-ci` or `openshift_release` (for rehearse jobs) | ||
| - OCP version prefix: Can be `doca4`, `nno1`, or other custom prefixes | ||
| - Job suffix: Must contain `nvidia-network-operator` followed by test type | ||
|
|
||
| If your job names don't match, you may need to adjust the `TEST_RESULT_PATH_REGEX` pattern in `fetch_ci_data.py`. | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we want to have the NNO dashboard updated in the same workflow as the GPU operator? But even then, I think it would be better to separate the NNO steps from the GPU ones. E.g. "Set GPU operator env vars", "Set NNO env vars", "Fetch GPU operator CI results", "Fetch NNO CI resutls", etc.