Skip to content

Conversation

@ombhojane
Copy link

What this PR does / why we need it:

This PR adds comprehensive Helm integration tests to the GitHub Actions workflow to ensure the Kubeflow Trainer Helm chart functions correctly across different environments and configurations.

The implementation includes:

  • Chart linting and validation using chart-testing (ct)
  • Automated dependency resolution for JobSet integration
  • Kind cluster setup for realistic testing environment
  • Combined lint-and-install testing as recommended in code review
  • Comprehensive verification of deployed components
  • Upgrade testing capabilities
  • Proper cleanup procedures

Additionally, this PR fixes a critical issue with the JobSet dependency in Chart.yaml where the OCI registry path was incorrect, preventing successful chart installations.

Which issue(s) this PR fixes :

Fixes #2577

Technical Details

Key Changes Made:

  1. Fixed JobSet Dependency: Corrected OCI registry path from oci://registry.k8s.io/jobset/charts/jobset to oci://registry.k8s.io/jobset/charts
  2. Implemented ct lint-and-install: Addressed review feedback from @andreyvelich to use combined testing approach
  3. Added Chart Testing Configuration: Created .github/ct.yaml with proper timeout and dependency settings
  4. Enhanced Verification: Added comprehensive checks for pods, services, CRDs, and component health

Workflow Features:

  • Automatically detects target branch for PR and push events
  • Sets up Kind cluster with fallback to project's make targets
  • Runs dependency updates and template validation
  • Performs installation testing with proper timeout handling
  • Verifies both Trainer and JobSet components are deployed correctly
  • Includes upgrade testing capabilities
  • Ensures proper cleanup of test resources

Testing Approach:

  • Lint Phase: Validates chart structure, dependencies, and templating
  • Install Phase: Tests actual deployment in Kind cluster environment
  • Verification Phase: Confirms all expected components are running
  • Upgrade Phase: Tests chart upgrade scenarios

This addresses all requirements from issue #2577 and incorporates feedback from the previous PR review process, ensuring the Trainer Helm chart maintains quality and reliability standards.

@google-oss-prow
Copy link

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign andreyvelich for approval. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@kramaranya
Copy link
Contributor

kramaranya commented Aug 7, 2025

Thank you for your first contribution @ombhojane! 🎉
Could you please sign your commits?

@ombhojane ombhojane force-pushed the helm-integration-tests branch from 81f7fc6 to 6c7a173 Compare August 8, 2025 04:07
@ombhojane
Copy link
Author

Hey @kramaranya
I've signed the commits & pushed the changes:

@andreyvelich
Copy link
Member

/ok-to-test

@coveralls
Copy link

coveralls commented Aug 8, 2025

Pull Request Test Coverage Report for Build 18153337284

Details

  • 0 of 0 changed or added relevant lines in 0 files are covered.
  • No unchanged relevant lines lost coverage.
  • Overall coverage remained the same at 55.174%

Totals Coverage Status
Change from base Build 18132966300: 0.0%
Covered Lines: 1093
Relevant Lines: 1981

💛 - Coveralls

@ombhojane
Copy link
Author

hey @andreyvelich
i've resolved the issues raised while actions were running
please review
image

Copy link
Member

@andreyvelich andreyvelich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/ok-to-test

@ombhojane Can you sign your commits please ?

@andreyvelich
Copy link
Member

/retest

Copy link
Member

@andreyvelich andreyvelich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @ombhojane!
I left my initial comments.

@ombhojane
Copy link
Author

thanks for detailed review, i'm going thru & resolving this

@ombhojane ombhojane force-pushed the helm-integration-tests branch from b0c7f9f to a6407b7 Compare August 11, 2025 15:33
ombhojane and others added 9 commits August 12, 2025 01:34
…tall commands compatible with Debian image (kubeflow#2528)

Signed-off-by: Debabrata47 <[email protected]>
Signed-off-by: ombhojane <[email protected]>
Signed-off-by: ombhojane <[email protected]>
Signed-off-by: ombhojane <[email protected]>
Signed-off-by: ombhojane <[email protected]>
Signed-off-by: ombhojane <[email protected]>
Signed-off-by: ombhojane <[email protected]>
Signed-off-by: ombhojane <[email protected]>
@ombhojane ombhojane force-pushed the helm-integration-tests branch from a68cbfc to 6fa4375 Compare August 11, 2025 20:04
@ombhojane
Copy link
Author

hey @andreyvelich
can you please review the codes.

Copy link
Member

@andreyvelich andreyvelich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/cc @kubeflow/kubeflow-trainer-team @astefanutti

# Wait for JobSet to be ready
kubectl wait --for=condition=ready pod -l app.kubernetes.io/name=jobset --timeout=300s -n jobset-system
- name: Create test values file for chart testing
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we leverage the e2e-setup-cluster.sh script to deploy Helm Chart ?

I think, the steps are similar, and we should just deploy it with Helm Charts instead of Kustomize manifests.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with @andreyvelich this should ideally not be duplicated.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

resolved!

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

leveraged e2e-setup-cluster-helm.sh

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ombhojane As I can see, you still build and load images as part of your e2e-setup-cluster-helm.sh script.
I suggest that you just add condition here, to whether you want to deploy Kubeflow Trainer with Kustomize or Helm Charts:

echo "Deploy Kubeflow Trainer control plane"
E2E_MANIFESTS_DIR="artifacts/e2e/manifests"
mkdir -p "${E2E_MANIFESTS_DIR}"
cat <<EOF >"${E2E_MANIFESTS_DIR}/kustomization.yaml"
apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization
resources:
- ../../../manifests/overlays/manager
images:
- name: "${CONTROLLER_MANAGER_CI_IMAGE_NAME}"
newTag: "${CONTROLLER_MANAGER_CI_IMAGE_TAG}"
EOF

In that case, you don't need to maintain separate script just for Helm deployment.

@google-oss-prow
Copy link

@andreyvelich: GitHub didn't allow me to request PR reviews from the following users: kubeflow/kubeflow-trainer-team.

Note that only kubeflow members and repo collaborators can review this PR, and authors cannot review their own PRs.

In response to this:

/cc @kubeflow/kubeflow-trainer-team @astefanutti

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

# Wait for JobSet to be ready
kubectl wait --for=condition=ready pod -l app.kubernetes.io/name=jobset --timeout=300s -n jobset-system
- name: Create test values file for chart testing
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with @andreyvelich this should ideally not be duplicated.

@ombhojane
Copy link
Author

sure, I'll enhance the flow as suggested

Comment on lines +34 to +40
- name: Setup cluster and deploy with Helm
run: |
make test-e2e-setup-cluster-helm K8S_VERSION=1.32.0
- name: Run E2E smoke tests
run: |
NAMESPACE="kubeflow-system"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After you modify the e2e setup script you should be able to just run:

      - name: Setup cluster
        run: |
          make test-e2e-setup-cluster K8S_VERSION=${{ matrix.kubernetes-version }} DEPLOYMENT_METHOD=helm

      - name: Run e2e with Go
        run: |
          make test-e2e || (kubectl logs -n kubeflow-system -l app.kubernetes.io/name=trainer && exit 1)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add Helm integration tests to GitHub actions workflow

6 participants