Skip to content

feat(ingest): improve CI batches #14239

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 6 commits into from
Jul 28, 2025
Merged

feat(ingest): improve CI batches #14239

merged 6 commits into from
Jul 28, 2025

Conversation

hsheth2
Copy link
Collaborator

@hsheth2 hsheth2 commented Jul 26, 2025

  • Moves us from 3 -> 6 ingestion batches to split our tests across. The goal is ~12-15 minutes per batch.
  • Only runs the expensive integration tests for Python 3.11. Lint + unit tests continue to run for both Python 3.9 and 3.11.

@github-actions github-actions bot added ingestion PR or Issue related to the ingestion of metadata devops PR or Issue related to DataHub backend & deployment labels Jul 26, 2025
Copy link

codecov bot commented Jul 26, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
⚠️ Test Analytics upload error: Unsupported file format

📢 Thoughts on this report? Let us know!

@datahub-cyborg datahub-cyborg bot added the needs-review Label for PRs that need review from a maintainer. label Jul 26, 2025
@@ -63,6 +63,9 @@ markers =
integration_batch_0: mark tests to run in batch 0 of integration tests. This is done mainly for parallelization in CI. Batch 0 is the default batch.
integration_batch_1: mark tests to run in batch 1 of integration tests
integration_batch_2: mark tests to run in batch 2 of integration tests
integration_batch_3: mark tests to run in batch 3 of integration tests (mostly powerbi)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it true there are two mostly powerbi batches?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah. somehow the m query parser is extremely slow, so the powerbi tests alone take ~24 minutes. so I split them into two batches

@datahub-cyborg datahub-cyborg bot added pending-submitter-merge and removed needs-review Label for PRs that need review from a maintainer. labels Jul 28, 2025
@hsheth2
Copy link
Collaborator Author

hsheth2 commented Jul 28, 2025

🤖 PR Nanny Status: ✅ All Checks Passing

Last updated: 2025-07-28 17:37:44 UTC / 2025-07-28 10:37:44 PDT

I'm monitoring this PR and all critical checks are currently passing.

I'll automatically retry any critical checks that fail in the future.


🧠 Can't debug failures? Get instant AI insights - Get PR Nanny

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚫 [actionlint] reported by reviewdog 🐶
property "extra_pip_extras" is not defined in object type {extra_pip_constraints: string; extra_pip_requirements: string; python-version: number} [expression]

run: ./gradlew -Pextra_pip_requirements='${{ matrix.extra_pip_requirements }}' -Pextra_pip_constraints='${{ matrix.extra_pip_constraints }}' -Pextra_pip_extras='${{ matrix.extra_pip_extras }}' :metadata-ingestion-modules:airflow-plugin:build

@hsheth2
Copy link
Collaborator Author

hsheth2 commented Jul 28, 2025

Airflow tests broke due to an unrelated PR, reverting to fix that here #14248

@hsheth2 hsheth2 merged commit da9121d into master Jul 28, 2025
71 of 75 checks passed
@hsheth2 hsheth2 deleted the ingestion-batches branch July 28, 2025 23:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
devops PR or Issue related to DataHub backend & deployment ingestion PR or Issue related to the ingestion of metadata pending-submitter-merge
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants