feat(tap): Utilize Joblib to run parallel streams during sync_all#2295
Draft
BuzzCutNorman wants to merge 8 commits into
Draft
feat(tap): Utilize Joblib to run parallel streams during sync_all#2295BuzzCutNorman wants to merge 8 commits into
Joblib to run parallel streams during sync_all#2295BuzzCutNorman wants to merge 8 commits into
Conversation
CodSpeed Performance ReportMerging #2295 will not alter performanceComparing Summary
|
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #2295 +/- ##
==========================================
- Coverage 89.18% 88.84% -0.34%
==========================================
Files 54 54
Lines 4788 4822 +34
Branches 936 944 +8
==========================================
+ Hits 4270 4284 +14
- Misses 361 378 +17
- Partials 157 160 +3 ☔ View full report in Codecov by Sentry. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Overview:
This PR attempts to utilize
Joblibto allowsync_allto run streams in parallel. A new Tap class methodsync_onewas introduced to give a parallel loop a target for the streams. There is a new property calledmax_parallelismthat takes in a integer value which is passed toparallel_configargumentn_jobs. The default value ofmax_parallelismisNone. A tap will only attempt a parallel run if value is present inmax_parallelism. The capability ofTAP_MAX_PARALLELISM_CONFIGwas added to the Tap class so a tap can be passed amax_parallelismvalue via the meltano.yml.Examples:
Comments:
I need assistance with ideas on how to create pytests to cover these changes. Also if you run pytest when parallelism is enabled a lot of tests will fail, especially mapper test. The seem to only get the state message then nothing else.
Resources:
lokybackend joblib/joblib#1017📚 Documentation preview 📚: https://meltano-sdk--2295.org.readthedocs.build/en/2295/