Skip to content

feat(tap): Utilize Joblib to run parallel streams during sync_all#2295

Draft
BuzzCutNorman wants to merge 8 commits into
meltano:mainfrom
BuzzCutNorman:feat-tap-parallel-streams
Draft

feat(tap): Utilize Joblib to run parallel streams during sync_all#2295
BuzzCutNorman wants to merge 8 commits into
meltano:mainfrom
BuzzCutNorman:feat-tap-parallel-streams

Conversation

@BuzzCutNorman
Copy link
Copy Markdown
Contributor

@BuzzCutNorman BuzzCutNorman commented Mar 7, 2024

Overview:
This PR attempts to utilize Joblib to allow sync_all to run streams in parallel. A new Tap class method sync_one was introduced to give a parallel loop a target for the streams. There is a new property called max_parallelism that takes in a integer value which is passed to parallel_config argument n_jobs. The default value of max_parallelism is None. A tap will only attempt a parallel run if value is present in max_parallelism . The capability of TAP_MAX_PARALLELISM_CONFIG was added to the Tap class so a tap can be passed a max_parallelism value via the meltano.yml.

Examples:

max_parallelism: -1 #All cpu cores
max_parallelism: Null #Null, 0, 1 are a no Parallel runs

Comments:
I need assistance with ideas on how to create pytests to cover these changes. Also if you run pytest when parallelism is enabled a lot of tests will fail, especially mapper test. The seem to only get the state message then nothing else.

Resources:


📚 Documentation preview 📚: https://meltano-sdk--2295.org.readthedocs.build/en/2295/

@codspeed-hq
Copy link
Copy Markdown

codspeed-hq Bot commented Mar 7, 2024

CodSpeed Performance Report

Merging #2295 will not alter performance

Comparing BuzzCutNorman:feat-tap-parallel-streams (3323b45) with main (b6fa56a)

Summary

✅ 6 untouched benchmarks

@codecov
Copy link
Copy Markdown

codecov Bot commented Mar 7, 2024

Codecov Report

Attention: Patch coverage is 46.51163% with 23 lines in your changes are missing coverage. Please review.

Project coverage is 88.84%. Comparing base (9d0c08b) to head (73342d4).

Files Patch % Lines
singer_sdk/tap_base.py 45.23% 19 Missing and 4 partials ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #2295      +/-   ##
==========================================
- Coverage   89.18%   88.84%   -0.34%     
==========================================
  Files          54       54              
  Lines        4788     4822      +34     
  Branches      936      944       +8     
==========================================
+ Hits         4270     4284      +14     
- Misses        361      378      +17     
- Partials      157      160       +3     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants