Skip to content

feat: Introduce function in Tap to convert input catalog to streams#3084

Closed
rafalkrupinski wants to merge 4 commits into
meltano:mainfrom
oeklo:feat/streams-from-catalog
Closed

feat: Introduce function in Tap to convert input catalog to streams#3084
rafalkrupinski wants to merge 4 commits into
meltano:mainfrom
oeklo:feat/streams-from-catalog

Conversation

@rafalkrupinski
Copy link
Copy Markdown
Contributor

@rafalkrupinski rafalkrupinski commented Jun 9, 2025

Following https://meltano.slack.com/archives/C06A1MD6A6L/p1749119815375359

Introducing Tap.load_streams_from_catalog(), which should reconstruct list[Stream] from the input catalog.

This makes it clear whether the Tap is expected to perform discovery or is run with catalog.

By default, for compatibility, the new function calls Tap.load_streams(), so it doesn't introduce a breaking change.


This is rather trivial function but has some logic, so I could use some hint on what kind of testing would be expected.

Summary by Sourcery

Add a new load_streams_from_catalog method to centralize catalog‐based stream reconstruction and update both the core Tap class and its template to use it without breaking existing behavior.

New Features:

  • Introduce Tap.load_streams_from_catalog to reconstruct streams from an input catalog
  • Update the streams property to delegate stream loading to the new method when a catalog is provided
  • Embed a default load_streams_from_catalog stub in the tap template to yield streams by catalog entries

Enhancements:

  • Maintain backward compatibility by having load_streams_from_catalog default to calling load_streams()

📚 Documentation preview 📚: https://meltano-sdk--3084.org.readthedocs.build/en/3084/

@sourcery-ai
Copy link
Copy Markdown
Contributor

sourcery-ai Bot commented Jun 9, 2025

Reviewer's Guide

This PR adds a new method to distinguish between discovery and catalog-based execution by introducing load_streams_from_catalog(), refactors the streams() property to leverage this method when an input catalog is provided, and updates the tap template to include a default load_streams_from_catalog implementation.

Sequence Diagram: Tap.streams Property Logic

sequenceDiagram
    participant C as Client
    participant T as Tap
    participant S as Stream

    C->>T: Access streams property
    T->>T: Check if _streams is initialized
    alt "_streams is already initialized"
        T-->>C: Return _streams
    else "_streams is None"
        T->>T: Get input_catalog
        alt "input_catalog is None (Discovery Mode)"
            T->>T: Call load_streams()
            loop "for each discovered stream"
                T->>T: Add stream to _streams
            end
        else "input_catalog is Present (Catalog Mode)"
            T->>T: Call load_streams_from_catalog()
            loop "for each stream from catalog"
                T->>S: stream.apply_catalog(input_catalog)
                S-->>T: Catalog applied
                T->>T: Add stream to _streams
            end
        end
        T-->>C: Return _streams
    end
Loading

Class Diagram: Template Tap Inheritance and Method Override

classDiagram
    class Stream {
        // Base Stream class from singer_sdk
    }
    class CookiecutterStream {
        // Represents specific streams like UsersStream, GroupsStream
    }
    CookiecutterStream --|> Stream

    class singer_sdk_Tap {
        +load_streams() : list~Stream~
        +load_streams_from_catalog() : Iterable~Stream~
    }
    class TemplateTap {
        +input_catalog : Catalog
        +load_streams_from_catalog() : Iterable~CookiecutterStream~ // Overridden
    }
    TemplateTap --|> singer_sdk_Tap
    TemplateTap ..> CookiecutterStream : creates
Loading

File-Level Changes

Change Details Files
Import and type annotation support for the new method
  • Added Iterable import under TYPE_CHECKING
  • Updated load_streams_from_catalog signature to return Iterable[Stream]
singer_sdk/tap_base.py
Introduce load_streams_from_catalog in Tap base
  • Added method stub that by default calls load_streams()
  • Provided docstring explaining backward compatibility
singer_sdk/tap_base.py
Refactor streams() property to use new method
  • Removed unconditional empty dict initialization
  • Added conditional branch: if no catalog, load via load_streams(); else, use load_streams_from_catalog() and apply catalog to each stream
singer_sdk/tap_base.py
Update cookiecutter tap template with default override
  • Added load_streams_from_catalog override yielding streams based on input_catalog keys
  • Included basic implementation for 'groups' and 'users' streams
cookiecutter/tap-template/{{cookiecutter.tap_id}}/{{cookiecutter.library_name}}/tap.py

Tips and commands

Interacting with Sourcery

  • Trigger a new review: Comment @sourcery-ai review on the pull request.
  • Continue discussions: Reply directly to Sourcery's review comments.
  • Generate a GitHub issue from a review comment: Ask Sourcery to create an
    issue from a review comment by replying to it. You can also reply to a
    review comment with @sourcery-ai issue to create an issue from it.
  • Generate a pull request title: Write @sourcery-ai anywhere in the pull
    request title to generate a title at any time. You can also comment
    @sourcery-ai title on the pull request to (re-)generate the title at any time.
  • Generate a pull request summary: Write @sourcery-ai summary anywhere in
    the pull request body to generate a PR summary at any time exactly where you
    want it. You can also comment @sourcery-ai summary on the pull request to
    (re-)generate the summary at any time.
  • Generate reviewer's guide: Comment @sourcery-ai guide on the pull
    request to (re-)generate the reviewer's guide at any time.
  • Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
    pull request to resolve all Sourcery comments. Useful if you've already
    addressed all the comments and don't want to see them anymore.
  • Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
    request to dismiss all existing Sourcery reviews. Especially useful if you
    want to start fresh with a new review - don't forget to comment
    @sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

  • Enable or disable review features such as the Sourcery-generated pull request
    summary, the reviewer's guide, and others.
  • Change the review language.
  • Add, remove or edit custom review instructions.
  • Adjust other review settings.

Getting Help

@codecov
Copy link
Copy Markdown

codecov Bot commented Jun 9, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 92.83%. Comparing base (fa02124) to head (e336f31).
⚠️ Report is 301 commits behind head on main.

Additional details and impacted files
@@           Coverage Diff           @@
##             main    #3084   +/-   ##
=======================================
  Coverage   92.83%   92.83%           
=======================================
  Files          63       63           
  Lines        5427     5430    +3     
  Branches      672      672           
=======================================
+ Hits         5038     5041    +3     
  Misses        281      281           
  Partials      108      108           
Flag Coverage Δ
core 78.98% <100.00%> (+0.01%) ⬆️
end-to-end 78.04% <100.00%> (+0.01%) ⬆️
optional-components 43.31% <14.28%> (-0.01%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link
Copy Markdown
Contributor

@sourcery-ai sourcery-ai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @rafalkrupinski - I've reviewed your changes and they look great!

Here's what I looked at during the review
  • 🟡 General issues: 2 issues found
  • 🟢 Security: all looks good
  • 🟢 Testing: all looks good
  • 🟢 Documentation: all looks good

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

Comment thread singer_sdk/tap_base.py
@codspeed-hq
Copy link
Copy Markdown

codspeed-hq Bot commented Jun 9, 2025

CodSpeed Performance Report

Merging #3084 will not alter performance

Comparing oeklo:feat/streams-from-catalog (e336f31) with main (fa02124)

Summary

✅ 8 untouched benchmarks

@rafalkrupinski rafalkrupinski force-pushed the feat/streams-from-catalog branch 3 times, most recently from d106420 to f8f68dd Compare June 9, 2025 18:06
@rafalkrupinski
Copy link
Copy Markdown
Contributor Author

Cookiecutter check seems to be failing on code that I didn't touch.
I've no idea what "semantic PR" check wants, it's already semantic

@edgarrmondragon edgarrmondragon changed the title introduce function in Tap to convert input catalog to streams feat: Introduce function in Tap to convert input catalog to streams Jun 9, 2025
@edgarrmondragon
Copy link
Copy Markdown
Collaborator

Cookiecutter check seems to be failing on code that I didn't touch. I've no idea what "semantic PR" check wants, it's already semantic

I think it's just complaining about some whitespace.

Copy link
Copy Markdown
Collaborator

@edgarrmondragon edgarrmondragon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hey @rafalkrupinski, thanks for the PR!

Would it make sense to start using the new method for SQL taps, for example?

@rafalkrupinski rafalkrupinski force-pushed the feat/streams-from-catalog branch from c0c88c3 to e6ad82b Compare June 10, 2025 07:55
@rafalkrupinski
Copy link
Copy Markdown
Contributor Author

Would it make sense to start using the new method for SQL taps, for example?

That depends on whether a tap can recreate a stream instance from the catalog entry. On the first glimpse SQLTap can, since it includes sufficient metadata (table, schema name) in the catalog.

I'll take a look

Rafał Krupiński and others added 4 commits July 30, 2025 08:24
…am objects, without discovery.

This clarifies whether the tap should perform discovery or use the input catalog, the two being mutually exclusive.
Co-authored-by: sourcery-ai[bot] <58596630+sourcery-ai[bot]@users.noreply.github.com>
…er.library_name}}/tap.py

Co-authored-by: Edgar Ramírez Mondragón <edgarrm358@gmail.com>
…er.library_name}}/tap.py

Co-authored-by: Edgar Ramírez Mondragón <edgarrm358@gmail.com>
@rafalkrupinski rafalkrupinski force-pushed the feat/streams-from-catalog branch from 56aee84 to e336f31 Compare July 30, 2025 06:24
@read-the-docs-community
Copy link
Copy Markdown

Documentation build overview

📚 Meltano SDK | 🛠️ build #29025657 (e336f31) | 🔍 preview

Files changed

Comparing with latest (fa02124...e336f31)

Show files (3) | 3 modified | 0 added | 0 deleted
File Status
genindex.html 📝 modified
classes/singer_sdk.SQLTap.html 📝 modified
classes/singer_sdk.Tap.html 📝 modified

@edgarrmondragon
Copy link
Copy Markdown
Collaborator

Ugh, since this PR is coming from an org GitHub is not letting me push changes 😮‍💨

@rafalkrupinski
Copy link
Copy Markdown
Contributor Author

I'm sorry, I won't be able to continue working on this. Do you like to take over or will we just close it?

@ReubenFrankel
Copy link
Copy Markdown
Contributor

FWIW, I'm still interested in this

@edgarrmondragon
Copy link
Copy Markdown
Collaborator

FWIW, I'm still interested in this

@ReubenFrankel do you wanna take over from the HEAD of this PR?

@rafalkrupinski
Copy link
Copy Markdown
Contributor Author

@edgarrmondragon will I transfer the repo to you?

@edgarrmondragon
Copy link
Copy Markdown
Collaborator

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants