Skip to content

refactor: Fix MRO of schema stream attribute#3621

Draft
edgarrmondragon wants to merge 1 commit into
mainfrom
refactor/schema-type
Draft

refactor: Fix MRO of schema stream attribute#3621
edgarrmondragon wants to merge 1 commit into
mainfrom
refactor/schema-type

Conversation

@edgarrmondragon
Copy link
Copy Markdown
Collaborator

@edgarrmondragon edgarrmondragon commented May 6, 2026

Summary by Sourcery

Refactor stream schema handling to use a descriptor-based StreamSchema on the base Stream class and fix method resolution order so schema access works consistently for instances and subclasses.

New Features:

  • Introduce a class-level schema descriptor on Stream backed by StreamSchema and a fallback instance schema descriptor.

Bug Fixes:

  • Ensure the schema attribute on streams correctly resolves via MRO and falls back to instance-level schema when no class-level schema is defined.
  • Prevent accidental mutation of stream schema by making the schema descriptor read-only on instances and raising an error if no schema source is configured.

Enhancements:

  • Add type-safe overloads and Self typing to the StreamSchema descriptor for better static type checking.
  • Improve error messages when schemas are missing for streams or when no schema source is configured.
  • Simplify SQLStream and filesystem stream schema access by relying on the unified descriptor mechanism instead of custom properties.

@edgarrmondragon edgarrmondragon self-assigned this May 6, 2026
@sourcery-ai
Copy link
Copy Markdown
Contributor

sourcery-ai Bot commented May 6, 2026

Reviewer's Guide

Refactors how stream schemas are exposed by turning schema into a descriptor on Stream with proper MRO behavior, adds a fallback descriptor for instance-level schemas, tightens schema source handling, and simplifies SQL and filesystem streams to rely on the new mechanism.

Class diagram for the refactored stream schema descriptor system

classDiagram
    direction LR

    class SchemaSource~_TKey~ {
        <<abstract>>
        +preprocess_schema(raw_schema: Schema) Schema
        +get_schema(key: _TKey, key_properties: Sequence~str~) Schema
    }

    class StreamSchema~_TKey~ {
        -schema_source: SchemaSource~_TKey~ | None
        -key: _TKey | None
        +__init__(schema_source: SchemaSource~_TKey~ | None = None, key: _TKey | None = None)
        +__get__(obj: Stream | None, objtype: type~Stream~ | None = None) Schema | StreamSchema~_TKey~
        +__set__(obj: Stream, _value: object) void
        +get_stream_schema(stream: Stream, stream_class: type~Stream~) Schema
    }

    class _InstanceSchemaDescriptor {
        <<descriptor>>
        +__init__() 
        +get_stream_schema(stream: Stream, stream_class: type~Stream~) Schema
    }

    class Stream {
        <<abstract>>
        +schema: StreamSchema~any~ $class
        -_schema: dict | None
        +__init__(tap: Tap, schema: dict | None, name: str)
        +primary_keys: Sequence~str~
        +schema_filepath: Path | Traversable | None
    }

    class SQLStream {
        <<abstract>>
        +connector_class: type~SQLConnector~
        +supports_nulls_first: bool
        +__init__(tap: Tap, catalog_entry: CatalogEntry, name: str)
        +metadata: MetadataMapping
        +tap_stream_id: str
    }

    class FilesystemStream {
        +filesystem: AbstractFileSystem
        -_schema: dict | None
        +__init__(tap: Tap, name: str, filesystem: AbstractFileSystem)
        +_get_full_schema() dict~str, any~
    }

    class DiscoveryError {
    }

    StreamSchema~_TKey~ --> SchemaSource~_TKey~ : uses
    _InstanceSchemaDescriptor --|> StreamSchema~str~ : extends
    Stream "1" --> "1" StreamSchema~any~ : schema
    SQLStream --|> Stream : extends
    FilesystemStream --|> Stream : extends

    StreamSchema~_TKey~ ..> DiscoveryError : may raise
    _InstanceSchemaDescriptor ..> DiscoveryError : may raise
Loading

File-Level Changes

Change Details Files
Introduce a proper descriptor-based StreamSchema interface with overloads, read-only behavior, and explicit handling when no schema source is configured.
  • Add Self import with Python version compatibility handling.
  • Make SchemaSource.schema parameter optional in the descriptor initializer.
  • Add overloads to StreamSchema.__get__ to distinguish class vs instance access and return types.
  • Update __get__ to handle class-level access by returning the descriptor and to call get_stream_schema for instances with a defaulted objtype.
  • Introduce __set__ on StreamSchema that always raises AttributeError to enforce read-only instance schema.
  • Update get_stream_schema to raise DiscoveryError when no schema source is configured instead of assuming one exists.
singer_sdk/schema/source.py
Provide a fallback schema descriptor on the base Stream class to restore expected MRO behavior and centralize instance _schema access.
  • Add _InstanceSchemaDescriptor subclass of StreamSchema that uses the instance _schema attribute and raises DiscoveryError if it is missing.
  • Attach schema: ClassVar[StreamSchema[Any]] = _InstanceSchemaDescriptor() on Stream with updated class-level docstring explaining usage and configuration options.
  • Remove the old @property schema implementation on Stream that directly exposed _schema and raised DiscoveryError there.
  • Import StreamSchema into the core stream module for type clarity.
singer_sdk/schema/source.py
singer_sdk/streams/core.py
Align SQL and filesystem streams with the new schema descriptor model, removing redundant schema caching and properties.
  • Remove _cached_schema and the cached schema property from SQLStream, relying on the base Stream.schema descriptor instead.
  • Change SQLStream.__init__ to pass schema=CatalogEntry.from_dict(catalog_entry).schema into the base Stream constructor rather than self.schema to avoid recursion and descriptor issues.
  • In FilesystemStream, stop overriding schema and instead set self._schema in __init__ using _get_full_schema, letting the base descriptor expose it.
  • Remove now-unneeded cached property and override of schema in the filesystem stream implementation.
singer_sdk/sql/stream.py
singer_sdk/contrib/filesystem/stream.py

Possibly linked issues


Tips and commands

Interacting with Sourcery

  • Trigger a new review: Comment @sourcery-ai review on the pull request.
  • Continue discussions: Reply directly to Sourcery's review comments.
  • Generate a GitHub issue from a review comment: Ask Sourcery to create an
    issue from a review comment by replying to it. You can also reply to a
    review comment with @sourcery-ai issue to create an issue from it.
  • Generate a pull request title: Write @sourcery-ai anywhere in the pull
    request title to generate a title at any time. You can also comment
    @sourcery-ai title on the pull request to (re-)generate the title at any time.
  • Generate a pull request summary: Write @sourcery-ai summary anywhere in
    the pull request body to generate a PR summary at any time exactly where you
    want it. You can also comment @sourcery-ai summary on the pull request to
    (re-)generate the summary at any time.
  • Generate reviewer's guide: Comment @sourcery-ai guide on the pull
    request to (re-)generate the reviewer's guide at any time.
  • Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
    pull request to resolve all Sourcery comments. Useful if you've already
    addressed all the comments and don't want to see them anymore.
  • Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
    request to dismiss all existing Sourcery reviews. Especially useful if you
    want to start fresh with a new review - don't forget to comment
    @sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

  • Enable or disable review features such as the Sourcery-generated pull request
    summary, the reviewer's guide, and others.
  • Change the review language.
  • Add, remove or edit custom review instructions.
  • Adjust other review settings.

Getting Help

@read-the-docs-community
Copy link
Copy Markdown

read-the-docs-community Bot commented May 6, 2026

Documentation build overview

📚 Meltano SDK | 🛠️ Build #32587714 | 📁 Comparing ec5aa1e against latest (5cd8e86)

  🔍 Preview build  

3 files changed
± genindex.html
± classes/singer_sdk.Stream.html
± classes/singer_sdk.sql.SQLStream.html

@codecov
Copy link
Copy Markdown

codecov Bot commented May 6, 2026

Codecov Report

❌ Patch coverage is 69.23077% with 8 lines in your changes missing coverage. Please review.
✅ Project coverage is 93.83%. Comparing base (8a4f294) to head (ec5aa1e).

Files with missing lines Patch % Lines
singer_sdk/schema/source.py 63.63% 6 Missing and 2 partials ⚠️

❌ Your patch check has failed because the patch coverage (69.23%) is below the target coverage (100.00%). You can increase the patch coverage or adjust the target coverage.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #3621      +/-   ##
==========================================
- Coverage   93.86%   93.83%   -0.03%     
==========================================
  Files          73       73              
  Lines        5947     5955       +8     
  Branches      729      732       +3     
==========================================
+ Hits         5582     5588       +6     
  Misses        271      271              
- Partials       94       96       +2     
Flag Coverage Δ
core 82.38% <61.53%> (+0.02%) ⬆️
end-to-end 75.21% <61.53%> (-0.07%) ⬇️
optional-components 42.88% <50.00%> (+0.07%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@codspeed-hq
Copy link
Copy Markdown

codspeed-hq Bot commented May 6, 2026

Merging this PR will not alter performance

✅ 8 untouched benchmarks


Comparing refactor/schema-type (ec5aa1e) with main (8a4f294)

Open in CodSpeed

Signed-off-by: Edgar Ramírez Mondragón <edgarrm358@gmail.com>
@edgarrmondragon edgarrmondragon force-pushed the refactor/schema-type branch from 137b41f to ec5aa1e Compare May 7, 2026 15:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant