Skip to content

refactor: Insert then delete#3453

Draft
edgarrmondragon wants to merge 1 commit into
sdk-log-based-hard-deletefrom
sdk-log-based-hard-delete-upsert
Draft

refactor: Insert then delete#3453
edgarrmondragon wants to merge 1 commit into
sdk-log-based-hard-deletefrom
sdk-log-based-hard-delete-upsert

Conversation

@edgarrmondragon
Copy link
Copy Markdown
Collaborator

@edgarrmondragon edgarrmondragon commented Jan 15, 2026

Summary by Sourcery

Refactor SQL hard-delete handling to upsert then delete based on the soft delete column for LOG_BASED replication.

New Features:

  • Introduce a generic bulk_upsert_records method on SQLSink for database-specific bulk upsert implementations using the table schema.

Bug Fixes:

  • Ensure hard delete processing works even when records already exist by upserting before deleting rows marked as deleted.

Enhancements:

  • Simplify SQLSink hard delete flow by removing record-splitting logic and delegating deletion to a connector-level method using the soft delete column.
  • Replace key-based deletion with a connector method that deletes all rows where the soft delete column is set, improving consistency and reducing required inputs.

Signed-off-by: Edgar Ramírez Mondragón <edgarrm358@gmail.com>
@sourcery-ai
Copy link
Copy Markdown
Contributor

sourcery-ai Bot commented Jan 15, 2026

Reviewer's Guide

Refactors SQL hard delete handling to use an upsert-then-delete workflow for LOG_BASED replication, simplifying deletion logic to rely on the _sdc_deleted_at column and adding a generic bulk_upsert_records implementation using SQLite-style INSERT OR REPLACE.

Sequence diagram for hard_delete upsert-then-delete workflow in SQLSink.process_batch

sequenceDiagram
    actor Tap
    participant SQLSink
    participant SQLConnector
    participant Database

    Tap->>SQLSink: process_batch(context)
    SQLSink->>SQLSink: config.hard_delete == True

    SQLSink->>SQLSink: bulk_upsert_records(full_table_name, schema, records)
    SQLSink->>SQLSink: conform_schema(schema)
    SQLSink->>SQLSink: conform_record(record) * for each record
    SQLSink->>SQLConnector: _connect()
    SQLConnector-->>SQLSink: Connection
    SQLSink->>Database: INSERT OR REPLACE INTO full_table_name VALUES(records)
    Database-->>SQLSink: rowcount

    SQLSink->>SQLSink: hard_delete_records()
    SQLSink->>SQLConnector: delete_where_sdc_deleted_at_is_not_null(full_table_name, sdc_deleted_at_column)
    SQLConnector->>SQLConnector: _connect()
    SQLConnector->>Database: DELETE FROM full_table_name WHERE sdc_deleted_at_column IS NOT NULL
    Database-->>SQLConnector: rowcount
    SQLConnector-->>SQLSink: rows_deleted

    SQLSink-->>Tap: batch processed
Loading

Class diagram for updated SQLSink and SQLConnector hard delete and upsert behavior

classDiagram
    class SQLSink {
        +connector_class: type~_C~
        +connector: SQLConnector
        +full_table_name: str | FullyQualifiedName
        +schema: dict
        +soft_delete_column_name: str
        +key_properties: list~str~
        +process_batch(context: dict) void
        +bulk_insert_records(full_table_name: str | FullyQualifiedName, schema: dict, records: Iterable~dict~) int
        +bulk_upsert_records(full_table_name: str | FullyQualifiedName, schema: dict, records: Iterable~dict~) int
        +hard_delete_records() int
        +conform_schema(schema: dict) dict
        +conform_record(record: dict) dict
        +conform_name(name: str, kind: str) str
    }

    class SQLConnector {
        +delete_where_sdc_deleted_at_is_not_null(full_table_name: str | FullyQualifiedName, sdc_deleted_at_column: str) int
        +_connect() Connection
    }

    SQLSink --> SQLConnector: uses connector

    note for SQLSink "When config.hard_delete is true, process_batch calls bulk_upsert_records then hard_delete_records"

    note for SQLConnector "hard_delete_records in SQLSink delegates to delete_where_sdc_deleted_at_is_not_null, which deletes rows where _sdc_deleted_at is not null"
Loading

File-Level Changes

IS NOT NULL statement.
  • Update SQLSink.hard_delete_records to call the new connector method using the conformed soft delete column name.
  • Change Details Files
    Refactor hard-delete processing in SQLSink to upsert all records first and then delete rows with _sdc_deleted_at set, instead of splitting records into separate insert/delete paths.
    • Change process_batch to call bulk_upsert_records followed by hard_delete_records when hard_delete is enabled, removing conditional branching on key_properties and record splitting.
    • Remove the _split_records_for_hard_delete helper and its logic for classifying records into delete vs insert sets based on the soft delete column.
    • Simplify hard_delete_records to perform a single delete based on the soft delete column being not null, no longer requiring key_properties or per-record key values.
    singer_sdk/sql/sink.py
    Introduce a default bulk_upsert_records implementation on SQLSink using SQLite INSERT OR REPLACE semantics for hard-delete workflows.
    • Add bulk_upsert_records method that conforms schema and records, fills missing properties with None, and builds a parameterized INSERT OR REPLACE statement.
    • Execute the upsert SQL in a transaction using the connector connection and return affected row count.
    • Log the generated upsert SQL for observability.
    singer_sdk/sql/sink.py
    Replace generic delete-by-key connector API with a specialized delete_where_sdc_deleted_at_is_not_null method that deletes all rows flagged as soft-deleted.
    • Remove delete_by_key, including its dynamic OR-of-AND primary-key condition builder and bound parameter construction.
    • Add delete_where_sdc_deleted_at_is_not_null that takes a conformed soft delete column name and issues a single DELETE ... WHERE
    singer_sdk/sql/connector.py
    singer_sdk/sql/sink.py
    Minor style and snapshot updates to align with linting and new behavior.
    • Annotate SQLSink class with noqa: PLR0904 to suppress too-many-public-methods lint warnings.
    • Update SQLite singer.log snapshot to reflect the new insert-then-delete behavior and logging output.
    singer_sdk/sql/sink.py
    tests/packages/snapshots/test_target_sqlite/test_sqlite_activate_version/singer.log

    Possibly linked issues

    • #feat: hard_delete capability does not consider the presence of non-null _sdc_deleted_at values generated during LOG_BASED replication: PR implements hard_delete for log-based replication by upserting then deleting rows with non-null _sdc_deleted_at.

    Tips and commands

    Interacting with Sourcery

    • Trigger a new review: Comment @sourcery-ai review on the pull request.
    • Continue discussions: Reply directly to Sourcery's review comments.
    • Generate a GitHub issue from a review comment: Ask Sourcery to create an
      issue from a review comment by replying to it. You can also reply to a
      review comment with @sourcery-ai issue to create an issue from it.
    • Generate a pull request title: Write @sourcery-ai anywhere in the pull
      request title to generate a title at any time. You can also comment
      @sourcery-ai title on the pull request to (re-)generate the title at any time.
    • Generate a pull request summary: Write @sourcery-ai summary anywhere in
      the pull request body to generate a PR summary at any time exactly where you
      want it. You can also comment @sourcery-ai summary on the pull request to
      (re-)generate the summary at any time.
    • Generate reviewer's guide: Comment @sourcery-ai guide on the pull
      request to (re-)generate the reviewer's guide at any time.
    • Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
      pull request to resolve all Sourcery comments. Useful if you've already
      addressed all the comments and don't want to see them anymore.
    • Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
      request to dismiss all existing Sourcery reviews. Especially useful if you
      want to start fresh with a new review - don't forget to comment
      @sourcery-ai review to trigger a new review!

    Customizing Your Experience

    Access your dashboard to:

    • Enable or disable review features such as the Sourcery-generated pull request
      summary, the reviewer's guide, and others.
    • Change the review language.
    • Add, remove or edit custom review instructions.
    • Adjust other review settings.

    Getting Help

    @read-the-docs-community
    Copy link
    Copy Markdown

    Documentation build overview

    📚 Meltano SDK | 🛠️ Build #31022323 | 📁 Comparing 576e114 against latest (49c0370)


    🔍 Preview build

    Show files changed (75 files in total): 📝 75 modified | ➕ 0 added | ➖ 0 deleted
    File Status
    capabilities.html 📝 modified
    genindex.html 📝 modified
    testing.html 📝 modified
    typing.html 📝 modified
    classes/singer_sdk.BatchSink.html 📝 modified
    classes/singer_sdk.GraphQLStream.html 📝 modified
    classes/singer_sdk.InlineMapper.html 📝 modified
    classes/singer_sdk.RESTStream.html 📝 modified
    classes/singer_sdk.RecordSink.html 📝 modified
    classes/singer_sdk.Sink.html 📝 modified
    classes/singer_sdk.Stream.html 📝 modified
    classes/singer_sdk.Tap.html 📝 modified
    classes/singer_sdk.Target.html 📝 modified
    classes/singer_sdk.authenticators.APIAuthenticatorBase.html 📝 modified
    classes/singer_sdk.authenticators.APIKeyAuthenticator.html 📝 modified
    classes/singer_sdk.authenticators.BasicAuthenticator.html 📝 modified
    classes/singer_sdk.authenticators.BearerTokenAuthenticator.html 📝 modified
    classes/singer_sdk.authenticators.OAuthAuthenticator.html 📝 modified
    classes/singer_sdk.authenticators.OAuthJWTAuthenticator.html 📝 modified
    classes/singer_sdk.authenticators.SimpleAuthenticator.html 📝 modified
    classes/singer_sdk.batch.BaseBatcher.html 📝 modified
    classes/singer_sdk.batch.JSONLinesBatcher.html 📝 modified
    classes/singer_sdk.exceptions.ConfigValidationError.html 📝 modified
    classes/singer_sdk.exceptions.FatalAPIError.html 📝 modified
    classes/singer_sdk.exceptions.InvalidStreamSortException.html 📝 modified
    classes/singer_sdk.exceptions.MapExpressionError.html 📝 modified
    classes/singer_sdk.exceptions.MaxRecordsLimitException.html 📝 modified
    classes/singer_sdk.exceptions.RecordsWithoutSchemaException.html 📝 modified
    classes/singer_sdk.exceptions.RetriableAPIError.html 📝 modified
    classes/singer_sdk.exceptions.StreamMapConfigError.html 📝 modified
    classes/singer_sdk.exceptions.TapStreamConnectionFailure.html 📝 modified
    classes/singer_sdk.exceptions.TooManyRecordsException.html 📝 modified
    classes/singer_sdk.pagination.BaseAPIPaginator.html 📝 modified
    classes/singer_sdk.pagination.BaseHATEOASPaginator.html 📝 modified
    classes/singer_sdk.pagination.BaseOffsetPaginator.html 📝 modified
    classes/singer_sdk.pagination.BasePageNumberPaginator.html 📝 modified
    classes/singer_sdk.pagination.HeaderLinkPaginator.html 📝 modified
    classes/singer_sdk.pagination.JSONPathPaginator.html 📝 modified
    classes/singer_sdk.pagination.LegacyPaginatedStreamProtocol.html 📝 modified
    classes/singer_sdk.pagination.LegacyStreamPaginator.html 📝 modified
    classes/singer_sdk.pagination.SimpleHeaderPaginator.html 📝 modified
    classes/singer_sdk.pagination.SinglePagePaginator.html 📝 modified
    classes/singer_sdk.sql.SQLConnector.html 📝 modified
    classes/singer_sdk.sql.SQLSink.html 📝 modified
    classes/singer_sdk.sql.SQLStream.html 📝 modified
    classes/singer_sdk.sql.SQLTap.html 📝 modified
    classes/singer_sdk.sql.SQLTarget.html 📝 modified
    classes/singer_sdk.sql.connector.JSONSchemaToSQL.html 📝 modified
    classes/singer_sdk.sql.connector.SQLToJSONSchema.html 📝 modified
    classes/typing/singer_sdk.typing.ArrayType.html 📝 modified
    classes/typing/singer_sdk.typing.BooleanType.html 📝 modified
    classes/typing/singer_sdk.typing.Constant.html 📝 modified
    classes/typing/singer_sdk.typing.CustomType.html 📝 modified
    classes/typing/singer_sdk.typing.DateTimeType.html 📝 modified
    classes/typing/singer_sdk.typing.DateType.html 📝 modified
    classes/typing/singer_sdk.typing.DecimalType.html 📝 modified
    classes/typing/singer_sdk.typing.DiscriminatedUnion.html 📝 modified
    classes/typing/singer_sdk.typing.DurationType.html 📝 modified
    classes/typing/singer_sdk.typing.EmailType.html 📝 modified
    classes/typing/singer_sdk.typing.HostnameType.html 📝 modified
    classes/typing/singer_sdk.typing.IPv4Type.html 📝 modified
    classes/typing/singer_sdk.typing.IPv6Type.html 📝 modified
    classes/typing/singer_sdk.typing.IntegerType.html 📝 modified
    classes/typing/singer_sdk.typing.JSONPointerType.html 📝 modified
    classes/typing/singer_sdk.typing.ObjectType.html 📝 modified
    classes/typing/singer_sdk.typing.OneOf.html 📝 modified
    classes/typing/singer_sdk.typing.PropertiesList.html 📝 modified
    classes/typing/singer_sdk.typing.Property.html 📝 modified
    classes/typing/singer_sdk.typing.RegexType.html 📝 modified
    classes/typing/singer_sdk.typing.RelativeJSONPointerType.html 📝 modified
    classes/typing/singer_sdk.typing.StringType.html 📝 modified
    classes/typing/singer_sdk.typing.TimeType.html 📝 modified
    classes/typing/singer_sdk.typing.URITemplateType.html 📝 modified
    classes/typing/singer_sdk.typing.URIType.html 📝 modified
    classes/typing/singer_sdk.typing.UUIDType.html 📝 modified

    @codecov
    Copy link
    Copy Markdown

    codecov Bot commented Jan 15, 2026

    Codecov Report

    ❌ Patch coverage is 91.30435% with 2 lines in your changes missing coverage. Please review.
    ✅ Project coverage is 94.13%. Comparing base (051b3d5) to head (576e114).

    Files with missing lines Patch % Lines
    singer_sdk/sql/sink.py 90.00% 1 Missing and 1 partial ⚠️

    ❌ Your patch check has failed because the patch coverage (91.30%) is below the target coverage (100.00%). You can increase the patch coverage or adjust the target coverage.

    Additional details and impacted files
    @@                      Coverage Diff                      @@
    ##           sdk-log-based-hard-delete    #3453      +/-   ##
    =============================================================
    - Coverage                      94.17%   94.13%   -0.05%     
    =============================================================
      Files                             70       70              
      Lines                           5820     5811       -9     
      Branches                         724      719       -5     
    =============================================================
    - Hits                            5481     5470      -11     
    - Misses                           236      237       +1     
    - Partials                         103      104       +1     
    Flag Coverage Δ
    core 81.67% <17.39%> (+0.12%) ⬆️
    end-to-end 76.45% <91.30%> (-0.08%) ⬇️
    optional-components 43.36% <17.39%> (+0.06%) ⬆️

    Flags with carried forward coverage won't be shown. Click here to find out more.

    ☔ View full report in Codecov by Sentry.
    📢 Have feedback on the report? Share it here.

    🚀 New features to boost your workflow:
    • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

    @codspeed-hq
    Copy link
    Copy Markdown

    codspeed-hq Bot commented Jan 15, 2026

    Merging this PR will not alter performance

    ✅ 8 untouched benchmarks


    Comparing sdk-log-based-hard-delete-upsert (576e114) with sdk-log-based-hard-delete (051b3d5)

    Open in CodSpeed

    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

    Labels

    None yet

    Projects

    None yet

    Development

    Successfully merging this pull request may close these issues.

    1 participant