Skip to content

feat(targets): Use the presence of _sdc_deleted_at to flag records for deletion in hard-delete mode#3450

Draft
edgarrmondragon wants to merge 1 commit into
mainfrom
sdk-log-based-hard-delete
Draft

feat(targets): Use the presence of _sdc_deleted_at to flag records for deletion in hard-delete mode#3450
edgarrmondragon wants to merge 1 commit into
mainfrom
sdk-log-based-hard-delete

Conversation

@edgarrmondragon
Copy link
Copy Markdown
Collaborator

@edgarrmondragon edgarrmondragon commented Jan 14, 2026

Related

Summary by Sourcery

Implement hard-delete handling for log-based replication using the _sdc_deleted_at flag and expose a key-based delete API for SQL connectors.

New Features:

  • Add support in SQL sinks for hard deleting records marked with _sdc_deleted_at during LOG_BASED replication when hard_delete is enabled.
  • Introduce a connector-level delete_by_key method to delete rows by primary key across SQL targets.

Enhancements:

  • Preserve _sdc_deleted_at metadata on records when hard_delete is enabled so it can be used to drive hard deletes.

Tests:

  • Add SQLite target tests covering hard delete and soft delete behavior for log-based replication, including composite primary key scenarios.

@edgarrmondragon edgarrmondragon added this to the v0.54 milestone Jan 14, 2026
@edgarrmondragon edgarrmondragon added SQL Support for SQL taps and targets Type/Target Singer targets Incremental Replication State, replication keys, etc. labels Jan 14, 2026
@edgarrmondragon edgarrmondragon self-assigned this Jan 14, 2026
@sourcery-ai
Copy link
Copy Markdown
Contributor

sourcery-ai Bot commented Jan 14, 2026

Reviewer's Guide

Implements hard-delete handling for LOG_BASED replication by treating records with _sdc_deleted_at as delete events when hard_delete=True, wiring this through the SQL sink/connector and adding SQLite tests including composite keys and soft-delete behavior.

Sequence diagram for hard-delete handling in SQL sink batch processing

sequenceDiagram
    actor TapProcess
    participant SQLSink
    participant SQLConnector
    participant Database

    TapProcess->>SQLSink: process_batch(context)
    SQLSink->>SQLSink: records = context["records"]
    SQLSink->>SQLSink: _split_records_for_hard_delete(records)
    SQLSink-->>SQLSink: records_to_delete, records_to_insert

    alt hard_delete enabled and key_properties set and records_to_delete not empty
        SQLSink->>SQLSink: hard_delete_records(records_to_delete)
        SQLSink->>SQLConnector: delete_by_key(full_table_name, key_columns, key_values)
        SQLConnector->>SQLConnector: build DELETE statement and bind values
        SQLConnector->>Database: execute DELETE ... WHERE (key_conditions)
        Database-->>SQLConnector: rowcount
        SQLConnector-->>SQLSink: number_deleted
        SQLSink->>SQLSink: log deletion count
    end

    alt records_to_insert not empty
        SQLSink->>SQLSink: bulk_insert_records(full_table_name, schema, records_to_insert)
        SQLSink->>Database: INSERT rows
        Database-->>SQLSink: insert result
    end

    TapProcess-->>TapProcess: batch complete
Loading

Updated class diagram for SQL sink hard-delete support

classDiagram
    class Sink {
        config: dict
        _remove_sdc_metadata_from_record(record: dict) void
    }

    class SQLSink {
        config: dict
        key_properties: list~str~
        full_table_name: str
        schema: dict
        soft_delete_column_name: str
        logger
        connector: SQLConnector
        process_batch(context: dict) void
        bulk_insert_records(full_table_name: str, schema: dict, records: Iterable~dict~) void
        _split_records_for_hard_delete(records: Iterable~dict~) tuple~list~dict~~ list~dict~~
        hard_delete_records(records: Sequence~dict~) int
        conform_name(name: str, object_type: str) str
        conform_record(record: dict) dict
    }

    class SQLConnector {
        delete_by_key(full_table_name: str, key_columns: Sequence~str~, key_values: Sequence~dict~) int
        _connect() Connection
    }

    Sink <|-- SQLSink
    SQLSink *-- SQLConnector

    %% Metadata handling behavior
    class RecordMetadataBehavior {
        +handles_sdc_deleted_at_based_on_hard_delete_flag
    }

    Sink ..> RecordMetadataBehavior
    SQLSink ..> RecordMetadataBehavior
Loading

File-Level Changes

Change Details Files
Route LOG_BASED delete events to either hard deletes or inserts based on hard_delete and presence of _sdc_deleted_at.
  • Modify SQLSink.process_batch to branch when hard_delete is enabled and key properties are present, splitting records into those to delete and those to insert.
  • Introduce _split_records_for_hard_delete to conform records, detect non-null _sdc_deleted_at using the conformed soft-delete column name, and return separate delete/insert lists.
  • Call hard_delete_records with conformed records to delete and bulk_insert_records with remaining records, preserving existing insert behavior when hard_delete is disabled or no keys are defined.
singer_sdk/sql/sink.py
Provide a connector-level primitive to delete rows by primary key for use by sinks performing hard deletes.
  • Add SQLConnector.delete_by_key which builds a parameterized DELETE ... WHERE (k1=...) OR (k1=... AND k2=...) statement over the provided key columns and values.
  • Execute the delete in a transaction using SQLAlchemy text bindings and return the number of rows deleted.
  • Wire SQLSink.hard_delete_records to call connector.delete_by_key with the sink’s full_table_name and key_properties.
singer_sdk/sql/connector.py
singer_sdk/sql/sink.py
Preserve _sdc_deleted_at metadata when hard delete is enabled so that delete events can be detected downstream.
  • Update _remove_sdc_metadata_from_record to skip removal of _sdc_deleted_at when hard_delete is true, while still stripping other _sdc_* metadata fields.
  • Document in the method docstring that _sdc_deleted_at is kept for LOG_BASED hard deletes.
singer_sdk/sinks/core.py
Add end-to-end SQLite target tests covering hard-delete and soft-delete behavior for LOG_BASED replication, including composite keys.
  • Add test_sqlite_hard_delete_log_based to verify that records with _sdc_deleted_at are physically removed when hard_delete=True.
  • Add test_sqlite_soft_delete_log_based to confirm that when hard_delete=False, records with _sdc_deleted_at are inserted as normal and _sdc_deleted_at is stripped.
  • Add test_sqlite_hard_delete_composite_key to validate hard delete behavior when the target table has a composite primary key.
tests/packages/test_target_sqlite.py

Assessment against linked issues

Issue Objective Addressed Explanation
#3444 Implement hard_delete behavior for LOG_BASED replication so that records with non-null _sdc_deleted_at are physically deleted instead of inserted/soft-deleted.
#3444 Apply the LOG_BASED hard_delete behavior at the generic SQL sink/connector level so all SQL-based targets using the SDK can benefit without overriding base classes.
#3444 Add automated tests verifying hard_delete behavior for LOG_BASED replication, including composite primary keys and default (non-hard) delete behavior.

Possibly linked issues


Tips and commands

Interacting with Sourcery

  • Trigger a new review: Comment @sourcery-ai review on the pull request.
  • Continue discussions: Reply directly to Sourcery's review comments.
  • Generate a GitHub issue from a review comment: Ask Sourcery to create an
    issue from a review comment by replying to it. You can also reply to a
    review comment with @sourcery-ai issue to create an issue from it.
  • Generate a pull request title: Write @sourcery-ai anywhere in the pull
    request title to generate a title at any time. You can also comment
    @sourcery-ai title on the pull request to (re-)generate the title at any time.
  • Generate a pull request summary: Write @sourcery-ai summary anywhere in
    the pull request body to generate a PR summary at any time exactly where you
    want it. You can also comment @sourcery-ai summary on the pull request to
    (re-)generate the summary at any time.
  • Generate reviewer's guide: Comment @sourcery-ai guide on the pull
    request to (re-)generate the reviewer's guide at any time.
  • Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
    pull request to resolve all Sourcery comments. Useful if you've already
    addressed all the comments and don't want to see them anymore.
  • Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
    request to dismiss all existing Sourcery reviews. Especially useful if you
    want to start fresh with a new review - don't forget to comment
    @sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

  • Enable or disable review features such as the Sourcery-generated pull request
    summary, the reviewer's guide, and others.
  • Change the review language.
  • Add, remove or edit custom review instructions.
  • Adjust other review settings.

Getting Help

@edgarrmondragon edgarrmondragon changed the title feat(sql): Use the presence of _sdc_deleted_at to flag records for deletion in hard-delete mode feat(targets): Use the presence of _sdc_deleted_at to flag records for deletion in hard-delete mode Jan 14, 2026
@read-the-docs-community
Copy link
Copy Markdown

read-the-docs-community Bot commented Jan 14, 2026

Documentation build overview

📚 Meltano SDK | 🛠️ Build #31008530 | 📁 Comparing 051b3d5 against latest (86954b7)


🔍 Preview build

Show files changed (3 files in total): 📝 3 modified | ➕ 0 added | ➖ 0 deleted
File Status
genindex.html 📝 modified
classes/singer_sdk.sql.SQLConnector.html 📝 modified
classes/singer_sdk.sql.SQLSink.html 📝 modified

@codspeed-hq
Copy link
Copy Markdown

codspeed-hq Bot commented Jan 14, 2026

CodSpeed Performance Report

Merging this PR will not alter performance

Comparing sdk-log-based-hard-delete (051b3d5) with main (52b0908)1

Summary

✅ 8 untouched benchmarks

Footnotes

  1. No successful run was found on main (86954b7) during the generation of this report, so 52b0908 was used instead as the comparison base. There might be some changes unrelated to this pull request in this report.

@codecov
Copy link
Copy Markdown

codecov Bot commented Jan 14, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 94.17%. Comparing base (86954b7) to head (051b3d5).
⚠️ Report is 218 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #3450      +/-   ##
==========================================
+ Coverage   94.14%   94.17%   +0.03%     
==========================================
  Files          70       70              
  Lines        5785     5820      +35     
  Branches      716      724       +8     
==========================================
+ Hits         5446     5481      +35     
  Misses        236      236              
  Partials      103      103              
Flag Coverage Δ
core 81.54% <10.52%> (-0.45%) ⬇️
end-to-end 76.52% <100.00%> (+0.14%) ⬆️
optional-components 43.29% <10.52%> (-0.22%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@edgarrmondragon edgarrmondragon force-pushed the sdk-log-based-hard-delete branch from 86d110c to f06af41 Compare January 14, 2026 20:50
…for deletion in hard-delete mode

Signed-off-by: Edgar Ramírez Mondragón <edgarrm358@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Incremental Replication State, replication keys, etc. SQL Support for SQL taps and targets Type/Target Singer targets

Projects

None yet

Development

Successfully merging this pull request may close these issues.

feat: hard_delete capability does not consider the presence of non-null _sdc_deleted_at values generated during LOG_BASED replication

1 participant