-
Notifications
You must be signed in to change notification settings - Fork 39
Description
Background
Currently, we filter out _fivetran_deleted rows at the staging layer here. However, this approach prevents deleted rows from flowing downstream, which can cause issues with incremental models recognizing these rows for proper deletion handling.
For example, in cases where a transaction is deleted in the source, the deletion does not propagate to the final models because the _fivetran_deleted rows are removed early in the pipeline.
Proposed Solution
Update the _fivetran_deleted filtering strategy to defer the removal of these rows to downstream transformations. This change would allow incremental models to process deletions correctly while preserving the ability to exclude deleted rows in the final outputs.
To do
- Requires updating all models downstream of staging to ensure
_fivetran_deletedrows are handled appropriately. - Needs validation to ensure no unintended side effects, such as retaining deleted rows in final outputs.
Steps to Implement
- Remove the
_fivetran_deletedfiltering logic from staging models. - Update downstream models to explicitly filter
_fivetran_deletedrows where necessary. - Write tests to confirm that deleted rows are processed correctly in incremental and full-refresh scenarios.
Additional Context
This change is proposed as an alternative solution to address incremental data quality issues, particularly for users who cannot schedule full-refreshes.
Open Questions
- What performance implications might this change introduce in larger datasets?