NOTICKET (feat): Add performance optimization for snowflake external tables #339
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Add performance optimization for snowflake external tables tables by limiting the partitions based on ldts
Description
Snowflake has some weaknesses in external table implementation and only does partition pruning when usings literals as filter on the partition ( see (https://community.snowflake.com/s/article/Pruning-is-not-happening-subquery for details)
The issue #335 already solved our most critical issue with the runtime of satelites, but also links, hubs and tracking-satelites have been getting slower and slower (from seconds to several minutes) with the increasing number of parquet-files in our data lake.
This PR tries to solve the problem by adding a new parameter datavault4dbt.max_days_for_late_arriving_data and filtering all staging tables to the last x days that were configured. Adding this feature speed up our dbt from 30 Minutes to just 5 minutes and should not have other consequences as long as you run dbt more often than the max_days_for_late_arriving_data and you don't have any data sources that deliver data later than the max_days_for_late_arriving_data-Parameter. To recreate the whole vault the parameter needs to be deleted or set to a value high enough
Type of change
Please delete options that are not relevant.
How Has This Been Tested?
Please describe the tests that you ran to verify your changes. Provide instructions so we can reproduce.
Test Configuration:
Checklist: