Skip to content

Conversation

Squisher23
Copy link

Add performance optimization for snowflake external tables tables by limiting the partitions based on ldts

Description

Snowflake has some weaknesses in external table implementation and only does partition pruning when usings literals as filter on the partition ( see (https://community.snowflake.com/s/article/Pruning-is-not-happening-subquery for details)

The issue #335 already solved our most critical issue with the runtime of satelites, but also links, hubs and tracking-satelites have been getting slower and slower (from seconds to several minutes) with the increasing number of parquet-files in our data lake.

This PR tries to solve the problem by adding a new parameter datavault4dbt.max_days_for_late_arriving_data and filtering all staging tables to the last x days that were configured. Adding this feature speed up our dbt from 30 Minutes to just 5 minutes and should not have other consequences as long as you run dbt more often than the max_days_for_late_arriving_data and you don't have any data sources that deliver data later than the max_days_for_late_arriving_data-Parameter. To recreate the whole vault the parameter needs to be deleted or set to a value high enough

Type of change

Please delete options that are not relevant.

  • [ x] Bug fix (non-breaking change which fixes an issue)

How Has This Been Tested?

Please describe the tests that you ran to verify your changes. Provide instructions so we can reproduce.

  • Tests you ran

Test Configuration:

  • datavault4dbt-Version: 1.98
  • dbt-Version:core, 1.92
  • dbt-adapter-Version: dbt-snowflake 1.91

Checklist:

  • [ x] I have performed a self-review of my code
  • [ x] I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation or included information that needs updates (e.g. in the Wiki) -> I don't know how to do that

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant