-
Notifications
You must be signed in to change notification settings - Fork 259
Description
The pipeline step Source_NewDataToCopy calls the stored procedure dvtosql.source_GetNewDataToCopy. The stored procedure selects the database tables to copy based on the tables listed in the various JobInfo_*.info files that are created in the storage account folder deltalake/conversionresults. The *.info files are selected based on the queuetime within the files being greater than when the last pipeline run ended (which is stored in the target database table dvtosql._controltableforcopy column lastdatetimemarker.).
The issue I see is that between the time when the pipeline starts and ends, additional *.info files are created and the queuetime within those files are less than the time the pipeline completes. Therefore the database tables listed in those new *.info files created will not be picked up during the next pipeline run and the changes in those database table will never make it into the downstream target database, until that database table is once again modified within FnO.
Has anyone using this pipeline come across this issue? I can provide additional information if needed.
As an example.
Table dvtosql._controltableforcopy shows the last time the pipeline completed was 2024-11-11 12:58:13 UTC
During the next pipeline run, the stored procedure dvtosql.source_GetNewDataToCopy should return the database tables within the following 3 *.info files. (note that these times are local UTC-5)
However the contents of the first info file shows the QueueTime = 2024-11-11 12:48:34 UTC which corresponds to the previous info file created at 7:48AM (12:48 UTC) and since the QueueTime is < the lastdatetimemarker in the _controltableforcopy, this info file is not parsed and the tables within it are not loaded into the downstream target database.
Thank you for any insight into this.
Seth


