Skip to content

DVLinkToSQL_IncrementalCopy Pipeline Issue #368

@sethbs

Description

@sethbs

The pipeline step Source_NewDataToCopy calls the stored procedure dvtosql.source_GetNewDataToCopy. The stored procedure selects the database tables to copy based on the tables listed in the various JobInfo_*.info files that are created in the storage account folder deltalake/conversionresults. The *.info files are selected based on the queuetime within the files being greater than when the last pipeline run ended (which is stored in the target database table dvtosql._controltableforcopy column lastdatetimemarker.).

The issue I see is that between the time when the pipeline starts and ends, additional *.info files are created and the queuetime within those files are less than the time the pipeline completes. Therefore the database tables listed in those new *.info files created will not be picked up during the next pipeline run and the changes in those database table will never make it into the downstream target database, until that database table is once again modified within FnO.

Has anyone using this pipeline come across this issue? I can provide additional information if needed.

As an example.

Table dvtosql._controltableforcopy shows the last time the pipeline completed was 2024-11-11 12:58:13 UTC

Image

During the next pipeline run, the stored procedure dvtosql.source_GetNewDataToCopy should return the database tables within the following 3 *.info files. (note that these times are local UTC-5)

Image

However the contents of the first info file shows the QueueTime = 2024-11-11 12:48:34 UTC which corresponds to the previous info file created at 7:48AM (12:48 UTC) and since the QueueTime is < the lastdatetimemarker in the _controltableforcopy, this info file is not parsed and the tables within it are not loaded into the downstream target database.

Image

Thank you for any insight into this.

Seth

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions