Skip to content

UPDATE COLUMNS FROM: bump _row_last_updated_at_version for matched rows #418

@jerryjch

Description

@jerryjch

Summary

UPDATE COLUMNS FROM commits a new dataset version but does not advance
_row_last_updated_at_version on rows whose column values were rewritten.

ALTER TABLE ... UPDATE COLUMNS ... FROM (UpdateColumnsBackfill in lance-spark)
commits through Lance CommitBuilder and Transaction as a normal Update with
UpdateMode.RewriteColumns. The table version increases, but per-row change-data
metadata _row_last_updated_at_version can stay the same as before the commit
(for example still equal to _row_created_at_version), even though data in the
updated columns changed.

Expected behavior

From Lance row lineage and change-data feed (CDF) docs, _row_last_updated_at_version
is the dataset version at which the row was last modified. If a write creates a
new dataset version and changes visible row data for matched rows, those rows
should get _row_last_updated_at_version set to that new version.
_row_created_at_version should stay at the version where the row first appeared.

Actual behavior

After UPDATE COLUMNS FROM, rows that had columns rewritten can still show the
same _row_last_updated_at_version as before, while the dataset version has moved
forward on commit.

Reproduction

  1. Create a Lance table with stable row IDs enabled (enable_stable_row_ids).
  2. Insert several rows (e.g. id 1, 2, 3) so CDF columns exist; note dataset version V0 and
    _row_created_at_version / _row_last_updated_at_version for each row.
  3. Run ALTER TABLE ... UPDATE COLUMNS ... FROM with a source view that updates only one row
    (e.g. id = 2); leave id 1 and 3 unchanged in the source.
  4. Read _row_last_updated_at_version for id = 2: it may still equal the pre-update value (or
    match created_at only) even though the dataset version advanced past V0.
  5. id 1 and 3 should not incorrectly bump.

Note

This ticket is on top of the following:

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions