Skip to content

Inconsistent semantics after setDT #4783

@OfekShilon

Description

@OfekShilon

(This is a cleanup and improvement of some of the #4589 discussion.)
Take this code:

> d1 <- data.frame(a=c(1,2,3,4,5), b=c(2,3,4,5,6))
> d2 <- d1
> setDT(d2)  # At this point d2 is a shallow copy of d1, pointing to the same columns

Do modifications to d2 impact d1? We could live with both 'yes' or 'no', but the answer is sometimes:

d2[, b:=3:7]         # (1) impacts only d2
d2[, c:=4:8]         # (2) impacts only d2
d2[!is.na(a), b:=5:9] # (3) impacts both
d2[, b:=30]          # (4) impacts both

In cases 1&2 d2 'plunks' the full columns into itself and d1 isn't affected. In cases 3 & 4 it seems that operation-in-place optimization kicks in (address(d2$b) is unchanged), so there is no copy-on-write and data still pointed to by d1 is overwritten.

These semantic discrepancies make (the otherwise great) setDT unusable to us except in the most trivial scripts.

# Output of sessionInfo()
R version 3.6.1 (2019-07-05)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 19041)
Matrix products: default

> packageVersion("data.table")
[1] ‘1.13.0’

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions