-
Notifications
You must be signed in to change notification settings - Fork 1k
Open
Labels
Milestone
Description
(This is a cleanup and improvement of some of the #4589 discussion.)
Take this code:
> d1 <- data.frame(a=c(1,2,3,4,5), b=c(2,3,4,5,6))
> d2 <- d1
> setDT(d2) # At this point d2 is a shallow copy of d1, pointing to the same columns
Do modifications to d2
impact d1
? We could live with both 'yes' or 'no', but the answer is sometimes:
d2[, b:=3:7] # (1) impacts only d2
d2[, c:=4:8] # (2) impacts only d2
d2[!is.na(a), b:=5:9] # (3) impacts both
d2[, b:=30] # (4) impacts both
In cases 1&2 d2
'plunks' the full columns into itself and d1
isn't affected. In cases 3 & 4 it seems that operation-in-place optimization kicks in (address(d2$b)
is unchanged), so there is no copy-on-write and data still pointed to by d1
is overwritten.
These semantic discrepancies make (the otherwise great) setDT unusable to us except in the most trivial scripts.
#
Output of sessionInfo()
R version 3.6.1 (2019-07-05)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 19041)
Matrix products: default
> packageVersion("data.table")
[1] ‘1.13.0’