-
Notifications
You must be signed in to change notification settings - Fork 1k
Open
Milestone
Description
Currently when calling forderv(retGrp=TRUE)
the extra attributes are set on integer vector, on the index.
d = data.table(x=c(2L,2:1))
#setindexv(d, "x") ## if you are on #4386
setattr(d, "index", setattr(integer(), "__x", forderv(d, "x", retGrp=TRUE)))
I would like to propose to wrap the index into list, and set extra attributes on a list rather than on the index. Or just keep attributes in the same list.
Currently:
str(attr(attr(d, "index"), "__x"))
# int [1:3] 3 1 2
# - attr(*, "starts")= int [1:2] 1 2
# - attr(*, "maxgrpn")= int 2
proposed, either of those, latter one seems to be easier to operate on:
str(attr(attr(d, "index"), "__x"))
#List of 1
# $ : int [1:3] 3 1 2
# - attr(*, "starts")= int [1:2] 1 2
# - attr(*, "maxgrpn")= int 2
str(attr(attr(d, "index"), "__x"))
#List of 3
# $ index : int [1:3] 3 1 2
# $ starts : int [1:2] 1 2
# $ maxgrpn: int 2
Motivation to change this internal structure is that, AFAIU, as of now, we cannot shallow duplicate index integer vector alone, that adds relatively big overhead if we want to re-use the index, but we don't want to have extra attributes on it.
Exact use case for that is described in 1606046
For an index of length 1e8 we are already losing 0.258s just to re-use existing index.
ColeMiller1