-
Notifications
You must be signed in to change notification settings - Fork 1k
Description
As already mentioned in multiple issues and over email/slack, we need automated tests that will be able to track performance regression.
This issue is meant to define scope.
Related useful project is planned in conbench. Once it will be working, I think we should use it. Unfortunately it does not seem to happen anytime soon, or even in a more distant future.
Anyway, keeping scope minimal should make it easier to eventually move to conbench later on.
Another related work is my old project macrobenchmarking.
And recent draft PR #4517.
Scope
Dimensions by which we will track timings
- environment (allow to lookup hardware configuration)
- R version
- git sha of data.table (lookup date and version)
- benchmark script (probably fixed to
benchmark.Rraw) - query
- version of a query (in case we modify existing query for some reason)
- description
Dimensions that for now I propose to not include in scope
- operating system
- linux kernal version
- compiler
- compiler version
- R compilation flags
- data.table compilation flags
- metrics (memory usage, etc.)
datatable.optimizeoption- number of threads (?)
- multiple runs of single query (?)
Challenges
Store timings
In current infrastructre we do not have any processes that appends artifacts (timings in context of CB). Each CB run has to store results somewhere and re-use them later on.
- timings storage in csv for simplicity (?)
Signalling a regression
- Should we compare only to the previous timings, or to an average timings from longer period (last month, etc.)?
- What is the tolerance threshold? for cheap queries 5-25% variance will be common.
- In case of exceeding threshold we may want to run benchmark few times and take an average to ensure before signalling regression. Then we need a second threshold for such average.
Environment
To reduce number of false regression signals we need to use private dedicated infrastructure.
Having dedicated machine may not be feasible, so we need to have a mechanism of signalling to jenkins (or other orchestration process) that particular machine is in use in an exclusive mode.
- We may use the same machine that runs db-benchmark
Pipeline
In the most likely case of not having a dedicated machine, CB may ended up being queued for a longer while (up to multiple days). Therefore it make sense to have it in a separate pipeline rather than in our data.table GLCI. Such CB pipeline could be scheduled to run daily or weekly instead of running on each commit.
- We could eventually move publishing artifacts (package, website, docker images) to dedicated daily pipeline. This is not strictly related to CB but would make it easier to publish CB results as well.
Versioning
- Should all elements of CB be included as a part of
data.tableproject? or a separate project- test script
inst/tests/benchmark.Rraw - new function
benchmark()that meant to be used liketest(), andbenchmark.data.table()to be used liketest.data.table() - any extra orchestration could be in
.ci/
- test script
Example test cases
[[on a list column by group [[ by group takes forever (24 hours +) with v1.13.0 vs 4 seconds with v1.12.8 #4646- joining in a loop, order of different types join operation almost 2 times slower #3928
- simple access
DT[10L],DT[, 3L]Selecting from data.table by row is very slow #3735 - use of
.SDfor many columns add timing test for many .SD cols #3797 - calling
setDTin a loop setDT could be much simpler #4476 - multithreaded function calls by group
DT[, uniqueN(a), by=b], should stress new throttle feature throttle threads for iterated small data tasks #4484 - more cases in existing benchmark.Rraw file