Skip to content

Continuous Benchmarking #4687

@jangorecki

Description

@jangorecki

As already mentioned in multiple issues and over email/slack, we need automated tests that will be able to track performance regression.
This issue is meant to define scope.

Related useful project is planned in conbench. Once it will be working, I think we should use it. Unfortunately it does not seem to happen anytime soon, or even in a more distant future.
Anyway, keeping scope minimal should make it easier to eventually move to conbench later on.
Another related work is my old project macrobenchmarking.
And recent draft PR #4517.


Scope

Dimensions by which we will track timings

  • environment (allow to lookup hardware configuration)
  • R version
  • git sha of data.table (lookup date and version)
  • benchmark script (probably fixed to benchmark.Rraw)
  • query
  • version of a query (in case we modify existing query for some reason)
  • description

Dimensions that for now I propose to not include in scope

  • operating system
  • linux kernal version
  • compiler
  • compiler version
  • R compilation flags
  • data.table compilation flags
  • metrics (memory usage, etc.)
  • datatable.optimize option
  • number of threads (?)
  • multiple runs of single query (?)

Challenges

Store timings

In current infrastructre we do not have any processes that appends artifacts (timings in context of CB). Each CB run has to store results somewhere and re-use them later on.

  • timings storage in csv for simplicity (?)

Signalling a regression

  • Should we compare only to the previous timings, or to an average timings from longer period (last month, etc.)?
  • What is the tolerance threshold? for cheap queries 5-25% variance will be common.
  • In case of exceeding threshold we may want to run benchmark few times and take an average to ensure before signalling regression. Then we need a second threshold for such average.

Environment

To reduce number of false regression signals we need to use private dedicated infrastructure.
Having dedicated machine may not be feasible, so we need to have a mechanism of signalling to jenkins (or other orchestration process) that particular machine is in use in an exclusive mode.

Pipeline

In the most likely case of not having a dedicated machine, CB may ended up being queued for a longer while (up to multiple days). Therefore it make sense to have it in a separate pipeline rather than in our data.table GLCI. Such CB pipeline could be scheduled to run daily or weekly instead of running on each commit.

  • We could eventually move publishing artifacts (package, website, docker images) to dedicated daily pipeline. This is not strictly related to CB but would make it easier to publish CB results as well.

Versioning

  • Should all elements of CB be included as a part of data.table project? or a separate project
    • test script inst/tests/benchmark.Rraw
    • new function benchmark() that meant to be used like test(), and benchmark.data.table() to be used like test.data.table()
    • any extra orchestration could be in .ci/

Example test cases

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions