Skip to content

Perf release gate #9068

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 11 commits into
base: master
Choose a base branch
from
48 changes: 48 additions & 0 deletions .gitlab/benchmarks/bp-runner.fail-on-breach.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,48 @@
# Thresholds set based on guidance in https://datadoghq.atlassian.net/wiki/x/LgI1LgE#How-to-choose-thresholds-for-pre-release-gates%3F

experiments:
- name: Run SLO breach check
steps:
- name: SLO breach check
run: fail_on_breach
# https://datadoghq.atlassian.net/wiki/x/LgI1LgE#How-to-choose-a-warning-range-for-pre-release-gates%3F
warning_range: 10
# File spec
# https://datadoghq.atlassian.net/wiki/x/LgI1LgE#Specification
# Measurements
# https://benchmarking.us1.prod.dog/trends?projectId=4&branch=master&trendsTab=per_scenario
scenarios:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Based on the results of https://gitlab.ddbuild.io/DataDog/apm-reliability/dd-trace-java/-/jobs/1031460223, I suggest to update the SLOs to the following values:


          # Standard macrobenchmarks
          # https://benchmarking.us1.prod.dog/trends?projectId=4&branch=master&trendsTab=per_scenario&scenario=normal_operation%2Fonly-tracing&trendsType=scenario
          - name: normal_operation/only-tracing
            thresholds:
              - agg_http_req_duration_p50 < 2.36 ms
              - agg_http_req_duration_p99 < 7.89 ms
          # https://benchmarking.us1.prod.dog/trends?projectId=4&branch=master&trendsTab=per_scenario&scenario=normal_operation%2Fotel-latest&trendsType=scenario
          - name: normal_operation/otel-latest
            thresholds:
              - agg_http_req_duration_p50 < 2.5 ms
              - agg_http_req_duration_p99 < 10 ms

          # https://benchmarking.us1.prod.dog/trends?projectId=4&branch=master&trendsTab=per_scenario&scenario=high_load%2Fonly-tracing&trendsType=scenario
          - name: high_load/only-tracing
            thresholds:
              - throughput > 1100.0 op/s
          # https://benchmarking.us1.prod.dog/trends?projectId=4&branch=master&trendsTab=per_scenario&scenario=high_load%2Fotel-latest&trendsType=scenario
          - name: high_load/otel-latest
            thresholds:
              - throughput > 1100.0 op/s

          # Startup macrobenchmarks
          # https://benchmarking.us1.prod.dog/trends?projectId=4&branch=master&trendsTab=per_scenario&scenario=startup%3Apetclinic%3Atracing%3AGlobalTracer&trendsType=scenario
          # https://benchmarking.us1.prod.dog/trends?projectId=4&branch=master&trendsTab=per_scenario&scenario=startup%3Apetclinic%3Aappsec%3AGlobalTracer&trendsType=scenario
          # https://benchmarking.us1.prod.dog/trends?projectId=4&branch=master&trendsTab=per_scenario&scenario=startup%3Apetclinic%3Aiast%3AGlobalTracer&trendsType=scenario
          - name: "startup:petclinic:(tracing|appsec|iast):GlobalTracer"
            thresholds:
              - execution_time < 280 ms
          # https://benchmarking.us1.prod.dog/trends?projectId=4&branch=master&trendsTab=per_scenario&scenario=startup%3Apetclinic%3Aprofiling%3AGlobalTracer&trendsType=scenario
          - name: "startup:petclinic:profiling:GlobalTracer"
            thresholds:
              - execution_time < 420 ms

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another tweak
image

# Note that thresholds there are choosen based the confidence interval with a 10% adjustment.

# Standard macrobenchmarks
# https://benchmarking.us1.prod.dog/trends?projectId=4&branch=master&trendsTab=per_scenario&scenario=normal_operation%2Fonly-tracing&trendsType=scenario
- name: normal_operation/only-tracing
thresholds:
- agg_http_req_duration_p50 < 2.6 ms
- agg_http_req_duration_p99 < 8.5 ms
# https://benchmarking.us1.prod.dog/trends?projectId=4&branch=master&trendsTab=per_scenario&scenario=normal_operation%2Fotel-latest&trendsType=scenario
- name: normal_operation/otel-latest
thresholds:
- agg_http_req_duration_p50 < 2.5 ms
- agg_http_req_duration_p99 < 10 ms

# https://benchmarking.us1.prod.dog/trends?projectId=4&branch=master&trendsTab=per_scenario&scenario=high_load%2Fonly-tracing&trendsType=scenario
- name: high_load/only-tracing
thresholds:
- throughput > 1100.0 op/s
# https://benchmarking.us1.prod.dog/trends?projectId=4&branch=master&trendsTab=per_scenario&scenario=high_load%2Fotel-latest&trendsType=scenario
- name: high_load/otel-latest
thresholds:
- throughput > 1100.0 op/s

# Startup macrobenchmarks
# https://benchmarking.us1.prod.dog/trends?projectId=4&branch=master&trendsTab=per_scenario&scenario=startup%3Apetclinic%3Atracing%3AGlobalTracer&trendsType=scenario
# https://benchmarking.us1.prod.dog/trends?projectId=4&branch=master&trendsTab=per_scenario&scenario=startup%3Apetclinic%3Aappsec%3AGlobalTracer&trendsType=scenario
# https://benchmarking.us1.prod.dog/trends?projectId=4&branch=master&trendsTab=per_scenario&scenario=startup%3Apetclinic%3Aiast%3AGlobalTracer&trendsType=scenario
- name: "startup:petclinic:(tracing|appsec|iast):GlobalTracer"
thresholds:
- execution_time < 280 ms
# https://benchmarking.us1.prod.dog/trends?projectId=4&branch=master&trendsTab=per_scenario&scenario=startup%3Apetclinic%3Aprofiling%3AGlobalTracer&trendsType=scenario
- name: "startup:petclinic:profiling:GlobalTracer"
thresholds:
- execution_time < 420 ms
68 changes: 68 additions & 0 deletions .gitlab/macrobenchmarks.yml
Original file line number Diff line number Diff line change
@@ -1,3 +1,8 @@
include:
project: 'DataDog/benchmarking-platform-tools'
file: 'images/templates/gitlab/notify-slo-breaches.template.yml'
ref: '925e0a3e7dd628885f6fc69cdaea5c8cc9e212bc'

.macrobenchmarks:
stage: macrobenchmarks
rules:
Expand Down Expand Up @@ -68,3 +73,66 @@ otel-latest:
BP_BENCHMARKS_CONFIGURATION: otel-latest
TRACER_OPTS: -javaagent:/app/otel-java-agent.jar -Ddd.env=otel-latest -Ddd.service=bp-java-petclinic
JAVA_OPTS: -javaagent:/app/memcheck/stability-testing-memwatch.jar -Xmx128M


check-slo-breaches:
stage: macrobenchmarks
interruptible: true
tags: ["arch:amd64"]
image: registry.ddbuild.io/images/benchmarking-platform-tools-ubuntu:latest
when: on_success
needs:
- job: baseline
artifacts: true
- job: only-tracing
artifacts: true
- job: otel-latest
artifacts: true
- job: benchmarks-startup
artifacts: true
- job: benchmarks-load
artifacts: true
- job: benchmarks-dacapo
artifacts: true
script:
# macrobenchmarks are located here, files are already in "converted" format
- export ARTIFACTS_DIR="$(pwd)/platform/artifacts/" && mkdir -p "${ARTIFACTS_DIR}"

# Need to move the artifacts the benchmarks-* job
- |
export BENCHMARKS_ARTIFACTS_DIR="$(pwd)/reports" && mkdir -p "${BENCHMARKS_ARTIFACTS_DIR}"
for benchmarkType in startup load dacapo; do
find "$BENCHMARKS_ARTIFACTS_DIR/$benchmarkType" -name "benchmark-baseline.json" -o -name "benchmark-candidate.json" | while read file; do
relpath="${file#$BENCHMARKS_ARTIFACTS_DIR/$benchmarkType/}"
prefix="${relpath%/benchmark-*}" # Remove the trailing /benchmark-(baseline|candidate).json
prefix="${prefix#./}" # Remove any leading ./
prefix="${prefix//\//-}" # Replace / with -
case "$file" in
*benchmark-baseline.json) type="baseline" ;;
*benchmark-candidate.json) type="candidate" ;;
esac
echo "Moving $file to $ARTIFACTS_DIR/${type}-${benchmarkType}-${prefix}.converted.json"
cp "$file" "$ARTIFACTS_DIR/${type}-${benchmarkType}-${prefix}.converted.json"
done
done
- ls -lah "$ARTIFACTS_DIR"
- bp-runner .gitlab/benchmarks/bp-runner.fail-on-breach.yml
artifacts:
name: "artifacts"
when: always
paths:
- platform/artifacts/
expire_in: 1 week
variables:
UPSTREAM_PROJECT_ID: $CI_PROJECT_ID # The ID of the current project. This ID is unique across all projects on the GitLab instance.
UPSTREAM_PROJECT_NAME: $CI_PROJECT_NAME # "dd-trace-java"
UPSTREAM_BRANCH: $CI_COMMIT_REF_NAME # The branch or tag name for which project is built.
UPSTREAM_COMMIT_SHA: $CI_COMMIT_SHA # The commit revision the project is built for.

notify-slo-breaches:
extends: .notify-slo-breaches
stage: macrobenchmarks
needs: ["check-slo-breaches"]
when: always
variables:
CHANNEL: "apm-release-platform"