Tune k8s DogStatsD workload to match observed telemetry by GeorgeHahn · Pull Request #1842 · DataDog/lading

GeorgeHahn · 2026-03-27T17:32:21Z

Switch the uds_dogstatsd_to_api experiment from high-cardinality UDS load (100 MiB/s, 1k-10k contexts) to a tuned UDP workload matching specific telemetry targets:

Switch DogStatsD from UDS to UDP with bind_address 0.0.0.0
Fix UDP generator to bind 0.0.0.0
Split into two UDP generators: metrics-only (300 B/s, 200 contexts) and events (600 B/s capacity, ~1 event/flush via 4% event weight)
Add file_gen logrotate for ~300 logs/flush at 1500 B/s
Redirect agent logs endpoint to lading-intake via DD_LOGS_CONFIG
Add DogStatsD UDP k8s Service for cross-pod routing
Deploy agent before lading (listener before sender)
Agent discovers log files dynamically via inotify
Load local lading image into kind cluster

Tested memory limits (targeting ~70% utilization):

Container	Limit	Peak	Utilization
agent	175 MB	127 MB	72.5%
trace-agent	55 MB	33 MB	59.6%
system-probe	385 MB	288 MB	74.9%
process-agent	70 MB	41 MB	59.2%
TOTAL	685 MB	489 MB	71.4%

Example invocation:

./k8s/uds_dogstatsd_to_api/experiment.sh \
    --total-limit 685 --agent-memory 175 --trace-memory 55 \
    --sysprobe-memory 385 --process-memory 70 \
    --duration 600 --tags "purpose:smp-experiment"

Switch the uds_dogstatsd_to_api experiment from high-cardinality UDS load (100 MiB/s, 1k-10k contexts) to a tuned UDP workload matching specific telemetry targets: - Switch DogStatsD from UDS to UDP with bind_address 0.0.0.0 - Use lading 0.31.2 bind_addr config (was 127.0.0.1, broke k8s routing) - Split into two UDP generators: metrics-only (300 B/s, 200 contexts) and events (600 B/s capacity, ~1 event/flush via 4% event weight) - Add file_gen logrotate for ~300 logs/flush at 1500 B/s - Redirect agent logs endpoint to lading-intake via DD_LOGS_CONFIG - Add DogStatsD UDP k8s Service for cross-pod routing - Deploy agent before lading (listener before sender) - Agent discovers log files dynamically via inotify - Re-enable NPM Validated memory limits (~70% utilization): | Container | Limit | Peak | Utilization | |---------------|--------|--------|-------------| | agent | 175 MB | 127 MB | 72.5% | | trace-agent | 55 MB | 33 MB | 59.6% | | system-probe | 385 MB | 288 MB | 74.9% | | process-agent | 70 MB | 41 MB | 59.2% | | TOTAL | 685 MB | 489 MB | 71.4% | Example invocation: ./k8s/uds_dogstatsd_to_api/experiment.sh \ --total-limit 685 --agent-memory 175 --trace-memory 55 \ --sysprobe-memory 385 --process-memory 70 \ --duration 600 --tags "purpose:smp-experiment"

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 8e37ab7ebc

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

chatgpt-codex-connector · 2026-03-27T18:40:48Z

k8s/uds_dogstatsd_to_api/manifests/lading.yaml

                 59, 61, 67, 71, 73, 79, 83, 89, 97, 101, 103, 107, 109, 113, 127, 131]
-          path: "/var/run/datadog/dsd.socket"
+          addr: "datadog-dogstatsd:8125"
+          bind_addr: "0.0.0.0:0"


Remove unsupported bind_addr from UDP workload manifest

The new UDP generator config includes bind_addr, but in this commit the UDP config type still uses #[serde(deny_unknown_fields)] and does not define a bind_addr field (lading/src/generator/udp.rs). That causes lading config deserialization to fail at startup, so the lading pod never runs this workload successfully.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-03-27T18:40:48Z

k8s/uds_dogstatsd_to_api/experiment.sh

+echo "========================================"
+AGENT_POD=$(kubectl get pods -l app.kubernetes.io/name=datadog-agent-deployment -o jsonpath='{.items[0].metadata.name}')
+if [ -n "$AGENT_POD" ]; then
+    python3 "$SCRIPT_DIR/analyze_telemetry.py" --expvar "$AGENT_POD" "$DURATION"


Stop calling missing analyze_telemetry.py in experiment

This command unconditionally invokes analyze_telemetry.py, but that script is not present in k8s/uds_dogstatsd_to_api (only analyze_memory.py exists in the repo tree). Because experiment.sh runs with set -e, the missing file makes the script exit non-zero every run after monitoring, so the experiment workflow cannot complete cleanly.

Useful? React with 👍 / 👎.

GeorgeHahn force-pushed the hahn/tune-udp-dogstatsd-workload branch from 217e935 to 8e37ab7 Compare March 27, 2026 17:35

GeorgeHahn marked this pull request as ready for review March 27, 2026 18:37

GeorgeHahn requested a review from a team as a code owner March 27, 2026 18:37

chatgpt-codex-connector bot reviewed Mar 27, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Tune k8s DogStatsD workload to match observed telemetry#1842

Tune k8s DogStatsD workload to match observed telemetry#1842
GeorgeHahn wants to merge 1 commit intoblt/introduce_a_k8s_based_lading_examplefrom
hahn/tune-udp-dogstatsd-workload

GeorgeHahn commented Mar 27, 2026 •

edited

Loading

Uh oh!

chatgpt-codex-connector bot left a comment

Uh oh!

chatgpt-codex-connector bot Mar 27, 2026

Uh oh!

chatgpt-codex-connector bot Mar 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

GeorgeHahn commented Mar 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector bot Mar 27, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector bot Mar 27, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

GeorgeHahn commented Mar 27, 2026 •

edited

Loading