Skip to content

Tune k8s DogStatsD workload to match observed telemetry#1842

Open
GeorgeHahn wants to merge 1 commit intoblt/introduce_a_k8s_based_lading_examplefrom
hahn/tune-udp-dogstatsd-workload
Open

Tune k8s DogStatsD workload to match observed telemetry#1842
GeorgeHahn wants to merge 1 commit intoblt/introduce_a_k8s_based_lading_examplefrom
hahn/tune-udp-dogstatsd-workload

Conversation

@GeorgeHahn
Copy link
Copy Markdown
Contributor

@GeorgeHahn GeorgeHahn commented Mar 27, 2026

Switch the uds_dogstatsd_to_api experiment from high-cardinality UDS load (100 MiB/s, 1k-10k contexts) to a tuned UDP workload matching specific telemetry targets:

  • Switch DogStatsD from UDS to UDP with bind_address 0.0.0.0
  • Fix UDP generator to bind 0.0.0.0
  • Split into two UDP generators: metrics-only (300 B/s, 200 contexts) and events (600 B/s capacity, ~1 event/flush via 4% event weight)
  • Add file_gen logrotate for ~300 logs/flush at 1500 B/s
  • Redirect agent logs endpoint to lading-intake via DD_LOGS_CONFIG
  • Add DogStatsD UDP k8s Service for cross-pod routing
  • Deploy agent before lading (listener before sender)
  • Agent discovers log files dynamically via inotify
  • Load local lading image into kind cluster

Tested memory limits (targeting ~70% utilization):

Container Limit Peak Utilization
agent 175 MB 127 MB 72.5%
trace-agent 55 MB 33 MB 59.6%
system-probe 385 MB 288 MB 74.9%
process-agent 70 MB 41 MB 59.2%
TOTAL 685 MB 489 MB 71.4%

Example invocation:

./k8s/uds_dogstatsd_to_api/experiment.sh \
    --total-limit 685 --agent-memory 175 --trace-memory 55 \
    --sysprobe-memory 385 --process-memory 70 \
    --duration 600 --tags "purpose:smp-experiment"

Switch the uds_dogstatsd_to_api experiment from high-cardinality UDS load
(100 MiB/s, 1k-10k contexts) to a tuned UDP workload matching specific
telemetry targets:

- Switch DogStatsD from UDS to UDP with bind_address 0.0.0.0
- Use lading 0.31.2 bind_addr config (was 127.0.0.1, broke k8s routing)
- Split into two UDP generators: metrics-only (300 B/s, 200 contexts)
  and events (600 B/s capacity, ~1 event/flush via 4% event weight)
- Add file_gen logrotate for ~300 logs/flush at 1500 B/s
- Redirect agent logs endpoint to lading-intake via DD_LOGS_CONFIG
- Add DogStatsD UDP k8s Service for cross-pod routing
- Deploy agent before lading (listener before sender)
- Agent discovers log files dynamically via inotify
- Re-enable NPM

Validated memory limits (~70% utilization):

  | Container     | Limit  | Peak   | Utilization |
  |---------------|--------|--------|-------------|
  | agent         | 175 MB | 127 MB | 72.5%       |
  | trace-agent   | 55 MB  | 33 MB  | 59.6%       |
  | system-probe  | 385 MB | 288 MB | 74.9%       |
  | process-agent | 70 MB  | 41 MB  | 59.2%       |
  | TOTAL         | 685 MB | 489 MB | 71.4%       |

Example invocation:

  ./k8s/uds_dogstatsd_to_api/experiment.sh \
      --total-limit 685 --agent-memory 175 --trace-memory 55 \
      --sysprobe-memory 385 --process-memory 70 \
      --duration 600 --tags "purpose:smp-experiment"
@GeorgeHahn GeorgeHahn force-pushed the hahn/tune-udp-dogstatsd-workload branch from 217e935 to 8e37ab7 Compare March 27, 2026 17:35
@GeorgeHahn GeorgeHahn marked this pull request as ready for review March 27, 2026 18:37
@GeorgeHahn GeorgeHahn requested a review from a team as a code owner March 27, 2026 18:37
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 8e37ab7ebc

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

59, 61, 67, 71, 73, 79, 83, 89, 97, 101, 103, 107, 109, 113, 127, 131]
path: "/var/run/datadog/dsd.socket"
addr: "datadog-dogstatsd:8125"
bind_addr: "0.0.0.0:0"
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Remove unsupported bind_addr from UDP workload manifest

The new UDP generator config includes bind_addr, but in this commit the UDP config type still uses #[serde(deny_unknown_fields)] and does not define a bind_addr field (lading/src/generator/udp.rs). That causes lading config deserialization to fail at startup, so the lading pod never runs this workload successfully.

Useful? React with 👍 / 👎.

echo "========================================"
AGENT_POD=$(kubectl get pods -l app.kubernetes.io/name=datadog-agent-deployment -o jsonpath='{.items[0].metadata.name}')
if [ -n "$AGENT_POD" ]; then
python3 "$SCRIPT_DIR/analyze_telemetry.py" --expvar "$AGENT_POD" "$DURATION"
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Stop calling missing analyze_telemetry.py in experiment

This command unconditionally invokes analyze_telemetry.py, but that script is not present in k8s/uds_dogstatsd_to_api (only analyze_memory.py exists in the repo tree). Because experiment.sh runs with set -e, the missing file makes the script exit non-zero every run after monitoring, so the experiment workflow cannot complete cleanly.

Useful? React with 👍 / 👎.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant