Tune k8s DogStatsD workload to match observed telemetry#1842
Tune k8s DogStatsD workload to match observed telemetry#1842GeorgeHahn wants to merge 1 commit intoblt/introduce_a_k8s_based_lading_examplefrom
Conversation
Switch the uds_dogstatsd_to_api experiment from high-cardinality UDS load
(100 MiB/s, 1k-10k contexts) to a tuned UDP workload matching specific
telemetry targets:
- Switch DogStatsD from UDS to UDP with bind_address 0.0.0.0
- Use lading 0.31.2 bind_addr config (was 127.0.0.1, broke k8s routing)
- Split into two UDP generators: metrics-only (300 B/s, 200 contexts)
and events (600 B/s capacity, ~1 event/flush via 4% event weight)
- Add file_gen logrotate for ~300 logs/flush at 1500 B/s
- Redirect agent logs endpoint to lading-intake via DD_LOGS_CONFIG
- Add DogStatsD UDP k8s Service for cross-pod routing
- Deploy agent before lading (listener before sender)
- Agent discovers log files dynamically via inotify
- Re-enable NPM
Validated memory limits (~70% utilization):
| Container | Limit | Peak | Utilization |
|---------------|--------|--------|-------------|
| agent | 175 MB | 127 MB | 72.5% |
| trace-agent | 55 MB | 33 MB | 59.6% |
| system-probe | 385 MB | 288 MB | 74.9% |
| process-agent | 70 MB | 41 MB | 59.2% |
| TOTAL | 685 MB | 489 MB | 71.4% |
Example invocation:
./k8s/uds_dogstatsd_to_api/experiment.sh \
--total-limit 685 --agent-memory 175 --trace-memory 55 \
--sysprobe-memory 385 --process-memory 70 \
--duration 600 --tags "purpose:smp-experiment"
217e935 to
8e37ab7
Compare
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 8e37ab7ebc
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| 59, 61, 67, 71, 73, 79, 83, 89, 97, 101, 103, 107, 109, 113, 127, 131] | ||
| path: "/var/run/datadog/dsd.socket" | ||
| addr: "datadog-dogstatsd:8125" | ||
| bind_addr: "0.0.0.0:0" |
There was a problem hiding this comment.
Remove unsupported bind_addr from UDP workload manifest
The new UDP generator config includes bind_addr, but in this commit the UDP config type still uses #[serde(deny_unknown_fields)] and does not define a bind_addr field (lading/src/generator/udp.rs). That causes lading config deserialization to fail at startup, so the lading pod never runs this workload successfully.
Useful? React with 👍 / 👎.
| echo "========================================" | ||
| AGENT_POD=$(kubectl get pods -l app.kubernetes.io/name=datadog-agent-deployment -o jsonpath='{.items[0].metadata.name}') | ||
| if [ -n "$AGENT_POD" ]; then | ||
| python3 "$SCRIPT_DIR/analyze_telemetry.py" --expvar "$AGENT_POD" "$DURATION" |
There was a problem hiding this comment.
Stop calling missing analyze_telemetry.py in experiment
This command unconditionally invokes analyze_telemetry.py, but that script is not present in k8s/uds_dogstatsd_to_api (only analyze_memory.py exists in the repo tree). Because experiment.sh runs with set -e, the missing file makes the script exit non-zero every run after monitoring, so the experiment workflow cannot complete cleanly.
Useful? React with 👍 / 👎.
Switch the
uds_dogstatsd_to_apiexperiment from high-cardinality UDS load (100 MiB/s, 1k-10k contexts) to a tuned UDP workload matching specific telemetry targets:Tested memory limits (targeting ~70% utilization):
Example invocation: