[CXP-3401][agent] skip broken e2e windows language detection test + clean up by gengnamstyle · Pull Request #48833 · DataDog/datadog-agent

gengnamstyle · 2026-04-03T00:18:37Z

Changes

skips TestLanguageDetectionWindows since the e2e test is blocked by a gRPC stream cycling bug (connection + disconnection) after MSI reinstall (s.UpdateEnv() does a full microsoft software install to "reset" the environment for a new test). This is an issue in the testing infrastructure which leads to the core agent being unable to pick up languages https://datadoghq.atlassian.net/browse/CXP-3410. I manually validated the language detection is still working. Again this is due to the testing setup.

Running the python script

❯ sshpass -p "<PASSWORD>" ssh -o StrictHostKeyChecking=no <SERVER> 'Set-Content -Path C:\sleep.py -Value "import time; time.sleep(600)"; Start-Process -FilePath "C:\Program Files\Datadog\Datadog Agent\embedded3\python.exe" -ArgumentList "C:\sleep.py" -WindowStyle Hidden; Start-Sleep 60; Get-Process python'

We can see the python language appear

❯ sshpass -p "<PASSWORD>" ssh -o StrictHostKeyChecking=no <SERVER> '& "C:\Program Files\Datadog\Datadog Agent\bin\agent.exe" workload-list --json' | grep -o '"Language":{[^}]*}' | sort | uniq -c | sort -rn
     96 "Language":{}
      1 "Language":{"Name":"python"}

removes chocolatey dependency — use the agent's embedded Python (embedded3/python.exe) instead of installing via chocolatey (flaky due to rate limits, see Vendor diskspd instead of installing from chocolatey #48688)
makes TestLanguageDetectionWindows less flaky
- removes PID matching and replaces with a check for any process with language=python instead of matching by PID
- use persistent SSH session to start Python via remoteHost.Start() so it survives across SSH commands

agent-platform-auto-pr · 2026-04-03T00:47:19Z

Files inventory check summary

File checks results against ancestor 1ed74f0b:

Results for datadog-agent_7.79.0~devel.git.413.1f8ef2c.pipeline.106014820-1_amd64.deb:

No change detected

### What does this PR do? Chocolatey was removed in: #48688 But a later PR update a test to use it again, hence it breaks with the conflict. Adds back chocolatey to buy some time properly fixing the test. #48833 ### Motivation ### Describe how you validated your changes ### Additional Notes

Remove PID matching from TestLanguageDetectionWindows and check for any process with language=python instead. The SSH session wraps commands in PowerShell, causing the PID from Get-CimInstance to differ from the one in workloadmeta, leading to false failures. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

cit-pr-commenter-54b7da · 2026-04-06T15:53:01Z

Regression Detector

Regression Detector Results

Metrics dashboard
Target profiles
Run ID: 822073d9-4c5d-4318-a7d0-261f73d50c02

Baseline: 3e0a1a6
Comparison: eaeaa61
Diff

Optimization Goals: ✅ No significant changes detected

Experiments ignored for regressions

Regressions in experiments with settings containing erratic: true are ignored.

perf	experiment	goal	Δ mean %	Δ mean % CI	trials	links
➖	docker_containers_cpu	% cpu utilization	-2.76	[-5.78, +0.26]	1	Logs

Fine details of change detection per experiment

perf	experiment	goal	Δ mean %	Δ mean % CI	trials	links
➖	quality_gate_metrics_logs	memory utilization	+1.42	[+1.18, +1.66]	1	Logs bounds checks dashboard
➖	otlp_ingest_metrics	memory utilization	+0.55	[+0.39, +0.71]	1	Logs
➖	ddot_logs	memory utilization	+0.35	[+0.29, +0.42]	1	Logs
➖	quality_gate_idle_all_features	memory utilization	+0.29	[+0.26, +0.33]	1	Logs bounds checks dashboard
➖	uds_dogstatsd_20mb_12k_contexts_20_senders	memory utilization	+0.11	[+0.05, +0.17]	1	Logs
➖	file_to_blackhole_1000ms_latency	egress throughput	+0.04	[-0.41, +0.49]	1	Logs
➖	file_to_blackhole_500ms_latency	egress throughput	+0.00	[-0.39, +0.40]	1	Logs
➖	uds_dogstatsd_to_api_v3	ingress throughput	+0.00	[-0.20, +0.21]	1	Logs
➖	tcp_dd_logs_filter_exclude	ingress throughput	-0.01	[-0.12, +0.11]	1	Logs
➖	uds_dogstatsd_to_api	ingress throughput	-0.01	[-0.21, +0.19]	1	Logs
➖	otlp_ingest_logs	memory utilization	-0.01	[-0.11, +0.09]	1	Logs
➖	file_to_blackhole_100ms_latency	egress throughput	-0.05	[-0.16, +0.06]	1	Logs
➖	ddot_metrics_sum_delta	memory utilization	-0.09	[-0.26, +0.09]	1	Logs
➖	docker_containers_memory	memory utilization	-0.10	[-0.18, -0.02]	1	Logs
➖	file_to_blackhole_0ms_latency	egress throughput	-0.11	[-0.56, +0.35]	1	Logs
➖	ddot_metrics_sum_cumulative	memory utilization	-0.18	[-0.32, -0.05]	1	Logs
➖	quality_gate_logs	% cpu utilization	-0.20	[-1.86, +1.45]	1	Logs bounds checks dashboard
➖	file_tree	memory utilization	-0.20	[-0.26, -0.14]	1	Logs
➖	quality_gate_idle	memory utilization	-0.30	[-0.35, -0.26]	1	Logs bounds checks dashboard
➖	tcp_syslog_to_blackhole	ingress throughput	-0.32	[-0.48, -0.17]	1	Logs
➖	ddot_metrics	memory utilization	-0.34	[-0.53, -0.15]	1	Logs
➖	ddot_metrics_sum_cumulativetodelta_exporter	memory utilization	-0.39	[-0.62, -0.17]	1	Logs
➖	docker_containers_cpu	% cpu utilization	-2.76	[-5.78, +0.26]	1	Logs

Bounds Checks: ✅ Passed

perf	experiment	bounds_check_name	replicates_passed	observed_value	links
✅	docker_containers_cpu	simple_check_run	10/10	719 ≥ 26
✅	docker_containers_memory	memory_usage	10/10	273.56MiB ≤ 370MiB
✅	docker_containers_memory	simple_check_run	10/10	720 ≥ 26
✅	file_to_blackhole_0ms_latency	memory_usage	10/10	0.18GiB ≤ 1.20GiB
✅	file_to_blackhole_0ms_latency	missed_bytes	10/10	0B = 0B
✅	file_to_blackhole_1000ms_latency	memory_usage	10/10	0.23GiB ≤ 1.20GiB
✅	file_to_blackhole_1000ms_latency	missed_bytes	10/10	0B = 0B
✅	file_to_blackhole_100ms_latency	memory_usage	10/10	0.20GiB ≤ 1.20GiB
✅	file_to_blackhole_100ms_latency	missed_bytes	10/10	0B = 0B
✅	file_to_blackhole_500ms_latency	memory_usage	10/10	0.21GiB ≤ 1.20GiB
✅	file_to_blackhole_500ms_latency	missed_bytes	10/10	0B = 0B
✅	quality_gate_idle	intake_connections	10/10	3 = 3	bounds checks dashboard
✅	quality_gate_idle	memory_usage	10/10	172.48MiB ≤ 181MiB	bounds checks dashboard
✅	quality_gate_idle_all_features	intake_connections	10/10	3 = 3	bounds checks dashboard
✅	quality_gate_idle_all_features	memory_usage	10/10	490.85MiB ≤ 550MiB	bounds checks dashboard
✅	quality_gate_logs	intake_connections	10/10	3 ≤ 6	bounds checks dashboard
✅	quality_gate_logs	memory_usage	10/10	202.07MiB ≤ 220MiB	bounds checks dashboard
✅	quality_gate_logs	missed_bytes	10/10	0B = 0B	bounds checks dashboard
✅	quality_gate_metrics_logs	cpu_usage	10/10	343.56 ≤ 2000	bounds checks dashboard
✅	quality_gate_metrics_logs	intake_connections	10/10	4 ≤ 6	bounds checks dashboard
✅	quality_gate_metrics_logs	memory_usage	10/10	418.37MiB ≤ 475MiB	bounds checks dashboard
✅	quality_gate_metrics_logs	missed_bytes	10/10	0B = 0B	bounds checks dashboard

Explanation

Confidence level: 90.00%
Effect size tolerance: |Δ mean %| ≥ 5.00%

Performance changes are noted in the perf column of each table:

✅ = significantly better comparison variant performance
❌ = significantly worse comparison variant performance
➖ = no significant change in performance

A regression test is an A/B test of target performance in a repeatable rig, where "performance" is measured as "comparison variant minus baseline variant" for an optimization goal (e.g., ingress throughput). Due to intrinsic variability in measuring that goal, we can only estimate its mean value for each experiment; we report uncertainty in that value as a 90.00% confidence interval denoted "Δ mean % CI".

For each experiment, we decide whether a change in performance is a "regression" -- a change worth investigating further -- if all of the following criteria are true:

Its estimated |Δ mean %| ≥ 5.00%, indicating the change is big enough to merit a closer look.
Its 90.00% confidence interval "Δ mean % CI" does not contain zero, indicating that if our statistical model is accurate, there is at least a 90.00% chance there is a difference in performance between baseline and comparison variants.
Its configuration does not mark it "erratic".

CI Pass/Fail Decision

✅ Passed. All Quality Gates passed.

quality_gate_logs, bounds check missed_bytes: 10/10 replicas passed. Gate passed.
quality_gate_logs, bounds check memory_usage: 10/10 replicas passed. Gate passed.
quality_gate_logs, bounds check intake_connections: 10/10 replicas passed. Gate passed.
quality_gate_idle_all_features, bounds check memory_usage: 10/10 replicas passed. Gate passed.
quality_gate_idle_all_features, bounds check intake_connections: 10/10 replicas passed. Gate passed.
quality_gate_idle, bounds check memory_usage: 10/10 replicas passed. Gate passed.
quality_gate_idle, bounds check intake_connections: 10/10 replicas passed. Gate passed.
quality_gate_metrics_logs, bounds check intake_connections: 10/10 replicas passed. Gate passed.
quality_gate_metrics_logs, bounds check memory_usage: 10/10 replicas passed. Gate passed.
quality_gate_metrics_logs, bounds check missed_bytes: 10/10 replicas passed. Gate passed.
quality_gate_metrics_logs, bounds check cpu_usage: 10/10 replicas passed. Gate passed.

…ctor missing from Windows build (#48830) Backport 3fc1dc8 from #48819. ___ ## Summary - Fixes remote process collector missing from Windows build, restoring language detection - Adds a Windows E2E test for language detection via the `remote_process_collector` to the existing `windowsTestSuite` - include cherry-picked commit from #48833 to skip broken e2e test ## What's broken PR #46219 split the workloadmeta catalog into `trivy` / `!trivy` variants but only included `remoteprocesscollector` in the `trivy`-gated file (`options.go`). Since `trivy` is in `LINUX_ONLY_TAGS`, Windows always uses the `!trivy` build (`options_nosbom.go`) — which is missing the remote process collector. This broke language detection on Windows since March 20. More details in jira ticket: https://datadoghq.atlassian.net/browse/CXP-3401 ## Evidence - [Diff that introduced the regression](20aa50f) — `options_nosbom.go` created without `remoteprocesscollector` - [LINUX_ONLY_TAGS includes trivy](https://github.com/DataDog/datadog-agent/blob/main/tasks/build_tags.py) — confirms Windows never gets the `trivy` build - Validated on Windows Server EC2 running agent 7.78.0-rc.5: - `agent workload-list --json` returns `{"Entities":{}}` - Agent logs show no `remote-process-collector` among workloadmeta collector candidates - Config confirms `language_detection.enabled: true` ## Why the test lives in `tests/process/` instead of `tests/language-detection/` The test is added to the existing `windowsTestSuite` in `tests/process/windows_test.go` to reuse the same Windows EC2 instance already provisioned by the `new-e2e-process-windows` CI job for efficiency rather than logical organization. Placing it in `tests/language-detection/` would require a separate Windows CI job and provision an additional Windows instance, adding ~10 min of extra CI time. ## Test plan - [x] E2E test confirms regression (fails against 7.78.0-rc.5 with `{"Entities":{}}`) (details in jira comment: https://datadoghq.atlassian.net/browse/CXP-3401?focusedCommentId=3134649) - [x] tested manually since e2e test is blocked by testing issue Ran the python script ``` ❯ sshpass -p "<PASSWORD>" ssh -o StrictHostKeyChecking=no <SERVER> 'Set-Content -Path C:\sleep.py -Value "import time; time.sleep(600)"; Start-Process -FilePath "C:\Program Files\Datadog\Datadog Agent\embedded3\python.exe" -ArgumentList "C:\sleep.py" -WindowStyle Hidden; Start-Sleep 60; Get-Process python' ``` We can see the python language appear ``` ❯ sshpass -p "<PASSWORD>" ssh -o StrictHostKeyChecking=no <SERVER> '& "C:\Program Files\Datadog\Datadog Agent\bin\agent.exe" workload-list --json' | grep -o '"Language":{[^}]*}' | sort | uniq -c | sort -rn 96 "Language":{} 1 "Language":{"Name":"python"} ``` 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-authored-by: gengnamstyle <matthew.geng@datadoghq.com> Co-authored-by: ali.benabdallah <ali.benabdallah@datadoghq.com>

gengnamstyle added backport/7.78.x Automatically create a backport PR to the 7.78.x branch once the PR is merged and removed backport/7.78.x Automatically create a backport PR to the 7.78.x branch once the PR is merged labels Apr 3, 2026

dd-octo-sts bot added internal Identify a non-fork PR team/container-experiences labels Apr 3, 2026

github-actions bot added the short review PR is simple enough to be reviewed quickly label Apr 3, 2026

gengnamstyle added changelog/no-changelog No changelog entry needed qa/done QA done before merge and regressions are covered by tests labels Apr 3, 2026

gengnamstyle force-pushed the matthew.geng/CXP-3401-reduce-windows-e2e-flakiness branch from 9fe16e9 to 7988d07 Compare April 3, 2026 03:42

gengnamstyle self-assigned this Apr 3, 2026

gengnamstyle force-pushed the matthew.geng/CXP-3401-reduce-windows-e2e-flakiness branch from 7988d07 to 1903884 Compare April 3, 2026 05:50

KevinFairise2 mentioned this pull request Apr 3, 2026

Add back chocolatey ... #48836

Merged

gengnamstyle force-pushed the matthew.geng/CXP-3401-reduce-windows-e2e-flakiness branch 3 times, most recently from 2f2e240 to 22cc244 Compare April 3, 2026 18:21

gengnamstyle force-pushed the matthew.geng/CXP-3401-reduce-windows-e2e-flakiness branch from 22cc244 to 1f8ef2c Compare April 3, 2026 19:21

gengnamstyle marked this pull request as ready for review April 3, 2026 20:31

gengnamstyle requested a review from a team as a code owner April 3, 2026 20:31

gengnamstyle changed the title ~~[CXP-3401] Reduce flakiness in Windows language detection e2e test~~ [CXP-3401][agent] skip broken e2e windows language detection test Apr 3, 2026

gengnamstyle changed the title ~~[CXP-3401][agent] skip broken e2e windows language detection test~~ [CXP-3401][agent] skip broken e2e windows language detection test + clean up Apr 3, 2026

brmenchl approved these changes Apr 6, 2026

View reviewed changes

gh-worker-dd-mergequeue-cf854d bot merged commit eaeaa61 into main Apr 6, 2026
265 checks passed

gh-worker-dd-mergequeue-cf854d bot deleted the matthew.geng/CXP-3401-reduce-windows-e2e-flakiness branch April 6, 2026 15:04

github-actions bot added this to the 7.79.0 milestone Apr 6, 2026

gengnamstyle mentioned this pull request Apr 6, 2026

[Backport 7.78.x] [CXP-3401][agent][windows] Fix remote process collector missing from Windows build #48830

Merged

2 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[CXP-3401][agent] skip broken e2e windows language detection test + clean up#48833

[CXP-3401][agent] skip broken e2e windows language detection test + clean up#48833
gh-worker-dd-mergequeue-cf854d[bot] merged 1 commit intomainfrom
matthew.geng/CXP-3401-reduce-windows-e2e-flakiness

gengnamstyle commented Apr 3, 2026 •

edited

Loading

Uh oh!

agent-platform-auto-pr bot commented Apr 3, 2026 •

edited

Loading

Uh oh!

Uh oh!

cit-pr-commenter-54b7da bot commented Apr 6, 2026

Experiments ignored for regressions

Fine details of change detection per experiment

Bounds Checks: ✅ Passed

Explanation

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

gengnamstyle commented Apr 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Changes

Uh oh!

agent-platform-auto-pr bot commented Apr 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Files inventory check summary

Results for datadog-agent_7.79.0~devel.git.413.1f8ef2c.pipeline.106014820-1_amd64.deb:

Uh oh!

Uh oh!

cit-pr-commenter-54b7da bot commented Apr 6, 2026

Regression Detector

Regression Detector Results

Optimization Goals: ✅ No significant changes detected

Experiments ignored for regressions

Fine details of change detection per experiment

Bounds Checks: ✅ Passed

Explanation

CI Pass/Fail Decision

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

gengnamstyle commented Apr 3, 2026 •

edited

Loading

agent-platform-auto-pr bot commented Apr 3, 2026 •

edited

Loading