Skip to content

remote feature#48691

Draft
BarFinsdd wants to merge 62 commits intomainfrom
bar.fins/remote-feature
Draft

remote feature#48691
BarFinsdd wants to merge 62 commits intomainfrom
bar.fins/remote-feature

Conversation

@BarFinsdd
Copy link
Copy Markdown
Contributor

What does this PR do?

copy of bar.fins/add-dependency-map-logic-same-host-services

Motivation

Describe how you validated your changes

Additional Notes

BarFinsdd and others added 30 commits March 30, 2026 11:03
Build a cross-platform port-to-PID map from connections to resolve
the destination service context for IntraHost connections.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Build a listening port-to-PID map using OS-level APIs (portlist.Poller)
on Linux and Windows, with a fallback to connections-based mapping.
For IntraHost connections, resolve the destination service context and
store it as RemoteServiceTagsIdx on the Connection proto field.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add tagger process tags (service, env, version, tracer metadata) to
remote service tags for same-host connections, alongside existing
process_context tags from the service extractor.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Cache IIS-specific tags (sitename, app_pool, subsite, service, env, version)
from ETW HTTP service events and expose them via a system-probe endpoint so
the process-agent can enrich remote service tags on same-host connections.

Entries have a 2-minute TTL and are evicted on read.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…lution

Expose process cache tags from system-probe via /process_cache_tags endpoint
for PID-based tag lookup. For same-host connections, try IIS ETW cache tags
first; fall back to process_context, tagger, and process cache tags by
destination PID. Limit IIS cache to 1024 entries and only cache local
connections.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…lution

Move getListeningPortToPIDMap to net_portmap.go (linux || windows) to
eliminate duplication. Add getRemoteProcessTags as a platform-specific
function: Windows uses process cache from system-probe, Linux uses the
tagger. Skip remote tag resolution for containerized connections.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Exercise the platform-specific remote process tag resolution path
(getRemoteProcessTags) which was previously untested. The new sub-test
provides both procCacheTags (Windows) and processTagProvider (Linux) so
it works cross-platform.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Remote service tag resolution depends on platform-specific functions
(getListeningPortToPIDMap, getRemoteProcessTags) that are not
implemented on macOS. Skip the entire test on non-Linux/Windows.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Fix unsafe type assertion in process_cache.GetAllPIDTags (panic risk)
- Use singleton portlist.Poller with cached results instead of creating
  a new Poller on every collection cycle
- Remove inaccurate fallback that mapped ephemeral ports to PIDs; pass
  portToPID from Run() into batchConnections
- Add context.WithTimeout to HTTP fetches in net_windows.go
- Replace full-map sweep in storeIISTagsCache with single-entry eviction
  to avoid latency spikes in the ETW callback path
- Make GetIISTagsCache read-only (skip expired entries without deleting)
- Replace fmt.Sprintf with strconv in buildIISTags hot path
- Remove Windows 'nul' artifact from .gitignore
- Add unit tests for GetAllPIDTags and IIS cache internals

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…inux

Instead of using the tagger (which requires system-probe's discovery
module and only works for APM-instrumented processes), read DD_SERVICE
directly from /proc/<pid>/environ so that any process with DD_SERVICE
set in its environment is correctly tagged in same-host connection data.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Split the "PID fallback with remote process tags" test:
- Common test (net_test.go): asserts service extractor's process_context
  tag which works on all platforms
- net_linux_test.go: tests getRemoteProcessTags reading /proc/environ
- net_windows_test.go: tests getRemoteProcessTags using procCacheTags

This fixes CI failure where the Linux /proc-based implementation cannot
return env:prod/service:web tags that only exist in the test's mock
processTagProvider callback (which Linux ignores).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
IIS tags are Windows-only, so move all IIS-related subtests (IIS match,
containerized guard, IntraHost guard, no-match) to the Windows test file.
Common net_test.go retains only platform-agnostic PID fallback tests.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…inux

getRemoteProcessTags on Linux reads DD_SERVICE from /proc/<pid>/environ
and now returns a service: tag instead of process_context:, which was
a duplicate of what serviceExtractor.GetServiceContext already provides
via cmdline parsing.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Add Linux and Windows Python HTTP server e2e tests for process_context remote tags
- Use WMI process creation on Windows for detached Python servers that survive SSH cleanup
- Use keep-alive connections on both platforms for reliable process context resolution
- Make getConnectionStats generic with variadic requiredTagPrefixes
- Fix IIS batch2 to not require per-connection IIS tags (cumulative FakeIntake includes expired cache entries)
- Merge identical usm-iis.yaml and usm-python.yaml into single usm.yaml
- Add service readiness assertions in deploy helpers
- Add warm-up and tolerance for agent startup race conditions

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
… agent

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Fix storeIISTagsCache to evict oldest entry when at capacity instead
  of silently dropping new entries
- Add TestHTTPRemoteTagsWindowsSuite to Windows e2e CI
- Enable process_service_inference in USM test config for Linux
- Refactor IIS test to verify sites sequentially with single-port helpers
- Re-comment deployWindowsBinaries/deployLinuxBinaries for CI

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
In CI the agent starts before IIS is installed, so system-probe's IIS
ETW provider never initializes. Add an explicit agent restart after IIS
and site creation to ensure the ETW tag cache is populated.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
portlist.Poller runs as dd-agent and cannot read /proc/<pid>/fd/ for
processes owned by other users, so listening ports are unresolved.
Supplement the map from intra-host connection entries provided by
system-probe (which runs as root and has the correct server-side PIDs).

Also cleans up temporary diagnostic logs and switches Python test to
use urllib for HTTP requests.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…agent

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Go treats *_windows_test.go as Windows-only. Rename to
ec2_1host_wkit_python_test.go so it compiles on Linux CI runners.
Also comment out deployLinuxBinaries for CI.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
BarFinsdd and others added 17 commits March 30, 2026 11:03
The Windows network tracer retains closed connections for up to 2
minutes (ClientStateExpiry). With the previous 2-minute cache TTL,
the final connection report could miss IIS tags when the cache
expired at the same time as the connection state.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Helps diagnose IIS keep-alive settings that affect ETW cache TTL
requirements for remote service tag resolution.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
… support

Extract RemoteServiceTagsIdx enrichment logic into a shared
remoteservice.Resolver package used by both the process-agent
(net.go) and the system-probe direct sender (sender_linux.go).
This ensures intra-host connections get remote service tags
regardless of which data path is active.

Additional changes:
- Add CREATE_NO_WINDOW and 30s timeout to PowerShell exec
- Delegate buildIISTags to DynamicTags() to avoid duplication
- Move platform-specific fetches into fetchRemoteServiceData()
- Add concurrency safety documentation to getListeningPortToPIDMap

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…cy-map-logic-same-host-services

# Conflicts:
#	go.sum
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Remote service tag enrichment now only applies to intra-host,
non-containerized, TCP connections. The portToPID map is also
filtered to TCP-only entries at all build sites.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…ce-tags-feature

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@BarFinsdd BarFinsdd self-assigned this Mar 31, 2026
@BarFinsdd BarFinsdd added changelog/no-changelog No changelog entry needed team/universal-service-monitoring The USM team qa/done QA done before merge and regressions are covered by tests labels Mar 31, 2026
@dd-octo-sts dd-octo-sts bot added the internal Identify a non-fork PR label Mar 31, 2026
@github-actions github-actions bot added the long review PR is complex, plan time to review it label Mar 31, 2026
@agent-platform-auto-pr
Copy link
Copy Markdown
Contributor

Go Package Import Differences

Baseline: 8f77c3b
Comparison: 9d3de01

binaryosarchchange
agentlinuxamd64
+1, -0
+github.com/DataDog/datadog-agent/pkg/network/remoteservice
agentlinuxarm64
+1, -0
+github.com/DataDog/datadog-agent/pkg/network/remoteservice
agentwindowsamd64
+1, -0
+github.com/DataDog/datadog-agent/pkg/network/remoteservice
agentdarwinamd64
+1, -0
+github.com/DataDog/datadog-agent/pkg/network/remoteservice
agentdarwinarm64
+1, -0
+github.com/DataDog/datadog-agent/pkg/network/remoteservice
iot-agentlinuxamd64
+1, -0
+github.com/DataDog/datadog-agent/pkg/network/remoteservice
iot-agentlinuxarm64
+1, -0
+github.com/DataDog/datadog-agent/pkg/network/remoteservice
heroku-agentlinuxamd64
+1, -0
+github.com/DataDog/datadog-agent/pkg/network/remoteservice
process-agentlinuxamd64
+4, -0
+github.com/DataDog/datadog-agent/pkg/network/remoteservice
+github.com/DataDog/datadog-agent/pkg/util/port/portlist
+go4.org/mem
+hash/maphash
process-agentlinuxarm64
+4, -0
+github.com/DataDog/datadog-agent/pkg/network/remoteservice
+github.com/DataDog/datadog-agent/pkg/util/port/portlist
+go4.org/mem
+hash/maphash
process-agentwindowsamd64
+3, -0
+github.com/DataDog/datadog-agent/pkg/network/remoteservice
+github.com/DataDog/datadog-agent/pkg/util/port/portlist
+golang.org/x/sys/cpu
process-agentdarwinamd64
+1, -0
+github.com/DataDog/datadog-agent/pkg/network/remoteservice
process-agentdarwinarm64
+1, -0
+github.com/DataDog/datadog-agent/pkg/network/remoteservice
heroku-process-agentlinuxamd64
+4, -0
+github.com/DataDog/datadog-agent/pkg/network/remoteservice
+github.com/DataDog/datadog-agent/pkg/util/port/portlist
+go4.org/mem
+hash/maphash
system-probelinuxamd64
+1, -0
+github.com/DataDog/datadog-agent/pkg/network/remoteservice
system-probelinuxarm64
+1, -0
+github.com/DataDog/datadog-agent/pkg/network/remoteservice

@agent-platform-auto-pr
Copy link
Copy Markdown
Contributor

Files inventory check summary

File checks results against ancestor 8f77c3b8:

Results for datadog-agent_7.79.0~devel.git.369.9d3de01.pipeline.105298105-1_amd64.deb:

No change detected

@agent-platform-auto-pr
Copy link
Copy Markdown
Contributor

Static quality checks

❌ Please find below the results from static quality gates
Comparison made with ancestor 8f77c3b
📊 Static Quality Gates Dashboard
🔗 SQG Job

Error

Quality gate Change Size (prev → curr → max)
iot_agent_deb_amd64 (on disk) +8.06 KiB (0.02% increase) 43.285 → 43.293 → 43.290
iot_agent_rpm_amd64 (on disk) +8.06 KiB (0.02% increase) 43.286 → 43.294 → 43.290
iot_agent_suse_amd64 (on disk) +8.06 KiB (0.02% increase) 43.286 → 43.294 → 43.290
Gate failure full details
Quality gate Error type Error message
iot_agent_deb_amd64 StaticQualityGateFailed static_quality_gate_iot_agent_deb_amd64 failed!
Disk size 43.3 MB exceeds limit of 43.3 MB by 3.3 KB
iot_agent_rpm_amd64 StaticQualityGateFailed static_quality_gate_iot_agent_rpm_amd64 failed!
Disk size 43.3 MB exceeds limit of 43.3 MB by 3.8 KB
iot_agent_suse_amd64 StaticQualityGateFailed static_quality_gate_iot_agent_suse_amd64 failed!
Disk size 43.3 MB exceeds limit of 43.3 MB by 3.8 KB

Static quality gates prevent the PR to merge!
You can check the static quality gates confluence page for guidance. We also have a toolbox page available to list tools useful to debug the size increase.

Successful checks

Info

Quality gate Change Size (prev → curr → max)
agent_deb_amd64 +56.75 KiB (0.01% increase) 753.077 → 753.132 → 753.380
agent_deb_amd64_fips +52.72 KiB (0.01% increase) 710.016 → 710.067 → 713.900
agent_heroku_amd64 +8.06 KiB (0.00% increase) 313.321 → 313.329 → 320.580
agent_msi +82.0 KiB (0.01% increase) 604.863 → 604.943 → 651.440
agent_rpm_amd64 +56.75 KiB (0.01% increase) 753.060 → 753.116 → 753.350
agent_rpm_amd64_fips +52.72 KiB (0.01% increase) 710.000 → 710.051 → 713.880
agent_rpm_arm64 +48.78 KiB (0.01% increase) 731.482 → 731.530 → 735.290
agent_rpm_arm64_fips +48.75 KiB (0.01% increase) 691.443 → 691.490 → 696.840
agent_suse_amd64 +56.75 KiB (0.01% increase) 753.060 → 753.116 → 753.350
agent_suse_amd64_fips +52.72 KiB (0.01% increase) 710.000 → 710.051 → 713.880
agent_suse_arm64 +48.78 KiB (0.01% increase) 731.482 → 731.530 → 735.290
agent_suse_arm64_fips +48.75 KiB (0.01% increase) 691.443 → 691.490 → 696.840
docker_agent_amd64 +56.74 KiB (0.01% increase) 813.379 → 813.435 → 815.700
docker_agent_arm64 +48.78 KiB (0.01% increase) 816.572 → 816.619 → 821.970
docker_agent_jmx_amd64 +56.75 KiB (0.01% increase) 1004.295 → 1004.350 → 1006.580
docker_agent_jmx_arm64 +48.78 KiB (0.00% increase) 996.266 → 996.313 → 1001.570
docker_cluster_agent_amd64 +4.07 KiB (0.00% increase) 203.945 → 203.949 → 206.270
iot_agent_deb_arm64 +8.06 KiB (0.02% increase) 40.332 → 40.340 → 40.920
iot_agent_deb_armhf +8.05 KiB (0.02% increase) 41.080 → 41.088 → 41.100
9 successful checks with minimal change (< 2 KiB)
Quality gate Current Size
docker_cluster_agent_arm64 218.419 MiB
docker_cws_instrumentation_amd64 7.142 MiB
docker_cws_instrumentation_arm64 6.689 MiB
docker_dogstatsd_amd64 39.234 MiB
docker_dogstatsd_arm64 37.445 MiB
dogstatsd_deb_amd64 29.878 MiB
dogstatsd_deb_arm64 28.030 MiB
dogstatsd_rpm_amd64 29.878 MiB
dogstatsd_suse_amd64 29.878 MiB
On-wire sizes (compressed)
Quality gate Change Size (prev → curr → max)
iot_agent_deb_amd64 neutral 11.403 MiB → 12.040
iot_agent_rpm_amd64 neutral 11.420 MiB → 12.060
iot_agent_suse_amd64 neutral 11.420 MiB → 12.060
agent_deb_amd64 +23.52 KiB (0.01% increase) 174.748 → 174.771 → 178.360
agent_deb_amd64_fips +19.61 KiB (0.01% increase) 165.366 → 165.385 → 172.790
agent_heroku_amd64 neutral 75.004 MiB → 79.970
agent_msi +28.0 KiB (0.02% increase) 138.387 → 138.414 → 146.220
agent_rpm_amd64 +20.8 KiB (0.01% increase) 177.620 → 177.640 → 181.830
agent_rpm_amd64_fips +25.12 KiB (0.01% increase) 167.700 → 167.724 → 173.370
agent_rpm_arm64 neutral 159.568 MiB → 163.060
agent_rpm_arm64_fips +10.12 KiB (0.01% increase) 151.413 → 151.422 → 156.170
agent_suse_amd64 +20.8 KiB (0.01% increase) 177.620 → 177.640 → 181.830
agent_suse_amd64_fips +25.12 KiB (0.01% increase) 167.700 → 167.724 → 173.370
agent_suse_arm64 neutral 159.568 MiB → 163.060
agent_suse_arm64_fips +10.12 KiB (0.01% increase) 151.413 → 151.422 → 156.170
docker_agent_amd64 +28.55 KiB (0.01% increase) 268.196 → 268.224 → 272.480
docker_agent_arm64 +26.03 KiB (0.01% increase) 255.391 → 255.416 → 261.060
docker_agent_jmx_amd64 +14.81 KiB (0.00% increase) 336.845 → 336.859 → 341.100
docker_agent_jmx_arm64 +25.86 KiB (0.01% increase) 320.027 → 320.052 → 325.620
docker_cluster_agent_amd64 neutral 71.375 MiB → 72.920
docker_cluster_agent_arm64 +6.44 KiB (0.01% increase) 67.011 → 67.017 → 68.220
docker_cws_instrumentation_amd64 neutral 2.999 MiB → 3.330
docker_cws_instrumentation_arm64 neutral 2.729 MiB → 3.090
docker_dogstatsd_amd64 neutral 15.174 MiB → 15.820
docker_dogstatsd_arm64 neutral 14.487 MiB → 14.830
dogstatsd_deb_amd64 -2.56 KiB (0.03% reduction) 7.896 → 7.893 → 8.790
dogstatsd_deb_arm64 neutral 6.778 MiB → 7.710
dogstatsd_rpm_amd64 neutral 7.903 MiB → 8.800
dogstatsd_suse_amd64 neutral 7.903 MiB → 8.800
iot_agent_deb_arm64 +2.29 KiB (0.02% increase) 9.705 → 9.707 → 10.450
iot_agent_deb_armhf neutral 9.944 MiB → 10.620

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

changelog/no-changelog No changelog entry needed component/system-probe internal Identify a non-fork PR long review PR is complex, plan time to review it qa/done QA done before merge and regressions are covered by tests team/agent-build team/cloud-network-monitoring team/universal-service-monitoring The USM team team/windows-products

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant