Skip to content

Commit f887181

Browse files
authored
[NDM] Add core check for NCM (#39735)
<!-- * Contributors are encouraged to read our [CONTRIBUTING](/CONTRIBUTING.md) documentation. * Both Contributor and Reviewer Checklists are available at https://datadoghq.dev/datadog-agent/guidelines/contributing/#pull-requests. * The pull request: * Should only fix one issue or add one feature at a time. * Must update the test suite for the relevant functionality. * Should pass all status checks before being reviewed or merged. * Commit titles should be prefixed with general area of pull request's change. * Please fill the below sections if possible with relevant information or links. --> ### What does this PR do? Adds the agent corecheck/integration for NCM / Network Config Management feature owned by NDM This PR adds: * A new core check `network_config_management` + defaults around that * Refine shared logic between the core check and traps/syslogs-based retrieval * Adds tests You can refer to this [documentation](https://datadoghq.atlassian.net/wiki/spaces/II/pages/5367792210/NCM+Architecture+Overview) for details regarding the vision for the agent-based architecture. **Core check flow** 1. Check is scheduled, configured according to the `conf.yaml` (example below in QA section) 2. For the device, retrieval (default 15m) during the check only grabs the running config (hardcoded for Cisco devices currently) 3. Finishes by submitting to EvP Many tasks left open, including: * Additional support for Telnet * Refining SSH support / configurations * Validation, parsing, refinement of config output post-retrieval (incl. sensitive data scrubbing, etc.) * etc. these will be upcoming PRs to address, as this initial contribution has grown large :^D ### Motivation ### Describe how you validated your changes <!-- Validate your changes before merge, ensuring that: * Your PR is tested by static / unit / integrations / e2e tests * Your PR description details which e2e tests cover your changes, if any * The PR description contains details of how you validated your changes. If you validated changes manually and not through automated tests, add context on why automated tests did not fit your changes validation. If you want additional validation by a second person, you can ask reviewers to do it. Describe how to set up an environment for manual tests in the PR description. Manual validation is expected to happen on every commit before merge. Any manual validation step should then map to an automated test. Manual validation should not substitute automation, minus exceptions not supported by test tooling yet. --> **VALIDATION OUTPUT** ``` cisco@qa-agent:~$ sudo -u dd-agent -- datadog-agent check network_config_management {"namespace":"zoe_ncm_test","integration":"","configs":[{"device_id":"zoe_ncm_test:10.10.1.1","device_ip":"10.10.1.1","config_type":"running","timestamp":1755274594,"tags":["device_ip:10.10.1.1"],"content":"\r\n\r\n\r\nBuilding configuration...\r\n\r\n \r\nCurrent configuration : 3144 bytes\r\n!\r\n! Last configuration change at 20:53:27 UTC Thu Aug 14 2025\r\n!\r\nversion 15.9\r\nservice timestamps debug datetime msec\r\nservice timestamps log datetime msec\r\nno service password-encryption\r\n!\r\nhostname qa-device\r\n!\r\nboot-start-marker\r\nboot-end-marker\r\n!\r\n!\r\n!\r\nno aaa new-model\r\n!\r\n!\r\n!\r\nmmi polling-interval 60\r\nno mmi auto-configure\r\nno mmi pvc\r\nmmi snmp-timeout 180\r\n!\r\n!\r\n!\r\n!\r\n!\r\n!\r\n!\r\n!\r\n!\r\n!\r\n!\r\nip ...EDITED FOR THE SAKE OF CONCISENESS"}],"collect_timestamp":1755274594} Running Checks ============== network_config_management ------------------------- Instance ID: network_config_management:zoe_ncm_test:51fd8431b479a10a [OK] Configuration Source: file:/etc/datadog-agent/conf.d/network_config_management.d/conf.yaml Total Runs: 1 Metric Samples: Last Run: 0, Total: 0 Events: Last Run: 0, Total: 0 ndmconfig: Last Run: 1, Total: 1 Service Checks: Last Run: 0, Total: 0 Average Execution Time : 1.089s Last Execution Date : 2025-08-15 16:16:34 UTC (1755274594000) Last Successful Execution Date : 2025-08-15 16:16:34 UTC (1755274594000) Metadata ======== config.hash: network_config_management:zoe_ncm_test:51fd8431b479a10a config.provider: file config.source: /etc/datadog-agent/conf.d/network_config_management.d/conf.yaml ``` QA steps/how to validate * Pull in the `ndm-tools/cml-qa` repo and follow instructions for setting up the CLI ([link](https://github.com/DataDog/ndm-tools/tree/main/cml-qa)) ``` python cml_template_generator.py \ --deb-url <INSERT THE DEB FROM CI/CD BUILD> \ --api-key <INSERT YOUR KEY> \ --site "datad0g.com" \ --title "<YOUR NAME/TITLE> NCM Agent QA" ``` * Go the NDM hosted CML ([docs](https://datadoghq.atlassian.net/wiki/spaces/II/pages/5061640281/Getting+Started+with+Cisco+Modeling+Labs+CML)) * Press Import * Pull in the YAML that gets created, start the lab * <to fill in more details to prep the IOS device for SSH, etc.> <img width="451" height="312" alt="image" src="https://github.com/user-attachments/assets/73f72efe-3677-4faa-a50e-41337db5fba3" /> `/etc/datadog-agent/conf.d/network_config_management.d/conf.yaml` ``` init_config: instances: - ip_address: "10.10.1.1" namespace: "zoe_ncm_test" auth: username: "cisco" password: "cisco" ssh_ciphers: [aes256-ctr, aes192-ctr, aes128-ctr] ssh_key_exchanges: [diffie-hellman-group14-sha1, diffie-hellman-group-exchange-sha1] ssh_host_key_algorithms: [ssh-rsa] ``` `iosv-0/0` steps to enable SSH on network device ``` qa-device#enable qa-device#conf t Enter configuration commands, one per line. End with CNTL/Z. qa-device(config)#username cisco privilege 15 secret cisco qa-device(config)#line vty 0 4 qa-device(config-line)#login local qa-device(config-line)#transport input ssh qa-device(config-line)#exit qa-device(config)#ip domain-name lab.local qa-device(config)#crypto key generate rsa modulus 2048 The name for the keys will be: qa-device.lab.local % The key modulus size is 2048 bits % Generating 2048 bit RSA keys, keys will be non-exportable... [OK] (elapsed time was 3 seconds) qa-device(config)#ip ssh version 2 qa-device(config)#end qa-device#write memory ``` ### Possible Drawbacks / Trade-offs ### Additional Notes <!-- * Anything else we should know when reviewing? * Include benchmarking information here whenever possible. * Include info about alternatives that were considered and why the proposed version was chosen. -->
1 parent 43a1768 commit f887181

File tree

17 files changed

+1656
-0
lines changed

17 files changed

+1656
-0
lines changed

.github/CODEOWNERS

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -448,6 +448,7 @@
448448
/pkg/collector/corechecks/embed/apm/ @DataDog/agent-apm
449449
/pkg/collector/corechecks/embed/process/ @DataDog/container-experiences
450450
/pkg/collector/corechecks/gpu/ @DataDog/ebpf-platform
451+
/pkg/collector/corechecks/networkconfigmanagement/ @DataDog/network-device-monitoring
451452
/pkg/collector/corechecks/network-devices/ @DataDog/ndm-integrations
452453
/pkg/collector/corechecks/orchestrator/ @DataDog/container-app
453454
/pkg/collector/corechecks/net/ @DataDog/agent-runtimes
@@ -608,6 +609,7 @@
608609
/pkg/network/tracer/*_windows*.go @DataDog/windows-products
609610
/pkg/network/usm/ @DataDog/universal-service-monitoring
610611
/pkg/network/usm/tests/*_windows*.go @DataDog/windows-products
612+
/pkg/networkconfigmanagement/ @DataDog/network-device-monitoring
611613
/pkg/ebpf/ @DataDog/ebpf-platform
612614
/pkg/ebpf/map_cleaner*.go @DataDog/universal-service-monitoring
613615
/pkg/compliance/ @DataDog/agent-cspm

comp/forwarder/eventplatform/component.go

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -25,6 +25,9 @@ const (
2525
// EventTypeNetworkPath is the event type for network devices Network Path data
2626
EventTypeNetworkPath = "network-path"
2727

28+
// EventTypeNetworkConfigManagement is the event type for network device configuration management
29+
EventTypeNetworkConfigManagement = "ndmconfig"
30+
2831
// EventTypeContainerLifecycle represents a container lifecycle event
2932
EventTypeContainerLifecycle = "container-lifecycle"
3033
// EventTypeContainerImages represents a container images event
Lines changed: 155 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,155 @@
1+
// Unless explicitly stated otherwise all files in this repository are licensed
2+
// under the Apache License Version 2.0.
3+
// This product includes software developed at Datadog (https://www.datadoghq.com/).
4+
// Copyright 2025-present Datadog, Inc.
5+
6+
//go:build ncm
7+
8+
// Package networkconfigmanagement defines the agent core check for retrieving network device configurations
9+
package networkconfigmanagement
10+
11+
import (
12+
"fmt"
13+
"time"
14+
15+
"github.com/benbjohnson/clock"
16+
17+
"github.com/DataDog/datadog-agent/comp/core/autodiscovery/integration"
18+
"github.com/DataDog/datadog-agent/comp/core/config"
19+
"github.com/DataDog/datadog-agent/pkg/aggregator/sender"
20+
"github.com/DataDog/datadog-agent/pkg/collector/check"
21+
core "github.com/DataDog/datadog-agent/pkg/collector/corechecks"
22+
ncmconfig "github.com/DataDog/datadog-agent/pkg/networkconfigmanagement/config"
23+
ncmremote "github.com/DataDog/datadog-agent/pkg/networkconfigmanagement/remote"
24+
ncmreport "github.com/DataDog/datadog-agent/pkg/networkconfigmanagement/report"
25+
ncmsender "github.com/DataDog/datadog-agent/pkg/networkconfigmanagement/sender"
26+
"github.com/DataDog/datadog-agent/pkg/util/log"
27+
"github.com/DataDog/datadog-agent/pkg/util/option"
28+
)
29+
30+
// CheckName is the name of the check
31+
const CheckName = "network_config_management"
32+
33+
// Check is the main struct for the network configuration management check
34+
type Check struct {
35+
core.CheckBase
36+
checkContext *ncmconfig.NcmCheckContext
37+
sender *ncmsender.NCMSender
38+
agentConfig config.Component
39+
remoteClient ncmremote.Client
40+
clock clock.Clock
41+
}
42+
43+
// Run executes the check to retrieve network device configurations from a device
44+
func (c *Check) Run() error {
45+
var checkErr error
46+
var configs []ncmreport.NetworkDeviceConfig
47+
48+
checkErr = c.remoteClient.Connect()
49+
if checkErr != nil {
50+
log.Errorf("unable to connect to remote device %s: %s", c.checkContext.Device.IPAddress, checkErr)
51+
return checkErr
52+
}
53+
defer func() {
54+
if c.remoteClient != nil {
55+
_ = c.remoteClient.Close()
56+
}
57+
}()
58+
59+
// TODO: validate the running config to make sure it's valid, extract other information from it, etc.
60+
runningConfig, checkErr := c.remoteClient.RetrieveRunningConfig()
61+
if checkErr != nil {
62+
return checkErr
63+
}
64+
65+
deviceID := fmt.Sprintf("%s:%s", c.checkContext.Namespace, c.checkContext.Device.IPAddress)
66+
tags := []string{
67+
"device_ip:" + c.checkContext.Device.IPAddress,
68+
}
69+
runningTimestamp, checkErr := ncmreport.RetrieveTimestampFromConfig(runningConfig)
70+
if checkErr != nil {
71+
log.Warnf("unable to extract last change timestamp from running config for %s, using agent collection ts: %s", deviceID, checkErr)
72+
runningTimestamp = c.clock.Now().Unix()
73+
}
74+
configs = append(configs, ncmreport.ToNetworkDeviceConfig(deviceID, c.checkContext.Device.IPAddress, ncmreport.RUNNING, runningTimestamp, tags, runningConfig))
75+
76+
// TODO: validate the startup config to make sure it's valid, extract other information from it, etc.
77+
startupConfig, checkErr := c.remoteClient.RetrieveStartupConfig()
78+
if checkErr != nil {
79+
// If the startup config cannot be retrieved, log a warning but continue
80+
log.Warnf("unable to retrieve startup config for %s, will not send: %s", deviceID, checkErr)
81+
} else {
82+
startupTimestamp, checkErr := ncmreport.RetrieveTimestampFromConfig(startupConfig)
83+
if checkErr != nil {
84+
log.Warnf("unable to extract last change timestamp from startup config for %s, using agent collection ts: %s", deviceID, checkErr)
85+
startupTimestamp = c.clock.Now().Unix()
86+
}
87+
// add the startup config to the payload if it was retrieved successfully
88+
configs = append(configs, ncmreport.ToNetworkDeviceConfig(deviceID, c.checkContext.Device.IPAddress, ncmreport.STARTUP, startupTimestamp, tags, startupConfig))
89+
}
90+
91+
checkErr = c.sender.SendNCMConfig(ncmreport.ToNCMPayload(c.checkContext.Namespace, "", configs, c.clock.Now().Unix()))
92+
if checkErr != nil {
93+
return checkErr
94+
}
95+
96+
// TODO: Send any metrics as well
97+
//c.sender.SendNCMMetrics()
98+
99+
c.sender.Commit()
100+
return nil
101+
}
102+
103+
// Configure sets up the check with the provided configuration and sender manager
104+
func (c *Check) Configure(senderManager sender.SenderManager, integrationConfigDigest uint64, rawInstance integration.Data, rawInitConfig integration.Data, source string) error {
105+
var err error
106+
107+
// Load/parse the configuration for the device instance
108+
c.checkContext, err = ncmconfig.NewNcmCheckContext(rawInstance, rawInitConfig)
109+
if err != nil {
110+
return fmt.Errorf("build config failed: %s", err)
111+
}
112+
113+
// Must be called before v.CommonConfigure
114+
c.BuildID(integrationConfigDigest, rawInstance, rawInitConfig)
115+
err = c.CommonConfigure(senderManager, rawInitConfig, rawInstance, source)
116+
if err != nil {
117+
return fmt.Errorf("common configure failed: %s", err)
118+
}
119+
120+
// Initialize the Sender
121+
s, err := c.GetSender()
122+
if err != nil {
123+
return err
124+
}
125+
ncmSender := ncmsender.NewNCMSender(s, c.checkContext.Namespace)
126+
c.sender = ncmSender
127+
128+
// TODO: add check to see the device's credentials type (SSH/Telnet) and create appropriate client factory
129+
c.remoteClient = ncmremote.NewSSHClient(c.checkContext.Device)
130+
131+
// Initialize the clock
132+
c.clock = clock.New()
133+
134+
return nil
135+
}
136+
137+
// Interval returns the interval at which the check should run (default 15 minutes for now)
138+
func (c *Check) Interval() time.Duration {
139+
return c.checkContext.MinCollectionInterval
140+
}
141+
142+
// Factory creates a new check factory
143+
func Factory(agentConfig config.Component) option.Option[func() check.Check] {
144+
return option.New(func() check.Check {
145+
return newCheck(agentConfig)
146+
})
147+
}
148+
149+
// newCheck creates a new instance of the Check with the provided agent configuration
150+
func newCheck(agentConfig config.Component) check.Check {
151+
return &Check{
152+
CheckBase: core.NewCheckBase(CheckName),
153+
agentConfig: agentConfig,
154+
}
155+
}

0 commit comments

Comments
 (0)