Remove docs and unit tests surrounding running agent as non-root #8931

naemono · 2025-11-24T21:36:07Z

Since v8.16 Elastic Agent doesn't need the daemonset to adjust directory permissions when running as non-root. This removes/adjusts the documentation to make this more clear. It also removes the e2e test.

Testing

Applied fleet-kubernetes-integration.yaml in the default namespace and everything came up without issues (see screenshot)
Verified volume uses hostPath:

      volumes:
      - hostPath:
          path: /var/lib/elastic-agent/default/elastic-agent/state
          type: DirectoryOrCreate
        name: agent-data

This will be paired with a documentation PR as well, fyi.

Signed-off-by: Michael Montgomery <[email protected]>

prodsecmachine · 2025-11-24T21:36:19Z

✅ Snyk checks have passed. No issues have been found so far.

Status	Scanner	Critical	High	Medium	Low	Total (0)
✅	Open Source Security	0	0	0	0	0 issues
✅	Licenses	0	0	0	0	0 issues

💻 Catch issues earlier using the plugins for VS Code, JetBrains IDEs, Visual Studio, and Eclipse.

Signed-off-by: Michael Montgomery <[email protected]>

Copilot

Pull request overview

This pull request removes obsolete documentation and testing infrastructure for running Elastic Agent as non-root, as this functionality is now natively supported in Elastic Agent v8.16.0 and later without requiring special DaemonSet configuration.

Key Changes:

Removed the dedicated non-root recipe file and its associated end-to-end test
Simplified test helper code by removing non-root specific DaemonSet mutation logic
Updated main recipe file to document that runAsUser: 0 is no longer required since v8.16.0

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 2 comments.

Show a summary per file

File	Description
test/e2e/test/helper/yaml.go	Removed `maybeMutateForAgentNonRootTests` function and fleet outputs configuration logic that was only needed for non-root testing; simplified DaemonSet handling
test/e2e/agent/recipes_test.go	Removed `TestFleetKubernetesNonRootIntegrationRecipe` test function that validated the non-root configuration
config/recipes/elastic-agent/fleet-kubernetes-integration.yaml	Added explanatory comments indicating that `runAsUser: 0` is no longer needed since v8.16.0
config/recipes/elastic-agent/fleet-kubernetes-integration-nonroot.yaml	Deleted entire recipe file that demonstrated non-root setup with DaemonSet for permission management
config/recipes/elastic-agent/README.asciidoc	Removed documentation section about the non-root integration recipe

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

You can also share your feedback on Copilot code review for a chance to win a $100 gift card. Take the survey.

config/recipes/elastic-agent/fleet-kubernetes-integration.yaml

Signed-off-by: Michael Montgomery <[email protected]>

pkoutsovasilis

Thanks for this cleanup @naemono!

Just want to confirm my understanding; elastic-agent >= 8.16 does have the capability to run as non-root, but for this to be effective there are some requirements:

The agent container needs CAP_SETPCAP and CAP_CHOWN capabilities - these are enabled by default in all major container runtimes but a user can/may drop them.
securityContext.allowPrivilegeEscalation should be true - IIRC this is also the default for most container runtimes but still a user can/may set this as false.

This raises a few questions in my head:

Should the operator explicitly set these capabilities? Even though they're defaults in most runtimes, should we be explicit about requiring them rather than relying on runtime defaults? This would make the requirements clearer and prevent issues if someone is running with a more restrictive runtime configuration.
Init container pattern vs continuous elevated capabilities: I'm trying to think through the security implications. Even though we can run as non-root with >= 8.16, we still need to run the agent container continuously with elevated capabilities. I'm wondering if there's still value in the init container approach for some users - where only the init container runs with elevated capabilities to set up permissions, and then everything is dropped from the main agent container.
How does this work with OpenShift? OCP has stricter default Security Context Constraints. Will the agent work out of the box on OCP with these capability requirements, or will users need to create custom SCCs?

Do you think there are security-conscious users who would prefer the init container pattern (short-lived elevated privileges) over a continuously running agent container with CAP_NET_RAW and CAP_CHOWN? Or am I overthinking this? WDYT @naemono?

pebrc · 2025-12-02T14:59:04Z

Do you think there are security-conscious users who would prefer the init container pattern (short-lived elevated privileges) over a continuously running agent container with CAP_NET_RAW and CAP_CHOWN? Or am I overthinking this?

I think this is a valid point and I think the init container pattern has indeed advantages over running the main container with elevated privileges. You mentioned that CAP_SETPCAP and privilege escalation is required, that sounds like quite far reaching privileges to me.

naemono · 2025-12-02T15:22:41Z

Do you think there are security-conscious users who would prefer the init container pattern (short-lived elevated privileges) over a continuously running agent container with CAP_NET_RAW and CAP_CHOWN? Or am I overthinking this?

I think this is a valid point and I think the init container pattern has indeed advantages over running the main container with elevated privileges. You mentioned that CAP_SETPCAP and privilege escalation is required, that sounds like quite far reaching privileges to me.

I don't disagree with what is being said here. I struggled with how to present this such that is was "clear" to the user the approaches one could take with respect to Agent without making it a convoluted mess. Unfortunately with the following points

The agent container needs CAP_SETPCAP and CAP_CHOWN capabilities - these are enabled by default in all major container runtimes but a user can/may drop them.
securityContext.allowPrivilegeEscalation should be true - IIRC this is also the default for most container runtimes but still a user can/may set this as false.

Things become even more difficult to document for Agent. I see some options:

Leave docs as-is. This unfortunately has the disadvantage of not even pointing out the 816 changes that many users may want to use, as it's a greatly simplified approach and handles the "defaults" for many/most users.
Present the 8.16+ option along-side the other options while trying to keep it clear the versions required (ECK >= 2.10.0 and Agent < 7.14.0), (ECK * and Agent 8.16+), etc for each approach.
Potentially other options...

Any preferences for an approach to try and bring this documentation up to current @pebrc @pkoutsovasilis ?

barkbay · 2025-12-04T07:53:04Z

The agent container needs CAP_SETPCAP and CAP_CHOWN capabilities [...]
[...]
I'm wondering if there's still value in the init container approach for some users - where only the init container runs with elevated capabilities to set up permissions, and then everything is dropped from the main agent container.

Sorry I'm a bit out of my depth here, and this is maybe off topic, but could you help me understand how running an init container could help avoid the need for CAP_SETPCAP?

pebrc · 2025-12-04T09:15:26Z

The agent container needs CAP_SETPCAP and CAP_CHOWN capabilities [...]
[...]
I'm wondering if there's still value in the init container approach for some users - where only the init container runs with elevated capabilities to set up permissions, and then everything is dropped from the main agent container.

Sorry I'm a bit out of my depth here, and this is maybe off topic, but could you help me understand how running an init container could help avoid the need for CAP_SETPCAP?

I think you might be right that Agent's main container needs CAP_SETPCAP regardless of the chowning happening in an init container because it needs to be able to setup its Beats child processes correctly. Maybe @pkoutsovasilis knows more here.

pkoutsovasilis · 2025-12-05T10:25:08Z

Sorry I'm a bit out of my depth here, and this is maybe off topic, but could you help me understand how running an init container could help avoid the need for CAP_SETPCAP?

The main point I'm trying to make is whether there's value from a security perspective for users who prefer an init container running as root to perform chowning, instead of a long-running elastic-agent container with securityContext.allowPrivilegeEscalation: true. My focus here is more on the privilege escalation setting rather than specific capabilities, but let me explain the capabilities mechanism as well:

When transitioning from a rootful process (container runtime) to a non-root process (elastic-agent), the effective capabilities are not maintained. As documented in the Linux capabilities man page under "Effect of user ID changes on capabilities": when the effective user ID changes from 0 to nonzero, all capabilities are cleared from the effective set. However, a rootless process can elevate its capabilities if they are part of the bounding set (this is controlled by dropped and added capabilities) and privilege escalation is allowed (securityContext.allowPrivilegeEscalation: true). This is what elastic-agent does here when running as non-root - it elevates its effective capabilities to those in its bounding set.
- Note that to raise capabilities in the Effective set, they must first be in the Permitted set, which elastic-agent achieves here by setting file capabilities.
After elevating the capabilities in its effective set, and because elastic-agent spawns subprocesses, it needs to modify both the inheritable and ambient sets of its own. Modifying the ambient set requires CAP_SETPCAP, which happens here. This is necessary because inheritable capabilities are not preserved across execve when running as non-root, so such applications should use ambient capabilities to allow child processes to inherit them.
- I think there's an assumption here that most probably holds true currently but may change: you assume that only the elastic-agent process with elevated privileges will perform all chowning, but this could easily change if an agent component, spawned as subprocess, needs to do the same. To complicate matters further, elastic-agent sometimes spawns subprocesses of its own binary that might need to perform chowning in the near future. In contrast, an init container approach could eliminate the need for CAP_SETPCAP because path ownership can be adjusted there without code changes. Again, these two capabilities CAP_SETPCAP and CAP_CHOWN are recommended for a rootless elastic-agent, not mandatory.

@barkbay does the above help?

I think you might be right that Agent's main container needs CAP_SETPCAP regardless of the chowning happening in an init container because it needs to be able to setup its Beats child processes correctly. Maybe @pkoutsovasilis knows more here.

@pebrc I don't think this is correct; actually, for versions prior to 8.16 there was no way for a rootless execution to elevate even its own capabilities. The same holds true for any rootless invocation that has securityContext.allowPrivilegeEscalation: false. As a matter of fact, elastic-agent has no issue running with securityContext.allowPrivilegeEscalation: false and all capabilities dropped if the state path is an EmptyDir and the user is running components that don't need any further capabilities. Also, AFAIK the component spec of elastic-agent has no field that captures which capabilities are needed by components, so the current approach is that components get whatever capabilities elastic-agent has.

Also, I may be missing something, but CAP_SETPCAP isn't particularly dangerous as a permission to have:

CAP_SETPCAP
If file capabilities are supported (i.e., since Linux 2.6.24): add any capability from the calling thread's bounding set to its inheritable set; drop capabilities from the bounding set (via prctl(2) PR_CAPBSET_DROP); make changes to the securebits flags.

pebrc · 2025-12-05T10:44:49Z

@pkoutsovasilis what is you recommendation? We should keep the recipe around because there is value in having instructions how to run agent without privilege escalation?

pkoutsovasilis · 2025-12-05T11:03:33Z

@pkoutsovasilis what is you recommendation? We should keep the recipe around because there is value in having instructions how to run agent without privilege escalation?

yes this would be my proposal; let's extend our current documentation to capture that elastic-agent can be invoked rootless without an init container but this would require securityContext.allowPrivilegeEscalation: true and CAP_CHOWN capability if the user needs to have the state persisted (we can debate if CAP_SETPCAP is required, I would leave it in to guarantee that every subprocess gets the capabilities). Then shift a bit the existing documentation to point out that there is another way to run rootless elastic-agent without privilege escalation.

Remove docs and unit tests surrounding running agent as non-root

e9f61e1

Signed-off-by: Michael Montgomery <[email protected]>

botelastic bot added the triage label Nov 24, 2025

Remove recipe

c26e9fd

Signed-off-by: Michael Montgomery <[email protected]>

naemono requested review from barkbay, Copilot, kvalliyurnatt, pebrc, pkoutsovasilis and rhr323 and removed request for pebrc November 24, 2025 21:57

naemono added >enhancement Enhancement of existing functionality >docs Documentation labels Nov 24, 2025

Copilot started reviewing on behalf of naemono November 24, 2025 21:58 View session

botelastic bot removed the triage label Nov 24, 2025

naemono marked this pull request as ready for review November 24, 2025 21:58

naemono changed the title ~~WIP: Remove docs and unit tests surrounding running agent as non-root~~ Remove docs and unit tests surrounding running agent as non-root Nov 24, 2025

Copilot finished reviewing on behalf of naemono November 24, 2025 21:59

Copilot AI reviewed Nov 24, 2025

View reviewed changes

config/recipes/elastic-agent/fleet-kubernetes-integration.yaml Outdated Show resolved Hide resolved

config/recipes/elastic-agent/fleet-kubernetes-integration.yaml Outdated Show resolved Hide resolved

naemono added 3 commits November 26, 2025 09:53

Add additional notes to the example and add back the recipe.

4c26d80

Signed-off-by: Michael Montgomery <[email protected]>

Adjust wording.

dbd92ee

Signed-off-by: Michael Montgomery <[email protected]>

Merge branch 'main' into update-agent-run-as-root-8.16

89615f9

pkoutsovasilis reviewed Dec 2, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Remove docs and unit tests surrounding running agent as non-root #8931

Remove docs and unit tests surrounding running agent as non-root #8931

Uh oh!

naemono commented Nov 24, 2025 •

edited

Loading

Uh oh!

prodsecmachine commented Nov 24, 2025 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

pkoutsovasilis left a comment

Uh oh!

pebrc commented Dec 2, 2025

Uh oh!

naemono commented Dec 2, 2025

Uh oh!

barkbay commented Dec 4, 2025

Uh oh!

pebrc commented Dec 4, 2025

Uh oh!

pkoutsovasilis commented Dec 5, 2025

Uh oh!

pebrc commented Dec 5, 2025

Uh oh!

pkoutsovasilis commented Dec 5, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Remove docs and unit tests surrounding running agent as non-root #8931

Are you sure you want to change the base?

Remove docs and unit tests surrounding running agent as non-root #8931

Uh oh!

Conversation

naemono commented Nov 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Testing

Uh oh!

prodsecmachine commented Nov 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

✅ Snyk checks have passed. No issues have been found so far.

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

pkoutsovasilis left a comment

Choose a reason for hiding this comment

Uh oh!

pebrc commented Dec 2, 2025

Uh oh!

naemono commented Dec 2, 2025

Uh oh!

barkbay commented Dec 4, 2025

Uh oh!

pebrc commented Dec 4, 2025

Uh oh!

pkoutsovasilis commented Dec 5, 2025

Uh oh!

pebrc commented Dec 5, 2025

Uh oh!

pkoutsovasilis commented Dec 5, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

naemono commented Nov 24, 2025 •

edited

Loading

prodsecmachine commented Nov 24, 2025 •

edited

Loading