Skip to content

[Bug]: Ambient capabilities are not applied as expected #3293

@saku3

Description

@saku3

Bug Description

Ambient capabilities are not applied as expected

The root cause lies in the implementation of drop_privileges for ambient capabilities.

https://github.com/youki-dev/youki/blob/main/crates/libcontainer/src/capabilities.rs#L155

    if let Some(ambient) = cs.ambient() {
        // check specifically for ambient, as those might not always be available
        if let Err(e) = syscall.set_capability(CapSet::Ambient, &to_set(ambient)) {
            tracing::warn!("failed to set ambient capabilities: {}", e);
        }
    }

In youki

Inside syscall.set_capability (as shown in the code above), it iterates over a HashSet—which does not guarantee a fixed order—to apply capabilities one by one.
If an error occurs partway through, execution returns from this code.
Because of that, the for loop does not always run to completion when an error happens, and some capabilities may end up not being applied.

In runc

Errors are handled inside the loop, so the for loop always runs to completion.

https://github.com/opencontainers/runc/blob/59a5ff14a2c1f6beb74982a9c03e31c5fb49859d/libcontainer/capabilities/capabilities.go#L142C1-L147C3

	for _, a := range ambs {
		err := capability.SetAmbient(true, a)
		if err != nil {
			logrus.Warnf("can't raise ambient capability %s: %v", capToStr(a), err)
		}
	}

Steps to Reproduce

  1. prepare config.json

We specify capabilities in the ambient set that are not included in the permitted and inheritable sets.

    "args": [
      "sh", "-c", "grep '^CapAmb' /proc/self/status"
    ],

...
    "capabilities": {
      "bounding": [
        "CAP_NET_BIND_SERVICE",
        "CAP_AUDIT_WRITE",
        "CAP_KILL"
      ],
      "effective": [
        "CAP_NET_BIND_SERVICE",
        "CAP_AUDIT_WRITE",
        "CAP_KILL"
      ],
      "inheritable": [
        "CAP_NET_BIND_SERVICE",
        "CAP_AUDIT_WRITE",
        "CAP_KILL"
      ],
      "permitted": [
        "CAP_NET_BIND_SERVICE",
        "CAP_AUDIT_WRITE",
        "CAP_KILL"
      ],
      "ambient": [
        "CAP_NET_BIND_SERVICE",
        "CAP_AUDIT_WRITE",
        "CAP_KILL",
        "CAP_SYSLOG"    ←←←←←←←←←←← operation not permitted 
      ]
    },
  1. run youki multiple times
$ youki run -b tutorial/ container
CapAmb: 0000000020000420
$ youki run -b tutorial/ container
CapAmb: 0000000020000420
$ youki run -b tutorial/ container
CapAmb: 0000000000000000
$ youki run -b tutorial/ container
CapAmb: 0000000020000020
$ youki run -b tutorial/ container
CapAmb: 0000000020000420

If you run the following command, it becomes even clearer:

strace -f -e trace=prctl youki run -b tutorial/ container

Expectation

In the above case

since
CAP_NET_BIND_SERVICE: 0x0000000000000400
CAP_AUDIT_WRITE: 0x0000000020000000
CAP_KILL: 0x0000000000000020

$ youki run -b tutorial/ container
CapAmb: 0000000020000420

System and Setup Info

No response

Additional Context

related: #3210

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions