-
Notifications
You must be signed in to change notification settings - Fork 186
Description
What happened:
Similar to #174 but specific to pod identity associations, we're observing the expected AWS_CONTAINER_AUTHORIZATION_TOKEN_FILE env var is absent when a service account and pod are created within a short window. Typically we'll experience something like this:
- Programmatically create a pod identity association, service account annotated with IAM role, and pod in short succession
- The pod comes up, but AWS operations error with
An error occurred (InvalidIdentityToken) when calling the AssumeRoleWithWebIdentity operation: No OpenIDConnect provider found in your account for https://oidc.eks... - Examining the pod env, note that
AWS_CONTAINER_AUTHORIZATION_TOKEN_FILEis missing butAWS_WEB_IDENTITY_TOKEN_FILEis set - Restart pod
- Note that
AWS_WEB_IDENTITY_TOKEN_FILEis now replaced withAWS_CONTAINER_AUTHORIZATION_TOKEN_FILEand pod operates as expected.
What you expected to happen:
If all the prerequisites are satisfied, pods should get the correct pod identity association mutation regardless of timing.
How to reproduce it (as minimally and precisely as possible):
- Create an IAM role with the correct pod identity association trust policy:
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Principal": {
"Service": "pods.eks.amazonaws.com"
},
"Action": [
"sts:TagSession",
"sts:AssumeRole"
],
}
]
}- Create an EKS cluster, enabling the EKS Pod Identity Agent add-on.
- Run
aws eks update-kubeconfig --name my-cluster - Run:
$ aws eks create-pod-identity-association \
--cluster-name my-cluster \
--namespace default \
--service-account test-sa \
--role-arn arn:aws:iam::111111111111:role/test-role && \
sleep 0.75s && \
kubectl apply -f - <<EOF
apiVersion: v1
kind: ServiceAccount
metadata:
namespace: default
name: test-sa
annotations:
eks.amazonaws.com/role-arn: arn:aws:iam::111111111111:role/test-role
---
apiVersion: v1
kind: Pod
metadata:
name: test-pod
namespace: default
spec:
serviceAccountName: test-sa
containers:
- name: test
image: amazon/aws-cli:latest
imagePullPolicy: IfNotPresent
command:
- aws
- sts
- get-caller-identity
EOFWhen waiting ~750ms or less between creating the association and submitting the SA, I consistently get the incorrect AWS_WEB_IDENTITY_TOKEN_FILE. Above ~1s seems to be reliably sufficient to get the correct mutation.
Anything else we need to know?:
I'm wondering if something like #236 and/or #252 should be applied to the FileConfig, to allow the cache some time to catch up or to provide a fallback in case of cache miss. The scenario in which a serviceaccount and a pod are created in a short timeframe is common with CI/CD and infrastructure-as-code.
Environment:
- AWS Region: us-east-2
- EKS Platform version:
eks.6 - Kubernetes version:
1.32 - Webhook Version: ¯\_(ツ)_/¯ whatever EKS is running under the hood