Skip to content

[BUG]: EKS provider doesn't work via IAM role #161

@nikdvy

Description

@nikdvy

Description of the bug

Thank you for keep team! It looks as really great growing project!

So, the problem is:

Having IAM role with all needed permissions and k8s (eks) serviceaccount annotated with the role keep cannot actually access the cluster

Logs I get from keep-backend side using k8s serviceaccount annotated with IAM Role:

{"worker_type": "uvicorn", "asctime": "2025-05-02 06:11:22,676", "message": "Error validating Kubernetes API scopes", "levelname": "ERROR", "name": "eks", "filename": "eks_provider.py", "otelTraceID": "c24a71dcef056bccdacae15d379d54f9", "otelSpanID": "3179b997e36e1734", "otelTraceSampled": true, "otelServiceName": "keep-api", "threadName": "MainThread", "process": 18, "module": "eks_provider", "exc_info": "Traceback (most recent call last):\n  File \"/venv/lib/python3.11/site-packages/keep/providers/eks_provider/eks_provider.py\", line 600, in __generate_client\n    cluster_info = eks_client.describe_cluster(\n                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n  File \"/venv/lib/python3.11/site-packages/botocore/client.py\", line 569, in _api_call\n    return self._make_api_call(operation_name, kwargs)\n           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n  File \"/venv/lib/python3.11/site-packages/botocore/client.py\", line 1023, in _make_api_call\n    raise error_class(parsed_response, operation_name)\nbotocore.exceptions.ClientError: An error occurred (UnrecognizedClientException) when calling the DescribeCluster operation: The security token included in the request is invalid.\n\nDuring handling of the above exception, another exception occurred:\n\nTraceback (most recent call last):\n  File \"/venv/lib/python3.11/site-packages/keep/providers/eks_provider/eks_provider.py\", line 243, in validate_scopes\n    k8s_client = self.client  # This will initialize connection to cluster\n                 ^^^^^^^^^^^\n  File \"/venv/lib/python3.11/site-packages/keep/providers/eks_provider/eks_provider.py\", line 329, in client\n    self._client = self.__generate_client()\n                   ^^^^^^^^^^^^^^^^^^^^^^^^\n  File \"/venv/lib/python3.11/site-packages/keep/providers/eks_provider/eks_provider.py\", line 634, in __generate_client\n    raise ProviderException(f\"Failed to generate EKS client: {e}\")\nkeep.exceptions.provider_exception.ProviderException: Failed to generate EKS client: An error occurred (UnrecognizedClientException) when calling the DescribeCluster operation: The security token included in the request is invalid."}
{"worker_type": "uvicorn", "asctime": "2025-05-02 06:11:22,679", "message": "Completed scope validation", "levelname": "INFO", "name": "eks", "filename": "eks_provider.py", "otelTraceID": "c24a71dcef056bccdacae15d379d54f9", "otelSpanID": "3179b997e36e1734", "otelTraceSampled": true, "otelServiceName": "keep-api", "threadName": "MainThread", "process": 18, "module": "eks_provider"}
{"worker_type": "uvicorn", "asctime": "2025-05-02 06:11:22,682", "message": "Failed to validate mandatory provider scopes", "levelname": "WARNING", "name": "keep.providers.providers_service", "filename": "providers_service.py", "otelTraceID": "c24a71dcef056bccdacae15d379d54f9", "otelSpanID": "3179b997e36e1734", "otelTraceSampled": true, "otelServiceName": "keep-api", "threadName": "MainThread", "process": 18, "module": "providers_service", "validated_scopes": {"eks:DescribeCluster": "An error occurred (UnrecognizedClientException) when calling the DescribeCluster operation: The security token included in the request is invalid.", "eks:ListClusters": "An error occurred (UnrecognizedClientException) when calling the ListClusters operation: The security token included in the request is invalid.", "pods:delete": "Failed to generate EKS client: An error occurred (UnrecognizedClientException) when calling the DescribeCluster operation: The security token included in the request is invalid.", "deployments:scale": "Failed to generate EKS client: An error occurred (UnrecognizedClientException) when calling the DescribeCluster operation: The security token included in the request is invalid.", "pods:list": "Failed to generate EKS client: An error occurred (UnrecognizedClientException) when calling the DescribeCluster operation: The security token included in the request is invalid.", "pods:get": "Failed to generate EKS client: An error occurred (UnrecognizedClientException) when calling the DescribeCluster operation: The security token included in the request is invalid.", "pods:logs": "Failed to generate EKS client: An error occurred (UnrecognizedClientException) when calling the DescribeCluster operation: The security token included in the request is invalid."}}
{"worker_type": "uvicorn", "asctime": "2025-05-02 06:11:22,683", "message": "Failed to validate mandatory provider scopes, returning 412", "levelname": "ERROR", "name": "keep.api.routes.providers", "filename": "providers.py", "otelTraceID": "c24a71dcef056bccdacae15d379d54f9", "otelSpanID": "3179b997e36e1734", "otelTraceSampled": true, "otelServiceName": "keep-api", "threadName": "MainThread", "process": 18, "module": "providers", "provider_id": "eks", "provider_type": "eks", "tenant_id": "keep"}

And here is log I get connecting with all-the-same permissions but using IAM user key pair:

{"worker_type": "uvicorn", "asctime": "2025-05-02 06:30:33,350", "message": "Installing provider", "levelname": "INFO", "name": "keep.providers.providers_service", "filename": "providers_service.py", "otelTraceID": "7466f0274a16c46dea7013b0be8a584f", "otelSpanID": "7bbeada4b8f1709f", "otelTraceSampled": true, "otelServiceName": "keep-api", "threadName": "MainThread", "process": 27, "module": "providers_service", "provider_id": "eks", "provider_type": "eks", "tenant_id": "keep"}
{"worker_type": "uvicorn", "asctime": "2025-05-02 06:30:33,351", "message": "Starting EKS API permissions validation", "levelname": "INFO", "name": "eks", "filename": "eks_provider.py", "otelTraceID": "7466f0274a16c46dea7013b0be8a584f", "otelSpanID": "7bbeada4b8f1709f", "otelTraceSampled": true, "otelServiceName": "keep-api", "threadName": "MainThread", "process": 27, "module": "eks_provider"}
{"worker_type": "uvicorn", "asctime": "2025-05-02 06:30:33,356", "message": "Validating eks:ListClusters permission", "levelname": "INFO", "name": "eks", "filename": "eks_provider.py", "otelTraceID": "7466f0274a16c46dea7013b0be8a584f", "otelSpanID": "7bbeada4b8f1709f", "otelTraceSampled": true, "otelServiceName": "keep-api", "threadName": "MainThread", "process": 27, "module": "eks_provider"}
{"worker_type": "uvicorn", "asctime": "2025-05-02 06:30:33,425", "message": "eks:ListClusters permission validated successfully", "levelname": "INFO", "name": "eks", "filename": "eks_provider.py", "otelTraceID": "7466f0274a16c46dea7013b0be8a584f", "otelSpanID": "7bbeada4b8f1709f", "otelTraceSampled": true, "otelServiceName": "keep-api", "threadName": "MainThread", "process": 27, "module": "eks_provider"}
{"worker_type": "uvicorn", "asctime": "2025-05-02 06:30:33,425", "message": "Validating eks:DescribeCluster permission", "levelname": "INFO", "name": "eks", "filename": "eks_provider.py", "otelTraceID": "7466f0274a16c46dea7013b0be8a584f", "otelSpanID": "7bbeada4b8f1709f", "otelTraceSampled": true, "otelServiceName": "keep-api", "threadName": "MainThread", "process": 27, "module": "eks_provider"}
{"worker_type": "uvicorn", "asctime": "2025-05-02 06:30:33,561", "message": "eks:DescribeCluster permission validated successfully", "levelname": "INFO", "name": "eks", "filename": "eks_provider.py", "otelTraceID": "7466f0274a16c46dea7013b0be8a584f", "otelSpanID": "7bbeada4b8f1709f", "otelTraceSampled": true, "otelServiceName": "keep-api", "threadName": "MainThread", "process": 27, "module": "eks_provider"}
{"worker_type": "uvicorn", "asctime": "2025-05-02 06:30:33,561", "message": "Starting Kubernetes API permissions validation", "levelname": "INFO", "name": "eks", "filename": "eks_provider.py", "otelTraceID": "7466f0274a16c46dea7013b0be8a584f", "otelSpanID": "7bbeada4b8f1709f", "otelTraceSampled": true, "otelServiceName": "keep-api", "threadName": "MainThread", "process": 27, "module": "eks_provider"}
{"worker_type": "uvicorn", "asctime": "2025-05-02 06:30:33,746", "message": "Error validating Kubernetes API scopes", "levelname": "ERROR", "name": "eks", "filename": "eks_provider.py", "otelTraceID": "7466f0274a16c46dea7013b0be8a584f", "otelSpanID": "7bbeada4b8f1709f", "otelTraceSampled": true, "otelServiceName": "keep-api", "threadName": "MainThread", "process": 27, "module": "eks_provider", "exc_info": "Traceback (most recent call last):\n  File \"/venv/lib/python3.11/site-packages/keep/providers/eks_provider/eks_provider.py\", line 626, in __generate_client\n    \"users\": [{\"name\": \"aws_user\", \"user\": {\"token\": self.__get_token()}}],\n                                                     ^^^^^^^^^^^^^^^^^^\n  File \"/venv/lib/python3.11/site-packages/keep/providers/eks_provider/eks_provider.py\", line 639, in __get_token\n    from awscli.customizations.eks.get_token import STSClientFactory, TokenGenerator\nModuleNotFoundError: No module named 'awscli'\n\nDuring handling of the above exception, another exception occurred:\n\nTraceback (most recent call last):\n  File \"/venv/lib/python3.11/site-packages/keep/providers/eks_provider/eks_provider.py\", line 243, in validate_scopes\n    k8s_client = self.client  # This will initialize connection to cluster\n                 ^^^^^^^^^^^\n  File \"/venv/lib/python3.11/site-packages/keep/providers/eks_provider/eks_provider.py\", line 329, in client\n    self._client = self.__generate_client()\n                   ^^^^^^^^^^^^^^^^^^^^^^^^\n  File \"/venv/lib/python3.11/site-packages/keep/providers/eks_provider/eks_provider.py\", line 634, in __generate_client\n    raise ProviderException(f\"Failed to generate EKS client: {e}\")\nkeep.exceptions.provider_exception.ProviderException: Failed to generate EKS client: No module named 'awscli'"}
{"worker_type": "uvicorn", "asctime": "2025-05-02 06:30:33,748", "message": "Completed scope validation", "levelname": "INFO", "name": "eks", "filename": "eks_provider.py", "otelTraceID": "7466f0274a16c46dea7013b0be8a584f", "otelSpanID": "7bbeada4b8f1709f", "otelTraceSampled": true, "otelServiceName": "keep-api", "threadName": "MainThread", "process": 27, "module": "eks_provider"}
{"worker_type": "uvicorn", "asctime": "2025-05-02 06:30:33,750", "message": "Validated provider scopes", "levelname": "INFO", "name": "keep.providers.providers_service", "filename": "providers_service.py", "otelTraceID": "7466f0274a16c46dea7013b0be8a584f", "otelSpanID": "7bbeada4b8f1709f", "otelTraceSampled": true, "otelServiceName": "keep-api", "threadName": "MainThread", "process": 27, "module": "providers_service", "validated_scopes": {"eks:DescribeCluster": true, "eks:ListClusters": true, "pods:delete": "Failed to generate EKS client: No module named 'awscli'", "deployments:scale": "Failed to generate EKS client: No module named 'awscli'", "pods:list": "Failed to generate EKS client: No module named 'awscli'", "pods:get": "Failed to generate EKS client: No module named 'awscli'", "pods:logs": "Failed to generate EKS client: No module named 'awscli'"}}
{"worker_type": "uvicorn", "asctime": "2025-05-02 06:30:33,752", "message": "Writing secret", "levelname": "INFO", "name": "keep.secretmanager.secretmanager", "filename": "kubernetessecretmanager.py", "otelTraceID": "7466f0274a16c46dea7013b0be8a584f", "otelSpanID": "7bbeada4b8f1709f", "otelTraceSampled": true, "otelServiceName": "keep-api", "threadName": "MainThread", "process": 27, "module": "kubernetessecretmanager", "secret_name": "keep-eks-8c7069ea88c94b51bbe1759b6b170485"}
{"worker_type": "uvicorn", "asctime": "2025-05-02 06:30:33,789", "message": "Secret created/updated successfully", "levelname": "INFO", "name": "keep.secretmanager.secretmanager", "filename": "kubernetessecretmanager.py", "otelTraceID": "7466f0274a16c46dea7013b0be8a584f", "otelSpanID": "7bbeada4b8f1709f", "otelTraceSampled": true, "otelServiceName": "keep-api", "threadName": "MainThread", "process": 27, "module": "kubernetessecretmanager", "secret_name": "keep-eks-8c7069ea88c94b51bbe1759b6b170485"}
{"worker_type": "uvicorn", "asctime": "2025-05-02 06:30:33,974", "message": "Getting secret", "levelname": "INFO", "name": "keep.secretmanager.secretmanager", "filename": "kubernetessecretmanager.py", "otelTraceID": "746de70d5dd86c5f2b8775301337d14e", "otelSpanID": "b1b65c9d1aaecd56", "otelTraceSampled": true, "otelServiceName": "keep-api", "threadName": "AnyIO worker thread", "process": 27, "module": "kubernetessecretmanager", "secret_name": "keep-eks-8c7069ea88c94b51bbe1759b6b170485"}
{"worker_type": "uvicorn", "asctime": "2025-05-02 06:30:33,992", "message": "Got secret successfully", "levelname": "INFO", "name": "keep.secretmanager.secretmanager", "filename": "kubernetessecretmanager.py", "otelTraceID": "746de70d5dd86c5f2b8775301337d14e", "otelSpanID": "b1b65c9d1aaecd56", "otelTraceSampled": true, "otelServiceName": "keep-api", "threadName": "AnyIO worker thread", "process": 27, "module": "kubernetessecretmanager", "secret_name": "keep-eks-8c7069ea88c94b51bbe1759b6b170485"}
{"worker_type": "uvicorn", "asctime": "2025-05-02 06:30:42,835", "message": "Getting secret", "levelname": "INFO", "name": "keep.secretmanager.secretmanager", "filename": "kubernetessecretmanager.py", "otelTraceID": "f3ea5035001949cabeb4d713e59d2531", "otelSpanID": "847c41bf2285f9c4", "otelTraceSampled": true, "otelServiceName": "keep-api", "threadName": "AnyIO worker thread", "process": 27, "module": "kubernetessecretmanager", "secret_name": "keep-eks-8c7069ea88c94b51bbe1759b6b170485"}
{"worker_type": "uvicorn", "asctime": "2025-05-02 06:30:42,846", "message": "Got secret successfully", "levelname": "INFO", "name": "keep.secretmanager.secretmanager", "filename": "kubernetessecretmanager.py", "otelTraceID": "f3ea5035001949cabeb4d713e59d2531", "otelSpanID": "847c41bf2285f9c4", "otelTraceSampled": true, "otelServiceName": "keep-api", "threadName": "AnyIO worker thread", "process": 27, "module": "kubernetessecretmanager", "secret_name": "keep-eks-8c7069ea88c94b51bbe1759b6b170485"}

Steps To Reproduce

  1. Create IAM Role (with permission policy and trust policy) which has policies arn:aws:iam::aws:policy/AmazonEKSServicePolicy, arn:aws:iam::aws:policy/AmazonEKSClusterPolicy and specific policy presented below:
{
    "Statement": [
        {
            "Action": [
                "eks:DescribeCluster",
                "eks:ListClusters"
            ],
            "Effect": "Allow",
            "Resource": "arn:aws:eks:us-east-1:<your-acc-id>:cluster/*",
            "Sid": "EksClusterReadOnly"
        },
        {
            "Action": [
                "eks:AccessKubernetesApi"
            ],
            "Effect": "Allow",
            "Resource": "arn:aws:eks:us-east-1:<your-acc-id>:cluster/*",
            "Sid": "EksK8sApiReadOnly"
        },
        {
            "Action": [
                "autoscaling:DescribeAutoScalingGroups",
                "ec2:Describe*",
                "ec2:GetSecurityGroupsForVpc",
                "elasticloadbalancing:Describe*",
                "iam:ListAttachedRolePolicies",
                "kms:DescribeKey",
                "logs:DescribeLogStreams"
            ],
            "Effect": "Allow",
            "Resource": "*",
            "Sid": "AwsEKSReadOnlyResources"
        }
    ],
    "Version": "2012-10-17"
}
  1. Create k8s serviceaccount annotated with this role:
apiVersion: v1
kind: ServiceAccount
metadata:
  name: keep
  namespace: monitoring
  annotations:
    eks.amazonaws.com/role-arn: arn:aws:iam::<your-acc-id>:role/keep
  1. Set is as serviceaccount for keep services in oss helm-chart:
serviceAccount:
  create: false
  annotations: {}
  name: "keep"
<...>
frontend:
<...>
  serviceAccount:
    create: false
    annotations: {}
    name: "keep"
<etc>
  1. Run the release

Additional Information

Working with IAM user key pair is not quite secure (for GDPR, ISO27001, etc.), because there is no security credentials rotation.

More context in slack: URL

Metadata

Metadata

Labels

bugSomething isn't working

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions