-
Notifications
You must be signed in to change notification settings - Fork 2k
Description
What happened:
Metrics-server 0.7.1 logs "Failed to scrape node" up to 2 minutes after a karpenter node is terminated or created.
What you expected to happen:
Do not log errors when EKS node is not ready to be scraped, when it is either already in Shutting-down/Terminated state or newly created.
Anything else we need to know?:
Example: Node ip-10-108-180-80.us-east-2.compute.internal was shut down but metrics servers kept logging errors for 2 minutes, every 15 seconds (--metric-resolution=15s)
See example log:
{"ts":1755621037144.3723,"caller":"scraper/scraper.go:147","msg":"Failed to scrape node, timeout to access kubelet","node":{"name":"ip-10-108-180-80.us-east-2.compute.internal"},"timeout":"10s","err":"Get "https://10.108.180.80:10250/metrics/resource\": context deadline exceeded"}
{"ts":1755621052196.1824,"caller":"scraper/scraper.go:147","msg":"Failed to scrape node, timeout to access kubelet","node":{"name":"ip-10-108-180-80.us-east-2.compute.internal"},"timeout":"10s","err":"Get "https://10.108.180.80:10250/metrics/resource\": context deadline exceeded"}
{"ts":1755621067156.2139,"caller":"scraper/scraper.go:147","msg":"Failed to scrape node, timeout to access kubelet","node":{"name":"ip-10-108-180-80.us-east-2.compute.internal"},"timeout":"10s","err":"Get "https://10.108.180.80:10250/metrics/resource\": context deadline exceeded"}
{"ts":1755621082187.7395,"caller":"scraper/scraper.go:147","msg":"Failed to scrape node, timeout to access kubelet","node":{"name":"ip-10-108-180-80.us-east-2.compute.internal"},"timeout":"10s","err":"Get "https://10.108.180.80:10250/metrics/resource\": context deadline exceeded"}
{"ts":1755621097153.6978,"caller":"scraper/scraper.go:147","msg":"Failed to scrape node, timeout to access kubelet","node":{"name":"ip-10-108-180-80.us-east-2.compute.internal"},"timeout":"10s","err":"Get "https://10.108.180.80:10250/metrics/resource\": context deadline exceeded"}
{"ts":1755621112118.9421,"caller":"scraper/scraper.go:147","msg":"Failed to scrape node, timeout to access kubelet","node":{"name":"ip-10-108-180-80.us-east-2.compute.internal"},"timeout":"10s","err":"Get "https://10.108.180.80:10250/metrics/resource\": context deadline exceeded"}
{"ts":1755621127189.2578,"caller":"scraper/scraper.go:147","msg":"Failed to scrape node, timeout to access kubelet","node":{"name":"ip-10-108-180-80.us-east-2.compute.internal"},"timeout":"10s","err":"Get "https://10.108.180.80:10250/metrics/resource\": context deadline exceeded"}
{"ts":1755621142189.1562,"caller":"scraper/scraper.go:147","msg":"Failed to scrape node, timeout to access kubelet","node":{"name":"ip-10-108-180-80.us-east-2.compute.internal"},"timeout":"10s","err":"Get "https://10.108.180.80:10250/metrics/resource\": context deadline exceeded"}
{"ts":1755621157130.2832,"caller":"scraper/scraper.go:147","msg":"Failed to scrape node, timeout to access kubelet","node":{"name":"ip-10-108-180-80.us-east-2.compute.internal"},"timeout":"10s","err":"Get "https://10.108.180.80:10250/metrics/resource\": context deadline exceeded"}
{"ts":1755621172130.848,"caller":"scraper/scraper.go:147","msg":"Failed to scrape node, timeout to access kubelet","node":{"name":"ip-10-108-180-80.us-east-2.compute.internal"},"timeout":"10s","err":"Get "
Environment:
-
Kubernetes distribution (GKE, EKS, Kubeadm, the hard way, etc.):
EKS -
Container Network Setup (flannel, ca3lico, etc.):
calico, coredns, ebs-csi-controller, r53-external-dns, karpenter, -
Kubernetes version (use
kubectl version
):
1.31 -
Metrics Server manifest
apiVersion: v1
kind: ServiceAccount
metadata:
labels:
k8s-app: metrics-server
name: metrics-server
namespace: kube-system
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
labels:
k8s-app: metrics-server
rbac.authorization.k8s.io/aggregate-to-admin: "true"
rbac.authorization.k8s.io/aggregate-to-edit: "true"
rbac.authorization.k8s.io/aggregate-to-view: "true"
name: system:aggregated-metrics-reader
rules:
- apiGroups:
- metrics.k8s.io
resources: - pods
- nodes
verbs: - get
- list
- watch---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
labels:
k8s-app: metrics-server
name: system:metrics-server
rules:
- metrics.k8s.io
- apiGroups:
- ""
resources: - nodes/metrics
verbs: - get
- ""
- apiGroups:
- ""
resources: - pods
- nodes
- nodes/stats
- namespaces
- configmaps
verbs: - get
- list
- watch
- ""
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
labels:
k8s-app: metrics-server
name: metrics-server-auth-reader
namespace: kube-system
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: Role
name: extension-apiserver-authentication-reader
subjects:
- kind: ServiceAccount
name: metrics-server
namespace: kube-system
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
labels:
k8s-app: metrics-server
name: metrics-server:system:auth-delegator
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: system:auth-delegator
subjects:
- kind: ServiceAccount
name: metrics-server
namespace: kube-system
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
labels:
k8s-app: metrics-server
name: system:metrics-server
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: system:metrics-server
subjects:
- kind: ServiceAccount
name: metrics-server
namespace: kube-system
apiVersion: v1
kind: Service
metadata:
labels:
k8s-app: metrics-server
name: metrics-server
namespace: kube-system
spec:
ports:
- name: https
port: 443
protocol: TCP
targetPort: https
selector:
k8s-app: metrics-server
apiVersion: apps/v1
kind: Deployment
metadata:
labels:
k8s-app: metrics-server
app: ${metrics_server_label_app}
owner: ${metrics_server_label_owner}
department: ${metrics_server_label_department}
team: ${metrics_server_label_team}
name: metrics-server
namespace: kube-system
spec:
selector:
matchLabels:
k8s-app: metrics-server
strategy:
rollingUpdate:
maxUnavailable: 0
template:
metadata:
labels:
k8s-app: metrics-server
app: ${metrics_server_label_app}
owner: ${metrics_server_label_owner}
department: ${metrics_server_label_department}
team: ${metrics_server_label_team}
spec:
tolerations:
- key: "apps"
operator: "Equal"
value: "corecomponents"
effect: "NoSchedule"
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: node.kubernetes.io/lifecycle
operator: In
values:
- ondemand_components
containers:
- args:
- --cert-dir=/tmp
- --secure-port=4443
- --kubelet-preferred-address-types=InternalIP,ExternalIP,Hostname
- --kubelet-use-node-status-port
- --metric-resolution=15s
- --logging-format=json
image: ${repo}:${image_tag}
imagePullPolicy: IfNotPresent
livenessProbe:
failureThreshold: 3
httpGet:
path: /livez
port: https
scheme: HTTPS
periodSeconds: 10
name: metrics-server
ports:
- containerPort: 4443
name: https
protocol: TCP
readinessProbe:
failureThreshold: 3
httpGet:
path: /readyz
port: https
scheme: HTTPS
initialDelaySeconds: 20
periodSeconds: 10
resources:
requests:
cpu: 100m
memory: 200Mi
securityContext:
allowPrivilegeEscalation: false
readOnlyRootFilesystem: true
runAsNonRoot: true
runAsUser: 1000
volumeMounts:
- mountPath: /tmp
name: tmp-dir
nodeSelector:
kubernetes.io/os: linux
hostNetwork: true
priorityClassName: system-cluster-critical
serviceAccountName: metrics-server
volumes:
- emptyDir: {}
name: tmp-dir
apiVersion: apiregistration.k8s.io/v1
kind: APIService
metadata:
labels:
k8s-app: metrics-server
name: v1beta1.metrics.k8s.io
spec:
group: metrics.k8s.io
groupPriorityMinimum: 100
insecureSkipTLSVerify: true
service:
name: metrics-server
namespace: kube-system
version: v1beta1
versionPriority: 100
-
Kubelet config:
cat /etc/kubernetes/kubelet/config.json
{
"address": "0.0.0.0",
"authentication": {
"x509": {
"clientCAFile": "/etc/kubernetes/pki/ca.crt"
},
"webhook": {
"enabled": true,
"cacheTTL": "2m0s"
},
"anonymous": {
"enabled": false
}
},
"authorization": {
"mode": "Webhook",
"webhook": {
"cacheAuthorizedTTL": "5m0s",
"cacheUnauthorizedTTL": "30s"
}
},
"cgroupDriver": "systemd",
"cgroupRoot": "/",
"clusterDNS": [
"172.20.0.10"
],
"clusterDomain": "cluster.local",
"containerRuntimeEndpoint": "unix:///run/containerd/containerd.sock",
"evictionHard": {
"memory.available": "100Mi",
"nodefs.available": "10%",
"nodefs.inodesFree": "5%"
},
"featureGates": {
"RotateKubeletServerCertificate": true
},
"hairpinMode": "hairpin-veth",
"kubeReserved": {
"cpu": "90m",
"ephemeral-storage": "1Gi",
"memory": "893Mi"
},
"kubeReservedCgroup": "/runtime",
"logging": {
"verbosity": 2
},
"maxPods": 58,
"protectKernelDefaults": true,
"providerID": "aws:///us-east-2a/i-077c3432936e48858",
"readOnlyPort": 0,
"serializeImagePulls": false,
"serverTLSBootstrap": true,
"systemReservedCgroup": "/system",
"tlsCipherSuites": [
"TLS_ECDHE_ECDSA_WITH_AES_128_GCM_SHA256",
"TLS_ECDHE_ECDSA_WITH_AES_256_GCM_SHA384",
"TLS_ECDHE_ECDSA_WITH_CHACHA20_POLY1305",
"TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256",
"TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384",
"TLS_ECDHE_RSA_WITH_CHACHA20_POLY1305",
"TLS_RSA_WITH_AES_128_GCM_SHA256",
"TLS_RSA_WITH_AES_256_GCM_SHA384"
],
"kind": "KubeletConfiguration",
"apiVersion": "kubelet.config.k8s.io/v1beta1" -
Metrics server logs:
{"ts":1755621037144.3723,"caller":"scraper/scraper.go:147","msg":"Failed to scrape node, timeout to access kubelet","node":{"name":"ip-10-108-180-80.us-east-2.compute.internal"},"timeout":"10s","err":"Get "https://10.108.180.80:10250/metrics/resource\": context deadline exceeded"}
{"ts":1755621042104.129,"caller":"scraper/scraper.go:149","msg":"Failed to scrape node","node":{"name":"ip-10-108-162-175.us-east-2.compute.internal"},"err":"Get "https://10.108.162.175:10250/metrics/resource\": remote error: tls: internal error"}
{"ts":1755621042162.523,"caller":"scraper/scraper.go:149","msg":"Failed to scrape node","node":{"name":"ip-10-108-54-32.us-east-2.compute.internal"},"err":"Get "https://10.108.54.32:10250/metrics/resource\": remote error: tls: internal error"}
{"ts":1755621052196.1824,"caller":"scraper/scraper.go:147","msg":"Failed to scrape node, timeout to access kubelet","node":{"name":"ip-10-108-180-80.us-east-2.compute.internal"},"timeout":"10s","err":"Get "https://10.108.180.80:10250/metrics/resource\": context deadline exceeded"}
{"ts":1755621067156.2139,"caller":"scraper/scraper.go:147","msg":"Failed to scrape node, timeout to access kubelet","node":{"name":"ip-10-108-180-80.us-east-2.compute.internal"},"timeout":"10s","err":"Get "https://10.108.180.80:10250/metrics/resource\": context deadline exceeded"}
{"ts":1755621082187.7395,"caller":"scraper/scraper.go:147","msg":"Failed to scrape node, timeout to access kubelet","node":{"name":"ip-10-108-180-80.us-east-2.compute.internal"},"timeout":"10s","err":"Get "https://10.108.180.80:10250/metrics/resource\": context deadline exceeded"}
{"ts":1755621097153.6978,"caller":"scraper/scraper.go:147","msg":"Failed to scrape node, timeout to access kubelet","node":{"name":"ip-10-108-180-80.us-east-2.compute.internal"},"timeout":"10s","err":"Get "https://10.108.180.80:10250/metrics/resource\": context deadline exceeded"}
{"ts":1755621112118.9421,"caller":"scraper/scraper.go:147","msg":"Failed to scrape node, timeout to access kubelet","node":{"name":"ip-10-108-180-80.us-east-2.compute.internal"},"timeout":"10s","err":"Get "https://10.108.180.80:10250/metrics/resource\": context deadline exceeded"}
{"ts":1755621127189.2578,"caller":"scraper/scraper.go:147","msg":"Failed to scrape node, timeout to access kubelet","node":{"name":"ip-10-108-180-80.us-east-2.compute.internal"},"timeout":"10s","err":"Get "https://10.108.180.80:10250/metrics/resource\": context deadline exceeded"}
{"ts":1755621142189.1562,"caller":"scraper/scraper.go:147","msg":"Failed to scrape node, timeout to access kubelet","node":{"name":"ip-10-108-180-80.us-east-2.compute.internal"},"timeout":"10s","err":"Get "https://10.108.180.80:10250/metrics/resource\": context deadline exceeded"}
{"ts":1755621157130.2832,"caller":"scraper/scraper.go:147","msg":"Failed to scrape node, timeout to access kubelet","node":{"name":"ip-10-108-180-80.us-east-2.compute.internal"},"timeout":"10s","err":"Get "https://10.108.180.80:10250/metrics/resource\": context deadline exceeded"}
{"ts":1755621172130.848,"caller":"scraper/scraper.go:147","msg":"Failed to scrape node, timeout to access kubelet","node":{"name":"ip-10-108-180-80.us-east-2.compute.internal"},"timeout":"10s","err":"Get "https://10.108.180.80:10250/metrics/resource\": context deadline exceeded"}
{"ts":1755621742166.749,"caller":"scraper/scraper.go:147","msg":"Failed to scrape node, timeout to access kubelet","node":{"name":"ip-10-108-161-50.us-east-2.compute.internal"},"timeout":"10s","err":"Get "https://10.108.161.50:10250/metrics/resource\": context deadline exceeded"}
{"ts":1755621757185.5354,"caller":"scraper/scraper.go:147","msg":"Failed to scrape node, timeout to access kubelet","node":{"name":"ip-10-108-161-50.us-east-2.compute.internal"},"timeout":"10s","err":"Get "https://10.108.161.50:10250/metrics/resource\": context deadline exceeded"}
{"ts":1755621762140.344,"caller":"scraper/scraper.go:149","msg":"Failed to scrape node","node":{"name":"ip-10-108-183-190.us-east-2.compute.internal"},"err":"Get "https://10.108.183.190:10250/metrics/resource\": remote error: tls: internal error"}
{"ts":1755621772153.691,"caller":"scraper/scraper.go:147","msg":"Failed to scrape node, timeout to access kubelet","node":{"name":"ip-10-108-161-50.us-east-2.compute.internal"},"timeout":"10s","err":"Get "https://10.108.161.50:10250/metrics/resource\": context deadline exceeded"}
{"ts":1755621787177.745,"caller":"scraper/scraper.go:147","msg":"Failed to scrape node, timeout to access kubelet","node":{"name":"ip-10-108-161-50.us-east-2.compute.internal"},"timeout":"10s","err":"Get "https://10.108.161.50:10250/metrics/resource\": context deadline exceeded"}
{"ts":1755621802095.0845,"caller":"scraper/scraper.go:147","msg":"Failed to scrape node, timeout to access kubelet","node":{"name":"ip-10-108-161-50.us-east-2.compute.internal"},"timeout":"10s","err":"Get "https://10.108.161.50:10250/metrics/resource\": context deadline exceeded"}
{"ts":1755621817115.6096,"caller":"scraper/scraper.go:147","msg":"Failed to scrape node, timeout to access kubelet","node":{"name":"ip-10-108-161-50.us-east-2.compute.internal"},"timeout":"10s","err":"Get "https://10.108.161.50:10250/metrics/resource\": context deadline exceeded"}
{"ts":1755621832201.8843,"caller":"scraper/scraper.go:147","msg":"Failed to scrape node, timeout to access kubelet","node":{"name":"ip-10-108-161-50.us-east-2.compute.internal"},"timeout":"10s","err":"Get "https://10.108.161.50:10250/metrics/resource\": context deadline exceeded"}
{"ts":1755621882168.9373,"caller":"scraper/scraper.go:149","msg":"Failed to scrape node","node":{"name":"ip-10-108-178-221.us-east-2.compute.internal"},"err":"Get "https://10.108.178.221:10250/metrics/resource\": remote error: tls: internal error"}
{"ts":1755621892168.5889,"caller":"scraper/scraper.go:147","msg":"Failed to scrape node, timeout to access kubelet","node":{"name":"ip-10-108-183-190.us-east-2.compute.internal"},"timeout":"10s","err":"Get "https://10.108.183.190:10250/metrics/resource\": context deadline exceeded"}
{"ts":1755621897133.5642,"caller":"scraper/scraper.go:149","msg":"Failed to scrape node","node":{"name":"ip-10-108-166-232.us-east-2.compute.internal"},"err":"Get "https://10.108.166.232:10250/metrics/resource\": dial tcp 10.108.166.232:10250: connect: connection refused"}
{"ts":1755621907144.9084,"caller":"scraper/scraper.go:147","msg":"Failed to scrape node, timeout to access kubelet","node":{"name":"ip-10-108-183-190.us-east-2.compute.internal"},"timeout":"10s","err":"Get "https://10.108.183.190:10250/metrics/resource\": context deadline exceeded"}
{"ts":1755621922085.854,"caller":"scraper/scraper.go:147","msg":"Failed to scrape node, timeout to access kubelet","node":{"name":"ip-10-108-166-232.us-east-2.compute.internal"},"timeout":"10s","err":"Get "https://10.108.166.232:10250/metrics/resource\": context deadline exceeded"}
{"ts":1755621952125.9766,"caller":"scraper/scraper.go:147","msg":"Failed to scrape node, timeout to access kubelet","node":{"name":"ip-10-108-183-190.us-east-2.compute.internal"},"timeout":"10s","err":"Get "https://10.108.183.190:10250/metrics/resource\": context deadline exceeded"}
{"ts":1755621967104.593,"caller":"scraper/scraper.go:147","msg":"Failed to scrape node, timeout to access kubelet","node":{"name":"ip-10-108-183-190.us-east-2.compute.internal"},"timeout":"10s","err":"Get "https://10.108.183.190:10250/metrics/resource\": context deadline exceeded"}
{"ts":1755621997106.5833,"caller":"scraper/scraper.go:147","msg":"Failed to scrape node, timeout to access kubelet","node":{"name":"ip-10-108-178-221.us-east-2.compute.internal"},"timeout":"10s","err":"Get "https://10.108.178.221:10250/metrics/resource\": context deadline exceeded"}
{"ts":1755622012187.4453,"caller":"scraper/scraper.go:147","msg":"Failed to scrape node, timeout to access kubelet","node":{"name":"ip-10-108-178-221.us-east-2.compute.internal"},"timeout":"10s","err":"Get "https://10.108.178.221:10250/metrics/resource\": context deadline exceeded"}
{"ts":1755622182161.2349,"caller":"scraper/scraper.go:149","msg":"Failed to scrape node","node":{"name":"ip-10-108-54-32.us-east-2.compute.internal"},"err":"Get "https://10.108.54.32:10250/metrics/resource\": dial tcp 10.108.54.32:10250: connect: connection refused"}
{"ts":1755622207109.7744,"caller":"scraper/scraper.go:147","msg":"Failed to scrape node, timeout to access kubelet","node":{"name":"ip-10-108-54-32.us-east-2.compute.internal"},"timeout":"10s","err":"Get "https://10.108.54.32:10250/metrics/resource\": context deadline exceeded"}
{"ts":1755622222112.0771,"caller":"scraper/scraper.go:147","msg":"Failed to scrape node, timeout to access kubelet","node":{"name":"ip-10-108-54-32.us-east-2.compute.internal"},"timeout":"10s","err":"Get "https://10.108.54.32:10250/metrics/resource\": context deadline exceeded"}
- Status of Metrics API:
kubectl describe apiservice v1beta1.metrics.k8s.io
kubectl describe apiservice v1beta1.metrics.k8s.i
Name: v1beta1.metrics.k8s.io
Namespace:
Labels: k8s-app=metrics-server
Annotations:
API Version: apiregistration.k8s.io/v1
Kind: APIService
Metadata:
Creation Timestamp: 2022-07-29T08:29:06Z
Resource Version: 1171407047
UID: 1c1f0d23-9bbb-4c3e-bb99-21a7b34a0115
Spec:
Group: metrics.k8s.io
Group Priority Minimum: 100
Insecure Skip TLS Verify: true
Service:
Name: metrics-server
Namespace: kube-system
Port: 443
Version: v1beta1
Version Priority: 100
Status:
Conditions:
Last Transition Time: 2025-08-18T12:38:24Z
Message: all checks passed
Reason: Passed
Status: True
Type: Available
Events:
/kind bug
Metadata
Metadata
Assignees
Labels
Type
Projects
Status