Skip to content

"Failed to scrape node" errors after node termination or newly created #1704

@jstefankowski

Description

@jstefankowski

What happened:
Metrics-server 0.7.1 logs "Failed to scrape node" up to 2 minutes after a karpenter node is terminated or created.

What you expected to happen:
Do not log errors when EKS node is not ready to be scraped, when it is either already in Shutting-down/Terminated state or newly created.

Anything else we need to know?:
Example: Node ip-10-108-180-80.us-east-2.compute.internal was shut down but metrics servers kept logging errors for 2 minutes, every 15 seconds (--metric-resolution=15s)

See example log:
{"ts":1755621037144.3723,"caller":"scraper/scraper.go:147","msg":"Failed to scrape node, timeout to access kubelet","node":{"name":"ip-10-108-180-80.us-east-2.compute.internal"},"timeout":"10s","err":"Get "https://10.108.180.80:10250/metrics/resource\": context deadline exceeded"}
{"ts":1755621052196.1824,"caller":"scraper/scraper.go:147","msg":"Failed to scrape node, timeout to access kubelet","node":{"name":"ip-10-108-180-80.us-east-2.compute.internal"},"timeout":"10s","err":"Get "https://10.108.180.80:10250/metrics/resource\": context deadline exceeded"}
{"ts":1755621067156.2139,"caller":"scraper/scraper.go:147","msg":"Failed to scrape node, timeout to access kubelet","node":{"name":"ip-10-108-180-80.us-east-2.compute.internal"},"timeout":"10s","err":"Get "https://10.108.180.80:10250/metrics/resource\": context deadline exceeded"}
{"ts":1755621082187.7395,"caller":"scraper/scraper.go:147","msg":"Failed to scrape node, timeout to access kubelet","node":{"name":"ip-10-108-180-80.us-east-2.compute.internal"},"timeout":"10s","err":"Get "https://10.108.180.80:10250/metrics/resource\": context deadline exceeded"}
{"ts":1755621097153.6978,"caller":"scraper/scraper.go:147","msg":"Failed to scrape node, timeout to access kubelet","node":{"name":"ip-10-108-180-80.us-east-2.compute.internal"},"timeout":"10s","err":"Get "https://10.108.180.80:10250/metrics/resource\": context deadline exceeded"}
{"ts":1755621112118.9421,"caller":"scraper/scraper.go:147","msg":"Failed to scrape node, timeout to access kubelet","node":{"name":"ip-10-108-180-80.us-east-2.compute.internal"},"timeout":"10s","err":"Get "https://10.108.180.80:10250/metrics/resource\": context deadline exceeded"}
{"ts":1755621127189.2578,"caller":"scraper/scraper.go:147","msg":"Failed to scrape node, timeout to access kubelet","node":{"name":"ip-10-108-180-80.us-east-2.compute.internal"},"timeout":"10s","err":"Get "https://10.108.180.80:10250/metrics/resource\": context deadline exceeded"}
{"ts":1755621142189.1562,"caller":"scraper/scraper.go:147","msg":"Failed to scrape node, timeout to access kubelet","node":{"name":"ip-10-108-180-80.us-east-2.compute.internal"},"timeout":"10s","err":"Get "https://10.108.180.80:10250/metrics/resource\": context deadline exceeded"}
{"ts":1755621157130.2832,"caller":"scraper/scraper.go:147","msg":"Failed to scrape node, timeout to access kubelet","node":{"name":"ip-10-108-180-80.us-east-2.compute.internal"},"timeout":"10s","err":"Get "https://10.108.180.80:10250/metrics/resource\": context deadline exceeded"}
{"ts":1755621172130.848,"caller":"scraper/scraper.go:147","msg":"Failed to scrape node, timeout to access kubelet","node":{"name":"ip-10-108-180-80.us-east-2.compute.internal"},"timeout":"10s","err":"Get "

Environment:

  • Kubernetes distribution (GKE, EKS, Kubeadm, the hard way, etc.):
    EKS

  • Container Network Setup (flannel, ca3lico, etc.):
    calico, coredns, ebs-csi-controller, r53-external-dns, karpenter,

  • Kubernetes version (use kubectl version):
    1.31

  • Metrics Server manifest
    apiVersion: v1
    kind: ServiceAccount
    metadata:
    labels:
    k8s-app: metrics-server
    name: metrics-server
    namespace: kube-system

apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
labels:
k8s-app: metrics-server
rbac.authorization.k8s.io/aggregate-to-admin: "true"
rbac.authorization.k8s.io/aggregate-to-edit: "true"
rbac.authorization.k8s.io/aggregate-to-view: "true"
name: system:aggregated-metrics-reader
rules:

  • apiGroups:
    • metrics.k8s.io
      resources:
    • pods
    • nodes
      verbs:
    • get
    • list
    • watch---
      apiVersion: rbac.authorization.k8s.io/v1
      kind: ClusterRole
      metadata:
      labels:
      k8s-app: metrics-server
      name: system:metrics-server
      rules:
  • apiGroups:
    • ""
      resources:
    • nodes/metrics
      verbs:
    • get
  • apiGroups:
    • ""
      resources:
    • pods
    • nodes
    • nodes/stats
    • namespaces
    • configmaps
      verbs:
    • get
    • list
    • watch

apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
labels:
k8s-app: metrics-server
name: metrics-server-auth-reader
namespace: kube-system
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: Role
name: extension-apiserver-authentication-reader
subjects:

  • kind: ServiceAccount
    name: metrics-server
    namespace: kube-system

apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
labels:
k8s-app: metrics-server
name: metrics-server:system:auth-delegator
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: system:auth-delegator
subjects:

  • kind: ServiceAccount
    name: metrics-server
    namespace: kube-system

apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
labels:
k8s-app: metrics-server
name: system:metrics-server
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: system:metrics-server
subjects:

  • kind: ServiceAccount
    name: metrics-server
    namespace: kube-system

apiVersion: v1
kind: Service
metadata:
labels:
k8s-app: metrics-server
name: metrics-server
namespace: kube-system
spec:
ports:

  • name: https
    port: 443
    protocol: TCP
    targetPort: https
    selector:
    k8s-app: metrics-server

apiVersion: apps/v1
kind: Deployment
metadata:
labels:
k8s-app: metrics-server
app: ${metrics_server_label_app}
owner: ${metrics_server_label_owner}
department: ${metrics_server_label_department}
team: ${metrics_server_label_team}
name: metrics-server
namespace: kube-system
spec:
selector:
matchLabels:
k8s-app: metrics-server
strategy:
rollingUpdate:
maxUnavailable: 0
template:
metadata:
labels:
k8s-app: metrics-server
app: ${metrics_server_label_app}
owner: ${metrics_server_label_owner}
department: ${metrics_server_label_department}
team: ${metrics_server_label_team}
spec:
tolerations:
- key: "apps"
operator: "Equal"
value: "corecomponents"
effect: "NoSchedule"
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: node.kubernetes.io/lifecycle
operator: In
values:
- ondemand_components
containers:
- args:
- --cert-dir=/tmp
- --secure-port=4443
- --kubelet-preferred-address-types=InternalIP,ExternalIP,Hostname
- --kubelet-use-node-status-port
- --metric-resolution=15s
- --logging-format=json
image: ${repo}:${image_tag}
imagePullPolicy: IfNotPresent
livenessProbe:
failureThreshold: 3
httpGet:
path: /livez
port: https
scheme: HTTPS
periodSeconds: 10
name: metrics-server
ports:
- containerPort: 4443
name: https
protocol: TCP
readinessProbe:
failureThreshold: 3
httpGet:
path: /readyz
port: https
scheme: HTTPS
initialDelaySeconds: 20
periodSeconds: 10
resources:
requests:
cpu: 100m
memory: 200Mi
securityContext:
allowPrivilegeEscalation: false
readOnlyRootFilesystem: true
runAsNonRoot: true
runAsUser: 1000
volumeMounts:
- mountPath: /tmp
name: tmp-dir
nodeSelector:
kubernetes.io/os: linux
hostNetwork: true
priorityClassName: system-cluster-critical
serviceAccountName: metrics-server
volumes:
- emptyDir: {}
name: tmp-dir

apiVersion: apiregistration.k8s.io/v1
kind: APIService
metadata:
labels:
k8s-app: metrics-server
name: v1beta1.metrics.k8s.io
spec:
group: metrics.k8s.io
groupPriorityMinimum: 100
insecureSkipTLSVerify: true
service:
name: metrics-server
namespace: kube-system
version: v1beta1
versionPriority: 100

  • Kubelet config:
    cat /etc/kubernetes/kubelet/config.json
    {
    "address": "0.0.0.0",
    "authentication": {
    "x509": {
    "clientCAFile": "/etc/kubernetes/pki/ca.crt"
    },
    "webhook": {
    "enabled": true,
    "cacheTTL": "2m0s"
    },
    "anonymous": {
    "enabled": false
    }
    },
    "authorization": {
    "mode": "Webhook",
    "webhook": {
    "cacheAuthorizedTTL": "5m0s",
    "cacheUnauthorizedTTL": "30s"
    }
    },
    "cgroupDriver": "systemd",
    "cgroupRoot": "/",
    "clusterDNS": [
    "172.20.0.10"
    ],
    "clusterDomain": "cluster.local",
    "containerRuntimeEndpoint": "unix:///run/containerd/containerd.sock",
    "evictionHard": {
    "memory.available": "100Mi",
    "nodefs.available": "10%",
    "nodefs.inodesFree": "5%"
    },
    "featureGates": {
    "RotateKubeletServerCertificate": true
    },
    "hairpinMode": "hairpin-veth",
    "kubeReserved": {
    "cpu": "90m",
    "ephemeral-storage": "1Gi",
    "memory": "893Mi"
    },
    "kubeReservedCgroup": "/runtime",
    "logging": {
    "verbosity": 2
    },
    "maxPods": 58,
    "protectKernelDefaults": true,
    "providerID": "aws:///us-east-2a/i-077c3432936e48858",
    "readOnlyPort": 0,
    "serializeImagePulls": false,
    "serverTLSBootstrap": true,
    "systemReservedCgroup": "/system",
    "tlsCipherSuites": [
    "TLS_ECDHE_ECDSA_WITH_AES_128_GCM_SHA256",
    "TLS_ECDHE_ECDSA_WITH_AES_256_GCM_SHA384",
    "TLS_ECDHE_ECDSA_WITH_CHACHA20_POLY1305",
    "TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256",
    "TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384",
    "TLS_ECDHE_RSA_WITH_CHACHA20_POLY1305",
    "TLS_RSA_WITH_AES_128_GCM_SHA256",
    "TLS_RSA_WITH_AES_256_GCM_SHA384"
    ],
    "kind": "KubeletConfiguration",
    "apiVersion": "kubelet.config.k8s.io/v1beta1"

  • Metrics server logs:

{"ts":1755621037144.3723,"caller":"scraper/scraper.go:147","msg":"Failed to scrape node, timeout to access kubelet","node":{"name":"ip-10-108-180-80.us-east-2.compute.internal"},"timeout":"10s","err":"Get "https://10.108.180.80:10250/metrics/resource\": context deadline exceeded"}
{"ts":1755621042104.129,"caller":"scraper/scraper.go:149","msg":"Failed to scrape node","node":{"name":"ip-10-108-162-175.us-east-2.compute.internal"},"err":"Get "https://10.108.162.175:10250/metrics/resource\": remote error: tls: internal error"}
{"ts":1755621042162.523,"caller":"scraper/scraper.go:149","msg":"Failed to scrape node","node":{"name":"ip-10-108-54-32.us-east-2.compute.internal"},"err":"Get "https://10.108.54.32:10250/metrics/resource\": remote error: tls: internal error"}
{"ts":1755621052196.1824,"caller":"scraper/scraper.go:147","msg":"Failed to scrape node, timeout to access kubelet","node":{"name":"ip-10-108-180-80.us-east-2.compute.internal"},"timeout":"10s","err":"Get "https://10.108.180.80:10250/metrics/resource\": context deadline exceeded"}
{"ts":1755621067156.2139,"caller":"scraper/scraper.go:147","msg":"Failed to scrape node, timeout to access kubelet","node":{"name":"ip-10-108-180-80.us-east-2.compute.internal"},"timeout":"10s","err":"Get "https://10.108.180.80:10250/metrics/resource\": context deadline exceeded"}
{"ts":1755621082187.7395,"caller":"scraper/scraper.go:147","msg":"Failed to scrape node, timeout to access kubelet","node":{"name":"ip-10-108-180-80.us-east-2.compute.internal"},"timeout":"10s","err":"Get "https://10.108.180.80:10250/metrics/resource\": context deadline exceeded"}
{"ts":1755621097153.6978,"caller":"scraper/scraper.go:147","msg":"Failed to scrape node, timeout to access kubelet","node":{"name":"ip-10-108-180-80.us-east-2.compute.internal"},"timeout":"10s","err":"Get "https://10.108.180.80:10250/metrics/resource\": context deadline exceeded"}
{"ts":1755621112118.9421,"caller":"scraper/scraper.go:147","msg":"Failed to scrape node, timeout to access kubelet","node":{"name":"ip-10-108-180-80.us-east-2.compute.internal"},"timeout":"10s","err":"Get "https://10.108.180.80:10250/metrics/resource\": context deadline exceeded"}
{"ts":1755621127189.2578,"caller":"scraper/scraper.go:147","msg":"Failed to scrape node, timeout to access kubelet","node":{"name":"ip-10-108-180-80.us-east-2.compute.internal"},"timeout":"10s","err":"Get "https://10.108.180.80:10250/metrics/resource\": context deadline exceeded"}
{"ts":1755621142189.1562,"caller":"scraper/scraper.go:147","msg":"Failed to scrape node, timeout to access kubelet","node":{"name":"ip-10-108-180-80.us-east-2.compute.internal"},"timeout":"10s","err":"Get "https://10.108.180.80:10250/metrics/resource\": context deadline exceeded"}
{"ts":1755621157130.2832,"caller":"scraper/scraper.go:147","msg":"Failed to scrape node, timeout to access kubelet","node":{"name":"ip-10-108-180-80.us-east-2.compute.internal"},"timeout":"10s","err":"Get "https://10.108.180.80:10250/metrics/resource\": context deadline exceeded"}
{"ts":1755621172130.848,"caller":"scraper/scraper.go:147","msg":"Failed to scrape node, timeout to access kubelet","node":{"name":"ip-10-108-180-80.us-east-2.compute.internal"},"timeout":"10s","err":"Get "https://10.108.180.80:10250/metrics/resource\": context deadline exceeded"}
{"ts":1755621742166.749,"caller":"scraper/scraper.go:147","msg":"Failed to scrape node, timeout to access kubelet","node":{"name":"ip-10-108-161-50.us-east-2.compute.internal"},"timeout":"10s","err":"Get "https://10.108.161.50:10250/metrics/resource\": context deadline exceeded"}
{"ts":1755621757185.5354,"caller":"scraper/scraper.go:147","msg":"Failed to scrape node, timeout to access kubelet","node":{"name":"ip-10-108-161-50.us-east-2.compute.internal"},"timeout":"10s","err":"Get "https://10.108.161.50:10250/metrics/resource\": context deadline exceeded"}
{"ts":1755621762140.344,"caller":"scraper/scraper.go:149","msg":"Failed to scrape node","node":{"name":"ip-10-108-183-190.us-east-2.compute.internal"},"err":"Get "https://10.108.183.190:10250/metrics/resource\": remote error: tls: internal error"}
{"ts":1755621772153.691,"caller":"scraper/scraper.go:147","msg":"Failed to scrape node, timeout to access kubelet","node":{"name":"ip-10-108-161-50.us-east-2.compute.internal"},"timeout":"10s","err":"Get "https://10.108.161.50:10250/metrics/resource\": context deadline exceeded"}
{"ts":1755621787177.745,"caller":"scraper/scraper.go:147","msg":"Failed to scrape node, timeout to access kubelet","node":{"name":"ip-10-108-161-50.us-east-2.compute.internal"},"timeout":"10s","err":"Get "https://10.108.161.50:10250/metrics/resource\": context deadline exceeded"}
{"ts":1755621802095.0845,"caller":"scraper/scraper.go:147","msg":"Failed to scrape node, timeout to access kubelet","node":{"name":"ip-10-108-161-50.us-east-2.compute.internal"},"timeout":"10s","err":"Get "https://10.108.161.50:10250/metrics/resource\": context deadline exceeded"}
{"ts":1755621817115.6096,"caller":"scraper/scraper.go:147","msg":"Failed to scrape node, timeout to access kubelet","node":{"name":"ip-10-108-161-50.us-east-2.compute.internal"},"timeout":"10s","err":"Get "https://10.108.161.50:10250/metrics/resource\": context deadline exceeded"}
{"ts":1755621832201.8843,"caller":"scraper/scraper.go:147","msg":"Failed to scrape node, timeout to access kubelet","node":{"name":"ip-10-108-161-50.us-east-2.compute.internal"},"timeout":"10s","err":"Get "https://10.108.161.50:10250/metrics/resource\": context deadline exceeded"}
{"ts":1755621882168.9373,"caller":"scraper/scraper.go:149","msg":"Failed to scrape node","node":{"name":"ip-10-108-178-221.us-east-2.compute.internal"},"err":"Get "https://10.108.178.221:10250/metrics/resource\": remote error: tls: internal error"}
{"ts":1755621892168.5889,"caller":"scraper/scraper.go:147","msg":"Failed to scrape node, timeout to access kubelet","node":{"name":"ip-10-108-183-190.us-east-2.compute.internal"},"timeout":"10s","err":"Get "https://10.108.183.190:10250/metrics/resource\": context deadline exceeded"}
{"ts":1755621897133.5642,"caller":"scraper/scraper.go:149","msg":"Failed to scrape node","node":{"name":"ip-10-108-166-232.us-east-2.compute.internal"},"err":"Get "https://10.108.166.232:10250/metrics/resource\": dial tcp 10.108.166.232:10250: connect: connection refused"}
{"ts":1755621907144.9084,"caller":"scraper/scraper.go:147","msg":"Failed to scrape node, timeout to access kubelet","node":{"name":"ip-10-108-183-190.us-east-2.compute.internal"},"timeout":"10s","err":"Get "https://10.108.183.190:10250/metrics/resource\": context deadline exceeded"}
{"ts":1755621922085.854,"caller":"scraper/scraper.go:147","msg":"Failed to scrape node, timeout to access kubelet","node":{"name":"ip-10-108-166-232.us-east-2.compute.internal"},"timeout":"10s","err":"Get "https://10.108.166.232:10250/metrics/resource\": context deadline exceeded"}
{"ts":1755621952125.9766,"caller":"scraper/scraper.go:147","msg":"Failed to scrape node, timeout to access kubelet","node":{"name":"ip-10-108-183-190.us-east-2.compute.internal"},"timeout":"10s","err":"Get "https://10.108.183.190:10250/metrics/resource\": context deadline exceeded"}
{"ts":1755621967104.593,"caller":"scraper/scraper.go:147","msg":"Failed to scrape node, timeout to access kubelet","node":{"name":"ip-10-108-183-190.us-east-2.compute.internal"},"timeout":"10s","err":"Get "https://10.108.183.190:10250/metrics/resource\": context deadline exceeded"}
{"ts":1755621997106.5833,"caller":"scraper/scraper.go:147","msg":"Failed to scrape node, timeout to access kubelet","node":{"name":"ip-10-108-178-221.us-east-2.compute.internal"},"timeout":"10s","err":"Get "https://10.108.178.221:10250/metrics/resource\": context deadline exceeded"}
{"ts":1755622012187.4453,"caller":"scraper/scraper.go:147","msg":"Failed to scrape node, timeout to access kubelet","node":{"name":"ip-10-108-178-221.us-east-2.compute.internal"},"timeout":"10s","err":"Get "https://10.108.178.221:10250/metrics/resource\": context deadline exceeded"}
{"ts":1755622182161.2349,"caller":"scraper/scraper.go:149","msg":"Failed to scrape node","node":{"name":"ip-10-108-54-32.us-east-2.compute.internal"},"err":"Get "https://10.108.54.32:10250/metrics/resource\": dial tcp 10.108.54.32:10250: connect: connection refused"}
{"ts":1755622207109.7744,"caller":"scraper/scraper.go:147","msg":"Failed to scrape node, timeout to access kubelet","node":{"name":"ip-10-108-54-32.us-east-2.compute.internal"},"timeout":"10s","err":"Get "https://10.108.54.32:10250/metrics/resource\": context deadline exceeded"}
{"ts":1755622222112.0771,"caller":"scraper/scraper.go:147","msg":"Failed to scrape node, timeout to access kubelet","node":{"name":"ip-10-108-54-32.us-east-2.compute.internal"},"timeout":"10s","err":"Get "https://10.108.54.32:10250/metrics/resource\": context deadline exceeded"}

  • Status of Metrics API:
    kubectl describe apiservice v1beta1.metrics.k8s.io

kubectl describe apiservice v1beta1.metrics.k8s.i
Name: v1beta1.metrics.k8s.io
Namespace:
Labels: k8s-app=metrics-server
Annotations:
API Version: apiregistration.k8s.io/v1
Kind: APIService
Metadata:
Creation Timestamp: 2022-07-29T08:29:06Z
Resource Version: 1171407047
UID: 1c1f0d23-9bbb-4c3e-bb99-21a7b34a0115
Spec:
Group: metrics.k8s.io
Group Priority Minimum: 100
Insecure Skip TLS Verify: true
Service:
Name: metrics-server
Namespace: kube-system
Port: 443
Version: v1beta1
Version Priority: 100
Status:
Conditions:
Last Transition Time: 2025-08-18T12:38:24Z
Message: all checks passed
Reason: Passed
Status: True
Type: Available
Events:

/kind bug

Metadata

Metadata

Assignees

No one assigned

    Labels

    kind/bugCategorizes issue or PR as related to a bug.needs-triageIndicates an issue or PR lacks a `triage/foo` label and requires one.

    Type

    No type

    Projects

    Status

    Needs Triage

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions