-
Notifications
You must be signed in to change notification settings - Fork 2.1k
Description
What would you like to be added:
I would like to add a few metrics to get cpu/memory size of ec2 instance types from spot ocean / karpenter labels on a k8s node. This is different than the node capacity status metric. (See the "why" section below.)
groupVersionKind:
group: ""
kind: Node
version: v1
labelsFromPath:
node: [metadata, name]
metrics:
- name: node_ec2_instance_total
help: Total memory of the ec2 instance backing the k8s node in MiB.
commonLabels:
provisioner: "ocean"
resource: "memory"
each:
type: Gauge
gauge:
path: [metadata, labels, "aws.spot.io/instance-memory"] # OR karpenter.k8s.aws/instance-memoryWhy is this needed:
It seems like the KSM community wants to keep custom metrics for only custom resources. Users who want to add new metrics for native k8s resources are advised to open an issue like this one.
The memory capacity for example is the ec2 instance mem size minus things like uefi/bios reservations, DMA, alignment holes, kernel reservations, etc. In practice on bottlerocket we have seen this be about 0.5 GiB but would like to be able to observe this.
Describe the solution you'd like:
I can put up a PR to collect these metrics if we think this is an acceptable metric to add. However, I feel like this metric is really niche and the list of labels will quickly become unmaintainable tho. I think the real solution here is to allow custom metrics on native resources but as noted earlier, this has been decided against.
Alternatively, a possible compromise to this is to maybe extend the kube_node_labels pattern a bit and add some configuration that will directly expose metrics for specific configured labels. This could be written to be exposed to all native types.
nodeLabelsAsMetrics:
- name: instance_memory
path: aws.spot.io/instance-memory
commonLabels:
provisioner: "ocean"
resource: "memory"
- name: instance_memory
path: karpenter.k8s.aws/instance-memory
commonLabels:
provisioner: "karpenter"
resource: "memory"would generate...
kube_node_labels_instance_memory{node="abc",provisioner="ocean",resource="memory"} 8192
kube_node_labels_instance_memory{node="xyz",provisioner="karpenter",resource="memory"} 4096
This would be adding now a 3rd system to KSM but I think it provides a lot of lacking functionality with our earlier decision and unblocks the community from having to get specific things upstreamed.
Looking forward to discussion/thoughts!
Additional context:
I have also tried to get this info into prometheus and/or grafana but these efforts failed as well as you can not easily convert a metric label value into a metric value.
Metadata
Metadata
Assignees
Labels
Type
Projects
Status