-
Notifications
You must be signed in to change notification settings - Fork 2.1k
docs: Add best practices for metrics #2528
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from all commits
Commits
Show all changes
6 commits
Select commit
Hold shift + click to select a range
3736c9a
docs: Add best practices for metrics
mrueg cacc980
Update docs/design/metrics-best-practices.md
mrueg 7677b41
Update docs/design/metrics-best-practices.md
mrueg 690b962
Include more comments from the review
mrueg 7ac9968
Update docs/design/metrics-best-practices.md
mrueg dcfaae9
Add some info about 1:1/1:n relationships
mrueg File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,72 @@ | ||
| # Kube-State-Metrics - Timeseries best practices | ||
|
|
||
| --- | ||
|
|
||
| Author: Manuel Rüger (<[email protected]>) | ||
|
|
||
| Date: October 17th 2024 | ||
|
|
||
| --- | ||
|
|
||
| ## Introduction | ||
|
|
||
| Kube-State-Metrics' goal is to provide insights into the state of Kubernetes objects by exposing them as metrics. | ||
| This document provides guidelines with the goal to create a good user experience when using these metrics. | ||
|
|
||
| Please be aware that this document is introduced in a later stage of the project and there might be metrics that do not follow these best practices. | ||
| Feel encouraged to report these metrics and provide a pull request to improve them. | ||
|
|
||
| ## General best practices | ||
|
|
||
| We follow [Prometheus](https://prometheus.io/docs/practices/naming/) best practices in terms of naming and labeling. | ||
|
|
||
| ## Best practices for kube-state-metrics | ||
|
|
||
| ### Avoid pre-computation | ||
|
|
||
| kube-state-metrics should expose metrics on an individual object level and avoid any sort of pre-computation unless it is required due to for example high cardinality on objects. | ||
| We prefer not to add metrics that can be derived from existing raw metrics. For example, we would not want to expose a metric called `kube_pod_total` as it can be computed with `count(kube_pod_info)`. | ||
| This way kube-state-metrics allows the user to have full control on how they want to use the metrics and gives them flexibility to do specific computation. | ||
|
|
||
| ### Static object properties | ||
|
|
||
| An object usually has a stable set of properties that do not change during its lifecycle in Kubernetes. | ||
| This includes properties like name, namespace, uid etc. that have a 1:1 relationship with the object. | ||
| It is a good practice to group those together into an `_info` metric. | ||
| If there is a 1:n relationship (e.g. a list of ports), it should be in a separate metric to avoid generating too many metrics. | ||
|
|
||
| ### Dynamic object properties | ||
|
|
||
| An object can also have a dynamic set of properties, which are usually part of the status field. | ||
| These change during the lifecycle of the object. | ||
| For example a pod can be in different states like "Pending", "Running" etc. | ||
| These should be part of a "State Set" that includes labels that identify the object as well as the dynamic property. | ||
|
|
||
| ### Linked properties | ||
|
|
||
| If an object contains a substructure that links multiple properties together (e.g. endpoint address and port), those should be reported in the same metric. | ||
|
|
||
| ### Optional properties | ||
|
|
||
| Some Kubernetes objects have optional fields. In case there is an optional value, the label should still be exposed, ideally as an empty string. | ||
|
|
||
| ### Timestamps | ||
|
|
||
| Timestamps like creation time or modification time should be exposed as a value. The metric should end with `_timestamp_seconds`. The date value is represented in [UNIX epoch seconds](https://en.wikipedia.org/wiki/Unix_time). | ||
|
|
||
| ### Cardinality | ||
|
|
||
| Some object properties can cause cardinality issues if they can contain a lot of different values or are linked together with multiple properties that also can change a lot. | ||
| In this case it is better to limit the number of values that can be exposed within kube-state-metrics by allowing only a few of them and have a default for others. | ||
| If for example the Kubernetes object contains a status field that contains an error message that can change a lot, it would be better to have a boolean `error="true"` label in case there is an error. | ||
| If there are some error messages that are worth exposing, those could be exposed and for any other message, a default value could be provided. | ||
|
|
||
| ## Stability | ||
|
|
||
| We follow the stability framework derived from Kubernetes, in which we expose experimental and stable metrics. | ||
| Experimental metrics are recently introduced or expose alpha/beta resources in the Kubernetes API. | ||
| They can change anytime and should be used with caution. | ||
| They can be promoted to a stable metric once the object stabilized in the Kubernetes API or they were part of two consecutive releases and haven't observed any changes in them. | ||
|
|
||
| Stable metrics are considered frozen with the exception of new labels being added. | ||
| A stable metric or a label on a stable metric can be deprecated in release Major.Minor and the earliest point it will be removed is the release Major.Minor+2. | ||
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
empty string and not having the label is the same thing in Prometheus, so why do this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This might be treated differently by other monitoring systems, so rather stay explicit here and ensure we do it the same way everywhere.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Feel free to chime in on #2528 (comment)