Skip to content

Commit d1e7996

Browse files
fix(controller): correct metrics handling (#174)
Signed-off-by: Oliver Bähler <[email protected]>
1 parent 66ad37c commit d1e7996

File tree

10 files changed

+47
-226
lines changed

10 files changed

+47
-226
lines changed

README.md

Lines changed: 1 addition & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -27,9 +27,7 @@ This Operators introduces the concept of [SopsProviders](./docs/usage.md#provide
2727

2828
With this option an Kubernetes users may manage their own keys and [`SopsSecrets`](./docs/usage.md#sopssecrets). The implementation of `SopsSecrets` allows them to be applied to the Kubernetes API with sops encryption-meta. The entire decryption happens within the cluster. So a `SopsSecret` is applied the way it's stored eg. in git.
2929

30-
31-
![Sops Operator](./docs/assets/sops-operator.gif)
32-
30+
![Sops Operator](./docs/assets/sops-operator.drawio.png)
3331

3432
## Documentation
3533

charts/sops-operator/README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -93,7 +93,7 @@ The following Values are available for this chart.
9393
| monitoring.enabled | bool | `false` | Enable Monitoring of the Operator |
9494
| monitoring.rules.annotations | object | `{}` | Assign additional Annotations |
9595
| monitoring.rules.enabled | bool | `true` | Enable deployment of PrometheusRules |
96-
| monitoring.rules.groups | list | `[{"name":"SopsAlerts","rules":[{"alert":"ProviderNotReady","annotations":{"description":"Secret {{ $labels.name }} has been in a NotReady state for over 15 minutes.","summary":"Provider {{ $labels.name }} is not ready"},"expr":"sops_provider_condition{status=\"NotReady\"} == 1","for":"15m","labels":{"severity":"warning"}},{"alert":"SecretNotReady","annotations":{"description":"Secret {{ $labels.name }} in {{ $labels.namespace }} has been in a NotReady state for over 15 minutes.","summary":"Secret {{ $labels.name }} in {{ $labels.namespace }} is not ready"},"expr":"sops_secret_condition{status=\"NotReady\"} == 1","for":"15m","labels":{"severity":"warning"}}]}]` | Prometheus Groups for the rule |
96+
| monitoring.rules.groups | list | `[{"name":"SopsAlerts","rules":[{"alert":"ProviderNotReady","annotations":{"description":"Secret {{ $labels.name }} has been in a NotReady state for over 15 minutes.","summary":"Provider {{ $labels.name }} is not ready"},"expr":"sops_provider_condition{status=\"NotReady\"} == 1","for":"15m","labels":{"severity":"warning"}},{"alert":"SecretNotReady","annotations":{"description":"Secret {{ $labels.name }} in {{ $labels.namespace }} has been in a NotReady state for over 15 minutes.","summary":"Secret {{ $labels.name }} in {{ $labels.namespace }} is not ready"},"expr":"sops_secret_condition{status=\"NotReady\"} == 1","for":"15m","labels":{"severity":"warning"}},{"alert":"GlobalSecretNotReady","annotations":{"description":"Global Secret {{ $labels.name }} has been in a NotReady state for over 15 minutes.","summary":"Global Secret {{ $labels.name }} is not ready"},"expr":"sops_global_secret_condition{status=\"NotReady\"} == 1","for":"15m","labels":{"severity":"warning"}}]}]` | Prometheus Groups for the rule |
9797
| monitoring.rules.labels | object | `{}` | Assign additional labels |
9898
| monitoring.rules.namespace | string | `""` | Install the rules into a different Namespace, as the monitoring stack one (default: the release one) |
9999
| monitoring.serviceMonitor.annotations | object | `{}` | Assign additional Annotations |

charts/sops-operator/values.yaml

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -199,6 +199,14 @@ monitoring:
199199
annotations:
200200
summary: "Secret {{ $labels.name }} in {{ $labels.namespace }} is not ready"
201201
description: "Secret {{ $labels.name }} in {{ $labels.namespace }} has been in a NotReady state for over 15 minutes."
202+
- alert: GlobalSecretNotReady
203+
expr: sops_global_secret_condition{status="NotReady"} == 1
204+
for: 15m
205+
labels:
206+
severity: warning
207+
annotations:
208+
summary: "Global Secret {{ $labels.name }} is not ready"
209+
description: "Global Secret {{ $labels.name }} has been in a NotReady state for over 15 minutes."
202210

203211
# ServiceMonitor
204212
serviceMonitor:

docs/assets/sops-operator.drawio

Lines changed: 0 additions & 204 deletions
This file was deleted.
934 KB
Loading

docs/assets/sops-operator.gif

-573 KB
Binary file not shown.

docs/monitoring.md

Lines changed: 14 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -3,10 +3,20 @@
33
Via the `/metrics` endpoint and the dedicated port you can scrape Prometheus Metrics. Amongst the standard [Kubebuilder Metrics](https://book-v1.book.kubebuilder.io/beyond_basics/controller_metrics) we provide metrics, to give you oversight of what's currently working and what's broken. This way you can always be informed, when something is not working as expected. Our custom metrics are prefixed with `sops_`:
44

55
```shell
6-
sops_provider_condition{name="default-onboarding",status="NotReady"} 0
7-
sops_provider_condition{name="default-onboarding",status="Ready"} 1
8-
sops_secret_condition{name="dev-onboarding",namespace="secret-namespace",status="NotReady"} 0
9-
sops_secret_condition{name="dev-onboarding",namespace="secret-namespace",status="Ready"} 1
6+
# HELP sops_provider_condition The current condition status of a Provider.
7+
# TYPE sops_provider_condition gauge
8+
sops_provider_condition{name="sample-provider",status="NotReady"} 0
9+
sops_provider_condition{name="sample-provider",status="Ready"} 1
10+
11+
# HELP sops_secret_condition The current condition status of a Secret.
12+
# TYPE sops_secret_condition gauge
13+
sops_secret_condition{name="secret-key-1",namespace="default",status="NotReady"} 0
14+
sops_secret_condition{name="secret-key-1",namespace="default",status="Ready"} 1
15+
16+
# HELP sops_global_secret_condition The current condition status of a Global Secret.
17+
# TYPE sops_global_secret_condition gauge
18+
sops_global_secret_condition{name="global-secret-key-1",status="NotReady"} 1
19+
sops_global_secret_condition{name="global-secret-key-1",status="Ready"} 0
1020
```
1121

1222
The Helm-Chart comes with a [ServiceMonitor](https://github.com/prometheus-operator/prometheus-operator/blob/main/Documentation/api.md#servicemonitor) and [PrometheusRules](https://github.com/prometheus-operator/prometheus-operator/blob/main/Documentation/api.md#monitoring.coreos.com/v1.PrometheusRule)

internal/controllers/sopsprovider_controller.go

Lines changed: 19 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -6,6 +6,7 @@ package controllers
66
import (
77
"context"
88
"errors"
9+
"fmt"
910

1011
"github.com/go-logr/logr"
1112
sopsv1alpha1 "github.com/peak-scale/sops-operator/api/v1alpha1"
@@ -23,7 +24,6 @@ import (
2324
ctrl "sigs.k8s.io/controller-runtime"
2425
"sigs.k8s.io/controller-runtime/pkg/builder"
2526
"sigs.k8s.io/controller-runtime/pkg/client"
26-
"sigs.k8s.io/controller-runtime/pkg/controller/controllerutil"
2727
"sigs.k8s.io/controller-runtime/pkg/event"
2828
"sigs.k8s.io/controller-runtime/pkg/handler"
2929
"sigs.k8s.io/controller-runtime/pkg/predicate"
@@ -102,26 +102,36 @@ func (r *SopsProviderReconciler) Reconcile(ctx context.Context, req ctrl.Request
102102
return reconcile.Result{}, nil
103103
}
104104

105+
defer func() {
106+
r.Metrics.RecordProviderCondition(instance)
107+
}()
108+
105109
reconcileErr := r.reconcile(ctx, log, instance)
106110
if reconcileErr != nil {
107111
instance.Status.Condition = meta.NewNotReadyCondition(instance, reconcileErr.Error())
108112
}
109113

110-
r.Metrics.RecordProviderCondition(instance)
111-
112114
// Always Post Status
113-
err := retry.RetryOnConflict(retry.DefaultBackoff, func() (err error) {
114-
log.V(10).Info("updating", "status", instance.Status)
115-
_, err = controllerutil.CreateOrUpdate(ctx, r.Client, instance.DeepCopy(), func() error {
116-
return r.Client.Status().Update(ctx, instance, &client.SubResourceUpdateOptions{})
117-
})
115+
err := retry.RetryOnConflict(retry.DefaultBackoff, func() error {
116+
current := &sopsv1alpha1.SopsProvider{}
117+
if err := r.Get(ctx, client.ObjectKeyFromObject(instance), current); err != nil {
118+
return fmt.Errorf("failed to refetch instance before update: %w", err)
119+
}
120+
121+
current.Status = instance.Status
118122

119-
return
123+
log.V(7).Info("updating status", "status", current.Status)
124+
125+
return r.Client.Status().Update(ctx, current)
120126
})
121127
if err != nil {
122128
return ctrl.Result{}, err
123129
}
124130

131+
if reconcileErr != nil {
132+
return ctrl.Result{}, reconcileErr
133+
}
134+
125135
return ctrl.Result{}, nil
126136
}
127137

internal/controllers/sopssecret_controller.go

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -117,7 +117,7 @@ func (r *SopsSecretReconciler) Reconcile(ctx context.Context, req ctrl.Request)
117117
}
118118

119119
defer func() {
120-
r.Metrics.DeleteSecretCondition(instance)
120+
r.Metrics.RecordSecretCondition(instance)
121121
}()
122122

123123
// Main Reconciler

internal/metrics/recorder.go

Lines changed: 3 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,6 @@ import (
77
sopsv1alpha1 "github.com/peak-scale/sops-operator/api/v1alpha1"
88
"github.com/peak-scale/sops-operator/internal/meta"
99
"github.com/prometheus/client_golang/prometheus"
10-
metav1 "k8s.io/apimachinery/pkg/apis/meta/v1"
1110
crtlmetrics "sigs.k8s.io/controller-runtime/pkg/metrics"
1211
)
1312

@@ -67,7 +66,7 @@ func (r *Recorder) Collectors() []prometheus.Collector {
6766
func (r *Recorder) RecordProviderCondition(provider *sopsv1alpha1.SopsProvider) {
6867
for _, status := range []string{meta.ReadyCondition, meta.NotReadyCondition} {
6968
var value float64
70-
if provider.Status.Condition.Status == metav1.ConditionTrue {
69+
if provider.Status.Condition.Type == status {
7170
value = 1
7271
}
7372

@@ -79,7 +78,7 @@ func (r *Recorder) RecordProviderCondition(provider *sopsv1alpha1.SopsProvider)
7978
func (r *Recorder) RecordSecretCondition(secret *sopsv1alpha1.SopsSecret) {
8079
for _, status := range []string{meta.ReadyCondition, meta.NotReadyCondition} {
8180
var value float64
82-
if secret.Status.Condition.Status == metav1.ConditionTrue {
81+
if secret.Status.Condition.Type == status {
8382
value = 1
8483
}
8584

@@ -91,7 +90,7 @@ func (r *Recorder) RecordSecretCondition(secret *sopsv1alpha1.SopsSecret) {
9190
func (r *Recorder) RecordGlobalSecretCondition(secret *sopsv1alpha1.GlobalSopsSecret) {
9291
for _, status := range []string{meta.ReadyCondition, meta.NotReadyCondition} {
9392
var value float64
94-
if secret.Status.Condition.Status == metav1.ConditionTrue {
93+
if secret.Status.Condition.Type == status {
9594
value = 1
9695
}
9796

0 commit comments

Comments
 (0)