@@ -8,45 +8,15 @@ The mechanism relies on Linkerd’s traffic-splitting functionality by providing
8
8
an operator to alter the backend services' weights in real time depending on
9
9
their readiness.
10
10
11
- ## Failover criteria
11
+ ## Table of contents
12
12
13
- The failover criteria is readiness failures on the targeted Pods. This is
14
- directly reflected on the Endpoints pointing to those Pods: only when Pods are
15
- ready, does the ` addresses ` field of the relevant Endpoints get populated.
16
-
17
- ## Services declaration
18
-
19
- The primitive used to declare the services to fail over is Linkerd's
20
- ` TrafficSplit ` CRD. The ` spec.service ` field contains the service name addressed
21
- by clients, and the ` spec.backends ` fields contain all the possible services
22
- that apex service might be served by. The service to be considered as primary is
23
- declared in the ` failover.linkerd.io/primary-service ` annotation. Those backend
24
- services can be located in the current cluster or they can point to mirror
25
- services backed by services in other clusters (through Linkerd's multicluster
26
- functionality).
27
-
28
- ## Operator
29
-
30
- Linkerd-failover is an operator to be installed in the local cluster (there
31
- where the clients consuming the service live), whose responsibility is to watch
32
- over the state of the Endpoints that are associated to the backends of the
33
- ` TrafficSplit ` , reacting to the failover criteria explained above.
34
-
35
- ## Failover logic
36
-
37
- The following describes the logic used to change the ` TrafficSplit ` weights:
38
-
39
- - Whenever the primary backend is ready, all the weight is set to it, setting
40
- the weights for all the secondary backends to zero.
41
- - Whenever the primary backend is not ready, the following rules apply only if
42
- there is at least one secondary backend that is ready:
43
- - The primary backend’s weight is set to zero
44
- - The weight is distributed equally among all the secondary backends that
45
- are ready
46
- - Whenever a secondary backend changes its readiness, the weight is
47
- redistributed among all the secondary backends that are ready
48
- - Whenever both the primary and secondaries are all unavailable, the connection
49
- will fail at the client-side, as expected.
13
+ - [ Requirements] ( #requirements )
14
+ - [ Configuration] ( #configuration )
15
+ - [ Installation] ( #installation )
16
+ - [ Example] ( #example )
17
+ - [ Implementation details] ( #implementation-details )
18
+ - [ Failover criteria] ( #failover-criteria )
19
+ - [ Failover logic] ( #failover-criteria )
50
20
51
21
## Requirements
52
22
@@ -60,9 +30,13 @@ The following Helm values are available:
60
30
- ` selector ` : determines which ` TrafficSplit ` instances to consider for
61
31
failover. It defaults to ` failover.linkerd.io/controlled-by={{.Release.Name}} `
62
32
(the value refers to the release name used in ` helm install ` ).
33
+ - ` logLevel ` , ` logFormat ` : for configuring the operator's logging.
63
34
64
35
## Installation
65
36
37
+ The SMI extension and the operator are to be installed in the local cluster
38
+ (where the clients consuming the service are located).
39
+
66
40
Linkerd-smi installation:
67
41
68
42
``` console
@@ -74,21 +48,28 @@ helm install linkerd-smi -n linkerd-smi --create-namespace linkerd-smi/linkerd-s
74
48
Linkerd-failover installation:
75
49
76
50
``` console
77
- helm install linkerd-failover -n linkerd-failover --create-namespace --devel linkerd/linkerd-failover
78
- ```
79
-
80
- ### Running locally for testing
51
+ # In case you haven' t added the linkerd-edge repo already
52
+ helm repo add linkerd-edge https://helm.linkerd.io/edge
53
+ helm repo up
81
54
82
- ``` console
83
- cargo run
55
+ helm install linkerd-failover -n linkerd-failover --create-namespace --devel linkerd-edge/linkerd-failover
84
56
```
85
57
86
58
## Example
87
59
88
60
The following ` TrafficSplit ` serves as the initial state for a failover setup.
61
+
62
+ Clients should send requests to the apex service ` sample-svc ` . The primary
63
+ service that will serve these requests is declared through the
64
+ ` failover.linkerd.io/primary-service ` annotation, ` sample-svc ` in this case.
65
+
89
66
When ` sample-svc ` starts failing, the weights will be switched over the other
90
67
backends.
91
68
69
+ Note that the failover services can be located in the local cluster, or they can
70
+ point to mirror services backed by services in other clusters (through Linkerd's
71
+ multicluster functionality).
72
+
92
73
``` yaml
93
74
apiVersion : split.smi-spec.io/v1alpha2
94
75
kind : TrafficSplit
@@ -97,7 +78,7 @@ metadata:
97
78
annotations :
98
79
failover.linkerd.io/primary-service : sample-svc
99
80
labels :
100
- app.kubernetes .io/managed -by : linkerd-failover
81
+ failover.linkerd .io/controlled -by : linkerd-failover
101
82
spec :
102
83
service : sample-svc
103
84
backends :
@@ -112,3 +93,28 @@ spec:
112
93
- service : sample-svc-asia1
113
94
weight : 0
114
95
` ` `
96
+
97
+ ## Implementation details
98
+
99
+ ### Failover criteria
100
+
101
+ The failover criteria is readiness failures on the targeted Pods. This is
102
+ directly reflected on the Endpoints object associated with those Pods: only when
103
+ Pods are ready, does the ` addresses` field of the relevant Endpoints get
104
+ populated.
105
+
106
+ # ## Failover logic
107
+
108
+ The following describes the logic used to change the `TrafficSplit` weights :
109
+
110
+ - Whenever the primary backend is ready, all the weight is set to it, setting
111
+ the weights for all the secondary backends to zero.
112
+ - Whenever the primary backend is not ready, the following rules apply only if
113
+ there is at least one secondary backend that is ready :
114
+ - The primary backend’s weight is set to zero.
115
+ - The weight is distributed equally among all the secondary backends that
116
+ are ready.
117
+ - Whenever a secondary backend changes its readiness, the weight is
118
+ redistributed among all the secondary backends that are ready
119
+ - Whenever both the primary and secondaries are unavailable, the connection will
120
+ fail at the client-side, as expected.
0 commit comments