You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: changes/2025-09-12_introduce-metrics-interface/background.md
+75-7Lines changed: 75 additions & 7 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -15,23 +15,91 @@ in this document are to be interpreted as described in
15
15
16
16
## Issues and Alternatives
17
17
18
-
### collecting metrics from customers?
19
-
20
18
Crypto Tools (CT) publishes software libraries. The latest
21
19
versions of these libraries have no logging or metrics publishing
22
20
to either a local application or to an observability service like AWS CloudWatch.
23
21
22
+
As client side encryption libraries emitting metrics must be done carefully as
23
+
to avoid accidentally [leaking](https://github.com/aws/aws-encryption-sdk-python/pull/105/files) any information related to the plaintext that could lead to a
24
+
loss of customer trust.
25
+
26
+
A popular feature request has been for in depth insights into CT libraries. Many customers
27
+
ask for suggestions on how to reduce network calls to AWS Key Management Service (AWS KMS) and
28
+
followup questions around cache performance.
29
+
30
+
CT offers solutions to reduce network calls to AWS KMS through the Caching CMM and the AWS KMS Hierarchical Keyring.
31
+
Today, there is no CT solution for customers to extract the performance metrics customers are looking for.
32
+
This can lead to frustrating debugging sessions and escalations that
33
+
could have been resolved with additional information.
34
+
35
+
Recent customer demand has allowed CT to re-evaluate client side metrics to offer
36
+
a better customer experience.
37
+
38
+
### Issue 1: What will be the default behavior?
39
+
40
+
As a client-side encryption library CT should be as cautious as possible.
41
+
Customers of CT libraries should be on the driver seat and determine for
42
+
themselves if their application could benefit from emitting metrics.
43
+
Making that decision for customers can erode customer trust.
44
+
45
+
For CT to be comfortable with allowing metrics, CT must consider that
46
+
this process must not affect the availability of the consumer of the library.
47
+
48
+
#### Opt-In (recommended)
49
+
50
+
By not emitting metrics by default existing customer workflows do not change.
51
+
52
+
This allows customers to test how their applications behave when they start to emit
53
+
metrics. Customers can then ask for updates to the implementations
54
+
CT provides or customers can go an implement their own interfaces that are fine-tuned
55
+
to their use cases.
56
+
57
+
#### Always
58
+
59
+
This option implies that CT guarantees that the availability of an application
60
+
will not change. Perhaps a bold implication this is ultimately what the customer
61
+
will feel like; getting no choice on the matter and opting to not upgrade.
62
+
Going from never emitting metrics to always emitting them says to customers
63
+
that their application no matter its use case will always benefit from metrics.
64
+
Without letting customers make that choice, CT looses hard earned customer trust.
65
+
66
+
This also forces customers to make a choice, start collecting metrics and pick up
67
+
additional updates CT provides or get stuck in a version of the library that will
68
+
become unsupported.
69
+
70
+
Additionally, requiring customers to start emitting metrics
71
+
almost certainly guarantees a breaking change across supported libraries.
72
+
73
+
### Issue 2: Should Data Plane APIs fail if metrics fail to publish?
74
+
75
+
#### No (recommended)
24
76
77
+
Metrics publishing must not impact application availability.
25
78
79
+
CT should allow for a fail-open approach when metrics fail to publish.
80
+
This will prevent metric publishing issues from impacting the
81
+
core functionality of the application.
26
82
27
-
### Why should supported libraries collect metrics?
83
+
CT can consider this a two-way door with initially not attempting to retry
84
+
to publish failed metrics and add this functionality later on.
28
85
29
-
Metrics collection has long been an outstanding feature request
30
-
that has been
86
+
#### Yes
31
87
32
-
### Default Behavior?
88
+
This will become a problem for the libraries and will undoubtedly result
89
+
in customer friction and failing adoption rates.
90
+
Failing operations due to metrics not being published leaves the availability
91
+
of the application to rest on the implementation of the metrics interface.
92
+
This should not be the case, metrics should aid the customer application
93
+
not restrict it.
33
94
95
+
### Issue 3: How will customers interact with the libraries to emit metrics?
34
96
35
-
###Should Data Plane APIs fail if metrics fail to publish?
97
+
#### Provide an Interface
36
98
99
+
Keeping in line with the rest of CT features, a well defined interface with out
100
+
of the box implementations should satisfy the feature request.
37
101
102
+
Out of the box implementations should cover publishing metrics to an
103
+
existing observability service like AWS CloudWatch and to the local file system.
104
+
These implementations should offer customers a guide into implementing their own
0 commit comments