Skip to content

Commit a46914a

Browse files
committed
Add readme
1 parent a980c0e commit a46914a

File tree

1 file changed

+60
-33
lines changed

1 file changed

+60
-33
lines changed

libs/exponential-histogram/README.md

Lines changed: 60 additions & 33 deletions
Original file line numberDiff line numberDiff line change
@@ -1,34 +1,61 @@
1-
* Implementation of merging and analysis algorithms for exponential histograms based on the [OpenTelemetry definition](https://opentelemetry.io/docs/specs/otel/metrics/data-model/#exponentialhistogram)
2-
* Uses a sparse storage: Only populated buckets consume memory and count towards the bucket limit
3-
* In contrast, the OpenTelemetry implementation uses a dense storage
4-
* Dense storage allows for O(1) time for insertion of individual values, sparse requires O( log m) where m is the bucket capacity
5-
* Sparse representation allows a more efficient storage and also for a simple merging algorithm with a runtime linear in the number of populated buckets
6-
* Sparse storage can almost exactly represent distributions with less distinct values than bucket count, which allows us to use this implementation also for explicit bucket histograms
7-
8-
# Merging algorithm
9-
10-
* Works very similar to the merge-step of merge sort: We iterate simultaneously over buckets from both histograms, merging buckets as needed
11-
* If the merged buckets exceed the configurable bucket count limit, we scale down as needed
12-
* We respect the zero-threshold of the zero buckets. We merge the zero threshold from both histograms and collapse any overlapping buckets into the zero bucket
13-
* In addition to not have single, malformed histograms drag down the accuracy, we also increase the scale of the histogram to aggregate if necessary (link to upscaling section)
14-
15-
## Upscaling
16-
17-
* We assume that all values in a bucket lie on a single point: the point of least relative error (TBD add definiton from code here)
18-
* This allows us to increase the scale of histograms without increasing the bucket count. Buckets are simply mapped to the ones in the new scale containing the point of least relative error of the original buckets
19-
* This can introduce a small error, as the original center might be moved a little, therefore we ensure that the upscaling happens at most once to not have the errors add-up
20-
* The higher the amount of upscaling, the less the error (higher scale means smaller buckets, which in turn means we get a better fit around the original point of least relative error)
21-
22-
# Distributions with fewer distinct values than the bucket count
23-
* The sparse storage only requires memory linear to the total number of buckets, dense storage in needs to store the entire range of the smallest and biggest buckets.
24-
* If we have at least as many buckets as we have distinct values to store in the histogram, we can almost exactly represent this distribution
25-
* We can set the scale to the maximum supported value (so the buckets become the smallest)
26-
* At the time of writing the maximum scale is 38, so the relative distance between the lower and upper bucket boundaries are (2^2(-38))
27-
* In otherwords : If we store for example a duration value of 10^15 nano seconds (= roughly 11.5 days), this value will be stored in a bucket which guarantees a relative error of at most 2^2(-38), so 2.5 microseconds in this case
28-
* We can make use of this property to convert explicit bucket histograms (https://opentelemetry.io/docs/specs/otel/metrics/data-model/#histogram) to exponential ones by again assuming that all values in a bucket lie in a single point:
1+
This library provides an implementation of merging and analysis algorithms for exponential histograms based on the [OpenTelemetry definition](https://opentelemetry.io/docs/specs/otel/metrics/data-model/#exponentialhistogram). It is designed as a complementary tool to the OpenTelemetry SDK, focusing specifically on efficient histogram merging and accurate percentile estimation.
2+
3+
## Overview
4+
5+
The library implements a sparse storage approach where only populated buckets consume memory and count towards the bucket limit. This differs from the OpenTelemetry implementation, which uses dense storage. While dense storage allows for O(1) time insertion of individual values, our sparse representation requires O(log m) time where m is the bucket capacity. However, the sparse representation enables more efficient storage and provides a simple merging algorithm with runtime linear in the number of populated buckets. In addition, this library also provides an array-backed sparse storage, ensuring cache efficiency.
6+
7+
The sparse storage approach offers significant advantages for [distributions with fewer distinct values](#distributions-with-few-distinct-values) than the bucket count, allowing the library to achieve near-exact representation of such distributions. This makes it suitable not only for exponential histograms but also as a universal solution for handling explicit bucket histograms.
8+
9+
## Merging Algorithm
10+
11+
The merging algorithm works similarly to the merge-step of merge sort.
12+
We simultaneously walk through the buckets of both histograms in order, merging them on the fly as needed.
13+
If the total number of buckets in the end would exceed the bucket limit, we scale down as needed.
14+
15+
Before we merge the buckets, we need to take care of the special zero-bucket and bring both histograms to the same scale.
16+
17+
For the zero-bucket, we merge the zero threshold from both histograms and collapse any overlapping buckets into the resulting new zero bucket.
18+
19+
In order to bring both histograms to the same scale, we can make adjustments in both directions:
20+
We can increase or decrease the scale of histograms as needed.
21+
22+
See the [upscaling section](#upscaling) for details on how the upscaling works.
23+
Upscaling helps prevent the precision of the result histogram merged from many histograms from being dragged down to the lowest scale of a potentially misconfigured input histogram. For example, if a histogram is recorded with a too low zero threshold, this can result in a degraded scale when using dense histogram storage, even if the histogram only contains two points.
24+
25+
### Upscaling
26+
27+
In general, we assume that all values in a bucket lie on a single point: the point of least relative error. This is the point `x` in the bucket such that:
28+
29+
```
30+
(x - l) / l = (u - x) / u
31+
```
32+
33+
Where `l` is the lower bucket boundary and `u` is the upper bucket boundary.
34+
35+
This assumption allows us to increase the scale of histograms without increasing the bucket count. Buckets are simply mapped to the ones in the new scale containing the point of least relative error of the original buckets.
36+
37+
This can introduce a small error, as the original center might be moved slightly. Therefore, we ensure that the upscaling happens at most once to prevent errors from adding up.
38+
The higher the amount of upscaling, the less the error (higher scale means smaller buckets, which in turn means we get a better fit around the original point of least relative error).
39+
40+
## Distributions with few distinct values
41+
42+
The sparse storage model only requires memory linear to the total number of buckets, while dense storage needs to store the entire range of the smallest and biggest buckets.
43+
44+
This offers significant benefits for distributions with fewer distinct values:
45+
If we have at least as many buckets as we have distinct values to store in the histogram, we can almost exactly represent this distribution.
46+
This can be achieved by simply maintaining the scale at the maximum supported value (so the buckets become the smallest).
47+
At the time of writing, the maximum scale is 38, so the relative distance between the lower and upper bucket boundaries is (2^2(-38)).
48+
49+
This is best explained with a concrete example:
50+
If we store, for example, a duration value of 10^15 nano seconds (= roughly 11.5 days), this value will be stored in a bucket that guarantees a relative error of at most 2^2(-38), so 2.5 microseconds in this case.
51+
As long as the number of values we insert is lower than the bucket count, we are guaranteed that no down-scaling happens: In contrast to dense storage, the scale does not depend on the spread between the smallest and largest bucket index.
52+
53+
### Handling Explicit Bucket Histograms
54+
55+
We can make use of this property to convert explicit bucket histograms (https://opentelemetry.io/docs/specs/otel/metrics/data-model/#histogram) to exponential ones by again assuming that all values in a bucket lie in a single point:
2956
* For each explicit bucket, we take its point of least relative error and add it to the corresponding exponential histogram bucket with the corresponding count
30-
* The open, upper and lower buckets including infinity will need a special treatment, but these are not useful for percentile estimates anyway
31-
* This gives us a great solution for universally dealing with histograms:
32-
* When merging exponential histograms generated from explicit ones, the result is exact as long as the number of distinct buckets from the original explicit bucket histograms does not exceed the exponential histogram bucket count
33-
* As a result, the computed percentiles will be exact with only the error of the original conversion
34-
* In addition this allows us to compute percentiles on mixed explicit bucket histograms or even mixing them with exponential ones by just using the exponential histogram algorithms
57+
* The open, upper, and lower buckets, including infinity, will need special treatment, but these are not useful for percentile estimates anyway
58+
59+
This gives us a great solution for universally dealing with histograms:
60+
When merging exponential histograms generated from explicit ones, the result is exact as long as the number of distinct buckets from the original explicit bucket histograms does not exceed the exponential histogram bucket count. As a result, the computed percentiles will be exact with only the error of the original conversion.
61+
In addition, this allows us to compute percentiles on mixed explicit bucket histograms or even mix them with exponential ones by just using the exponential histogram algorithms.

0 commit comments

Comments
 (0)