-
Notifications
You must be signed in to change notification settings - Fork 9
Implement a novel consistent hashing algorithm with replication #80
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
aneubeck
wants to merge
21
commits into
main
Choose a base branch
from
aneubeck/sampling
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Changes from 9 commits
Commits
Show all changes
21 commits
Select commit
Hold shift + click to select a range
0d83453
core implementation
aneubeck 89f8ad4
Update README.md
aneubeck b03f0b7
Apply suggestion from @Copilot
aneubeck 0dbf1c9
finish proof
aneubeck 953d9bd
Merge branch 'aneubeck/sampling' of https://github.com/github/rust-ge…
aneubeck a5eb91e
Update README.md
aneubeck 220624d
Update README.md
aneubeck fc69a9e
Update README.md
aneubeck 90259e9
Update README.md
aneubeck 8480ea3
Replace key with hasher traits
aneubeck 0baaafc
Update lib.rs
aneubeck 0dcb137
Update README.md
aneubeck d4b8410
Update crates/consistent-hashing/README.md
aneubeck 0935ea0
Update crates/consistent-hashing/README.md
aneubeck 99c69f3
Update crates/consistent-hashing/README.md
aneubeck 496f539
Update crates/consistent-hashing/README.md
aneubeck 23f3080
add benchmark
aneubeck 5d52237
remove second vector
aneubeck f6e29f7
Update README.md
aneubeck d20f9b6
Update performance.rs
aneubeck 1dde97c
make linter happy
aneubeck File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,17 @@ | ||
[package] | ||
name = "consistent-hashing" | ||
version = "0.1.0" | ||
edition = "2021" | ||
description = "Constant time consistent hashing algorithms." | ||
repository = "https://github.com/github/rust-gems" | ||
license = "MIT" | ||
keywords = ["probabilistic", "algorithm", "consistent hashing", "jump hashing", "rendezvous hashing"] | ||
categories = ["algorithms", "data-structures", "mathematics", "science"] | ||
|
||
[lib] | ||
crate-type = ["lib", "staticlib"] | ||
bench = false | ||
|
||
[dependencies] | ||
|
||
[dev-dependencies] |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,99 @@ | ||
# Consistent Hashing | ||
|
||
Consistent hashing maps keys to a changing set of nodes (shards, servers) so that when nodes join or leave, only a small fraction of keys move. It is used in distributed caches, databases, object stores, and load balancers to achieve scalability and high availability with minimal data reshuffling. | ||
|
||
Common algorithms | ||
- [Consistent hashing](https://en.wikipedia.org/wiki/Consistent_hashing) (hash ring with virtual nodes) | ||
- [Rendezvous hashing](https://en.wikipedia.org/wiki/Rendezvous_hashing) | ||
- [Jump consistent hash](https://arxiv.org/pdf/1406.2294) | ||
- [Maglev hashing](https://research.google/pubs/pub44824) | ||
- [AnchorHash: A Scalable Consistent Hash](https://arxiv.org/abs/1812.09674) | ||
- [DXHash](https://arxiv.org/abs/2107.07930) | ||
- [JumpBackHash](https://arxiv.org/abs/2403.18682) | ||
|
||
## Complexity summary | ||
|
||
where `N` is the number of nodes and `R` is the number of replicas. | ||
|
||
| Algorithm | Lookup per key | Node add/remove | Memory | Lookup with replication | | ||
| | (no replication) | | | | | ||
|-------------------------|---------------------|----------------------------------------|---------------------------|-------------------------------------| | ||
| Hash ring (with vnodes) | O(log N): binary search over N points; O(1): with specialized structures | O(log N) | O(N) | O(log N + R): Take next R distinct successors | | ||
| Rendezvous | O(N): max score | O(1) | O(N) node list | O(N log R): pick top R scores | | ||
| Jump consistent hash | O(log(N)) expected | 0 | O(1) | O(R log N) | | ||
| AnchorHash | O(1) expected | O(1) | O(N) | Not native | | ||
| DXHash | O(1) expected | O(1) | O(N) | Not native | | ||
| JumpBackHash | O(1) expected | 0 | O(1) | Not native | | ||
| **ConsistentChooseK** | **O(1) expected** | **0** | **O(1)** | **O(R^2)**; **O(R log(R))**: using heap | | ||
|
||
Replication of keys | ||
- Hash ring: replicate by walking clockwise to the next R distinct nodes. Virtual nodes help spread replicas more evenly. Replicas are not independently distributed. | ||
- Rendezvous hashing: replicate by selecting the top R nodes by score for the key. This naturally yields R distinct owners and supports weights. | ||
- Jump consistent hash: the base function doesn't support replication. But the math can be easily modified to support consistent replication. | ||
- JumpBackHash and variants: The trick of Jump consistent hash to support replication won't work here due to the additional state introduced. | ||
- ConsistentChooseK: Faster and more memory efficient than all other solutions. | ||
|
||
Why replication matters | ||
- Tolerates node failures and maintenance without data unavailability. | ||
- Distributes read/write load across multiple owners, reducing hotspots. | ||
- Enables fast recovery and higher tail-latency resilience. | ||
|
||
## ConsistentChooseK algorithm | ||
|
||
The following functions summarize the core algorithmic innovation as a minimal Rust excerpt. | ||
`n` is the number of nodes and `k` is the number of desired replica. | ||
The chosen nodes are returned as distinct integers in the range `0..n`. | ||
|
||
``` | ||
fn consistent_choose_k<Key>(key: Key, k: usize, n: usize) -> Vec<usize> { | ||
(0..k).rev().scan(n, |n, k| Some(consistent_choose_max(key, k + 1, n))).collect() | ||
} | ||
|
||
fn consistent_choose_max<Key>(key: Key, k: usize, n: usize) -> usize { | ||
(0..k).map(|k| consistent_hash(key, k, n - k) + k).max() | ||
} | ||
|
||
fn consistent_hash<Key>(key: Key, i: usize, n: usize) -> usize { | ||
// compute the i-th independent consistent hash for `key` and `n` nodes. | ||
} | ||
``` | ||
|
||
`consistent_choose_k` makes `k` calls to `consistent_choose_max` which calls `consistent_hash` another `k` times. | ||
In total, `consistent_hash` is called `k * (k+1) / 2` Utilizing a `O(1)` solution for `consistent_hash` leads to a `O(k^2)` runtime. | ||
aneubeck marked this conversation as resolved.
Show resolved
Hide resolved
|
||
This runtime can be further improved by replacing the max operation with a heap where popped elements are updated according to the new arguments `n` and `k`. | ||
With this optimization, the complexity reduces to `O(k log k)`. | ||
With some probabilistic bucketing strategy, it should be possible to reduce the expected runtime to `O(k)`. | ||
For small `k` neither optimization is probably improving the actual performance though. | ||
|
||
The next section proves the correctness of this algorithm. | ||
|
||
## N-Choose-R replication | ||
aneubeck marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
We define the consistent `n-choose-k` replication as follows: | ||
|
||
1. For a given number `n` of nodes, choose `k` distinct nodes `S`. | ||
2. For a given `key` the chosen set of nodes must be uniformly chosen from all possible sets of size `k`. | ||
3. When `n` increases by one, exactly one node in the chosen set will be changed. | ||
4. and the node will be changed with probability `k/(n+1)`. | ||
|
||
In the remainder of this section we prove that the `consistent_choose_k` algorithm satisfies those properties. | ||
|
||
Let's define `M(k,n) = consistent_choose_max(_, k, n)` and `S(k, n) := consistent_choose_k(_, k, n)` as short-cuts for some arbitrary fixed `key`. | ||
We assume that `consistent_hash(key, k, n)` computes `k` independent consistent hash functions. | ||
|
||
Since `M(k, n) < n` and `S(k, n) = {M(k, n)} ∪ S(k - 1, M(k, n))` for `k > 1`, `S(k, n)` constructs a strictly monotonically decreasing sequence. The sequence outputs exactly `k` elements which therefore must all be distinct which proves property 1 for `k <= n`. | ||
|
||
Properties 2, 3, and 4 can be proven via induction as follows. | ||
|
||
`k = 1`: We expect that `consistent_hash` returns a single uniformly distributed node index which is consistent in `n`, i.e. changes the hash value with probability `1/(n+1)`, when `n` increments by one. In our implementation, we use an `O(1)` implementation of the jump-hash algorithm. For `k=1`, `consistent_choose_k(key, 1, n)` becomes a single function call to `consistent_choose_max(key, 1, n)` which in turn calls `consistent_hash(key, 0, n)`. I.e. `consistent_choose_k` inherits the all the desired properties from `consistent_hash` for `k=1` and all `n>=1`. | ||
|
||
`k -> k+1`: `M(k+1, n+1) = M(k+1, n)` iff `M(k, n+1) < n` and `consistent_hash(_, k, n+1-k) < n - k`. The probability for this is `(n+1-k)/(n+1)` for the former by induction and `(n-k)/(n+1-k)` by the assumption that `consistent_hash` is a proper consistent hash function. Since both these probabilities are assumed to be independent, the probability that our initial value changes is `1 - (n+1-k)/(n+1) * (n-k)/(n+1-k) = 1 - (n-k)/(n+1) = (k+1)/(n+1)` proving property 4. | ||
|
||
Property 3 is trivially satisfied if `S(k+1, n+1) = S(k+1, n)`. So, we focus on the case where `S(k+1, n+1) != S(k+1, n)`, which implies that `n ∈ S(k+1, n+1)` as largest element. | ||
We know that `S(k+1, n) = {m} ∪ S(k, m)` for some `m` by definition and `S(k, n) = S(k, u) ∖ {v} ∪ {w}` by induction for some `u`, `v`, and `w`. Thus far we have `S(k+1, n+1) = {n} ∪ S(k, n) = {n} ∪ S(k, u) ∖ {v} ∪ {w}`. | ||
|
||
If `u = m`, then `S(k+1, n) = {m} ∪ S(k, m) ∖ {v} ∪ {w}` and `S(k+1, n+1) = {n} ∪ S(k, n) = {n} ∪ S(k, m) ∖ {v} ∪ {w}` and the two differ exaclty in the elemetns `m` and `n` proving property 3. | ||
aneubeck marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
If `u ≠ m`, then `consistent_hash(_, k, n) = m`, since that's the only way how the largest values in `S(k+1, n)` and `S(k, n)` can differ. In this case, `m ∉ S(k+1, n+1)`, since `n` (and not `m`) is the largest element of `S(k+1, n+1)`. Furthermore, `S(k, n) = S(k, m)`, since `consistent_hash(_, i, n) < m` for all `i < k` (otherwise there is a contradiction). | ||
Putting it together leads to `S(k+1, n+1) = {n} ∪ S(k, m)` and `S(k+1, n) = {m} ∪ S(k, m)` which differ exactly in the elements `n` and `m` which concludes the proof. | ||
|
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.