Skip to content

Conversation

@ZhuYuJin
Copy link

This is a draft implementation to remove global mutex in HashMapBackend.
This PR tries to avoiding locking the whole hashmap backend while updating or inserting new payload because our online service exists writing and reading actions at the same time. We need to ensure reading performance while writing actions exist.

The following is my change ideas:

  1. I involve an atomic variable in Payload to implement a row lock.
  2. I involve a mutex in Partition to protect insert a new payload action. Besides, parallel_flat_hash_map ensures the atomicity of adding/removing/updating an entry in flat hash map.

uint64_t access_count;
};
ValuePtr value;
std::atomic<int8_t> lck;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Technically speaking I have nothing against this change in general. But need to carefully think that this will not break the MultiProcessHashmap. Maybe it is better to splice the code.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is more relevant is that despite you using int_8 here, the memory usage should still increase by 8 here due to alignment. Can we un-align the data structure without loss of performance?

phmap::parallel_flat_hash_map<Key, Payload, phmap::priv::hash_default_hash<Key>,
phmap::priv::hash_default_eq<Key>,
phmap::priv::Allocator<phmap::priv::Pair<const Key, Payload>>,
4, std::shared_mutex> entries;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it will avoid breaking the updates, but partly due to reasons that I wanted to avoid.

Why parallel_hash_map does with this configuration is to split the hash table into 4 independent pieces.

The submap is this thing in their code:

struct Inner : public Lockable

Upon query each call ends up with a structure like this,

typename Lockable::UniqueLock m;
auto res = this->find_or_prepare_insert(k, m);

which in turn executes:

Inner& inner = sets_[subidx(hashval)];
auto&  set   = inner.set_;
mutexlock    = std::move(typename Lockable::UniqueLock(inner));

. So in other words. It will select one of the 4 independent submaps and FULLY lock it. So there is no parallelism possible on that submap. So in the very BEST case when you have 4 keys, each being hashed by subidx to different sub-hashmaps, you can accomodate up to 4 threads. Arguably, that is better than now. But you need to be aware this the worst-case still exists. Hence the reason why we didn't use their parallelized hashmap implementation yet.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@yingcanw We can take over this part of the change without much issues. In my new streamlined hashmap we will use a similar method to speed things up.

const Payload& payload{it->second}; \
\
/* Stash pointer and reference in map. */ \
std::unique_lock lock(part.read_write_lck); \
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@yingcanw What worries me about this is mostly the strain under heavy load. We would need to measure if it doesn't kill insert speed. The new hashmap will fix this one.


template <typename Key>
size_t HashMapBackend<Key>::size(const std::string& table_name) const {
const std::shared_lock lock(read_write_guard_);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The changes below are fine under the assumption that we takeover the above changes.

/* Race-conditions here are deliberately ignored because insignificant in practice. */ \
__VA_ARGS__; \
while (payload.lck.load(std::memory_order_relaxed) != 0); \
std::copy_n(payload.value, part.value_size, &values[(k - keys) * value_stride]); \
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think here is a logical error.

What if payload.value is rapidly changed. The lck doesn't guarantee that. During the copy_n payload value might change or become invalid.

lock.unlock(); \
} \
\
if (payload.lck.load(std::memory_order_relaxed) != -1) { \
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like it could work. It is a little bit fuzzy. Can you clarify the intended lifecycle of lck. What value should follow after what?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants