-
Notifications
You must be signed in to change notification settings - Fork 22
Reduce lock contention in Shuffler PostBox
#687
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Reduce lock contention in Shuffler PostBox
#687
Conversation
Signed-off-by: niranda perera <[email protected]>
This reverts commit f7bd315.
Signed-off-by: niranda perera <[email protected]>
Signed-off-by: niranda perera <[email protected]>
|
Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually. Contributors can view more details about this message here. |
…-contention-in-spilling
Signed-off-by: niranda perera <[email protected]>
|
I'll look at the implementation in a bit, but one comment on the strategy here: As you say, the goal of our lock here is to
There are a few ways to accomplish this:
When I wrote up the issue, I imagined a variant of 1: What if we have just the single lock, but release it while doing the actual spill operation? # thread 1: spilling
with lock:
chunk_ids = postbox.search(...)
# release the lock to perform the spill
# This is currently unsafe, because someone could extract
# the buffer mid-spill
for chunk_id in chunk_ids:
postbox.spill(chunk_id)
with lock:
postbox.update_state(buffers)As written, my pseudocode isn't safe: We can't have some other thread come along and try to extract one of the buffers being spilled in that for loop. # thread 2: extract
with lock:
buffer = postbox.extract(chunk_id)because the If we wanted to do something like this, we'd need to make the Postbox state a bit more complicated. I think it'd need some kind of map of Then we'd need to figure out what we want the |
|
@TomAugspurger I think I agree with your take. Initially I was thinking about an
This way, we could make PostBoxes truly lock-free. But felt like the impl is a bit involved where we'd have to do a bunch of CAS on the atomic.
This depends on the extract method. When we want to extract outputs, we call
This was used only for spilling earlier, and this PR removes it. The reasoning was, we first query a postbox what were the chunks that have device data, and then come back to spill each of those. |
Signed-off-by: niranda perera <[email protected]>
|
Thanks for the overview. LMK if you need any help with profiling / benchmarking. The setup in #674 used query 4 from cudf-polars tpch benhmarks at SF 1K on an H100: |
|
@TomAugspurger Can you run a profile with this PR? |
|
Sure. |
|
A couple things I notice:
I'm not sure at this point whether any of these is a problem. Maybe it'd be worth writing up a simpler reproducer. I guess https://gist.github.com/TomAugspurger/d5b0d3b0e5765e448aa07a4fcc706171#file-slow_spill-py might cover this a bit, but I'm not sure. |
Signed-off-by: niranda perera <[email protected]>
Signed-off-by: niranda perera <[email protected]>
|
@TomAugspurger Looking at the profile, it does seem like there are mutex lock regions corresponding to spill regions. I'm suspecting these could be insert/extract calls. I think one problem with this impl is, |


This PR introduces the following changes to
Shuffler PostBox.PostBoxhas an internal mutex and to prevent extractions during spilling, its is also protected by an external mutex in the Shuffler. IINM, the idea there is, only to block extraction and still allow insertions during spilling.keysduring the postbox construction. This allows keys to be populated during the initialization. This allows us to remove the class level mutex in the PostBox. Consequence of this is, that the emptiness test has now become more complicated. Its handled by an atomic counter.unordered_map<ChunkID, Chunk>map. This PR changes this to usevector<Chunk>because we will no longer be querying from ChunkID values.Todo:
profiling
Closes #674