-
Notifications
You must be signed in to change notification settings - Fork 22
Description
As discussed in #657 (comment), spill threads can spend a significant proportion of their time waiting for a lock while some other thread does its spilling:
At the moment, that lock protects both
- spilling:
rapidsmpf/cpp/src/shuffler/shuffler.cpp
Lines 818 to 819 in 1df992a
std::lock_guard<std::mutex> lock(ready_postbox_spilling_mutex_); spilled = postbox_spilling(br_, comm_->logger(), ready_postbox_, spill_need); - extraction:
rapidsmpf/cpp/src/shuffler/shuffler.cpp
Line 771 in 1df992a
std::unique_lock<std::mutex> lock(ready_postbox_spilling_mutex_);
As discussed in #657, the actual act of spilling buffers (allocating host memory, doing the host to device transfer) can take a substantial amount of time. If the lock is only there to protect attempting to spill the same buffer multiple times, we might be able to refactor the code split postbox_spilling into two distinct phases:
- A phase to determine which set of buffers to spill, in order to reach some target amount of bytes spilled
- A phase to actually spill that set of buffers identified in phase one.
That kind of spilling should only require a lock for phase 1.
However, because the same lock is used for extraction, things might be harder. Threads doing an extract (with the lock) might rely on some invariant like a buffer either being in device memory or host memory, but not in the process of being spilled. We might need to introduce a new "spilling" state, but I'm hazy on the details at this point.
This is related to the broader themes around spilling performance discussed in github.com//issues/657. As we improve the performance of spilling, lock contention ought to go down since the spill thread will spend less time actually spilling, and so will spend less time holding the lock with today's implementation.
cc @nirandaperera.