Skip to content

Reduce lock contention in spilling #674

@TomAugspurger

Description

@TomAugspurger

As discussed in #657 (comment), spill threads can spend a significant proportion of their time waiting for a lock while some other thread does its spilling:

Image

At the moment, that lock protects both

As discussed in #657, the actual act of spilling buffers (allocating host memory, doing the host to device transfer) can take a substantial amount of time. If the lock is only there to protect attempting to spill the same buffer multiple times, we might be able to refactor the code split postbox_spilling into two distinct phases:

  1. A phase to determine which set of buffers to spill, in order to reach some target amount of bytes spilled
  2. A phase to actually spill that set of buffers identified in phase one.

That kind of spilling should only require a lock for phase 1.

However, because the same lock is used for extraction, things might be harder. Threads doing an extract (with the lock) might rely on some invariant like a buffer either being in device memory or host memory, but not in the process of being spilled. We might need to introduce a new "spilling" state, but I'm hazy on the details at this point.

This is related to the broader themes around spilling performance discussed in github.com//issues/657. As we improve the performance of spilling, lock contention ought to go down since the spill thread will spend less time actually spilling, and so will spend less time holding the lock with today's implementation.

cc @nirandaperera.

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions