Add RCU bindings #143
Description
There are some APIs like for_each_process
that want you to hold an RCU read lock when calling them. We should support this, at minimum. We probably want to also have full support for RCU-protected data, but maybe we can put that on hold.
Upstream RCU docs: https://www.kernel.org/doc/Documentation/RCU/whatisRCU.txt
The rough idea of RCU is to implement a reader-writer lock-ish pattern where readers are low-overhead, under the assumption that all readers are short-lived - a valid assumption for kernels, where a "reader" is the execution of a single syscall/interrupt; anything that persists state beyond this will count as a "writer". In particular, readers should not be slowed down when a writer is trying to do an update, so ideally reads should happen concurrently. The way this is done is by having writers allocate and fill in a new structure, atomically update the pointer to it, synchronize memory and wait for existing readers to complete, and then deallocating the old structure. (This pointer can either be a single pointer to some single structure, or one of the pointers in a linked list, or something.) That way, existing readers see a consistent view - either the old or new version - and you're guaranteed that once you're finished synchronizing, any readers in progress are only seeing the new version.
(Multiple writers are not protected by RCU in any way. They should use a conventional mutex, or something.)
The basic RCU API in Rusty pseudocode is
extern {
fn rcu_read_lock();
fn rcu_read_unlock();
fn synchronize_rcu();
fn rcu_assign_pointer<T: Sized>(p: &*mut T, v: *mut T);
fn rcu_dereference<T: Sized>(p: &*T) -> *T;
}
(Note the latter two in C are macros, into which p
is syntactically passed by reference. If we want to bind them directly, we probably want to expose C helpers that take void **
and handle the type safety in the Rust wrapper.)
rcu_read_lock
and rcu_read_unlock
take no context info, they cause an RCU read-sized critical section for any RCU use in the kernel. Between those you may call rcu_dereference
on anything. (That said, there are two other RCU "flavors" in the kernel, rcu_read_lock_bh
etc. and rcu_read_lock_sched
etc. They appear to have been consolidated last year but I think from the API they should still be treated as separate? In any case, the other flavors are rarely used and we can probably get away with ignoring them for now.)
The rules as I understand them are:
- You can only read an RCU pointer between
rcu_read_lock
andrcu_read_unlock
(aka "within a read-side critical section"), and you can only read it withrcu_dererence
(though it compiles out on all architectures besides Alpha).rcu_dereference
doesn't actually do the dereferencing itself, it simply gives you a pointer you can safely deference untilrcu_read_unlock
. - You cannot block in a read-side critical section.
- You can only write an RCU pointer with
rcu_assign_pointer
, though you can do so at any point without advance notice. - You must keep the old object pointed to by an
rcu_assign_pointer
valid until you've calledsynchronize_rcu
.
You may nest / overlap read-side critical sections, though. It's a reader lock, you can have more than one of those and writers can't do anything to invalidate the data you're reading until all reader locks are dropped. The only difference with RCU is that writers only wait on synchronize_rcu
for existing reader locks, once they start, any new reader locks don't affect it (new reads are now guaranteed to be ordered after any previous rcu_assign_pointer
s).
I think the rough way to handle this is to
- create an
Rcu<T>
pointer type - have an empty
RcuRead
object whose constructor callsrcu_read_lock()
, whose destructor callsrcu_read_unlock()
, and which is required to read anRcu<T>
- have ... some sort of operation to assign to an
Rcu<T>
that keeps the old value alive. Perhaps it returns anRcuDropGuard<T>
object that holds on to a pointer to the old object, and that object's destructor callssynchronize_rcu()
and then frees it?
I think it's memory-safe that we're using RAII / destructors here. The worst that can happen is that you deadlock (if you forget an RcuRead
) or you keep the old value alive forever (if you forget RcuDropGuard<T>
). The unsafety in std::thread::scoped::JoinGuard
was that there were operations that were unsafe to do before the JoinGuard
was dropped and safe after (like, deallocate data used by the thread). There are no operations that are enabled by either an RcuRead
or an RcuDropGuard<T>
going out of scope. (The RcuDropGuard<T>
itself needs to handle deallocating the old object, for this reason, to guarantee that synchronize_rcu()
was in fact called.)
I'm a little less sure about making sure you don't hold the result of reading an Rcu<T>
after the originating RcuRead
goes out of scope / after you call rcu_read_unlock
. Can we use lifetimes here, by giving you a pointer that's constrained to the lifetime of RcuRead
? Or do we need to insist on making you pass a callback/lambda?
For the short term, I'd like to at least introduce RcuRead
and have it be a required parameter for binding things like for_each_process
, and we can handle doing RCU pointer reads and writes in Rust later. (Which also lets us defer figuring out how to integrate the mutex needed for avoiding concurrent writes.)
More interesting RCU docs:
More description of things that RCU does and does not guarantee: https://www.kernel.org/doc/Documentation/RCU/Design/Requirements/Requirements.html
An RCU home page, more or less: http://www.rdrop.com/~paulmck/RCU/
"RCU's first-ever CVE, and how I lived to tell the tale": https://youtu.be/hZX1aokdNiY http://www.rdrop.com/~paulmck/RCU/cve.2019.01.23e.pdf (tl;dr: use-after-free because they locked the wrong RCU flavor, but the solution is a bit more complicated than that)