Skip to content
This repository was archived by the owner on Mar 7, 2021. It is now read-only.
This repository was archived by the owner on Mar 7, 2021. It is now read-only.

Add RCU bindings #143

Open
Open
@geofft

Description

@geofft

There are some APIs like for_each_process that want you to hold an RCU read lock when calling them. We should support this, at minimum. We probably want to also have full support for RCU-protected data, but maybe we can put that on hold.

Upstream RCU docs: https://www.kernel.org/doc/Documentation/RCU/whatisRCU.txt

The rough idea of RCU is to implement a reader-writer lock-ish pattern where readers are low-overhead, under the assumption that all readers are short-lived - a valid assumption for kernels, where a "reader" is the execution of a single syscall/interrupt; anything that persists state beyond this will count as a "writer". In particular, readers should not be slowed down when a writer is trying to do an update, so ideally reads should happen concurrently. The way this is done is by having writers allocate and fill in a new structure, atomically update the pointer to it, synchronize memory and wait for existing readers to complete, and then deallocating the old structure. (This pointer can either be a single pointer to some single structure, or one of the pointers in a linked list, or something.) That way, existing readers see a consistent view - either the old or new version - and you're guaranteed that once you're finished synchronizing, any readers in progress are only seeing the new version.

(Multiple writers are not protected by RCU in any way. They should use a conventional mutex, or something.)

The basic RCU API in Rusty pseudocode is

extern {
    fn rcu_read_lock();
    fn rcu_read_unlock();
    fn synchronize_rcu();
    fn rcu_assign_pointer<T: Sized>(p: &*mut T, v: *mut T);
    fn rcu_dereference<T: Sized>(p: &*T) -> *T;
}

(Note the latter two in C are macros, into which p is syntactically passed by reference. If we want to bind them directly, we probably want to expose C helpers that take void ** and handle the type safety in the Rust wrapper.)

rcu_read_lock and rcu_read_unlock take no context info, they cause an RCU read-sized critical section for any RCU use in the kernel. Between those you may call rcu_dereference on anything. (That said, there are two other RCU "flavors" in the kernel, rcu_read_lock_bh etc. and rcu_read_lock_sched etc. They appear to have been consolidated last year but I think from the API they should still be treated as separate? In any case, the other flavors are rarely used and we can probably get away with ignoring them for now.)

The rules as I understand them are:

  • You can only read an RCU pointer between rcu_read_lock and rcu_read_unlock (aka "within a read-side critical section"), and you can only read it with rcu_dererence (though it compiles out on all architectures besides Alpha). rcu_dereference doesn't actually do the dereferencing itself, it simply gives you a pointer you can safely deference until rcu_read_unlock.
  • You cannot block in a read-side critical section.
  • You can only write an RCU pointer with rcu_assign_pointer, though you can do so at any point without advance notice.
  • You must keep the old object pointed to by an rcu_assign_pointer valid until you've called synchronize_rcu.

You may nest / overlap read-side critical sections, though. It's a reader lock, you can have more than one of those and writers can't do anything to invalidate the data you're reading until all reader locks are dropped. The only difference with RCU is that writers only wait on synchronize_rcu for existing reader locks, once they start, any new reader locks don't affect it (new reads are now guaranteed to be ordered after any previous rcu_assign_pointers).

I think the rough way to handle this is to

  • create an Rcu<T> pointer type
  • have an empty RcuRead object whose constructor calls rcu_read_lock(), whose destructor calls rcu_read_unlock(), and which is required to read an Rcu<T>
  • have ... some sort of operation to assign to an Rcu<T> that keeps the old value alive. Perhaps it returns an RcuDropGuard<T> object that holds on to a pointer to the old object, and that object's destructor calls synchronize_rcu() and then frees it?

I think it's memory-safe that we're using RAII / destructors here. The worst that can happen is that you deadlock (if you forget an RcuRead) or you keep the old value alive forever (if you forget RcuDropGuard<T>). The unsafety in std::thread::scoped::JoinGuard was that there were operations that were unsafe to do before the JoinGuard was dropped and safe after (like, deallocate data used by the thread). There are no operations that are enabled by either an RcuRead or an RcuDropGuard<T> going out of scope. (The RcuDropGuard<T> itself needs to handle deallocating the old object, for this reason, to guarantee that synchronize_rcu() was in fact called.)

I'm a little less sure about making sure you don't hold the result of reading an Rcu<T> after the originating RcuRead goes out of scope / after you call rcu_read_unlock. Can we use lifetimes here, by giving you a pointer that's constrained to the lifetime of RcuRead? Or do we need to insist on making you pass a callback/lambda?

For the short term, I'd like to at least introduce RcuRead and have it be a required parameter for binding things like for_each_process, and we can handle doing RCU pointer reads and writes in Rust later. (Which also lets us defer figuring out how to integrate the mutex needed for avoiding concurrent writes.)

More interesting RCU docs:

More description of things that RCU does and does not guarantee: https://www.kernel.org/doc/Documentation/RCU/Design/Requirements/Requirements.html

An RCU home page, more or less: http://www.rdrop.com/~paulmck/RCU/

"RCU's first-ever CVE, and how I lived to tell the tale": https://youtu.be/hZX1aokdNiY http://www.rdrop.com/~paulmck/RCU/cve.2019.01.23e.pdf (tl;dr: use-after-free because they locked the wrong RCU flavor, but the solution is a bit more complicated than that)

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions