Storing an object as &Header, but reading the data past the end of the header

This is related to https://github.com/rust-lang/unsafe-code-guidelines/issues/2 but the read is not out of bounds of the allocation, not being written to by other threads, not the bytes of a `&mut Blah`, etc. That is to say, really the code is trying to model a dynamically sized type, that for one reason or another does not support  (Note that ther are a number of custom DST proposals). 

So, I heard that it was UB for you to have a &T and read outside the bounds of that T, even if conceptually it's a totally in-bounds read. E.g. `T` here may be a ZST, or it may be a header after which a trailing array is expected, or standing that sits at the head of a trailing array, or it may be a struct that's the common shared fields of some set of other struct... These are pretty common in unsafe code as it's a pattern which is both legal and useful in C and C++.

It's pretty common in Rust too:

- It's not unheard of in C apis to use a `#[repr(C)] struct Foo { _priv: [u8; 0] }`, as this is what bindgen uses. Some of these APIs then go on use `&Foo` in the Rust code. (This is essentially a workaround for a lack of a stable `extern Type`). This code doesn't read the data, so the only issue would be if we told LLVM it could assume things about the pointer that turn out to be untrue in a situation like cross-lang LTO, probably.

- Similarly, I've seen other FFI code that used a `struct CStr([u8; 0])` for a similar purpose — as a version of `std::ffi::CStr` that you can actually pass to C directly. (I even almost did this for [ffi_support::FfiStr](https://docs.rs/ffi-support/0.4.2/ffi_support/struct.FfiStr.html), but went with a pointer inside so I could easily check for code passing in null).

- `bitvec` has a `BitSlice` type which acts a lot like a slice that magically has bit-level indexing. Internally it's something like `struct BitSlice { _mem: [()] }` which lets it behave like an unsized type, The "pointer" and length are both specially encoded values that contain both the actual pointer/length as well as bit-level offsets for tracking where withing byte things are. There are a lot of reasons this might be illegal, but I had not thought `mem::size_of_val` returning the wrong value was the actual one.

- `anyhow::Error` internally wraps a `Box<ErrorImpl<()>>`, where `ErrorImpl<T>` contains a vtable, a backtrace, and then the `T`. `ErrorImpl<()>` is used as it behaves as the "common header" for real ErrorImpl values. On construction, `Box<ErrorImpl<T>>` is converted to `Box<ErrorImpl<()>>`, when stored in the Error.

    Whenever a method is called that needs to delegate to the vtable, the `Box<ErrorImpl<()>>` is converted into the right pointer type for the vtable function (one of `&ErrorImpl<()>`, `&mut ErrorImpl<()>`, `Box<ErrorImpl<()>>`) which is called with that pointer. The first thing the vtable function generally does is convert the reference to e.g. `&ErrorImpl<T>`, example: https://github.com/dtolnay/anyhow/blob/99c982128458fecb8d1d7aff9478dd77dac0ee3b/src/error.rs#L538-L545. (I had always kind of thought it wasn't okay to use `Box<T>` here, but I'm surprised that stuff like `&ErrorImpl<()>` to `&ErrorImpl<RealType>` isn't okay either). 

- `wio-rs` contains `VariableSizedBox` which provides this pattern in a library form, and IIUC is mostly intended for the flexible-array-member case. The API attempts to launder pointers to the object, which is... very non-obvious. It seems like it plausibly avoids the issue here, though, but it's insanely subtle, and if this is the recommended pattern, I suspect it will need a very good nomicon entry. https://github.com/retep998/wio-rs/blob/9bf021178b2d02485f1bd35e6cff41bf52d4a9a2/src/vsb.rs#L98-L113

- I do [something similar](https://github.com/thomcc/arcstr/blob/main/src/arc_str.rs#L725-L728) in `arcstr`, where there's a header and a variable length segment that trails it. I avoided issues here by luck, as I took great care to avoid ever putting the inner type behind a reference. This was lucky since I wasn't aware of this at all, and did it for other reasons. This was painful as it required field hard-coding offsets.

- This isn't to say anything of the numerous C or C++ apis which expose polymorphism in this way — In c++ this is how single non-virtual inheritance is represented, so it's especially common, although it was common in C too. Additionally, C code with a flexible array member is in tons of places, and not just windows APIs.

This is just a few off the top — there's a lot of unsafe code that does this. Personally, I had thought it was allowed so long as you don't go past the actual bounds of the allocation, it makes *some* sense that it's not though, unfortunately. (Somehow, I don't think I've ever had miri trouble me about it, but it's seeming like it's just because of luck && coincidence more than anything else).

Anyway, I think if this is UB we should start being way more vocal about it, because it's a totally legal pattern in C and C++, and common.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Storing an object as &Header, but reading the data past the end of the header #256

137 remaining items

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Participants

Storing an object as &Header, but reading the data past the end of the header #256

Description

Activity

thomcc commented on Nov 11, 2020

burdges commented on Nov 11, 2020

RalfJung commented on Nov 11, 2020

Diggsey commented on Nov 11, 2020

Diggsey commented on Nov 11, 2020

RalfJung commented on Nov 11, 2020

burdges commented on Nov 11, 2020

thomcc commented on Nov 12, 2020

RalfJung commented on Nov 12, 2020

thomcc commented on Nov 12, 2020

137 remaining items

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Participants

Issue actions