-
Notifications
You must be signed in to change notification settings - Fork 62
Description
For what it's worth, I think it would be nice if the compiler could do scalar-replacement-of-aggregates (i.e. splitting a structure-typed variable into individual variables for each of the fields, which can then potentially be stored in registers rather than memory) even when a reference to one of the fields escapes.
My response is:
Note that repr(Rust) doesn't save us here; repr(Rust) means the layout is unspecified but offset_of! should still work. I am not sure how to specify a language in a reasonably simple way that allows this.
(I think C may allow this optimization, making container_of-style code illegal in C. But of course tons of code still does it in practice and we all just hope things don't fall apart. I do not consider that an acceptable outcome for Rust.)
and then @comex asks
Well, have you thought about which other optimizations may be impacted by this?
Not sure what you mean -- "this" = having byte-level untyped memory? This is what LLVM does, and to my knowledge all its optimizations are compatible with that model, so I'd say the impact is not big. But I am not an expert in what kinds of optimizations compilers (want to) do. I am coming more from the perspective of how to specify the language to enable optimizations that others tell me are desirable. Given that most languages have an ambiguous or no specification of what exactly constitutes UB, I feel like that's a niche I need to fill. ;)
Being able to separate the fields of a struct even when there are pointers to it is easy in languages with a high-level view of memory, but immensely difficult in C/C++/Rust which allow byte-wise memory manipulations. While it is possible to do so, I think the resulting semantics are inscrutable for programmers. So the real question here is, how much complexity in the spec (with the associated amount of bugs caused by misunderstandings, and complexity in Miri-like tools that aim to reflect the spec) are we willing to accept to enable such optimizations?
Activity
Diggsey commentedon Aug 6, 2020
Why would
offset_of!
break that optimisation? If a reference to a field in a struct escapes, then the provenance of that reference means that the called code cannot access any other fields on the struct, so the compiler could happily put them in registers.RalfJung commentedon Aug 6, 2020
That would use provenance to justify the optimization, which I think is very different from what @comex imagined.
Also, this doesn't work when we talk about a raw pointer to the field instead of a reference.
comex commentedon Aug 6, 2020
Sorry, to clarify:
By "this" I meant
container_of
-type operations, i.e. computing an outer object pointer from an inner object pointer.I was talking about separating the fields of a struct even if there is an escaped pointer to a field – but not pointers to the struct itself, or to any other fields. So the field which had pointers to it would be laid out as normal, but the rest of the struct might not be.
If
container_of
-type operations are forbidden for references, and if the pointer-to-field was computed using references (&foo.bar
not&raw foo.bar
), this optimization should be sound... at least in some cases.At minimum, it should be fully semantics-preserving to allocate stack space for the entire struct, but only write to the field in question, leaving the rest as uninitialized memory. Really that's just a special case of aliasing assumptions combined with dead store elimination.
As a stretch goal, it would also be nice to only allocate stack space for the field, but I suppose there are a few speed bumps. Even if the user wasn't allowed to dereference the result of
container_of
, they could still compute it, and make assumptions about the pointer value itself:I don't know how feasible it is to actually enforce these restrictions, so in practice they might rule out the "only allocate space tor the field" optimization altogether. It might be interesting to consider whether the memory model could be adjusted to free the compiler from having to enforce them. But again, that's a stretch goal; wasting a bit of stack space is usually not a big deal.
RalfJung commentedon Aug 8, 2020
That is what I assumed. This relies on a view of memory not as a chunk of bytes, but as something "tree-like" that is structured according to the types of the data stored in there. C tries to do that. It is a mess.
The question is, how do you forbid them? Languages are not defined by saying with optimizations are allowed, they are defined by a concrete operational semantics that says what happens when you execute the program. To permit optimizations like the one you describe, that operational semantics has to somehow make it such that pointer arithmetic starting at one field of a struct cannot reach other fields of that struct. Provenance can do that, but only when aliasing assumptions come into play -- not when raw pointers are used.
In that case the only avenue left is to make it so that the fields of a struct are not laid out next to each other in some byte-wise fashion, but instead form a tree in memory so that the fields are not next to each other linearly. But then you have to explain why
memcpy
and things like that work, and you end up making a lot of real-world code UB.If the pointer-to-field is raw, there are no aliasing assumptions we can make.
comex commentedon Aug 12, 2020
As I said, this optimization would not be applicable for raw pointers to fields.
RalfJung commentedon Aug 12, 2020
So you'd only do it if there were references to fields, but not if there were any raw pointers that escaped?
Yeah that should work, if we adopt Stacked Borrows or something similar. That would not rely on a different view of memory, so it is a fundamentally very different proposal.
It is in conflict with #134.
[-]What about: optimizations that rely on a higher-level view of memory[/-][+]What about: splitting struct into separate fields despite escaped references (maybe even raw pointers)[/+]&T
rust-lang/rust-memory-model#19Jules-Bertholet commentedon Jul 14, 2023
Varargs potentially wants something like this: https://rust-lang.zulipchat.com/#narrow/stream/136281-t-opsem/topic/Varargs.20SROA.20sadness
Dante-Broggi commentedon Jul 16, 2023
IIUC, Swift does not have field splitting / SROA optimization for unsafe pointers, but does for inout and shared arguments, because all values are semantically "in registers" and the only ways to semantically ensure a value actually has an address is the withUnsafe*Pointer(to:) family of intrinsic functions, or if the value is in a static variable.
A particular example of this is described in a blog post by Matt Gallagher, where only taking the address of the first element of an array passed to
strcpy
causes the tail of the array to simply not be allocated.Reflecting back to Rust, this would result in new reference types
&?addr T
/&?addr mut T
, where the existing reference types would be&addr T
/&addr mut T
if the default would change in a new edition, where the new reference types could only be convertible to the current reference types in limited ways which would not expose the address, perhaps by an intrinsic with a callback API and not directly to pointers or integers.RalfJung commentedon Jul 16, 2023
With strict subobject provenance, I think the same would be allowed in Rust.
I think you are basically suggesting we should have 2 reference types -- one that is strictly limited to the given type, and one that allows accessing surrounding memory under certain conditions?
digama0 commentedon Jul 16, 2023
It sounds like the proposed non-
&addr
reference type would also not have guaranteed address stability. It is unclear to me whether our memory model can support that.RalfJung commentedon Jul 16, 2023
What exactly does that even mean?
In fact we don't even need 2 types for this, just two different operations for taking a reference -- one that restricts provenance to the subobject, and one that does not.
digama0 commentedon Jul 16, 2023
Something like:
is not guaranteed to succeed, because the semantics of the
&noaddr T as usize
operation entails (possibly) puttingT
somewhere in memory and then taking the address of that, so there is no guarantee when you do it multiple times that you get the same result.CAD97 commentedon Jul 16, 2023
I would initially expect a specific reference to be consistent. But it not necessarily being so would also enable converting
fn(&scalar)
tofn(scalar)
.Plus the original spitball would prevent converting to
usize
at all for the different ref type, IIUC. I don't think it's practical to have a split reference type, though; it would assuredly lead to more "color" style complaints and just restricting provenance should be possible to make sufficient, if cross-provenance pointer ops get treated like cross-alloc.RalfJung commentedon Jul 16, 2023
How does that relate to SROA? Is the point that we must not be able to tell whether two references have the right distance to match the layout of the original struct?
digama0 commentedon Jul 16, 2023
The point is that
&noaddr T
is really only a "logical pointer", it doesn't necessarily have the trappings associated with a pointer in memory. This is what you need to make substantial calling convention changes like SROA as optimizations.I agree re: the high bar of adding another pointer type, though. Having this kind of pervasive non-guaranteed address would be useful for a lot of things, but also problematic for many other things so I'm not sure we want to adopt this semantics for
&T
, and without that it doesn't seem worth it.Jules-Bertholet commentedon Jul 16, 2023
I'm not sure we need something so radical as
&noaddr T
. SROA doesn't need fully nondeterministic addresses, it needs provenance slicing and a loosening of guarantees around offsets. I see no reason whyx as usize == x as usize
should ever not hold.digama0 commentedon Jul 17, 2023
I know that looks a bit odd when written that way, but it's similar to the non-guarantee around
&0 as usize == &0 as usize
. The&noaddr T
is coerced to a&T
before conversion tousize
, and when you do that coercion twice you might get pointers to different locations. (You probably won't, but there is no guarantee of such.)If we wanted to be more explicit about it, we would say that
&noaddr T as usize
is a compile error and then the code looks like&*x as usize == &*x as usize
, which maybe looks more reasonable to be non-guaranteed (or maybe not, I don't know).RalfJung commentedon Jul 17, 2023
When
&noaddr
are created to e.g.x.0
andx.1
, with the offset between those fields being observed in other instances of the same type, then I don't see how we can do SROA without full address non-determinism. Though possibly the non-determinism could occur when the reference is created? Instead of it being a different type,&noaddr <place>
could somehow allow putting the data to a different location. Though with mutable references this all becomes quite tricky as the mutation must be visible through the original place after the lifetime is over.Jules-Bertholet commentedon Jul 18, 2023
At the latest. I think it could even be moved earlier, to when the struct is created.
I don't think it makes sense to have new reference types just for this, either we find a way to retrofit SROA onto existing references, or we don't do it at all. But the set of guarantees I am picturing would look something like this:
offset_of!