-
Notifications
You must be signed in to change notification settings - Fork 13.6k
Open
Labels
A-MIRArea: Mid-level IR (MIR) - https://blog.rust-lang.org/2016/04/19/MIR.htmlArea: Mid-level IR (MIR) - https://blog.rust-lang.org/2016/04/19/MIR.htmlA-codegenArea: Code generationArea: Code generationA-mir-optArea: MIR optimizationsArea: MIR optimizationsC-optimizationCategory: An issue highlighting optimization opportunities or PRs implementing suchCategory: An issue highlighting optimization opportunities or PRs implementing suchT-compilerRelevant to the compiler team, which will review and decide on the PR/issue.Relevant to the compiler team, which will review and decide on the PR/issue.T-opsemRelevant to the opsem teamRelevant to the opsem team
Description
When creating multiple instances of a small struct, each instance will be allocated separately on the stack even if they are known never to overlap.
Example: the following code will generate two alloca
calls that are not optimized away by LLVM:
(Godbolt)
pub struct WithOffset<T> {
pub data: T,
pub offset: usize,
}
#[inline(never)]
pub fn use_w(w: WithOffset<&[u8; 16]>) {
std::hint::black_box(w);
}
#[inline(never)]
pub fn peek_w(w: &WithOffset<&[u8; 16]>) {
std::hint::black_box(w);
}
pub fn offsets(buf: [u8; 16]) {
let w = WithOffset {
data: &buf,
offset: 0,
};
peek_w(&w);
use_w(w);
let w2 = WithOffset {
data: &buf,
offset: 1,
};
peek_w(&w2);
use_w(w2);
}
LLVM IR:
; playground::offsets
; Function Attrs: noinline nounwind
define internal fastcc void @playground::offsets(ptr noalias nocapture noundef nonnull readonly align 1 dereferenceable(16) %buf) unnamed_addr #0 {
start:
%w2 = alloca [16 x i8], align 8
%w = alloca [16 x i8], align 8
store ptr %buf, ptr %w, align 8
%0 = getelementptr inbounds nuw i8, ptr %w, i64 8
store i64 0, ptr %0, align 8
; call playground::peek_w
call fastcc void @playground::peek_w(ptr noalias noundef readonly align 8 dereferenceable(16) %w) #88
; call playground::use_w
call fastcc void @playground::use_w(ptr noalias noundef readonly align 1 dereferenceable(16) %buf, i64 noundef 0) #88
store ptr %buf, ptr %w2, align 8
%1 = getelementptr inbounds nuw i8, ptr %w2, i64 8
store i64 1, ptr %1, align 8
; call playground::peek_w
call fastcc void @playground::peek_w(ptr noalias noundef readonly align 8 dereferenceable(16) %w2) #88
; call playground::use_w
call fastcc void @playground::use_w(ptr noalias noundef readonly align 1 dereferenceable(16) %buf, i64 noundef 1) #88
ret void
}
It seems like a call to @llvm.lifetime.{start,end}.p0
is missing. If we instead use:
pub fn closures(buf: [u8; 16]) {
(|| {
let w = WithOffset {
data: &buf,
offset: 0,
};
peek_w(&w);
use_w(w);
})();
(|| {
let w2 = WithOffset {
data: &buf,
offset: 1,
};
peek_w(&w2);
use_w(w2);
})();
}
We do get them and the second alloca
is optimized away (see the Godbolt link).
I encountered this when working on memorysafety/rav1d#1402, where this misoptimization results in over 100 bytes of extra allocations in a specific function, which slows down the entire binary by ~0.5%.
This might also be related to #138544
scottmcm
Metadata
Metadata
Assignees
Labels
A-MIRArea: Mid-level IR (MIR) - https://blog.rust-lang.org/2016/04/19/MIR.htmlArea: Mid-level IR (MIR) - https://blog.rust-lang.org/2016/04/19/MIR.htmlA-codegenArea: Code generationArea: Code generationA-mir-optArea: MIR optimizationsArea: MIR optimizationsC-optimizationCategory: An issue highlighting optimization opportunities or PRs implementing suchCategory: An issue highlighting optimization opportunities or PRs implementing suchT-compilerRelevant to the compiler team, which will review and decide on the PR/issue.Relevant to the compiler team, which will review and decide on the PR/issue.T-opsemRelevant to the opsem teamRelevant to the opsem team
Type
Projects
Milestone
Relationships
Development
Select code repository
Activity
[-]Missed optimization: creating multiple instances of a small struct results don't reuse the stack allocation[/-][+]Missed optimization: multiple instances of a small struct don't reuse the stack allocation[/+]saethlin commentedon May 27, 2025
Can you explain how? That issue is about a missing MIR optimization, and this issue is about an LLVM optimization.
ohadravid commentedon May 27, 2025
I thought it might be related to that issue because there's a "missing"
StorageDead
call - but maybe I just got confused 🙂hanna-kruppe commentedon May 27, 2025
I suspect the desired optimization depends on unresolved details of Rust's operational semantic (e.g., rust-lang/unsafe-code-guidelines#188). Does moving out of
w
to calluse_w
have any special significance, in particular, does it guarantee the storage can be deallocated eagerly since nothing appears to re-initialize it? What ifWithOffset
wereCopy
? It's possible that stacked borrows or tree borrows can justify it by saying that the reference that escaped intopeek_w
can't be used anymore after the function returns (but I wouldn't bet on it either). However, it's hard to justify optimizations based on an aliasing model that isn't agreed upon yet.hanna-kruppe commentedon May 27, 2025
For example, miri (without any flags, at least) doesn't take any issue with this program that stashes a pointer to
x1
in the first call topeek
and then reads through that pointer in the second call topeek
:saethlin commentedon May 27, 2025
The discussion in #138544 about storage markers is not related. It is very easy to think you've found a pattern in MIR that can be optimized on with trivial analysis but write a pass that is unsound. I've done it too.
To this specific issue, I think the storage markers are what we want in
-Zmir-opt-level=0
, and yet the assembly seems unchanged. @hanna-kruppe can you check me on that?hanna-kruppe commentedon May 27, 2025
In the MIR I'm looking at the relevant locals in
offsets
are_2
and_9
, both of which are live until right before thereturn
in bb4. (_8
and_15
are short-lived temporaries only live around during theuse_w
calls.) This is what I would expect: matching the surface Rust semantics before any optimizations, but would mean they have to go in disjoint stack slots because they have overlapping lifetimes.saethlin commentedon May 27, 2025
Ah! I was looking at the wrong locals. You are right.
ohadravid commentedon May 27, 2025
@hanna-kruppe that's a very surprising example 😮
BTW - in the Rav1d case, it is a
Copy
struct.I think there's something more going on. If I use blocks like this:
There are still be two
alloca
calls, but Miri will complain about:so I maybe in this case the semantics are defined?
17 remaining items