Skip to content

Missed optimization: multiple instances of a small struct don't reuse the stack allocation #141649

@ohadravid

Description

@ohadravid
Contributor

When creating multiple instances of a small struct, each instance will be allocated separately on the stack even if they are known never to overlap.

Example: the following code will generate two alloca calls that are not optimized away by LLVM:

(Godbolt)

pub struct WithOffset<T> {
    pub data: T,
    pub offset: usize,
}

#[inline(never)]
pub fn use_w(w: WithOffset<&[u8; 16]>) {
    std::hint::black_box(w);
}

#[inline(never)]
pub fn peek_w(w: &WithOffset<&[u8; 16]>) {
    std::hint::black_box(w);
}

pub fn offsets(buf: [u8; 16]) {
    let w = WithOffset {
        data: &buf,
        offset: 0,
    };

    peek_w(&w);
    use_w(w);

    let w2 = WithOffset {
        data: &buf,
        offset: 1,
    };

    peek_w(&w2);
    use_w(w2);
}

LLVM IR:

; playground::offsets
; Function Attrs: noinline nounwind
define internal fastcc void @playground::offsets(ptr noalias nocapture noundef nonnull readonly align 1 dereferenceable(16) %buf) unnamed_addr #0 {
start:
  %w2 = alloca [16 x i8], align 8
  %w = alloca [16 x i8], align 8
  store ptr %buf, ptr %w, align 8
  %0 = getelementptr inbounds nuw i8, ptr %w, i64 8
  store i64 0, ptr %0, align 8
; call playground::peek_w
  call fastcc void @playground::peek_w(ptr noalias noundef readonly align 8 dereferenceable(16) %w) #88
; call playground::use_w
  call fastcc void @playground::use_w(ptr noalias noundef readonly align 1 dereferenceable(16) %buf, i64 noundef 0) #88
  store ptr %buf, ptr %w2, align 8
  %1 = getelementptr inbounds nuw i8, ptr %w2, i64 8
  store i64 1, ptr %1, align 8
; call playground::peek_w
  call fastcc void @playground::peek_w(ptr noalias noundef readonly align 8 dereferenceable(16) %w2) #88
; call playground::use_w
  call fastcc void @playground::use_w(ptr noalias noundef readonly align 1 dereferenceable(16) %buf, i64 noundef 1) #88
  ret void
}

It seems like a call to @llvm.lifetime.{start,end}.p0 is missing. If we instead use:

pub fn closures(buf: [u8; 16]) {
    (|| {
        let w = WithOffset {
            data: &buf,
            offset: 0,
        };

        peek_w(&w);
        use_w(w);
    })();

    (|| {
        let w2 = WithOffset {
            data: &buf,
            offset: 1,
        };

        peek_w(&w2);
        use_w(w2);
    })();
}

We do get them and the second alloca is optimized away (see the Godbolt link).

I encountered this when working on memorysafety/rav1d#1402, where this misoptimization results in over 100 bytes of extra allocations in a specific function, which slows down the entire binary by ~0.5%.

This might also be related to #138544

Activity

added
needs-triageThis issue may need triage. Remove it if it has been sufficiently triaged.
on May 27, 2025
changed the title [-]Missed optimization: creating multiple instances of a small struct results don't reuse the stack allocation[/-] [+]Missed optimization: multiple instances of a small struct don't reuse the stack allocation[/+] on May 27, 2025
saethlin

saethlin commented on May 27, 2025

@saethlin
Member

This might also be related to #138544

Can you explain how? That issue is about a missing MIR optimization, and this issue is about an LLVM optimization.

ohadravid

ohadravid commented on May 27, 2025

@ohadravid
ContributorAuthor

I thought it might be related to that issue because there's a "missing" StorageDead call - but maybe I just got confused 🙂

added
T-compilerRelevant to the compiler team, which will review and decide on the PR/issue.
C-optimizationCategory: An issue highlighting optimization opportunities or PRs implementing such
A-codegenArea: Code generation
A-LLVMArea: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues.
and removed
needs-triageThis issue may need triage. Remove it if it has been sufficiently triaged.
on May 27, 2025
hanna-kruppe

hanna-kruppe commented on May 27, 2025

@hanna-kruppe
Contributor

I suspect the desired optimization depends on unresolved details of Rust's operational semantic (e.g., rust-lang/unsafe-code-guidelines#188). Does moving out of w to call use_w have any special significance, in particular, does it guarantee the storage can be deallocated eagerly since nothing appears to re-initialize it? What if WithOffset were Copy? It's possible that stacked borrows or tree borrows can justify it by saying that the reference that escaped into peek_w can't be used anymore after the function returns (but I wouldn't bet on it either). However, it's hard to justify optimizations based on an aliasing model that isn't agreed upon yet.

hanna-kruppe

hanna-kruppe commented on May 27, 2025

@hanna-kruppe
Contributor

For example, miri (without any flags, at least) doesn't take any issue with this program that stashes a pointer to x1 in the first call to peek and then reads through that pointer in the second call to peek:

use std::ptr;
use std::cell::Cell;

thread_local !{
    static LAST_X: Cell<*const u32> = const { Cell::new(ptr::null()) };
} 

fn peek(x: &u32) -> u32 {
    let last_x: *const u32 = LAST_X.get();
    let result: u32 = if last_x.is_null() {
        *x
    } else {
        unsafe { *last_x }
    };
    LAST_X.set(x);
    result
}


fn main() {
    let x1 = 5;
    dbg!(peek(&x1));
    let x2 = 8;
    dbg!(peek(&x2));
}
saethlin

saethlin commented on May 27, 2025

@saethlin
Member

The discussion in #138544 about storage markers is not related. It is very easy to think you've found a pattern in MIR that can be optimized on with trivial analysis but write a pass that is unsound. I've done it too.

To this specific issue, I think the storage markers are what we want in -Zmir-opt-level=0, and yet the assembly seems unchanged. @hanna-kruppe can you check me on that?

hanna-kruppe

hanna-kruppe commented on May 27, 2025

@hanna-kruppe
Contributor

In the MIR I'm looking at the relevant locals in offsets are _2 and _9, both of which are live until right before the return in bb4. (_8 and _15 are short-lived temporaries only live around during the use_w calls.) This is what I would expect: matching the surface Rust semantics before any optimizations, but would mean they have to go in disjoint stack slots because they have overlapping lifetimes.

saethlin

saethlin commented on May 27, 2025

@saethlin
Member

Ah! I was looking at the wrong locals. You are right.

ohadravid

ohadravid commented on May 27, 2025

@ohadravid
ContributorAuthor

@hanna-kruppe that's a very surprising example 😮
BTW - in the Rav1d case, it is a Copy struct.

I think there's something more going on. If I use blocks like this:

pub fn offsets(buf: [u8; 16]) {
    {
        let w = WithOffset {
            data: &buf,
            offset: 0,
        };

        peek_w(&w);
        use_w(w);
    }

    {
        let w2 = WithOffset {
            data: &buf,
            offset: 1,
        };

        peek_w(&w2);
        use_w(w2);
    }
}

There are still be two alloca calls, but Miri will complain about:

fn main() {
    {
        let x1 = 5;
        dbg!(peek(&x1));
    }
    {
        let x2 = 8;
        dbg!(peek(&x2));
    }
}

so I maybe in this case the semantics are defined?

17 remaining items

Loading
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Metadata

Assignees

No one assigned

    Labels

    A-MIRArea: Mid-level IR (MIR) - https://blog.rust-lang.org/2016/04/19/MIR.htmlA-codegenArea: Code generationA-mir-optArea: MIR optimizationsC-optimizationCategory: An issue highlighting optimization opportunities or PRs implementing suchT-compilerRelevant to the compiler team, which will review and decide on the PR/issue.T-opsemRelevant to the opsem team

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

      Development

      No branches or pull requests

        Participants

        @nikic@hanna-kruppe@ohadravid@saethlin@scottmcm

        Issue actions

          Missed optimization: multiple instances of a small struct don't reuse the stack allocation · Issue #141649 · rust-lang/rust