Skip to content

Conversation

@Wilco1
Copy link
Contributor

@Wilco1 Wilco1 commented Sep 15, 2025

Make it clearer that a GOT sequence can only be optimized if all GOT relocations to the symbol are part of a sequence.

Make it clearer that a GOT sequence can only be optimized if all GOT
relocations to the symbol are part of a sequence.
@smithp35
Copy link
Contributor

Can I check what you mean by All GOT relocations? To give an example

// Relocations in sequence, for the sake of the example; out of range
ADRP  x0, :got: symbol            // R_<CLS>_ADR_GOT_PAGE
LDR   x0, [x0 :got_lo12: symbol]  // R_<CLS>_LD64_GOT_LO12_NC
...
// Relocations in sequence, in range
ADRP  x1, :got: symbol            // R_<CLS>_ADR_GOT_PAGE
LDR   x1, [x1 :got_lo12: symbol]  // R_<CLS>_LD64_GOT_LO12_NC
...
// Instructions not in sequence, although relocations could be consecutive
ADRP  x2, :got: symbol            // R_<CLS>_ADR_GOT_PAGE
NOP
LDR   x2, [x1 :got_lo12: symbol]  // R_<CLS>_LD64_GOT_LO12_NC

I think you mean that in each of these three pairs of GOT relocations to symbol then all must be optimised or none. This would avoid the problem observed in llvm/llvm-project#138254 which had the following example:

foo:
        cmp     x0, 0
        bge     .L8
        adrp    x2, :got:b // Optimised to adrp x2, b
.L9:
        ldr     x2, [x2, :got_lo12:b] // Optimised to adr x2, [x2 :lo12: b]
        add     x0, x2, x1
        b       bar
        .p2align 2,,3
.L8:
        stp     x29, x30, [sp, -32]!
        adrp    x2, :got:b // Can't be optimised due to add that is in way.
        add     x0, x0, 1 // although in theory could be out of range of b.
        ldr     x3, [x2, :got_lo12:b]
        add     x0, x3, x0
        mov     x29, sp
        stp     x2, x1, [sp, 16]
        bl      bar
        ldp     x2, x1, [sp, 16]
        ldp     x29, x30, [sp], 32
        b .L9 // Jump to after optimised ADRP expecting the ldr, but getting adr.

@Wilco1
Copy link
Contributor Author

Wilco1 commented Sep 16, 2025

No - as described, all must be part of a sequence in order to be optimizable since that is proof they are independent. However not all need to be optimized (eg. if out of range). So in your example only the 3rd ADRP is not a valid sequence and thus blocks the optimization for the other 2.

An alternative would be to optimize all or nothing without considering pairs. But then a single ADRP that ends up out of range would block all optimizations, making it unsuitable for the medium/large model. Also keeping the ADRP/LDR as pairs results in better code quality.

@smithp35
Copy link
Contributor

I think the problem observed llvm/llvm-project#138254 could still happen with valid sequences, with at least one out of range. I agree it would be vanishingly unlikely and would be more difficult to check for.

I'd like to have a think about the wording and will make some suggestions. Hopefully tomorrow.

@Wilco1
Copy link
Contributor Author

Wilco1 commented Sep 16, 2025

No it can't happen with only valid sequences since you cannot ever leak the value of the ADRP since the LDR in the next instruction overwrites it. Thus it is impossible to branch into the middle of a sequence.

@smithp35
Copy link
Contributor

OK, I see that in the problem example the destination registers in the ADRP and LDR are different so this would be an invalid sequence even if the relocations were consecutive.

This does mean that it is insufficient to just look at the relocations. For example if I hand edit the LLVM example then all the relocations to b are in sequence, but we'd need to look at the instructions to detect the different destination registers.

I guess it means we'll need to be clear about what we mean by valid sequence.

foo:
        cmp     x0, 0
        bge     .L8
        adrp    x2, :got:b // Optimised to adrp x2, b
.L9:
        ldr     x2, [x2, :got_lo12:b] // Optimised to adr x2, [x2 :lo12: b]
        add     x0, x2, x1
        b       bar
        .p2align 2,,3
.L8:
        stp     x29, x30, [sp, -32]!
        adrp    x2, :got:b // Invalid sequence as destination registers don't match
        ldr     x3, [x2, :got_lo12:b] // but relocations are consecutive.
        add     x0, x0, 1 // reordered by hand 
        add     x0, x3, x0
        mov     x29, sp
        stp     x2, x1, [sp, 16]
        bl      bar
        ldp     x2, x1, [sp, 16]
        ldp     x29, x30, [sp], 32
        b .L9 // Jump to after optimised ADRP expecting the ldr, but getting adr.

@Wilco1
Copy link
Contributor Author

Wilco1 commented Sep 16, 2025

The basic conditions explained above obviously still apply:

  • The relocations apply to consecutive instructions in the order specified.
  • The relocations use the same symbol.
  • The relocated instructions have the same source and destination register.

I thought the specification was clear enough... Perhaps we need to explain the optimization algorithm in pseudo code step-by-step?

Copy link
Contributor

@smithp35 smithp35 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reading through again. I think we've got a set of "base" or "common" conditions on line 1425 for all sequences. Then there's a "specific" conditions for each individual sequence. For the Large GOT indirection, starting at line 1431.

From your comment above about one of the sequences being out of range being OK, then I think we'll need to make sure that "valid" sequence doesn't include symbol is within range of the R_<CLS>_ADR_PREL_PG_HI21 relocation.

I think some examples will help.

- The relocations do not appear separately or in a different order.

In this case each set of relocations is independent and may be optimized. The following sequences are defined:
The following sequences are defined:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Line 1425 above (can't put the comment on the line). I suggest we insert a "base" before conditions
"if all the following base conditions are true."

- ``symbol`` does not have a ``st_shndx`` of ``SHN_ABS`` or the output is not required to be position independent.
- ``symbol`` is within range of the ``R_<CLS>_ADR_PREL_PG_HI21`` relocation.
- The addend of both relocations is zero.
- All ``R_<CLS>_ADR_GOT_PAGE`` and ``R_<CLS>_LD64_GOT_LO12_NC`` relocations to ``symbol`` are part of a sequence.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this can be combined with the line below and removed from the specific conditions. For example

If the R_<CLS>_ADR_GOT_PAGE and R_<CLS>_LD64_GOT_LO12_NC relocations to symbol are used outside a sequence that satisfies the base conditions then the Large GOT indirection optimization is not legal for symbol.

Examples that prevent the Large GOT indirection optimization for symbol:

::

  // Different destination register
  ADRP  x0, :got: symbol            // R_<CLS>_ADR_GOT_PAGE
  LDR   x1, [x0 :got_lo12: symbol]  // R_<CLS>_LD64_GOT_LO12_NC

  // Instructions not in sequence.
  ADRP  x0, :got: symbol            // R_<CLS>_ADR_GOT_PAGE
  NOP
  LDR   x0, [x0 :got_lo12: symbol]  // R_<CLS>_LD64_GOT_LO12_NC

A linker may avoid creating a GOT entry if no other GOT relocations exist for the symbol.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants