-
Notifications
You must be signed in to change notification settings - Fork 211
Introduce SHF_AARCH64_LARGE section flag #326
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
This is similar to SHF_X86_64_LARGE and allows custom section names to be marked as LARGE and hence moved away to outer edges of the binary to reduce relocation pressure.
|
Can you give some more details about how this will be used. Thanks to SHF_EXCLUDE being erroneously defined in SHF_MASKPROC we've only got 2 processor specific flags available in the SHF_MASKPROC space and this will take one of them. We've got to be careful that this is the best use of the flag. If this is going to affect multiple processor architectures and not just AArch64 and x86_64 is a more processor neutral situation more appropriate as each processor only has 3 flags available in practice. Ideally SHF_EXCLUDE should be deprecated and recoded but I can't see that happening with the generic ELF spec in limbo. For known sections to be moved away at link time, a naming convention could be used. You mention custom sections, is this just to distinguish sections with custom names that do not follow the naming convention to be identified as large? |
|
(Some notes on https://maskray.me/blog/2023-05-14-relocation-overflow-and-code-models) During my time at Google, we encountered relocation overflow issues with large x86-64 executables built with specific instrumentations like -fprofile-generate and various -fsanitize=, at optimization levels -O1 and even -O3. (Unoptimized -O0 builds with these instrumentations would have exacerbated the problem.) Within the large Bazel monorepo, we aimed to implement toolchain settings to mitigate this relocation overflow pressure. I believe that ld.lld --default-script could offer an elegant solution by allowing us to mark specific data sections from instrumentation passes. Unfortunately, I didn't have the opportunity to deploy this during my tenure at Google. Instead, our approach was to utilize This section flag oriented choice was primarily because this flag has been available for over a decade and some folks disliked a default linker script. If I were designing SHF_X86_64_LARGE, I would have been cautious about allocating a bit from the SHF_MASKPROC range (only 4, or 3 if we exclude SHF_EXCLUDE). Before allocating another bit from SHF_MASKPROC for a different architecture, I would prioritize adding --default-script to the toolchain settings. |
Yes, for example some Nvidia sections such as As Fangrui mentioned, while default linker script can do the job for these custom sections, marking a section as large and letting linker push them to outer edges of binary to reduce relocation pressure, as it does on x86, is more elegant and scalable solution, IMHO. Because nv_fatbin is just one example; there are other custom section names that we would like to handle in this way. I wasn't aware of already-scarce processor-specific section flag bits. So I understand the concerns. I like the idea of marking such a flag processor-neutral in that case although not all target may need such a flag. |
|
Would it be feasible to use say |
I am not sure if runtime would be okay for these sections to be renamed. |
|
The two cases I'm most familiar for when the precise section name is used is:
As I understand it, marking a section I can see that adding a flag makes it easier for linkers to do the right thing (without needing a linker script) for sections that cannot conform to the required naming conventions. However I'm not yet convinced it is worth burning 50% of our remaining processor section flag space for it. Given that this should affect any architecture that has needed to define code-models (x86_64, AArch64, RISCV64 etc.) then I think it would be best to try and find an architecture neutral identifier. If we can't use a section flag in the generic or OS space, there may be other less elegant, but acceptable alternatives to marking a section as large. [1] |
Since the section would be extern, one can always refer to it using PIC/PIE addressing without needing a large code model (which currently does not support PIC/PIE). In principle the linker could automatically detect huge sections and sort them differently. This would avoid the majority of scenarios where people run into relocation range issues due to a huge section or array. |
I think that could work for AArch64 although it likely end up scanning relocations and marking data sections that are "not large" (contain a non GOT generating ADRP/ADR/LDR relocation to a symbol defined by the section). What remains are the sections that can be moved to the end of the program. Not sure how easily that would generalise to other architectures though. |
|
I was thinking "any section > 1GB goes at the end" by default, but checking relocations would be even better. Then you could stop worrying about code models and just emit GOT relocations and let the linker deal with it (including optimizing them back to ADRP if in range). |
|
(I am still travelling with limited computer access) I understand and agree that the pushback to a section flag is reasonable. Building on my previous comment,
I strongly recommend that Google tests this before adding an ELF marker mechanism to LLVM/assembler/linker. |
|
Thanks for discussion. I agree and am convinced that it's not ideal to burn processor-specific flags for this. That leaves us with few options:
I am somewhat inclined towards (1). We have looked into (3) and I think it has its limitations (without improving support for linker scripts) -- for example, |
Google actually has this default linker script now precisely for the purpose of aligning program segments to huge page boundaries, but I would say that the usability of it so far has been pretty poor. I should probably read more about linker script features and syntax, but I think the whole "insert after/before" model introduces a lot of program-global dependencies on the existence of non-existence of sections like The .init_array priority numbering scheme comes to mind for me as perhaps possible prior art for controlling section ordering, but it's not great. Mainly, though, I take the point that processor-specific flags are scarce, and we can drop the case for arm flags. I do think it is worth going down the path of an ISA / OS neutral section flag, since after quickly glancing at LLVM ELF.h, they appear less scarce, and the flag encodes a promise that generated code will use general access patterns, i.e. a GOT load in the general case, or whatever pattern is used for a cross-DSO reference. |
This is similar to SHF_X86_64_LARGE and allows custom section names to be marked as LARGE and hence moved away to outer edges of the binary to reduce relocation pressure.