Skip to content

Commit 8c21467

Browse files
committed
[sysvabi64] Move TLSDESC resolvers to separate design doc
The TLSDESC resolver functions are not ABI so we can move them out of the sysvabi64 document. Providing some examples that can be used by a dynamic linker is still useful so move this to the design documents section. Add a comment about DTV surplus TLS that permits a dynamic loader to dlopen a DSO with initial-exec TLS. There can be a small number of performance critical shared-libraries that use initial exec TLS, but are expected to be opened via dlopen, particularly by scripting languages like python.
1 parent 670c11d commit 8c21467

File tree

2 files changed

+150
-114
lines changed

2 files changed

+150
-114
lines changed
Lines changed: 137 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,137 @@
1+
..
2+
Copyright (c) 2023-2025, Arm Limited and its affiliates. All rights reserved.
3+
CC-BY-SA-4.0 AND Apache-Patent-License
4+
See LICENSE file for details
5+
6+
.. _SYSVABI64: https://github.com/ARM-software/abi-aa/tree/main/sysvabi64/sysvabi64.rst
7+
.. _TLSDESC: http://www.fsfla.org/~lxoliva/writeups/TLS/RFC-TLSDESC-ARM.txt
8+
9+
Thread Local Storage TLSDESC resolver functions
10+
***********************************************
11+
12+
Preamble
13+
========
14+
15+
Background
16+
----------
17+
18+
The ``R_AARCH64_TLSDESC`` dynamic relocation is platform specific. The
19+
dynamic loader is expected to choose an appropriate resolver function
20+
for the context. This document provides some example resolver
21+
functions.
22+
23+
These examples are for illustrative purposes only. There is no
24+
requirement for any of the following resolver functions to be
25+
implemented.
26+
27+
The ABI requirements for calling convention of resolver functions is
28+
described in `SYSVABI64`_.
29+
30+
Example Resolver Functions
31+
^^^^^^^^^^^^^^^^^^^^^^^^^^
32+
33+
Due to the restrictions on calling convention, the
34+
resolver routines must be written in assembly language.
35+
36+
Static TLS Specialization:
37+
38+
When the TLS variable is in the static TLS block, the offset from the
39+
thread pointer is fixed at runtime. The dynamic loader can calculate
40+
the offset and place it in the TLS descriptor. All the static TLS
41+
resolver function needs to do is extract the offset and return it.
42+
43+
.. code-block:: asm
44+
45+
_dl_tlsdesc_return:
46+
// x0 contains pointer to struct tlsdesc.
47+
// tlsdesc.argument.value contains offset of variable from TP
48+
ldr x0, [x0, #8]
49+
ret
50+
51+
Dynamic TLS Specialization:
52+
53+
When the TLS variable is defined in dynamic TLS the address of the TLS
54+
variable must be calculated by the resolver function using
55+
``__tls_get_addr``. The resolver function returns the offset from the
56+
thread pointer by subtracting the address of the thread pointer from
57+
the address of the TLS variable. In practice an implementation of the
58+
dynamic TLS resolver contains many platform specific details outside
59+
of the scope of the ABI. An example of how a dynamic resolver might be
60+
implemented can be found in the Dynamic Specialization section of
61+
TLSDESC_.
62+
63+
Undefined Weak Symbols
64+
65+
An undefined weak symbol has the value 0. As the resolver function
66+
returns an offset from the Thread Pointer, to get a value of 0 when
67+
added to the Thread Pointer the resolver function returns a negative
68+
thread pointer value that cancels to 0 when added to the thread
69+
pointer.
70+
71+
.. code-block:: asm
72+
73+
__dl_tlsdesc_undefweak:
74+
mrs x0, tpidr_el0
75+
neg x0, x0
76+
ret
77+
78+
Lazy resolution of R_AARCH64_TLSDESC
79+
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
80+
81+
The TLSDESC_ paper describes an optional mechanism to resolve TLSDESC
82+
calls lazily. Lazy resolution for TLSDESC resolver functions is not
83+
recommended on AArch64. Additional synchronization is required for
84+
each TLSDESC call, which has a significant affect on performance. The
85+
description below describes the additional synchronization that is
86+
needed.
87+
88+
Instead of fully resolving the ``R_AARCH64_TLSDESC`` relocation at
89+
module load time, a lazy resolver function runs on the first TLSDESC
90+
call. The lazy resolver updates the TLS Descriptor with the actual
91+
resolver function and the parameter to the actual resolver
92+
function. In a multi-threaded program when lazy TLS in use, the
93+
resolver functions must ensure that the write to the parameter in the
94+
TLS descriptor has completed before reading it.
95+
96+
.. code-block:: asm
97+
98+
// Code to obtain the offset of var from thread pointer.
99+
// Loads the address of the resolver function into x1.
100+
// Places the address of the TLS Descriptor into x0.
101+
adrp x0, :tlsdesc:var
102+
ldr x1, [x0, #:tlsdesc_lo12:var]
103+
add x0, x0, #:tlsdesc_lo12:var]
104+
.tlsdesccall var
105+
blr x1 // _dl_desc_return
106+
107+
// Resolver function
108+
_dl_tlsdesc_return:
109+
// load the parameter from the TLS descriptor. Without
110+
// synchronization this load can read an old value prior
111+
// to the lazy resolvers update to the descriptor completing.
112+
ldr x0, [x0, #8]
113+
ret
114+
115+
The recommended way to ensure synchronization between the lazy
116+
resolver update of the TLS Descriptor and the actual resolver function
117+
accessing the TLS Descriptor is:
118+
119+
* The TLS lazy resolver function uses a store release when updating
120+
the address of the resolver function in the TLS Descriptor.
121+
122+
* The actual entry function uses a load acquire on the address of the
123+
resolver function, with a destination register of xzr.
124+
125+
Referring to the example above, the code for the resolver function
126+
becomes:
127+
128+
.. code-block:: asm
129+
130+
// Resolver function
131+
_dl_tlsdesc_return:
132+
// Guaranteed to complete after the lazy resolvers store release
133+
// of the address in [x0].
134+
ldar xzr, [x0]
135+
// Access the parameter.
136+
ldr x0, [x0, #8]
137+
ret

sysvabi64/sysvabi64.rst

Lines changed: 13 additions & 114 deletions
Original file line numberDiff line numberDiff line change
@@ -25,6 +25,7 @@
2525
.. _SYSVABI: https://github.com/ARM-software/abi-aa/releases
2626
.. _ELFTLS: https://www.uclibc.org/docs/tls.pdf
2727
.. _TLSDESC: http://www.fsfla.org/~lxoliva/writeups/TLS/RFC-TLSDESC-ARM.txt
28+
.. _TLSDESCRES: https://github.com/ARM-software/abi-aa/tree/main/design-documents/tlsdesc-resolvers.txt
2829

2930
.. role:: c(code)
3031
:language: c
@@ -265,6 +266,8 @@ This document refers to, or is referred to by, the following documents.
265266
+-----------------+--------------------------------------------------------------+-----------------------------------------------------------------------------+
266267
| SYM-VER_ | http://people.redhat.com/drepper/symbol-versioning | GNU Symbol Versioning |
267268
+-----------------+--------------------------------------------------------------+-----------------------------------------------------------------------------+
269+
| TLSDESCRES_ | design-documents/tlsdesc-resolvers | TLSDESC resolver function examples |
270+
+-----------------+--------------------------------------------------------------+-----------------------------------------------------------------------------+
268271

269272
Terms and Abbreviations
270273
-----------------------
@@ -2268,8 +2271,12 @@ thread pointer and places it in a GOT entry. The GOT entry is
22682271
relocated by dynamic relocation ``R_AARCH64_TLS_TPREL64``.
22692272

22702273
A shared-library that contains Initial Exec TLS must have the
2271-
``DF_STATIC_TLS`` dynamic tag set. An attempt to load a shared library
2272-
with ``DF_STATIC_TLS`` via ``dlopen`` will be rejected.
2274+
``DF_STATIC_TLS`` dynamic tag set. In the general case an attempt to
2275+
load a shared library with ``DF_STATIC_TLS`` via ``dlopen`` will be
2276+
rejected. Some dynamic loaders implement a surplus of DTV slots that
2277+
permit a fixed number of ``DF_STATIC_TLS`` modules to be dynamically
2278+
loaded. Whether a DTV surplus is available and how many slots are
2279+
available is implementation defined.
22732280

22742281
Small Code model;
22752282

@@ -2430,7 +2437,7 @@ GOT entry, and the argument for the chosen resolver function in the
24302437
second GOT entry.
24312438

24322439
The AArch64 C and assembler examples are adapted from the AArch32
2433-
TLSDESC_ paper. The C code below represents the TLS Descriptor.
2440+
`TLSDESC`_ paper. The C code below represents the TLS Descriptor.
24342441

24352442
.. code-block:: c
24362443
@@ -2452,6 +2459,9 @@ The TLS resolver functions are not standardized by this ABI as they
24522459
are internal to the dynamic linker. Programs must not directly refer
24532460
to TLS resolver functions.
24542461

2462+
The `TLSDESCRES`_ document contains information on how a platform
2463+
might implement the resolver functions.
2464+
24552465
Calling Convention
24562466
^^^^^^^^^^^^^^^^^^
24572467

@@ -2467,117 +2477,6 @@ TLS resolver functions are not required to save any register added by
24672477
an extension, such as the scalable vector registers or the SVE
24682478
predicate registers. See `GCCML`_ for details.
24692479

2470-
Example Resolver Functions
2471-
^^^^^^^^^^^^^^^^^^^^^^^^^^
2472-
2473-
These examples are for illustrative purposes only. There is no
2474-
requirement for any of the following resolver functions to be
2475-
implemented. Due to the restrictions on calling convention, the
2476-
resolver routines must be written in assembly language.
2477-
2478-
Static TLS Specialization:
2479-
2480-
When the TLS variable is in the static TLS block, the offset from the
2481-
thread pointer is fixed at runtime. The dynamic loader can calculate
2482-
the offset and place it in the TLS descriptor. All the static TLS
2483-
resolver function needs to do is extract the offset and return it.
2484-
2485-
.. code-block:: asm
2486-
2487-
_dl_tlsdesc_return:
2488-
// x0 contains pointer to struct tlsdesc.
2489-
// tlsdesc.argument.value contains offset of variable from TP
2490-
ldr x0, [x0, #8]
2491-
ret
2492-
2493-
Dynamic TLS Specialization:
2494-
2495-
When the TLS variable is defined in dynamic TLS the address of the TLS
2496-
variable must be calculated by the resolver function using
2497-
``__tls_get_addr``. The resolver function returns the offset from the
2498-
thread pointer by subtracting the address of the thread pointer from
2499-
the address of the TLS variable. In practice an implementation of the
2500-
dynamic TLS resolver contains many platform specific details outside
2501-
of the scope of the ABI. An example of how a dynamic resolver might be
2502-
implemented can be found in the Dynamic Specialization section of
2503-
TLSDESC_.
2504-
2505-
Undefined Weak Symbols
2506-
2507-
An undefined weak symbol has the value 0. As the resolver function
2508-
returns an offset from the Thread Pointer, to get a value of 0 when
2509-
added to the Thread Pointer the resolver function returns a negative
2510-
thread pointer value that cancels to 0 when added to the thread
2511-
pointer.
2512-
2513-
.. code-block:: asm
2514-
2515-
__dl_tlsdesc_undefweak:
2516-
mrs x0, tpidr_el0
2517-
neg x0, x0
2518-
ret
2519-
2520-
Lazy resolution of R_AARCH64_TLSDESC
2521-
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2522-
2523-
The TLSDESC_ paper describes an optional mechanism to resolve TLSDESC
2524-
calls lazily. Lazy resolution for TLSDESC resolver functions is not
2525-
recommended on AArch64. Additional synchronization is required for
2526-
each TLSDESC call, which has a significant affect on performance. The
2527-
description below describes the additional synchronization that is
2528-
needed.
2529-
2530-
Instead of fully resolving the ``R_AARCH64_TLSDESC`` relocation at
2531-
module load time, a lazy resolver function runs on the first TLSDESC
2532-
call. The lazy resolver updates the TLS Descriptor with the actual
2533-
resolver function and the parameter to the actual resolver
2534-
function. In a multi-threaded program when lazy TLS in use, the
2535-
resolver functions must ensure that the write to the parameter in the
2536-
TLS descriptor has completed before reading it.
2537-
2538-
.. code-block:: asm
2539-
2540-
// Code to obtain the offset of var from thread pointer.
2541-
// Loads the address of the resolver function into x1.
2542-
// Places the address of the TLS Descriptor into x0.
2543-
adrp x0, :tlsdesc:var
2544-
ldr x1, [x0, #:tlsdesc_lo12:var]
2545-
add x0, x0, #:tlsdesc_lo12:var]
2546-
.tlsdesccall var
2547-
blr x1 // _dl_desc_return
2548-
2549-
// Resolver function
2550-
_dl_tlsdesc_return:
2551-
// load the parameter from the TLS descriptor. Without
2552-
// synchronization this load can read an old value prior
2553-
// to the lazy resolvers update to the descriptor completing.
2554-
ldr x0, [x0, #8]
2555-
ret
2556-
2557-
The recommended way to ensure synchronization between the lazy
2558-
resolver update of the TLS Descriptor and the actual resolver function
2559-
accessing the TLS Descriptor is:
2560-
2561-
* The TLS lazy resolver function uses a store release when updating
2562-
the address of the resolver function in the TLS Descriptor.
2563-
2564-
* The actual entry function uses a load acquire on the address of the
2565-
resolver function, with a destination register of xzr.
2566-
2567-
Referring to the example above, the code for the resolver function
2568-
becomes:
2569-
2570-
.. code-block:: asm
2571-
2572-
// Resolver function
2573-
_dl_tlsdesc_return:
2574-
// Guaranteed to complete after the lazy resolvers store release
2575-
// of the address in [x0].
2576-
ldar xzr, [x0]
2577-
// Access the parameter.
2578-
ldr x0, [x0, #8]
2579-
ret
2580-
25812480
Libraries
25822481
=========
25832482

0 commit comments

Comments
 (0)