Skip to content

Commit 8c5dc1b

Browse files
author
Dan Cross
committed
Stabilize volatile copy and set functions
Stabilize the `volatile_copy_memory`, `volatile_copy_nonoverlapping_memory` and `volatile_set_memory` intrinsics as `ptr::copy_volatile`, `ptr::copy_nonoverlapping_volatile` and `ptr::write_bytes_volatile`, respectively. Signed-off-by: Dan Cross <[email protected]>
1 parent a9cbbb2 commit 8c5dc1b

File tree

1 file changed

+224
-0
lines changed

1 file changed

+224
-0
lines changed

text/0000-volatile-copy-and-set.md

Lines changed: 224 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,224 @@
1+
- Feature Name: volatile-copy-and-set
2+
- Start Date: 2019-04-17
3+
- RFC PR: [rust-lang/rfcs#0000](https://github.com/rust-lang/rfcs/pull/0000)
4+
- Rust Issue: [rust-lang/rust#00000](https://github.com/rust-lang/rust/issues/00000)
5+
6+
# Summary
7+
[summary]: #summary
8+
9+
Stabilize the `volatile_copy_memory`, `volatile_copy_nonoverlapping_memory`
10+
and `volatile_set_memory` intrinsics as `ptr::copy_volatile`,
11+
`ptr::copy_nonoverlapping_volatile` and `ptr::write_bytes_volatile`,
12+
respectively.
13+
14+
# Motivation
15+
[motivation]: #motivation
16+
17+
`ptr::read_volatile` and `ptr::write_volatile` were stabilized in RFC
18+
[1467](https://github.com/rust-lang/rfcs/pull/1467). The stated motivation
19+
at the time was that this allowed "volatile access to memory-mapped I/O
20+
in stable code", something that was only previously possible using unstable
21+
intrinsics or "by abusing a bug in the `load` and `store` functions on
22+
atomic types which gives them volatile semantics
23+
([rust-lang/rust#30962](https://github.com/rust-lang/rust/pull/30962))."
24+
25+
At the time, the decision was made not to also provide stable
26+
interfaces for the `volatile_copy_memory` or `volatile_set_memory`
27+
intrinsics, as they were "not used often" nor provided in C.
28+
However, when writing low-level code, it is sometimes also useful
29+
to be able to execute volatile copy and set operations.
30+
31+
For example, when booting x86_64 "application processor" (AP) logical
32+
processors, code copies a sequence of instructions that for the AP to
33+
execute into a page in low physical memory, and then sends a startup
34+
inter-processor interrupt (SIPI) to the AP's local interrupt
35+
controller: the target interrupt vector number given in the SIPI is
36+
multiplied by the page size to determine the physical memory address
37+
where the AP should start executing. So a SIPI sent to vector 7 of
38+
an AP causes that processor to begin executing instructions at
39+
physical memory address 0x7000.
40+
41+
That is:
42+
43+
```
44+
extern "C" {
45+
fn copy_proto_page_to_phys_mem(src: usize, phys: u64);
46+
fn send_init_ipi(cpu: u32);
47+
fn send_sipi(cpu: u32, vector: u8);
48+
49+
static INIT_CODE: *const u8;
50+
static INIT_CODE_LEN: usize;
51+
}
52+
53+
// A contrived type for illustration; not actually useful.
54+
pub struct SIPIPage {
55+
// Note that `bytes` is not visible outside of `SIPIPage`.
56+
bytes: [u8; 4096],
57+
}
58+
59+
impl SIPIPage {
60+
// Note that the _only_ operation on the `bytes` field
61+
// of `SIPIPage` is in `new`. The compiler could, in
62+
// theory, elide the `copy`.
63+
pub fn new() -> SIPIPage {
64+
let mut bytes = [0; 4096];
65+
unsafe {
66+
core::ptr::copy(INIT_CODE, bytes.as_mut_ptr(), INIT_CODE_LEN);
67+
}
68+
SIPIPage { bytes }
69+
}
70+
}
71+
72+
fn main() {
73+
let proto_sipi_page = SIPIPage::new();
74+
let some_core = 2;
75+
unsafe {
76+
copy_proto_page_to_phys_mem(&proto_sipi_page as *const _ as usize, 0x7000);
77+
send_init_ipi(some_core);
78+
send_sipi(some_core, 7);
79+
}
80+
}
81+
```
82+
83+
Obviously this is an unlikely way of initializing the SIPI page and
84+
a real kernel would not do it this way.
85+
86+
Hoever, this code snippet is specifically constructed such that the
87+
sequence of sending IPIs makes no reference to `proto_sipi_page` and
88+
since the `bytes` field is not visible outside of `new`, this
89+
illustrates a situation in which the compiler _could_ theoretically
90+
elect to elide the copy.
91+
92+
If this sequenced used `core::ptr::copy_volatile` then the compiler
93+
would know that the copy had some externally visible side-effect
94+
and could not be elided.
95+
96+
When writing a multi-processor operating system kernel for x86_64 in
97+
Rust, the programmer would copy the instruction text to some address
98+
and write to the local programmable interrupt controller to send a
99+
SIPI to start AP cores, but from the compiler's perspective, it might
100+
appear that the memory holding the AP startup code is never referred
101+
to again. The compiler could potentially choose to elide the copy
102+
entirely, and the AP might start executing junk instructions from
103+
uninitialized memory. In the worst case, this may silently corrupt
104+
kernel state.
105+
106+
Using a volatile copy can inform the compiler that there is an
107+
externally observable side-effect forcing it to preserve the copy.
108+
Similarly, volatile "write_bytes" allows a program to preserve a
109+
write that has some side-effect (for example, initializing register
110+
state in a device, or clearing a frame buffer).
111+
112+
# Guide-level explanation
113+
[guide-level-explanation]: #guide-level-explanation
114+
115+
Given these operations, one would write, for example, the following:
116+
117+
```
118+
#[no_mangle]
119+
pub unsafe extern "C" fn maybe_called_via_ffi(ptr: *mut u8; len: usize) {
120+
println!("this function has a side-effect, and it is not just the println!");
121+
core::ptr::write_bytes_volatile(ptr, SOME_DATA, SOME_DATA_LEN);
122+
}
123+
```
124+
125+
and assert that the `write_bytes_volatile` call is not be elided.
126+
127+
# Reference-level explanation
128+
[reference-level-explanation]: #reference-level-explanation
129+
130+
`ptr::copy_volatile`, `ptr::copy_nonoverlapping_volatile` and
131+
`ptr::write_bytes_volatile` will work the same way as `ptr::copy`,
132+
`ptr_copy_nonoverlapping` and `ptr::write_bytes` respectively, but
133+
with volatile semantics. As stated in RFC 1467, "the semantics of
134+
a volatile access are already pretty well defined by the C standard.
135+
136+
We further propose enhancing the documentation for these functions
137+
to the same level of the existing volatile functions.
138+
139+
Documentation presently refers to LLVM implementation details
140+
to explain the memory model, etc, here:
141+
http://llvm.org/docs/LangRef.html#volatile-memory-accesses.
142+
We propose modifying existing documentation, and writing new
143+
docuemntation, referring to the memory model in the C standard
144+
instead.
145+
146+
# Drawbacks
147+
[drawbacks]: #drawbacks
148+
149+
Volatile semantics are not well defined by the C standard, but
150+
that is out of the scope of this proposal.
151+
152+
# Rationale and alternatives
153+
[rationale-and-alternatives]: #rationale-and-alternatives
154+
155+
The intrinsics operations already exist and have the semantics
156+
required by operating system implementors and others.
157+
158+
There are several alternatives, each with their own drawbacks:
159+
160+
1. Continue using the unstable `core_intrinsics` feature and use the
161+
existing unstable intrinsics. However, this ties the programmer
162+
to unstable Rust, which is undesirable in some environments.
163+
2. Use the existing copy and set interfaces without volatile qualifiers
164+
and hope that the compiler does not elide the relevant calls. While
165+
likely workable in practice for most likely scenarios, this could
166+
lead to surprising behavior if the compiler ever incorporates
167+
sufficiently advanced analyses that allow it to determine that those
168+
elisions are possible from its perspective. Hope is not a strategy.
169+
3. Use the foreign function interface to call separately written code
170+
in another language that provides the required semantics. This
171+
is inelegant and complicates the build process.
172+
4. Hand-code copy and set loops in terms of the existing `write_volatile`
173+
function. This is inelegant, less likely to perform well, and opens
174+
up the possibility of bugs. For example, compare:
175+
176+
```
177+
for (i, elem) in some_slice.iter().enumerate() {
178+
unsafe {
179+
core::ptr::write_volatile(&mut dest[i], *elem);
180+
}
181+
}
182+
```
183+
to,
184+
```
185+
unsafe {
186+
core::ptr::copy_volatile(some_slice.as_ptr(), dest.as_mut_ptr(), some_slice.len());
187+
}
188+
```
189+
190+
Finally, RFC 1467 was in mild error in asserting that no analogue
191+
for the proposed intrinsics exist in C. `memcpy`, `memmove` and `memset`
192+
on volatile-qualified data provide this functionality in standard C.
193+
It is important that this proposal not tie the Rust language to specifics
194+
of the LLVM implementation, but Rust also does not yet have a well-defined
195+
memory model. Hence this proposal advocates referring to C's semantics.
196+
197+
# Prior art
198+
[prior-art]: #prior-art
199+
200+
Other languages support volatile style accesses, notably C and C++.
201+
Interestingly, volatile semantics in those languages are associated with
202+
individual objects, and `volatile` is a type qualifier, not an operaton
203+
attribute. In those systems, any number of operations on a
204+
volatile-qualified datum result in volatile memory semantics; since
205+
any identifier used by the standard library is defined to be reserved
206+
for special treatment by the compiler, this means that the standard
207+
`memcpy`, `memmove` and `memset` operations can all be expected to exhibit
208+
volatile semantics if applied to volatle-qualified objects.
209+
210+
# Unresolved questions
211+
[unresolved]: #unresolved-questions
212+
213+
None.
214+
215+
# Future possibilities
216+
[future-possibilities]: #future-possibilities
217+
218+
A some point, a well-defined memory model for Rust may be stabilized that
219+
would widen the design space and permit revisiting these primitives. For
220+
example, "volatile" currently means that a write cannot be elided, but it
221+
also imposes strict ordering semantics with respect to other volatile
222+
accesses. One can envision a sufficiently rich memory model that one
223+
might be some way to specify an "unelidable" write, but without ordering
224+
constraints.

0 commit comments

Comments
 (0)