|
| 1 | +- Feature Name: volatile-copy-and-set |
| 2 | +- Start Date: 2019-04-17 |
| 3 | +- RFC PR: [rust-lang/rfcs#0000](https://github.com/rust-lang/rfcs/pull/0000) |
| 4 | +- Rust Issue: [rust-lang/rust#00000](https://github.com/rust-lang/rust/issues/00000) |
| 5 | + |
| 6 | +# Summary |
| 7 | +[summary]: #summary |
| 8 | + |
| 9 | +Stabilize the `volatile_copy_memory`, `volatile_copy_nonoverlapping_memory` |
| 10 | +and `volatile_set_memory` intrinsics as `ptr::copy_volatile`, |
| 11 | +`ptr::copy_nonoverlapping_volatile` and `ptr::write_bytes_volatile`, |
| 12 | +respectively. |
| 13 | + |
| 14 | +# Motivation |
| 15 | +[motivation]: #motivation |
| 16 | + |
| 17 | +`ptr::read_volatile` and `ptr::write_volatile` were stabilized in RFC |
| 18 | +[1467](https://github.com/rust-lang/rfcs/pull/1467). The stated motivation |
| 19 | +at the time was that this allowed "volatile access to memory-mapped I/O |
| 20 | +in stable code", something that was only previously possible using unstable |
| 21 | +intrinsics or "by abusing a bug in the `load` and `store` functions on |
| 22 | +atomic types which gives them volatile semantics |
| 23 | +([rust-lang/rust#30962](https://github.com/rust-lang/rust/pull/30962))." |
| 24 | + |
| 25 | +At the time, the decision was made not to also provide stable |
| 26 | +interfaces for the `volatile_copy_memory` or `volatile_set_memory` |
| 27 | +intrinsics, as they were "not used often" nor provided in C. |
| 28 | +However, when writing low-level code, it is sometimes also useful |
| 29 | +to be able to execute volatile copy and set operations. |
| 30 | + |
| 31 | +For example, when booting x86_64 "application processor" (AP) logical |
| 32 | +processors, code copies a sequence of instructions that for the AP to |
| 33 | +execute into a page in low physical memory, and then sends a startup |
| 34 | +inter-processor interrupt (SIPI) to the AP's local interrupt |
| 35 | +controller: the target interrupt vector number given in the SIPI is |
| 36 | +multiplied by the page size to determine the physical memory address |
| 37 | +where the AP should start executing. So a SIPI sent to vector 7 of |
| 38 | +an AP causes that processor to begin executing instructions at |
| 39 | +physical memory address 0x7000. |
| 40 | + |
| 41 | +That is: |
| 42 | + |
| 43 | +``` |
| 44 | +extern "C" { |
| 45 | + fn copy_proto_page_to_phys_mem(src: usize, phys: u64); |
| 46 | + fn send_init_ipi(cpu: u32); |
| 47 | + fn send_sipi(cpu: u32, vector: u8); |
| 48 | +
|
| 49 | + static INIT_CODE: *const u8; |
| 50 | + static INIT_CODE_LEN: usize; |
| 51 | +} |
| 52 | +
|
| 53 | +// A contrived type for illustration; not actually useful. |
| 54 | +pub struct SIPIPage { |
| 55 | + // Note that `bytes` is not visible outside of `SIPIPage`. |
| 56 | + bytes: [u8; 4096], |
| 57 | +} |
| 58 | +
|
| 59 | +impl SIPIPage { |
| 60 | + // Note that the _only_ operation on the `bytes` field |
| 61 | + // of `SIPIPage` is in `new`. The compiler could, in |
| 62 | + // theory, elide the `copy`. |
| 63 | + pub fn new() -> SIPIPage { |
| 64 | + let mut bytes = [0; 4096]; |
| 65 | + unsafe { |
| 66 | + core::ptr::copy(INIT_CODE, bytes.as_mut_ptr(), INIT_CODE_LEN); |
| 67 | + } |
| 68 | + SIPIPage { bytes } |
| 69 | + } |
| 70 | +} |
| 71 | +
|
| 72 | +fn main() { |
| 73 | + let proto_sipi_page = SIPIPage::new(); |
| 74 | + let some_core = 2; |
| 75 | + unsafe { |
| 76 | + copy_proto_page_to_phys_mem(&proto_sipi_page as *const _ as usize, 0x7000); |
| 77 | + send_init_ipi(some_core); |
| 78 | + send_sipi(some_core, 7); |
| 79 | + } |
| 80 | +} |
| 81 | +``` |
| 82 | + |
| 83 | +Obviously this is an unlikely way of initializing the SIPI page and |
| 84 | +a real kernel would not do it this way. |
| 85 | + |
| 86 | +Hoever, this code snippet is specifically constructed such that the |
| 87 | +sequence of sending IPIs makes no reference to `proto_sipi_page` and |
| 88 | +since the `bytes` field is not visible outside of `new`, this |
| 89 | +illustrates a situation in which the compiler _could_ theoretically |
| 90 | +elect to elide the copy. |
| 91 | + |
| 92 | +If this sequenced used `core::ptr::copy_volatile` then the compiler |
| 93 | +would know that the copy had some externally visible side-effect |
| 94 | +and could not be elided. |
| 95 | + |
| 96 | +When writing a multi-processor operating system kernel for x86_64 in |
| 97 | +Rust, the programmer would copy the instruction text to some address |
| 98 | +and write to the local programmable interrupt controller to send a |
| 99 | +SIPI to start AP cores, but from the compiler's perspective, it might |
| 100 | +appear that the memory holding the AP startup code is never referred |
| 101 | +to again. The compiler could potentially choose to elide the copy |
| 102 | +entirely, and the AP might start executing junk instructions from |
| 103 | +uninitialized memory. In the worst case, this may silently corrupt |
| 104 | +kernel state. |
| 105 | + |
| 106 | +Using a volatile copy can inform the compiler that there is an |
| 107 | +externally observable side-effect forcing it to preserve the copy. |
| 108 | +Similarly, volatile "write_bytes" allows a program to preserve a |
| 109 | +write that has some side-effect (for example, initializing register |
| 110 | +state in a device, or clearing a frame buffer). |
| 111 | + |
| 112 | +# Guide-level explanation |
| 113 | +[guide-level-explanation]: #guide-level-explanation |
| 114 | + |
| 115 | +Given these operations, one would write, for example, the following: |
| 116 | + |
| 117 | +``` |
| 118 | +#[no_mangle] |
| 119 | +pub unsafe extern "C" fn maybe_called_via_ffi(ptr: *mut u8; len: usize) { |
| 120 | + println!("this function has a side-effect, and it is not just the println!"); |
| 121 | + core::ptr::write_bytes_volatile(ptr, SOME_DATA, SOME_DATA_LEN); |
| 122 | +} |
| 123 | +``` |
| 124 | + |
| 125 | +and assert that the `write_bytes_volatile` call is not be elided. |
| 126 | + |
| 127 | +# Reference-level explanation |
| 128 | +[reference-level-explanation]: #reference-level-explanation |
| 129 | + |
| 130 | +`ptr::copy_volatile`, `ptr::copy_nonoverlapping_volatile` and |
| 131 | +`ptr::write_bytes_volatile` will work the same way as `ptr::copy`, |
| 132 | +`ptr_copy_nonoverlapping` and `ptr::write_bytes` respectively, but |
| 133 | +with volatile semantics. As stated in RFC 1467, "the semantics of |
| 134 | +a volatile access are already pretty well defined by the C standard. |
| 135 | + |
| 136 | +We further propose enhancing the documentation for these functions |
| 137 | +to the same level of the existing volatile functions. |
| 138 | + |
| 139 | +Documentation presently refers to LLVM implementation details |
| 140 | +to explain the memory model, etc, here: |
| 141 | +http://llvm.org/docs/LangRef.html#volatile-memory-accesses. |
| 142 | +We propose modifying existing documentation, and writing new |
| 143 | +docuemntation, referring to the memory model in the C standard |
| 144 | +instead. |
| 145 | + |
| 146 | +# Drawbacks |
| 147 | +[drawbacks]: #drawbacks |
| 148 | + |
| 149 | +Volatile semantics are not well defined by the C standard, but |
| 150 | +that is out of the scope of this proposal. |
| 151 | + |
| 152 | +# Rationale and alternatives |
| 153 | +[rationale-and-alternatives]: #rationale-and-alternatives |
| 154 | + |
| 155 | +The intrinsics operations already exist and have the semantics |
| 156 | +required by operating system implementors and others. |
| 157 | + |
| 158 | +There are several alternatives, each with their own drawbacks: |
| 159 | + |
| 160 | +1. Continue using the unstable `core_intrinsics` feature and use the |
| 161 | + existing unstable intrinsics. However, this ties the programmer |
| 162 | + to unstable Rust, which is undesirable in some environments. |
| 163 | +2. Use the existing copy and set interfaces without volatile qualifiers |
| 164 | + and hope that the compiler does not elide the relevant calls. While |
| 165 | + likely workable in practice for most likely scenarios, this could |
| 166 | + lead to surprising behavior if the compiler ever incorporates |
| 167 | + sufficiently advanced analyses that allow it to determine that those |
| 168 | + elisions are possible from its perspective. Hope is not a strategy. |
| 169 | +3. Use the foreign function interface to call separately written code |
| 170 | + in another language that provides the required semantics. This |
| 171 | + is inelegant and complicates the build process. |
| 172 | +4. Hand-code copy and set loops in terms of the existing `write_volatile` |
| 173 | + function. This is inelegant, less likely to perform well, and opens |
| 174 | + up the possibility of bugs. For example, compare: |
| 175 | + |
| 176 | + ``` |
| 177 | + for (i, elem) in some_slice.iter().enumerate() { |
| 178 | + unsafe { |
| 179 | + core::ptr::write_volatile(&mut dest[i], *elem); |
| 180 | + } |
| 181 | + } |
| 182 | + ``` |
| 183 | + to, |
| 184 | + ``` |
| 185 | + unsafe { |
| 186 | + core::ptr::copy_volatile(some_slice.as_ptr(), dest.as_mut_ptr(), some_slice.len()); |
| 187 | + } |
| 188 | + ``` |
| 189 | +
|
| 190 | +Finally, RFC 1467 was in mild error in asserting that no analogue |
| 191 | +for the proposed intrinsics exist in C. `memcpy`, `memmove` and `memset` |
| 192 | +on volatile-qualified data provide this functionality in standard C. |
| 193 | +It is important that this proposal not tie the Rust language to specifics |
| 194 | +of the LLVM implementation, but Rust also does not yet have a well-defined |
| 195 | +memory model. Hence this proposal advocates referring to C's semantics. |
| 196 | +
|
| 197 | +# Prior art |
| 198 | +[prior-art]: #prior-art |
| 199 | +
|
| 200 | +Other languages support volatile style accesses, notably C and C++. |
| 201 | +Interestingly, volatile semantics in those languages are associated with |
| 202 | +individual objects, and `volatile` is a type qualifier, not an operaton |
| 203 | +attribute. In those systems, any number of operations on a |
| 204 | +volatile-qualified datum result in volatile memory semantics; since |
| 205 | +any identifier used by the standard library is defined to be reserved |
| 206 | +for special treatment by the compiler, this means that the standard |
| 207 | +`memcpy`, `memmove` and `memset` operations can all be expected to exhibit |
| 208 | +volatile semantics if applied to volatle-qualified objects. |
| 209 | +
|
| 210 | +# Unresolved questions |
| 211 | +[unresolved]: #unresolved-questions |
| 212 | +
|
| 213 | +None. |
| 214 | +
|
| 215 | +# Future possibilities |
| 216 | +[future-possibilities]: #future-possibilities |
| 217 | +
|
| 218 | +A some point, a well-defined memory model for Rust may be stabilized that |
| 219 | +would widen the design space and permit revisiting these primitives. For |
| 220 | +example, "volatile" currently means that a write cannot be elided, but it |
| 221 | +also imposes strict ordering semantics with respect to other volatile |
| 222 | +accesses. One can envision a sufficiently rich memory model that one |
| 223 | +might be some way to specify an "unelidable" write, but without ordering |
| 224 | +constraints. |
0 commit comments