-
Notifications
You must be signed in to change notification settings - Fork 1.6k
rustc_scalable_vector
#3838
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
rustc_scalable_vector
#3838
Conversation
text/3838-repr-scalable.md
Outdated
|
||
Similarly, a `scalable` repr is introduced to define a scalable vector type. | ||
`scalable` accepts an integer to determine the minimum number of elements the | ||
vector contains. For example: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wouldn't it be that the amount of elements in the vector needs to be a whole multiple of the scalable value? Otherwise scalable(4)
would allow 5 elements, which would need to be represented as <vscale x 1 x f32>
in LLVM rather than <vscale x 4 x f32>
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I asked the same question in the previous RFC, unfortunately that never got resolved and the new RFC perpetuates the same confusion. Also see this subthread. @davidtwco It would be good to ensure that all the valuable feedback the previous RFC got is not just deleted and forgotten for the 2nd revision.
For the question at hand:
Yeah, this description of the argument as "minimum" is extremely misleading. This field apparently must be set to the "hardware scaling unit" (I don't know the proper term for this, but it's 128bit for ARM) divided by the size of the field. Everything else would at best be a giant waste, if it even works. For instance, IIUC, putting scalable(2)
here means only half the register will ever get used (no matter the register size). I wonder why we even let the code pick that field at all; it seems to me the compiler should just compute it. Is there ever a situation where a 4-byte element type should not use scalable(4)
on ARM?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know that I follow - the N
in repr(scalable(N))
is the N
in <vscale x N x f32>
. There's a correct value for that N
for any given valid type that the user of repr(scalable)
needs to write. We'd need to get that correct when defining these types in stdarch
but because this is an internal-facing thing, that's fine, we can make sure they're correct.
I can only really speak to SVE, but as I understand it, N
is basically "how many of this type can you fit in the minimum vector length?". For a f32
on SVE, that's four, because the minimum vector length is 128 bits and a f32
is 32 bits. vscale
is the hardware-dependent part that we don't pick, on some processors, it'll be one and you just have the 4x f32
, on some it'll be two and you have 8x f32
, etc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Which part of my comment is confusing? I even asked a concrete question:
Is there ever a situation where a 4-byte element type should not use scalable(4) on ARM?
It seems the answer is "no", which then means there is apparently no motivation for having this degree of freedom in the first place?
I think you confirmed my theory from above:
- Calling this the minimum is very confusing. If I set the minimum to 2 I would expect that an array of size 8 still uses the most efficient representation for that size, but that's not what happens here.
- There is only one desired value for this "minimum size", and it can be computed by the compiler.
I don't understand why we should be exposing this very confusing parameter, even if it's just exposed to libcore. That still needs good documentation and testing for all possible values etc. The RFC surely does not do a good job documenting it. (And no, relating it to some obscure LLVM concept doesn't count. This needs a self-contained explanation.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Which part of my comment is confusing?
GitHub only showed me bjorn's comment when I replied, just seeing yours now, apologies.
I asked the #3268 (comment) in the previous RFC, unfortunately that never got resolved and the new RFC perpetuates the same confusion. Also see #3268 (comment). @davidtwco It would be good to ensure that all the valuable feedback the previous RFC got is not just deleted and forgotten for the 2nd revision.
Apologies, I tried to capture anything that seemed like it was still relevant but may have missed some threads.
Yeah, this description of the argument as "minimum" is extremely misleading. This field apparently must be set to the "hardware scaling unit" (I don't know the proper term for this, but it's 128bit for ARM) divided by the size of the field. Everything else would at best be a giant waste, if it even works. For instance, IIUC, putting scalable(2) here means only half the register will ever get used (no matter the register size). I wonder why we even let the code pick that field at all; it seems to me the compiler should just compute it. Is there ever a situation where a 4-byte element type should not use scalable(4) on ARM?
It seems the answer is "no", which then means there is apparently no motivation for having this degree of freedom in the first place?
I think you confirmed my theory from above:
Calling this the minimum is very confusing. If I set the minimum to 2 I would expect that an array of size 8 still uses the most efficient representation for that size, but that's not what happens here.
There is only one desired value for this "minimum size", and it can be computed by the compiler.
I don't understand why we should be exposing this very confusing parameter, even if it's just exposed to libcore.
I phrased it as "minimum" considering the number of elements that might be present when one includes vscale
in that calculation. e.g. for a svint32_t
, the minimum number of elements is four (when you have a 128-bit register and vscale=1
), but it could be greater than four: it could be eight (when you have a 256-bit register and vscale=2
) or it could be sixteen (when you have a 512-bit register and vscale=4
), etc. This wording is clearly confusing though, so I'll try to make it clearer.
In the "Reference-level explanation", I do mention why the compiler doesn't compute N
(though that should probably be in the "Rationale and alternatives" section):
repr(scalable)
expects the number ofelements
to be provided rather than calculating it. This avoids needing to teach the compiler how to calculate the requiredelement
count, particularly as some of these scalable types can have different element counts. For instance, the predicates used in SVE have different element counts depending on the types they are a predicate for.
I'll expand this in the RFC also, but to elaborate: For SVE, many of the intrinsics take a predicate vector alongside the data vectors, and the predicate vector decides which lanes are on or off for the operation (i.e. which elements in the vector are operated on: all of them, none of them, even-numbered indices, whatever you want). Predicate vectors are in different and smaller registers than the data, these have a bit for every byte in the vector register (for a minimum size of 16 bits). e.g. <vscale x 16 x i8>
vector has a <vscale x 16 x i1>
predicate.
For the non-predicate vectors, you're right that we're basically only going to want to define them with the minimum element count that uses the whole minimum register size. However, for predicate vectors, we want to define types where the N
matches the number of elements in the non-predicate vector, i.e. a <vscale x 4 x i1>
to match a <vscale x 4 x f32>
, <vscale x 8 x i1>
to match <vscale x 8 x u16>
, or <vscale x 16 x i1>
to match <vscale x 16 x u8>
. Some of the sign-extension intrinsics use types with non-128-bit multiples inside their definitions too, though I'd need to refresh my memory on the specifics for those.
That makes makes it trickier than just $min_register_size / $type_size
, which would always be <vscale x 16 x i1>
. We could do something more complicated that gives the compiler all the information it needs to be able to calculate this in every circumstance, but that's a bunch of extra complexity for types that are going to be defined once in the standard library and then well-tested.
That still needs good documentation and testing for all possible values etc. The RFC surely does not do a good job documenting it. (And no, relating it to some obscure LLVM concept doesn't count. This needs a self-contained explanation.)
I don't think it needs testing for all possible values. Just those that we end up defining to use with the intrinsics, and those will all get plenty of testing.
It's a bit of a balancing act how much detail to provide about how the scalable vectors work or what rustc's contract with the codegen backend is, and how much to omit as too much detail. Especially when somewhat familiar with these types/extensions, it's easy to miss where we've gone too far one way or the other, so I appreciate the feedback.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added a bunch more wording along the lines of what I've written in my previous comment in 5f5f7d2, let me know if that clears things up.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
GitHub only showed me bjorn's comment when I replied, just seeing yours now, apologies.
Ah, that's fair. :)
Thanks for clarifying. I guess with predicates the point is that they do indeed not use the entire width of the corresponding register but that's fine -- we only want one bit per element, the hardware has one bit per byte, and deals with the mismatch?
Added a bunch more wording along the lines of what I've written in my previous comment in 5f5f7d2, let me know if that clears things up.
You still introduce this number as the "minimum", which is still confusing. It's not just the minimum, it is the "quantum" or whatever you want to call it -- the vector will always have a length that is an integer multiple of this number. (And moreover, it'll be the same integer multiple for all types.) It is formally correct that this number is also the minimum size, but by leading with that you send the reader down the entirely wrong track.
I don't think it needs testing for all possible values. Just those that we end up defining to use with the intrinsics, and those will all get plenty of testing.
I can almost guarantee that some library contributor will one day pick a wrong value for this and they'll have a really bad time if the result is some odd garbage behavior. Even internal interfaces need to be designed with care and tested in an "open source hive mind" project like Rust. The only fundamental difference to external interfaces is that internal interfaces are perma-unstable so we can more easily adjust them if mistakes have been made. But in terms of documentation, the absolutely need as much care as stable features, and they should get proper testing too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I chatted with @RalfJung about this in Zulip since my last update, I've changed the description of these types so that hopefully it's a lot clearer.
I still propose manually specifying N
but have a lot more justification for why I think this is the right choice - still happy to revisit that decision if it isn't a convincing argument - 388170a.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do not think this is the correct approach nor implementation. It is invested in reusing an existing framework in repr(simd)
which has many undesirable features.
text/3838-repr-scalable.md
Outdated
# Reference-level explanation | ||
[reference-level-explanation]: #reference-level-explanation | ||
|
||
Types annotated with the `#[repr(simd)]` attribute contains either an array | ||
field or multiple fields to indicate the intended size of the SIMD vector that | ||
the type represents. | ||
|
||
Similarly, a `scalable(N)` representation is introduced to define a scalable | ||
vector type. `scalable(N)` accepts an integer to determine the minimum number of | ||
elements the vector contains. For example: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We use repr(C)
because otherwise some hypothetical CStruct repr type would have to be parameterized by every single arrangement of types, in an ordered fashion.
But repr(simd)
does not pair with either repr(packed)
or repr(align(N))
coherently, and neither would repr(scalable)
.
I do not think this should be handled by a new repr
... modeling it on a still-unstable repr(simd)
... which, as we have discovered over time with existing reprs, has a million footguns for how they work and get compiled. A lang item seems more to-the-point. I intend to make this true for repr(simd)
as well anyways.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've changed this to not be a repr
hint, instead a separate attribute, so there aren't any issues with overlap with other representation hints - f514a3e.
text/3838-repr-scalable.md
Outdated
|
||
```rust | ||
#[repr(simd, scalable(4))] | ||
pub struct svfloat32_t { _ty: [f32], } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One of the aspects of using these random repr
attributes is it is misleading. This looks like it is unsized to begin with, but these are unconst Sized
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've changed the attribute to no longer be part of the repr
attribute, and also changed the type of the type marker to no longer be an !Sized
type, as these types will end up being non-const Sized
.
text/3838-repr-scalable.md
Outdated
Therefore, there is an additional restriction that these types cannot be used in | ||
the argument or return types of functions unless those functions are annotated | ||
with the relevant target feature. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would this not mean that these types cannot be used in function pointers, as we have no knowledge of the target features of a function pointer?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the same reason it would also clash with dyn-compatible traits.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is similar to the existing checks we have for __m256
only being allowed in extern "C"
functions when AVX is available. They can be used in function ptrs just fine, because someone somewhere must have made the unsafe promise that it is okay to call a #[target_feature]
function since we actually do have the target feature.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
text/3838-repr-scalable.md
Outdated
Therefore, there is an additional restriction that these types cannot be used in | ||
the argument or return types of functions unless those functions are annotated | ||
with the relevant target feature. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the same reason it would also clash with dyn-compatible traits.
text/3838-repr-scalable.md
Outdated
When a scalable vector is instantiated into a generic function during | ||
monomorphisation, or a trait method is being implemented for a scalable vector, | ||
then the relevant target feature will be added to the function. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Adding target features to functions based on the types being mentioned in them was also proposed for a different reason in #3525 but that ran into all sorts of complications (I'm not sure off-hand which of these are relevant here).
Unlike that other RFC, what's proposed here isn't even visible in the function signature and not tied to some "if a value of this type exists, the target feature must be enabled, so it's safe" reasoning, which causes new problems. So far the unsafe
enforcement and manual safety reasoning for target features assumes it's clear which target features a function has, long before monomorphization. For example, consider this program:
// crate A
fn foo<T: Sized>() -> usize {
size_of::<T>()
}
// crate B
use crate_a::foo;
fn main() {
println!("vector size: {}", foo::<svfloat32_t>());
}
If the instantiation of foo
gets the target feature, it needs to be unsafe to call. How can the compiler ensure this? And how is it surfaced to the programmer so they can write the correct #[cfg]
or is_aarch64_feature_detected!(...)
condition?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Adding target features depending on the concrete instantiation is also fundamentally incompatible with how target features currently work in rustc. Of course the architecture can be changed, but it's a quite non-trivial change, and it has non-trivial ramifications: the MIR inliner crucially relies (for soundness) on knowing the target features of a function without knowing the concrete instance of the generics (inlining a function with more features into a function with fewer features is unsound), so functions that are generic in their target feature set risk serious perf regressions due the MIR inliner becoming less effective. Making every generic function potentially have target-feature thus seems to be like a non-starter both in terms of soundness (that's @hanna-kruppe's issue) and because it completely kills the MIR inliner for generic code (which is the code where we most care about the MIR inliner).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When a scalable vector is instantiated into a generic function during
monomorphisation
It's also somewhat unclear what this means -- does instantiating T
with fn() -> svfloat32_t
also add the target feature? It pretty much has to, for soundness.
But then what about something like this
trait Tr { type Assoc; fn make() -> Self::Assoc }
fn foo<T: Tr>() {
size_of_val(&T::make());
}
Now if I call foo
with a type T
such that T::Assoc
is svfloat32_t
, I also need the target feature:
impl Tr for i32 { type Assoc = svfloat32_t; ...}
foo::<i32>(); // oopsie, this must be unsafe
The target feature is needed any time svfloat32_t
is in any way reachable via the given trait instance. This is not something the type system can even express, which also means I don't think it can possibly be sound.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've removed these parts of the RFC, instead adding limitations around implementing traits on these types and instantiating generics with these types.
These limitations aren't good and I want to remove them, but you're all absolutely correct that there's lots of very tricky challenges with a type that requires a target_feature
to be enabled for it to be used, so these restrictions hopefully make the proposal at least nominally feasible, even if I have no intention to try and see it accepted with these restrictions. See 16bb4eb and 9466d26 for these changes.
At the moment, the goal of this RFC is just to provide a rough indication of the direction we'd like to take with scalable vectors to justify an experimental implementation so I can try different things to solve this problem - I've got a couple ideas that I think might work.
text/3838-repr-scalable.md
Outdated
For non-predicate scalable vectors, it will be typical that `N` will be | ||
`$minimum_register_length / $type_size` (e.g. `4` for `f32` or `8` for `f16` | ||
with a minimum 128-bit register length). In this circumstance, `N` could be | ||
trivially calculated by the compiler. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that this even this part doesn't really work for RVV, which has the added dimension of LMUL: there's not a single $minimum_register_length
but several useful sizes of N for every element type. See the table at https://llvm.org/docs//RISCV/RISCVVectorExtension.html#mapping-to-llvm-ir-types
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Expanded on lmul
and RVV in 388170a.
text/3838-repr-scalable.md
Outdated
compiler always be able to calculate `N` isn't justified given the permanently | ||
unstable nature of the `repr(scalable(N))` attribute and the scalable vector | ||
types defined in `std::arch` are likely to be few in number, automatically | ||
generated and well-tested. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I want there to be a std::simd
way to use scalable vectors at some point, so we would need a way for the compiler to either generate the correct N
or be able to correctly handle any N
(within some architecture-independent bounds, e.g. N
is always 1, 2, 4, 8, or 16) by translating to some other N
that the current architecture can handle.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd like that too but it's not something I'm trying to enable in this RFC, I just want to lay the groundwork for vendor intrinsics for scalable vectors, how to make these work with Portable SIMD is better as a follow-up
text/3838-repr-scalable.md
Outdated
|
||
However, as C also has restriction and scalable vectors are nevertheless used in | ||
production code, it is unlikely there will be much demand for those restrictions | ||
to be relaxed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
struct
s containing a scalable SIMD type may actually end up being in large demand due to project-portable-simd:
One idea IIRC we've talked about is that we would end up with something kinda like a SIMD version of ArrayVec
where we'd have a struct
that contains both an active length (like Vec::len
, not to be confused with vscale
which is more like Vec::capacity
) and a scalable SIMD type, so we could do operations where the tail of the SIMD type is not valid but user code wouldn't need unsafe
since those extra elements are ignored by using auto-generated masks or setvl
.
this would need struct
s like so:
pub struct SimdArrayVec<T> {
len: usize,
value: ScalableSimd<T>,
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think there's limited interest in doing this on the LLVM side, so it's a limitation we're stuck with. We will support arrays of scalable vectors and can support structs with only scalable vector members, but that's all LLVM supports at the moment.
Thanks all for the feedback, I got a little bit busy after posting this, so I'll respond as soon as I can. |
Co-authored-by: Jamie Cunliffe <[email protected]>
5f5f7d2
to
9630780
Compare
Thanks everyone for the feedback, I've pushed a bunch of changes: Notable changes:
Small changes:
Hopefully these address many of the concerns raised so far. I don't expect it to address all of them, there are still big unresolved questions to be resolved. My intent with this is just to indicate the rough direction we plan to take so that I can justify an experimental implementation to try and find solutions to some of the issues raised. |
element of the same type (but only if that struct is annotated with | ||
`#[rustc_scalable_vector]`) | ||
|
||
- Cannot be used in arrays |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you state they can be stored in arrays as an exception, but here you contradict that.
- Cannot be instantiated into generic functions (see | ||
[*Target features*][target-features]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this also exclude e.g. size_of::<svfloat32_t>()
?
- Cannot have trait implementations (see [*Target features*][target-features]) | ||
|
||
- Including blanket implementations (i.e. `impl<T> Foo for T` is not a valid | ||
candidate for a scalable vector) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess this is intended to prevent a scalable vector value and types from bleeding into generic code, without relying entirely on post-mono errors? If so, note that they can also be smuggled in via associated types (slight modification of Ralf's earlier example):
trait Tr { type Assoc; }
fn foo<T: Tr>() {
size_of::<T::Assoc>();
}
- It may be possible to support scalable vector types without the target feature | ||
being enabled by using an indirect ABI similarly to fixed length vectors. | ||
|
||
- This would enable these restrictions to be lifted and for scalable vector | ||
types to be the same as fixed length vectors with respect to interactions | ||
with the `target_feature` attribute. | ||
|
||
- As with fixed length vectors, it would still be desirable for them to | ||
avoid needing to be passed indirectly between annotated functions, but | ||
this could be addressed in a follow-up. | ||
|
||
- Experimentation is required to determine if this is feasible. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With an indirect ABI it should be possible to make use of scalable vector types without the sve
feature. You will need to avoid emitting any vscale
in the LLVM IR and use variable-length alloca/memcpy for any data movement involving such types. You will need to call a helper function to actually read VL to figure out how big the vectors are.
This could be done in rustc, but it's also possible that this support could be implemented directly in LLVM.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
imo the ABI for determining vscale
should be like:
static VSCALE: AtomicUsize = AtomicUsize::new(0);
#[cold]
fn vscale_slow() -> usize {
let mut vscale = 1; // default if no features are available
#[cfg(any(target_arch = "aarch64", target_arch = "arm64ec"))]
if is_aarch64_feature_detected!("sve") {
vscale = svcntb() as usize / 16;
}
// add other arches here
VSCALE.store(vscale, Ordering::Relaxed);
NonZeroUsize::new(vscale).unwrap()
}
#[inline(always)]
pub fn vscale() -> NonZeroUsize {
NonZeroUsize::new(VSCALE.load(Ordering::Relaxed)).unwrap_or_else(vscale_slow)
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually I think you can get away with just a function containing svcntb
if you assume that the presence of an SVE type implies that the sve
feature is enabled since that is required to create an instance of such a type in the first place.
Unfortunately this doesn't work for MaybeUninit<svint32_t>
since you then have to know the size without the guarantee that an sve
-feature function has previously been executed. Is MaybeUninit
expected to support scalable vector types?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Then again, this doesn't need to be super optimized since this is for the slow path: we will most likely lint against any use of a scalable vector type in a function without the appropriate features, and consider it a user mistake to do such a thing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually I think you can get away with just a function containing
svcntb
if you assume that the presence of an SVE type implies that thesve
feature is enabled since that is required to create an instance of such a type in the first place.
no you can't -- you can make a sve type even without the sve feature with something like (avoiding zeroed
, read_unaligned
, etc. since they would require putting sve types in generics):
fn zeroed() -> svuint8_t {
#[repr(C, align(16)] // I'm assuming 16 is enough
struct Zeros([u8; 2048]);
const ZEROS: &'static Zeros = &Zeros([0; 2048]); // long enough for max vl
unsafe { *(ZEROS as *const _ as *const svuint8_t) }
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So far Rust-the-language has shied away from anything like that. Most code in practice requires that target features aren't taken away once they were detected, but:
- It's fine for the available target features to increase monotonically over the program's lifetime, e.g., because the code that does the detection and makes the results available is itself written in Rust. You could possibly paper over this by conceptualizing an initial "unknown" state, but then you need some way to disallow code that would depend on SVE availability to run before the box with the Schrödinger's cat is opened. This also has to be respected by code motion by the compiler!
- Unsafe code can very much take features away again (as long as all code involve has to be carefully written to handle this correctly). This came up in Stabilize target_feature_11 rust#134090 for example, with the motivating use case of the Linux kernel's selective use of hardfloat and SIMD extensions for particular functions where it's a big win, while avoiding the save/restore cost in the vast majority of userspace<->kernel interactions. This also seems potentially relevant for SVE and RVV.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, maybe there should be a feature_fence
intrinsic that tells the compiler to not code-motion over it as well as tell std
to reset the feature-detection and vscale caches.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If it was that simple, LLVM could also just support vscale
changing at runtime. Many people (including me) have been bashing their head against this wall over the years. Two major problems, in brief:
- Pretty much no code anywhere, both in compilers and in the code being compiled, is prepared to deal with the meaning of a type shifting during program execution. It's not just that
size_of::<T>()
changes and can't be cached. Nothing that in any way depends on the size can flow across the "fence". That notably also includes every value of the types that got reinterpreted. Even apparently trivial code likelet x = make_scalable_vector(); change_vsize(); let y = x;
is broken. I don't even know how to write a precise language spec for that. - Nothing in the world of optimizing compilers is really prepared to treat basic operations like "allocate space for a local variable" or loads/stores/arithmetic to have an implicit dependency on some global state that must not be reordered across some sort of fence (especially if the "vscale change" can be hidden inside a function call, rather than being embedded into the static structure of the program representation).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not proposing LLVM's idea of vscale
changes, just that Rust's fallback value changes between the SVE-disabled value (probably 1
, it could also be None
so size_of
or any other size-dependent operations panics and/or aborts) and LLVM's idea of what vscale
is. The intrinsic would be unsafe
with a safety condition that there aren't any scalable vectors (on this thread? needs a thread-local vscale
cache)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This doesn't really solve any of the problems because they also apply to surface Rust, to any formalization of Rust's semantics, and to rustc's MIR.
Supercedes #3268.
Introduces a new attribute,
#[rustc_scalable_vector(N)]
, which can be used to define new scalable vector types, such as those in Arm's Scalable Vector Extension (SVE), or RISC-V's Vector Extension (RVV).rustc_scalable_vector(N)
is internal compiler infrastructure that will be used only in the standard library to introduce scalable vector types which can then be stabilised. Only the infrastructure to define these types are introduced in this RFC, not the types or intrinsics that use it.This RFC has a dependency on #3729 as scalable vectors are necessarily non-const
Sized
.There are some large unresolved questions in this RFC, the current purpose of the RFC is to indicate a intended direction for this work to justify an experimental implementation to help resolve those unresolved questions.
Rendered