Skip to content

rustc_scalable_vector #3838

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 24 commits into
base: master
Choose a base branch
from
Open

Conversation

davidtwco
Copy link
Member

@davidtwco davidtwco commented Jul 14, 2025

Supercedes #3268.

Introduces a new attribute, #[rustc_scalable_vector(N)], which can be used to define new scalable vector types, such as those in Arm's Scalable Vector Extension (SVE), or RISC-V's Vector Extension (RVV).

rustc_scalable_vector(N) is internal compiler infrastructure that will be used only in the standard library to introduce scalable vector types which can then be stabilised. Only the infrastructure to define these types are introduced in this RFC, not the types or intrinsics that use it.

This RFC has a dependency on #3729 as scalable vectors are necessarily non-const Sized.

There are some large unresolved questions in this RFC, the current purpose of the RFC is to indicate a intended direction for this work to justify an experimental implementation to help resolve those unresolved questions.

Rendered


Similarly, a `scalable` repr is introduced to define a scalable vector type.
`scalable` accepts an integer to determine the minimum number of elements the
vector contains. For example:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't it be that the amount of elements in the vector needs to be a whole multiple of the scalable value? Otherwise scalable(4) would allow 5 elements, which would need to be represented as <vscale x 1 x f32> in LLVM rather than <vscale x 4 x f32>.

Copy link
Member

@RalfJung RalfJung Jul 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I asked the same question in the previous RFC, unfortunately that never got resolved and the new RFC perpetuates the same confusion. Also see this subthread. @davidtwco It would be good to ensure that all the valuable feedback the previous RFC got is not just deleted and forgotten for the 2nd revision.

For the question at hand:
Yeah, this description of the argument as "minimum" is extremely misleading. This field apparently must be set to the "hardware scaling unit" (I don't know the proper term for this, but it's 128bit for ARM) divided by the size of the field. Everything else would at best be a giant waste, if it even works. For instance, IIUC, putting scalable(2) here means only half the register will ever get used (no matter the register size). I wonder why we even let the code pick that field at all; it seems to me the compiler should just compute it. Is there ever a situation where a 4-byte element type should not use scalable(4) on ARM?

Copy link
Member Author

@davidtwco davidtwco Jul 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know that I follow - the N in repr(scalable(N)) is the N in <vscale x N x f32>. There's a correct value for that N for any given valid type that the user of repr(scalable) needs to write. We'd need to get that correct when defining these types in stdarch but because this is an internal-facing thing, that's fine, we can make sure they're correct.

I can only really speak to SVE, but as I understand it, N is basically "how many of this type can you fit in the minimum vector length?". For a f32 on SVE, that's four, because the minimum vector length is 128 bits and a f32 is 32 bits. vscale is the hardware-dependent part that we don't pick, on some processors, it'll be one and you just have the 4x f32, on some it'll be two and you have 8x f32, etc.

Copy link
Member

@RalfJung RalfJung Jul 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Which part of my comment is confusing? I even asked a concrete question:

Is there ever a situation where a 4-byte element type should not use scalable(4) on ARM?

It seems the answer is "no", which then means there is apparently no motivation for having this degree of freedom in the first place?

I think you confirmed my theory from above:

  • Calling this the minimum is very confusing. If I set the minimum to 2 I would expect that an array of size 8 still uses the most efficient representation for that size, but that's not what happens here.
  • There is only one desired value for this "minimum size", and it can be computed by the compiler.

I don't understand why we should be exposing this very confusing parameter, even if it's just exposed to libcore. That still needs good documentation and testing for all possible values etc. The RFC surely does not do a good job documenting it. (And no, relating it to some obscure LLVM concept doesn't count. This needs a self-contained explanation.)

Copy link
Member Author

@davidtwco davidtwco Jul 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Which part of my comment is confusing?

GitHub only showed me bjorn's comment when I replied, just seeing yours now, apologies.

I asked the #3268 (comment) in the previous RFC, unfortunately that never got resolved and the new RFC perpetuates the same confusion. Also see #3268 (comment). @davidtwco It would be good to ensure that all the valuable feedback the previous RFC got is not just deleted and forgotten for the 2nd revision.

Apologies, I tried to capture anything that seemed like it was still relevant but may have missed some threads.

Yeah, this description of the argument as "minimum" is extremely misleading. This field apparently must be set to the "hardware scaling unit" (I don't know the proper term for this, but it's 128bit for ARM) divided by the size of the field. Everything else would at best be a giant waste, if it even works. For instance, IIUC, putting scalable(2) here means only half the register will ever get used (no matter the register size). I wonder why we even let the code pick that field at all; it seems to me the compiler should just compute it. Is there ever a situation where a 4-byte element type should not use scalable(4) on ARM?

It seems the answer is "no", which then means there is apparently no motivation for having this degree of freedom in the first place?

I think you confirmed my theory from above:

Calling this the minimum is very confusing. If I set the minimum to 2 I would expect that an array of size 8 still uses the most efficient representation for that size, but that's not what happens here.
There is only one desired value for this "minimum size", and it can be computed by the compiler.
I don't understand why we should be exposing this very confusing parameter, even if it's just exposed to libcore.

I phrased it as "minimum" considering the number of elements that might be present when one includes vscale in that calculation. e.g. for a svint32_t, the minimum number of elements is four (when you have a 128-bit register and vscale=1), but it could be greater than four: it could be eight (when you have a 256-bit register and vscale=2) or it could be sixteen (when you have a 512-bit register and vscale=4), etc. This wording is clearly confusing though, so I'll try to make it clearer.

In the "Reference-level explanation", I do mention why the compiler doesn't compute N (though that should probably be in the "Rationale and alternatives" section):

repr(scalable) expects the number of elements to be provided rather than calculating it. This avoids needing to teach the compiler how to calculate the required element count, particularly as some of these scalable types can have different element counts. For instance, the predicates used in SVE have different element counts depending on the types they are a predicate for.

I'll expand this in the RFC also, but to elaborate: For SVE, many of the intrinsics take a predicate vector alongside the data vectors, and the predicate vector decides which lanes are on or off for the operation (i.e. which elements in the vector are operated on: all of them, none of them, even-numbered indices, whatever you want). Predicate vectors are in different and smaller registers than the data, these have a bit for every byte in the vector register (for a minimum size of 16 bits). e.g. <vscale x 16 x i8> vector has a <vscale x 16 x i1> predicate.

For the non-predicate vectors, you're right that we're basically only going to want to define them with the minimum element count that uses the whole minimum register size. However, for predicate vectors, we want to define types where the N matches the number of elements in the non-predicate vector, i.e. a <vscale x 4 x i1> to match a <vscale x 4 x f32>, <vscale x 8 x i1> to match <vscale x 8 x u16>, or <vscale x 16 x i1> to match <vscale x 16 x u8>. Some of the sign-extension intrinsics use types with non-128-bit multiples inside their definitions too, though I'd need to refresh my memory on the specifics for those.

That makes makes it trickier than just $min_register_size / $type_size, which would always be <vscale x 16 x i1>. We could do something more complicated that gives the compiler all the information it needs to be able to calculate this in every circumstance, but that's a bunch of extra complexity for types that are going to be defined once in the standard library and then well-tested.

That still needs good documentation and testing for all possible values etc. The RFC surely does not do a good job documenting it. (And no, relating it to some obscure LLVM concept doesn't count. This needs a self-contained explanation.)

I don't think it needs testing for all possible values. Just those that we end up defining to use with the intrinsics, and those will all get plenty of testing.


It's a bit of a balancing act how much detail to provide about how the scalable vectors work or what rustc's contract with the codegen backend is, and how much to omit as too much detail. Especially when somewhat familiar with these types/extensions, it's easy to miss where we've gone too far one way or the other, so I appreciate the feedback.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a bunch more wording along the lines of what I've written in my previous comment in 5f5f7d2, let me know if that clears things up.

Copy link
Member

@RalfJung RalfJung Jul 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

GitHub only showed me bjorn's comment when I replied, just seeing yours now, apologies.

Ah, that's fair. :)

Thanks for clarifying. I guess with predicates the point is that they do indeed not use the entire width of the corresponding register but that's fine -- we only want one bit per element, the hardware has one bit per byte, and deals with the mismatch?

Added a bunch more wording along the lines of what I've written in my previous comment in 5f5f7d2, let me know if that clears things up.

You still introduce this number as the "minimum", which is still confusing. It's not just the minimum, it is the "quantum" or whatever you want to call it -- the vector will always have a length that is an integer multiple of this number. (And moreover, it'll be the same integer multiple for all types.) It is formally correct that this number is also the minimum size, but by leading with that you send the reader down the entirely wrong track.

I don't think it needs testing for all possible values. Just those that we end up defining to use with the intrinsics, and those will all get plenty of testing.

I can almost guarantee that some library contributor will one day pick a wrong value for this and they'll have a really bad time if the result is some odd garbage behavior. Even internal interfaces need to be designed with care and tested in an "open source hive mind" project like Rust. The only fundamental difference to external interfaces is that internal interfaces are perma-unstable so we can more easily adjust them if mistakes have been made. But in terms of documentation, the absolutely need as much care as stable features, and they should get proper testing too.

Copy link
Member Author

@davidtwco davidtwco Aug 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I chatted with @RalfJung about this in Zulip since my last update, I've changed the description of these types so that hopefully it's a lot clearer.

I still propose manually specifying N but have a lot more justification for why I think this is the right choice - still happy to revisit that decision if it isn't a convincing argument - 388170a.

Copy link
Member

@workingjubilee workingjubilee left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not think this is the correct approach nor implementation. It is invested in reusing an existing framework in repr(simd) which has many undesirable features.

Comment on lines 128 to 137
# Reference-level explanation
[reference-level-explanation]: #reference-level-explanation

Types annotated with the `#[repr(simd)]` attribute contains either an array
field or multiple fields to indicate the intended size of the SIMD vector that
the type represents.

Similarly, a `scalable(N)` representation is introduced to define a scalable
vector type. `scalable(N)` accepts an integer to determine the minimum number of
elements the vector contains. For example:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We use repr(C) because otherwise some hypothetical CStruct repr type would have to be parameterized by every single arrangement of types, in an ordered fashion.

But repr(simd) does not pair with either repr(packed) or repr(align(N)) coherently, and neither would repr(scalable).

I do not think this should be handled by a new repr... modeling it on a still-unstable repr(simd)... which, as we have discovered over time with existing reprs, has a million footguns for how they work and get compiled. A lang item seems more to-the-point. I intend to make this true for repr(simd) as well anyways.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've changed this to not be a repr hint, instead a separate attribute, so there aren't any issues with overlap with other representation hints - f514a3e.


```rust
#[repr(simd, scalable(4))]
pub struct svfloat32_t { _ty: [f32], }
Copy link
Member

@workingjubilee workingjubilee Jul 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One of the aspects of using these random repr attributes is it is misleading. This looks like it is unsized to begin with, but these are unconst Sized.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've changed the attribute to no longer be part of the repr attribute, and also changed the type of the type marker to no longer be an !Sized type, as these types will end up being non-const Sized.

Comment on lines 216 to 218
Therefore, there is an additional restriction that these types cannot be used in
the argument or return types of functions unless those functions are annotated
with the relevant target feature.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would this not mean that these types cannot be used in function pointers, as we have no knowledge of the target features of a function pointer?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the same reason it would also clash with dyn-compatible traits.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is similar to the existing checks we have for __m256 only being allowed in extern "C" functions when AVX is available. They can be used in function ptrs just fine, because someone somewhere must have made the unsafe promise that it is okay to call a #[target_feature] function since we actually do have the target feature.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Clarified the function pointer point in c6d836d and dyn-compatibility in 530378c.

Comment on lines 216 to 218
Therefore, there is an additional restriction that these types cannot be used in
the argument or return types of functions unless those functions are annotated
with the relevant target feature.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the same reason it would also clash with dyn-compatible traits.

Comment on lines 230 to 232
When a scalable vector is instantiated into a generic function during
monomorphisation, or a trait method is being implemented for a scalable vector,
then the relevant target feature will be added to the function.
Copy link

@hanna-kruppe hanna-kruppe Jul 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding target features to functions based on the types being mentioned in them was also proposed for a different reason in #3525 but that ran into all sorts of complications (I'm not sure off-hand which of these are relevant here).

Unlike that other RFC, what's proposed here isn't even visible in the function signature and not tied to some "if a value of this type exists, the target feature must be enabled, so it's safe" reasoning, which causes new problems. So far the unsafe enforcement and manual safety reasoning for target features assumes it's clear which target features a function has, long before monomorphization. For example, consider this program:

// crate A
fn foo<T: Sized>() -> usize {
    size_of::<T>()
}

// crate B
use crate_a::foo;
fn main() {
    println!("vector size: {}", foo::<svfloat32_t>());
}

If the instantiation of foo gets the target feature, it needs to be unsafe to call. How can the compiler ensure this? And how is it surfaced to the programmer so they can write the correct #[cfg] or is_aarch64_feature_detected!(...) condition?

Copy link
Member

@RalfJung RalfJung Jul 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding target features depending on the concrete instantiation is also fundamentally incompatible with how target features currently work in rustc. Of course the architecture can be changed, but it's a quite non-trivial change, and it has non-trivial ramifications: the MIR inliner crucially relies (for soundness) on knowing the target features of a function without knowing the concrete instance of the generics (inlining a function with more features into a function with fewer features is unsound), so functions that are generic in their target feature set risk serious perf regressions due the MIR inliner becoming less effective. Making every generic function potentially have target-feature thus seems to be like a non-starter both in terms of soundness (that's @hanna-kruppe's issue) and because it completely kills the MIR inliner for generic code (which is the code where we most care about the MIR inliner).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When a scalable vector is instantiated into a generic function during
monomorphisation

It's also somewhat unclear what this means -- does instantiating T with fn() -> svfloat32_t also add the target feature? It pretty much has to, for soundness.

But then what about something like this

trait Tr { type Assoc; fn make() -> Self::Assoc }

fn foo<T: Tr>() {
  size_of_val(&T::make());
}

Now if I call foo with a type T such that T::Assoc is svfloat32_t, I also need the target feature:

impl Tr for i32 { type Assoc = svfloat32_t; ...}

foo::<i32>(); // oopsie, this must be unsafe

The target feature is needed any time svfloat32_t is in any way reachable via the given trait instance. This is not something the type system can even express, which also means I don't think it can possibly be sound.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've removed these parts of the RFC, instead adding limitations around implementing traits on these types and instantiating generics with these types.

These limitations aren't good and I want to remove them, but you're all absolutely correct that there's lots of very tricky challenges with a type that requires a target_feature to be enabled for it to be used, so these restrictions hopefully make the proposal at least nominally feasible, even if I have no intention to try and see it accepted with these restrictions. See 16bb4eb and 9466d26 for these changes.

At the moment, the goal of this RFC is just to provide a rough indication of the direction we'd like to take with scalable vectors to justify an experimental implementation so I can try different things to solve this problem - I've got a couple ideas that I think might work.

Comment on lines 304 to 307
For non-predicate scalable vectors, it will be typical that `N` will be
`$minimum_register_length / $type_size` (e.g. `4` for `f32` or `8` for `f16`
with a minimum 128-bit register length). In this circumstance, `N` could be
trivially calculated by the compiler.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that this even this part doesn't really work for RVV, which has the added dimension of LMUL: there's not a single $minimum_register_length but several useful sizes of N for every element type. See the table at https://llvm.org/docs//RISCV/RISCVVectorExtension.html#mapping-to-llvm-ir-types

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Expanded on lmul and RVV in 388170a.

compiler always be able to calculate `N` isn't justified given the permanently
unstable nature of the `repr(scalable(N))` attribute and the scalable vector
types defined in `std::arch` are likely to be few in number, automatically
generated and well-tested.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I want there to be a std::simd way to use scalable vectors at some point, so we would need a way for the compiler to either generate the correct N or be able to correctly handle any N (within some architecture-independent bounds, e.g. N is always 1, 2, 4, 8, or 16) by translating to some other N that the current architecture can handle.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd like that too but it's not something I'm trying to enable in this RFC, I just want to lay the groundwork for vendor intrinsics for scalable vectors, how to make these work with Portable SIMD is better as a follow-up


However, as C also has restriction and scalable vectors are nevertheless used in
production code, it is unlikely there will be much demand for those restrictions
to be relaxed.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

structs containing a scalable SIMD type may actually end up being in large demand due to project-portable-simd:
One idea IIRC we've talked about is that we would end up with something kinda like a SIMD version of ArrayVec where we'd have a struct that contains both an active length (like Vec::len, not to be confused with vscale which is more like Vec::capacity) and a scalable SIMD type, so we could do operations where the tail of the SIMD type is not valid but user code wouldn't need unsafe since those extra elements are ignored by using auto-generated masks or setvl.

this would need structs like so:

pub struct SimdArrayVec<T> {
    len: usize,
    value: ScalableSimd<T>,
}

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there's limited interest in doing this on the LLVM side, so it's a limitation we're stuck with. We will support arrays of scalable vectors and can support structs with only scalable vector members, but that's all LLVM supports at the moment.

@davidtwco
Copy link
Member Author

Thanks all for the feedback, I got a little bit busy after posting this, so I'll respond as soon as I can.

@davidtwco davidtwco changed the title repr(scalable) rustc_scalable_vector Aug 7, 2025
@davidtwco
Copy link
Member Author

davidtwco commented Aug 7, 2025

Thanks everyone for the feedback, I've pushed a bunch of changes:

Notable changes:

  • f514a3e renames this from repr(scalable) to rustc_scalable_vector
    • I wasn't aware of objections to repr(simd) and I don't have an especially strong preference as to what we call this, the framing that it was an extension of repr(simd) was largely syntactic and quite shallow, so I've changed it to rustc_scalable_vector
  • 83d0b93 clarifies which types are accepted with the type marker
  • dcb29b4 avoids partially introducing scalable vectors in the guide-level explanation
  • 388170a is a significant change to how we explain scalable vectors and justify the manual specification of N
    • As I said in rustc_scalable_vector #3838 (comment), I'm still open to changing this, but I don't think it's trivial and think manually specifying N is likely to be okay, happy to hear if my justification of that point isn't convincing
    • Crucially, it adds ASCII diagrams, and every RFC is better with ASCII diagrams
  • 16bb4eb adds a bunch of restrictions to these types to avoid issues with target_feature
    • I don't like these restrictions and I want to find a way to remove them, so I've made this change so that the RFC isn't just entirely infeasible, but I've got some ideas that I'd like to explore for how to fix the issues here
    • 9466d26 adds another possibility for how we might be able to avoid these restrictions
  • ca4aa06 mentions that these types ought to be considered FFI-safe
  • 5011dc6 adds lots of prior art, reviewing the issues related to repr(simd) and target_feature and their relevance to scalable vectors
  • 6186245 relaxes some of the restrictions about these types' use in compound types following changes in LLVM and describes how tuples of vectors will be defined
  • c6d836d clarifies how the ABI requirements of these types will impact function pointers
  • 530378c clarifies how the ABI requirements of these types will impact dyn compatibility

Small changes:

  • 2a7705a adds asserts into the example of using scalable vectors
  • 8709d8c adds some missing words
  • b96e2b7 clarifies some of the ambiguous parts of the code example and what the intrinsics are doing
  • 4123a3a fixes a typo
  • e5b546c removes trailing whitespaces
  • 494da8a moves the warning about prctl into a subsection
  • 968ae3e removes an incorrect detail from our description of the "no action taken" alternative
  • 9630780 removes a reference to repr(simd) that is no longer relevant

Hopefully these address many of the concerns raised so far. I don't expect it to address all of them, there are still big unresolved questions to be resolved. My intent with this is just to indicate the rough direction we plan to take so that I can justify an experimental implementation to try and find solutions to some of the issues raised.

element of the same type (but only if that struct is annotated with
`#[rustc_scalable_vector]`)

- Cannot be used in arrays
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you state they can be stored in arrays as an exception, but here you contradict that.

Comment on lines +274 to +275
- Cannot be instantiated into generic functions (see
[*Target features*][target-features])

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this also exclude e.g. size_of::<svfloat32_t>()?

Comment on lines +277 to +280
- Cannot have trait implementations (see [*Target features*][target-features])

- Including blanket implementations (i.e. `impl<T> Foo for T` is not a valid
candidate for a scalable vector)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess this is intended to prevent a scalable vector value and types from bleeding into generic code, without relying entirely on post-mono errors? If so, note that they can also be smuggled in via associated types (slight modification of Ralf's earlier example):

trait Tr { type Assoc; }

fn foo<T: Tr>() {
  size_of::<T::Assoc>();
}

Comment on lines +1158 to +1169
- It may be possible to support scalable vector types without the target feature
being enabled by using an indirect ABI similarly to fixed length vectors.

- This would enable these restrictions to be lifted and for scalable vector
types to be the same as fixed length vectors with respect to interactions
with the `target_feature` attribute.

- As with fixed length vectors, it would still be desirable for them to
avoid needing to be passed indirectly between annotated functions, but
this could be addressed in a follow-up.

- Experimentation is required to determine if this is feasible.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With an indirect ABI it should be possible to make use of scalable vector types without the sve feature. You will need to avoid emitting any vscale in the LLVM IR and use variable-length alloca/memcpy for any data movement involving such types. You will need to call a helper function to actually read VL to figure out how big the vectors are.

This could be done in rustc, but it's also possible that this support could be implemented directly in LLVM.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

imo the ABI for determining vscale should be like:

static VSCALE: AtomicUsize = AtomicUsize::new(0);

#[cold]
fn vscale_slow() -> usize {
    let mut vscale = 1; // default if no features are available
    #[cfg(any(target_arch = "aarch64", target_arch = "arm64ec"))]
    if is_aarch64_feature_detected!("sve") {
        vscale = svcntb() as usize / 16;
    }
    // add other arches here
    VSCALE.store(vscale, Ordering::Relaxed);
    NonZeroUsize::new(vscale).unwrap()
}

#[inline(always)]
pub fn vscale() -> NonZeroUsize {
    NonZeroUsize::new(VSCALE.load(Ordering::Relaxed)).unwrap_or_else(vscale_slow)
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually I think you can get away with just a function containing svcntb if you assume that the presence of an SVE type implies that the sve feature is enabled since that is required to create an instance of such a type in the first place.

Unfortunately this doesn't work for MaybeUninit<svint32_t> since you then have to know the size without the guarantee that an sve-feature function has previously been executed. Is MaybeUninit expected to support scalable vector types?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then again, this doesn't need to be super optimized since this is for the slow path: we will most likely lint against any use of a scalable vector type in a function without the appropriate features, and consider it a user mistake to do such a thing.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually I think you can get away with just a function containing svcntb if you assume that the presence of an SVE type implies that the sve feature is enabled since that is required to create an instance of such a type in the first place.

no you can't -- you can make a sve type even without the sve feature with something like (avoiding zeroed, read_unaligned, etc. since they would require putting sve types in generics):

fn zeroed() -> svuint8_t {
    #[repr(C, align(16)] // I'm assuming 16 is enough
    struct Zeros([u8; 2048]);
    const ZEROS: &'static Zeros = &Zeros([0; 2048]); // long enough for max vl
    unsafe { *(ZEROS as *const _ as *const svuint8_t) }
}

Copy link

@hanna-kruppe hanna-kruppe Aug 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So far Rust-the-language has shied away from anything like that. Most code in practice requires that target features aren't taken away once they were detected, but:

  • It's fine for the available target features to increase monotonically over the program's lifetime, e.g., because the code that does the detection and makes the results available is itself written in Rust. You could possibly paper over this by conceptualizing an initial "unknown" state, but then you need some way to disallow code that would depend on SVE availability to run before the box with the Schrödinger's cat is opened. This also has to be respected by code motion by the compiler!
  • Unsafe code can very much take features away again (as long as all code involve has to be carefully written to handle this correctly). This came up in Stabilize target_feature_11 rust#134090 for example, with the motivating use case of the Linux kernel's selective use of hardfloat and SIMD extensions for particular functions where it's a big win, while avoiding the save/restore cost in the vast majority of userspace<->kernel interactions. This also seems potentially relevant for SVE and RVV.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, maybe there should be a feature_fence intrinsic that tells the compiler to not code-motion over it as well as tell std to reset the feature-detection and vscale caches.

Copy link

@hanna-kruppe hanna-kruppe Aug 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it was that simple, LLVM could also just support vscale changing at runtime. Many people (including me) have been bashing their head against this wall over the years. Two major problems, in brief:

  1. Pretty much no code anywhere, both in compilers and in the code being compiled, is prepared to deal with the meaning of a type shifting during program execution. It's not just that size_of::<T>() changes and can't be cached. Nothing that in any way depends on the size can flow across the "fence". That notably also includes every value of the types that got reinterpreted. Even apparently trivial code like let x = make_scalable_vector(); change_vsize(); let y = x; is broken. I don't even know how to write a precise language spec for that.
  2. Nothing in the world of optimizing compilers is really prepared to treat basic operations like "allocate space for a local variable" or loads/stores/arithmetic to have an implicit dependency on some global state that must not be reordered across some sort of fence (especially if the "vscale change" can be hidden inside a function call, rather than being embedded into the static structure of the program representation).

Copy link
Member

@programmerjake programmerjake Aug 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not proposing LLVM's idea of vscale changes, just that Rust's fallback value changes between the SVE-disabled value (probably 1, it could also be None so size_of or any other size-dependent operations panics and/or aborts) and LLVM's idea of what vscale is. The intrinsic would be unsafe with a safety condition that there aren't any scalable vectors (on this thread? needs a thread-local vscale cache)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't really solve any of the problems because they also apply to surface Rust, to any formalization of Rust's semantics, and to rustc's MIR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants