-
Notifications
You must be signed in to change notification settings - Fork 23
Description
API design partially based on discussions with @BartMassey. Revised based on feedback, in particular from @Amanieu.
Proposal
Problem statement
People regularly want to generate random numbers for a variety of use cases. Doing so currently requires using a crate from crates.io
. rand
is the 6th most downloaded crate on crates.io
, and fastrand
is quite popular as well. Many, many people over the years have expressed frustration that random number generation is not available in the standard library (despite the standard library using randomness internally for HashMap
).
There are multiple reasons why we have not historically added this capability. Primarily, there are three capabilities people want, and those capabilities seem to present a "pick any two" constraint:
- Secure random number generation
- Seedable random number generation
- Stability between versions of Rust when seeded
These constraints arise from the possibility of a secure random number generator potentially requiring updates for security reasons. Changing the random number generator would result in different sequences for the same seed.
In addition to that primary constraint, there have also been design difficulties: there are numerous pieces of additional functionality people may want surrounding random number generation, which makes any proposal for it subject to massive scope creep and bikeshed painting. Most notably: users of random numbers may want to represent the state of the RNG explicitly as something they can pass around, or implicitly as global state for simplicity.
This ACP proposes a solution that aims to be as simple as possible, satisfy all the stated constraints on the problem, and allow for future expansion if desired. This ACP proposes a generator that is secure, but not stable across versions of Rust; this allows us to update the secure RNG if any issue or potential improvement arises.
Separately, ACP 394 proposes an RNG that is seedable and guarantees identical seeding across Rust versions, but is not a secure RNG.
Motivating examples or use cases
- Generating randomness for games (e.g. rolling dice).
- Randomly shuffling a list
- Randomly sampling elements from a set
- Random test case generation / fuzz testing
Solution sketch
// In `core::random`:
/// A source of random data
pub trait RandomSource {
/// Fill `buf` with random bytes.
fn gen_bytes(&mut self, buf: &mut [u8]);
}
/// Trait for any type that can be generated via random number generation.
pub trait Random {
fn random(random_source: &mut (impl RandomSource + ?Sized)) -> Self;
}
// In `std::random`:
/// Random source that produces cryptographically secure random numbers at a reasonable speed.
///
/// This generator does not encapsulate any state, and cannot be seeded or replayed. It may vary by target and by Rust version.
#[derive(Default, Copy, Clone, Debug)]
pub struct DefaultRng;
impl RandomSource for DefaultRng { /* ... */ }
/// Generate a random value, uniformly distributed over possible values.
///
/// This uses `DefaultRng`.
pub fn random<R: Random>() -> R {
<R as Random>::random(&mut DefaultRng)
}
The trait Random
will initially be implemented for all iN and uN integer types, isize
/usize
, and bool
, as well as arrays and tuples of such values.
Notably, Random
will not initially be implemented for floating-point values (to initially avoid questions of distribution), strings/paths/etc (to avoid questions of length and character selection), or char
(to avoid questions of what subset of values to generate). We can consider such additions in the future; see the section on future work below.
The random number generator may use OS randomness directly, if available, or use a CSPRNG seeded from OS/hardware randomness.
Alternatives
We could do nothing, and continue to refer people to external crates like rand
and fastrand
.
We could eliminate the convenience functions that implicitly use DefaultRng
, and require all users to pass around a RandomSource directly. However, this would add complexity to many simple use cases.
We could allow gen_bytes
to fill an uninitialized buffer. This would be a more complex interface, however. We could potentially introduce such an interface later, when we've stabilized the types to make it simpler.
We could use Read
to get data from random sources (e.g. trait RandomSource: Read
). This would have the advantage of letting us use read_buf
once we stabilize that. However, this would prevent us from putting RandomSource
in core
.
We could allow gen_bytes
to fail and return a Result
. However, it seems unlikely that most consumers of randomness would be able to handle not having access to it. The higher-level functions will almost certainly want to panic rather than returning a result, for convenience of invocation, and having callers who can deal with an absence of randomness call the low-level interface directly doesn't seem like a sufficiently useful interface to support. We also don't want to introduce a whole family of parallel try_random
/try_random_range
/etc functions. We could always introduce a try_gen_bytes
interface later, if there's a need for one. (In addition, we would not want to use io::Result
here, as that would prevent RandomSource
from living in core
.)
We could rename Random::random
to Random::random_with
or similar, and use random
for a function that calls random_with(DefaultRng)
. This would make it convenient to call, for instance, u32::random()
. This doesn't seem like an important convenience, however, as it's just as easy to call random::<u32>()
(or in most contexts just random()
with type inference).
Future work
We should support randomly shuffling arrays:
impl<T> [T] {
/// Shuffle a slice into a random order, using `DefaultRng`.
pub fn shuffle(&mut self) {
self.shuffle_with(DefaultRng);
}
/// Shuffle a slice into a random order, using the specified random source
pub fn shuffle_with(&mut self, rng: &mut (impl RandomSource + ?Sized));
}
If we need it for performance, we could add functions like gen_u64
or gen_u32
to RandomSource
(with default implementations based on gen_bytes
), to help RNGs that can implement those more efficiently than they can implement gen_bytes
. This is inspired by Hasher
doing something similar. I propose that we initially have just gen_bytes
, and require benchmarks demonstrating a substantial speedup before we consider the additional complexity and interface surface area.
We should support random generation in ranges (e.g. random_range(1..6)
). Providing a correct implementation will steer people away from the most common open-coded implementation (using %
), which introduces bias.
We could support choosing a random item from an iterator. (This can be done more optimally when the iterator implements some additional traits.)
We can implement Random
for many more types, such as NonZero<T>
and Wrapping<T>
.
We should support derive(Random)
for types whose fields all implement Random
. This is relatively straightforward for structs. However, supporting this for enums would require some additional care to handle discriminants: should a random Option<u8>
be 50% None
, or 1/257 None
? The latter is much more difficult to implement correctly.
We could add additional RandomSource
implementations to the standard library, including seeded sources. This would enable purposes such as testing, reproducing the creation of objects whose algorithms require randomness to generate, replaying processes such as fuzz testing, supporting game seeds or replays, and various others. Note that this would not be the same type as DefaultRng
.
We should not allow seeding the DefaultRng
state, as that would affect all users rather than only affecting those that are opting into supporting seeding, and would also preclude designs that don't involve a seed (e.g. obtaining random numbers directly from the OS).
We could consider, in the future, introducing random float generation, or random character generation, or generation of various other types. A careful design could allow using the same mechanisms for this as for random generation in ranges (e.g. a Distribution<T>
trait to sample from). Alternatively, we could leave the full breadth of this to the ecosystem, and just support random ranges directly, as well as some common specific ways to generate floats and characters.
We could provide a trait to fill an existing value rather than creating a new one.
What happens now?
This issue contains an API change proposal (or ACP) and is part of the libs-api team feature lifecycle. Once this issue is filed, the libs-api team will review open proposals as capability becomes available. Current response times do not have a clear estimate, but may be up to several months.
Possible responses
The libs team may respond in various different ways. First, the team will consider the problem (this doesn't require any concrete solution or alternatives to have been proposed):
- We think this problem seems worth solving, and the standard library might be the right place to solve it.
- We think that this probably doesn't belong in the standard library.
Second, if there's a concrete solution:
- We think this specific solution looks roughly right, approved, you or someone else should implement this. (Further review will still happen on the subsequent implementation PR.)
- We're not sure this is the right solution, and the alternatives or other materials don't give us enough information to be sure about that. Here are some questions we have that aren't answered, or rough ideas about alternatives we'd want to see discussed.
Activity
programmerjake commentedon Jun 11, 2024
why can't
Rng
beSync
? if all methods that modify state take&mut self
then beingSync
should be finethe8472 commentedon Jun 11, 2024
So the
Rng
type is only meant for the insecure generation and the free function is meant for secure generation?That means there's no common abstraction for them if one wants to use a seedable RNG in tests and a real one in production.
joshtriplett commentedon Jun 11, 2024
I put that in future work: it's possible we might want to have a common type, or alternatively a common trait and two different types (so that the type system ensures you don't use an insecure RNG where you wanted a secure one). But that is likely to require further design, and I would like to leave that out of the initial design to keep the initial design simple. Past efforts to add an RNG have run into endless design issues and tradeoffs. The smaller the surface area, the fewer of those issues come up.
joshtriplett commentedon Jun 11, 2024
You're entirely right; fixed.
ChrisDenton commentedon Jun 11, 2024
Do we need a more descriptive name than
Rng
? I'm nervous about claiming such a generic name for the insecure version. And if this is to be stable for all time then I think that the rng algorithm becomes an API question.Also shouldn't this have a way to fill a buffer using the system crng? It's less convenient but it's the most basic building block that platforms provide so it may be worth having a cross-platform version exposed.
joshtriplett commentedon Jun 11, 2024
I don't think we should make any guarantees that something is using the "system" RNG, just that it's using some cryptographically secure RNG.
I do agree that "fill this buffer with randomness" is a useful building block, but the goal of this ACP was the simple end-user-targeted interface, since existing libraries already have those building blocks.
joshtriplett commentedon Jun 11, 2024
The intent is that the seeded, insecure version is an API guarantee. The secure version makes no guarantees about algorithm, but the seeded version does.
ChrisDenton commentedon Jun 11, 2024
Sure, picking an algorithm for the insecure version is what I mean. That would need to be part of the proposal as it's part of the API, no?
BurntSushi commentedon Jun 11, 2024
I am in favor of the general idea here.
A domain question for folks using secure RNG: how do you write tests for the behavior of routines that make use of the secure RNG? Do you do it by providing it with a seed? I think the API as proposed would be insufficient for that because there's no way to create a secure RNG with a seed.
I have some initial thoughts:
I kinda lean towards this in the affirmative. That is, that we should have type-level separation. The use cases are different enough that being able to say "this type is always going to do 'secure' rng" is likely quite valuable. I'm not sure we need to commit to this in the initial design, but I think we should leave room for it. For example, perhaps by renaming
Rng
toInsecureRng
or whatever.This is a little concerning to me because it feels like a very strong guarantee that will lock us into something for eternity. But, I confess, I am a bit ignorant here, and perhaps this isn't as big of a deal as I think it is. Do other language ecosystems guarantee stability of seeded insecure RNGs?
the8472 commentedon Jun 11, 2024
With a pluggable RNG implementing a trait. In
rand
Rng is a traitI've asked this in #393 (comment) and here the reply #393 (comment)
joshtriplett commentedon Jun 11, 2024
@ChrisDenton wrote:
You're right, I should spell that out more explicitly. My intention was that we initially use the same RNG implementation for both, but if any security issue ever arises that requires us to change the RNG implementation, we'll change the secure one and leave the insecure one.
joshtriplett commentedon Jun 11, 2024
@BurntSushi wrote:
As @the8472 wrote, likely by using a trait.
I think I'm convinced by this, yeah: let's keep the types distinct, and then we can provide a trait for any code that wants to be generic over the RNG. But I still think this should be future work, to avoid increasing the initial surface area we need to get consensus on.
To the extent other languages provide stability guarantees, yes, some other languages/libraries do provide seeded RNGs and guarantee that the same seed produces the same sequence of values. For instance, Python provides an insecure seedable RNG in the
random
module, and separately provides secure random number generation in thesecrets
module which does not support seeding.pitaj commentedon Jun 11, 2024
Should the
random
function take a range, or is that left to future additions? Generating a uniform random value within a range is important especially for secure rng because otherwise people might use%
(introducing bias).67 remaining items