Skip to content

provide a better random "vector" generator in the aobench benchmark #90

@gnzlbg

Description

@gnzlbg
Contributor

@TheIronBorn suggested that the random number generator that we are using for packed vectors in the aobench benchmark is too naive.

It would be great if we could use a better random number generator there for the 128-bit and 256-bit cases and put it behind a feature flag.

Activity

gnzlbg

gnzlbg commented on Sep 4, 2018

@gnzlbg
ContributorAuthor

@TheIronBorn now that the aobench benchmark matches ISPC performance its time to beat it. Would you be willing to look into this or could you give some hints here about which PRNG would make sense to use for that benchmark ?

TheIronBorn

TheIronBorn commented on Sep 4, 2018

@TheIronBorn
Contributor

I have a collection of various SIMD PRNGs here: https://github.com/TheIronBorn/simd_prngs.

What are the requirements for quality/hardware?

gnzlbg

gnzlbg commented on Sep 4, 2018

@gnzlbg
ContributorAuthor

Currently we use LFSR113, so anything that's not worse quality-wise, but has higher throughput. Hardware-wise, we do conditional compilation behind the 256bit feature flag (cfg(feature = "256bit")) to select between 128-bit wide and 256-bit wide implementations at compile-time, so we can provide two different PRNGs, one for each.

I guess we could technically also expose a 512-bit wide flag, but that is not implemented yet.

Also, we can add the simd_prngs crate as an optional dependency - there is no need to re-implement anything here AFAICT.

TheIronBorn

TheIronBorn commented on Sep 4, 2018

@TheIronBorn
Contributor

Do you know if the ISPC LFSR implementation avoids correlation between random streams?

gnzlbg

gnzlbg commented on Sep 4, 2018

@gnzlbg
ContributorAuthor

No idea, I adapted it 1:1 into the aobench example here: https://github.com/rust-lang-nursery/packed_simd/blob/master/examples/aobench/src/random.rs

The ISPC documentation mentions the following:

Note that if the same varying seed value is used for all of the program instances (e.g. RNGState state; seed_rng(&state, 1);), then all of the program instances in the gang will see the same sequence of pseudo-random numbers.

So I would think that no, it does not avoid it.

TheIronBorn

TheIronBorn commented on Sep 4, 2018

@TheIronBorn
Contributor

Sounds like that's a no. So correlation protection would be an improvement we could provide, but that means a narrower selection of PRNGs.

TheIronBorn

TheIronBorn commented on Sep 4, 2018

@TheIronBorn
Contributor
gnzlbg

gnzlbg commented on Sep 4, 2018

@gnzlbg
ContributorAuthor

So IIRC the current aobench implementation uses a different random seed for each vector lane, so I don't know how much of an improvement "correlation protection" would bring. I don't know how would a higher-quality random number generator affect the ambient occlusion benchmark, maybe it would generate better images in corner cases ?

TheIronBorn

TheIronBorn commented on Sep 4, 2018

@TheIronBorn
Contributor

I don't see it being much of a problem as long as we have a large enough statespace respective to stream length.
How many random numbers are used per stream?

TheIronBorn

TheIronBorn commented on Sep 4, 2018

@TheIronBorn
Contributor

Since we can only rely on vector-width in regards to hardware, AES-NI, vector shuffles, rotations, etc won't be useful

gnzlbg

gnzlbg commented on Sep 4, 2018

@gnzlbg
ContributorAuthor

How many random numbers are used per stream?

So in total we generate ~ width*height*nsubsamples*nsubsamples*ntheta*nphi random numbers, and per stream that number divided by the number of streams.

For the benchmark cases, we generate 800 x 600 images, with 2 sub-samples per-pixel per-dir, and 8x8 = 64 rays for different thetas and phi, that's: 800 * 600 * 2 * 2* 8 * 8 = 16 million random numbers. When we use 128-bit wide vectors, we have 4 streams of 4 f32s, so that would be 16 million / 4 ~= 4 million random numbers per stream.

Since we can only rely on vector-width in regards to hardware, AES-NI, vector shuffles, rotations, etc won't be useful

This is an artificial limitation that we currently have, but we can lift it. Currently, using cfg(target_feature = ) at compile-time we can rely on whatever we want. I don't know what's however worth it.

TheIronBorn

TheIronBorn commented on Sep 4, 2018

@TheIronBorn
Contributor

With a stream length of 4 million, the chance of correlation is ~2e-31 for a period of 2^128

TheIronBorn

TheIronBorn commented on Sep 11, 2018

@TheIronBorn
Contributor

Also: for chaotic PRNGs like Jenkin's smallprng (JSF) or PractRand's SFC, the chance of a cycle shorter than 4 million is ~2^-(state_bits - 22)
-http://www.pcg-random.org/posts/random-invertible-mapping-statistics.html

For Jsf32: ~2^-106
Sfc uses a counter to raise the minimum cycle length to 2^32/64

gnzlbg

gnzlbg commented on Nov 11, 2018

@gnzlbg
ContributorAuthor
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

      Development

      No branches or pull requests

        Participants

        @gnzlbg@TheIronBorn

        Issue actions

          provide a better random "vector" generator in the aobench benchmark · Issue #90 · rust-lang/packed_simd