You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
@TheIronBorn now that the aobench benchmark matches ISPC performance its time to beat it. Would you be willing to look into this or could you give some hints here about which PRNG would make sense to use for that benchmark ?
Currently we use LFSR113, so anything that's not worse quality-wise, but has higher throughput. Hardware-wise, we do conditional compilation behind the 256bit feature flag (cfg(feature = "256bit")) to select between 128-bit wide and 256-bit wide implementations at compile-time, so we can provide two different PRNGs, one for each.
I guess we could technically also expose a 512-bit wide flag, but that is not implemented yet.
Also, we can add the simd_prngs crate as an optional dependency - there is no need to re-implement anything here AFAICT.
Note that if the same varying seed value is used for all of the program instances (e.g. RNGState state; seed_rng(&state, 1);), then all of the program instances in the gang will see the same sequence of pseudo-random numbers.
So IIRC the current aobench implementation uses a different random seed for each vector lane, so I don't know how much of an improvement "correlation protection" would bring. I don't know how would a higher-quality random number generator affect the ambient occlusion benchmark, maybe it would generate better images in corner cases ?
I don't see it being much of a problem as long as we have a large enough statespace respective to stream length.
How many random numbers are used per stream?
So in total we generate ~ width*height*nsubsamples*nsubsamples*ntheta*nphi random numbers, and per stream that number divided by the number of streams.
For the benchmark cases, we generate 800 x 600 images, with 2 sub-samples per-pixel per-dir, and 8x8 = 64 rays for different thetas and phi, that's: 800 * 600 * 2 * 2* 8 * 8 = 16 million random numbers. When we use 128-bit wide vectors, we have 4 streams of 4 f32s, so that would be 16 million / 4 ~= 4 million random numbers per stream.
Since we can only rely on vector-width in regards to hardware, AES-NI, vector shuffles, rotations, etc won't be useful
This is an artificial limitation that we currently have, but we can lift it. Currently, using cfg(target_feature = ) at compile-time we can rely on whatever we want. I don't know what's however worth it.
Activity
gnzlbg commentedon Sep 4, 2018
@TheIronBorn now that the aobench benchmark matches ISPC performance its time to beat it. Would you be willing to look into this or could you give some hints here about which PRNG would make sense to use for that benchmark ?
TheIronBorn commentedon Sep 4, 2018
I have a collection of various SIMD PRNGs here: https://github.com/TheIronBorn/simd_prngs.
What are the requirements for quality/hardware?
gnzlbg commentedon Sep 4, 2018
Currently we use LFSR113, so anything that's not worse quality-wise, but has higher throughput. Hardware-wise, we do conditional compilation behind the
256bit
feature flag (cfg(feature = "256bit")
) to select between 128-bit wide and 256-bit wide implementations at compile-time, so we can provide two different PRNGs, one for each.I guess we could technically also expose a 512-bit wide flag, but that is not implemented yet.
Also, we can add the
simd_prngs
crate as an optional dependency - there is no need to re-implement anything here AFAICT.TheIronBorn commentedon Sep 4, 2018
Do you know if the ISPC LFSR implementation avoids correlation between random streams?
gnzlbg commentedon Sep 4, 2018
No idea, I adapted it 1:1 into the aobench example here: https://github.com/rust-lang-nursery/packed_simd/blob/master/examples/aobench/src/random.rs
The ISPC documentation mentions the following:
So I would think that no, it does not avoid it.
TheIronBorn commentedon Sep 4, 2018
Sounds like that's a no. So correlation protection would be an improvement we could provide, but that means a narrower selection of PRNGs.
TheIronBorn commentedon Sep 4, 2018
LFSR113 has 6 BigCrush failures: http://www.iro.umontreal.ca/~lecuyer/myftp/papers/testu01.pdf
gnzlbg commentedon Sep 4, 2018
So IIRC the current aobench implementation uses a different random seed for each vector lane, so I don't know how much of an improvement "correlation protection" would bring. I don't know how would a higher-quality random number generator affect the ambient occlusion benchmark, maybe it would generate better images in corner cases ?
TheIronBorn commentedon Sep 4, 2018
I don't see it being much of a problem as long as we have a large enough statespace respective to stream length.
How many random numbers are used per stream?
TheIronBorn commentedon Sep 4, 2018
Since we can only rely on vector-width in regards to hardware, AES-NI, vector shuffles, rotations, etc won't be useful
gnzlbg commentedon Sep 4, 2018
So in total we generate ~
width*height*nsubsamples*nsubsamples*ntheta*nphi
random numbers, and per stream that number divided by the number of streams.For the benchmark cases, we generate 800 x 600 images, with 2 sub-samples per-pixel per-dir, and 8x8 = 64 rays for different thetas and phi, that's:
800 * 600 * 2 * 2* 8 * 8
= 16 million random numbers. When we use 128-bit wide vectors, we have 4 streams of 4 f32s, so that would be 16 million / 4 ~= 4 million random numbers per stream.This is an artificial limitation that we currently have, but we can lift it. Currently, using
cfg(target_feature = )
at compile-time we can rely on whatever we want. I don't know what's however worth it.TheIronBorn commentedon Sep 4, 2018
With a stream length of 4 million, the chance of correlation is ~2e-31 for a period of 2^128
TheIronBorn commentedon Sep 11, 2018
Also: for chaotic PRNGs like Jenkin's
smallprng
(JSF) or PractRand's SFC, the chance of a cycle shorter than 4 million is ~2^-(state_bits - 22)-http://www.pcg-random.org/posts/random-invertible-mapping-statistics.html
For Jsf32: ~2^-106
Sfc uses a counter to raise the minimum cycle length to 2^32/64
gnzlbg commentedon Nov 11, 2018
There is also: https://github.com/termoshtt/rust-sfmt