This repository was archived by the owner on Nov 27, 2022. It is now read-only.
This repository was archived by the owner on Nov 27, 2022. It is now read-only.
User code vs benchmark monstrosities #9
Open
Description
I was playing with frag iteration and I made 3 versions:
- V1 is a simple loop
time: [407.12 ns 409.32 ns 411.81 ns]
V1
self.0.run(|mut data: ViewMut<Data>| {
(&mut data).iter().for_each(|data| {
data.0 *= 2.0;
})
});
- V2 helps the compiler auto-vectorize
time: [165.88 ns 166.45 ns 167.05 ns]
V2
self.0.run(|mut data: ViewMut<Data>| {
(&mut data)
.iter()
.into_chunk_exact(4)
.unwrap_or_else(|_| panic!())
.for_each(|chunk| {
chunk[0].0 *= 2.0;
chunk[1].0 *= 2.0;
chunk[2].0 *= 2.0;
chunk[3].0 *= 2.0;
})
});
time: [127.37 ns 129.08 ns 131.26 ns]
V3
use core::arch::x86_64::*;
unsafe {
let delta = _mm_set1_ps(2.0);
self.0.run(|mut data: ViewMut<Data>| {
(&mut data)
.iter()
.into_chunk_exact(4)
.unwrap_or_else(|_| panic!())
.for_each(|chunk| {
let simd_chunk = _mm_loadu_ps(chunk as *const _ as *const _);
_mm_mul_ps(simd_chunk, delta);
_mm_storeu_ps(chunk as *mut _ as *mut _, simd_chunk);
})
});
}
V2 and V3 will likely not be used by many people (if any). And the time is ridiculously small either way.
My question is: should the benchmarks only use code that users would use, try to optimize as much as possible or somewhere in-between?
Metadata
Metadata
Assignees
Labels
No labels