CUDA: Add fastdiv
to k_bin_bcast*
, giving 1-3% E2E performance (#…
#2127
Job | Run time |
---|---|
3m 9s | |
16m 6s | |
1h 8m 15s | |
3m 5s | |
1m 46s | |
16m 11s | |
14m 24s | |
4m 3s | |
3m 2s | |
5m 43s | |
2m 1s | |
2m 24s | |
2m 0s | |
5m 11s | |
11m 25s | |
5m 48s | |
4m 23s | |
3m 54s | |
7m 1s | |
3m 6s | |
1m 13s | |
19m 23s | |
2m 7s | |
12m 32s | |
4m 35s | |
13m 12s | |
1m 23s | |
6m 54s | |
12m 51s | |
3m 28s | |
20m 52s | |
2m 12s | |
9m 51s | |
3m 56s | |
3m 32s | |
9m 14s | |
4m 44s | |
1m 35s | |
5m 46s | |
6m 32s | |
9m 1s | |
2m 2s | |
5h 39m 52s |