Skip to content
This repository was archived by the owner on Jan 24, 2024. It is now read-only.

Commit 7856530

Browse files
committed
fix conflicts.
2 parents f7383ee + 275c28e commit 7856530

File tree

8 files changed

+139
-118
lines changed

8 files changed

+139
-118
lines changed

AUTHORS.md

Lines changed: 27 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,27 @@
1+
| Github account | name |
2+
|---|---|
3+
| chenjiaoAngel | Jiao Chen |
4+
| cyj1986 | Yujuan Cheng |
5+
| feifei14119 | Fei Wang |
6+
| jackyh | Chengjie He |
7+
| Jayoprell | Xiaocheng Luo |
8+
| jjsbear | Jingsong Ji |
9+
| LittleMaer | Yi Zhuang |
10+
| mengkai94 | Kai Meng |
11+
| micytw | Michael Wu |
12+
| pangge | Chaowen Cui |
13+
| perchbird | Xiaokun Yu |
14+
| PeterJkPeng | Junyi Peng |
15+
| qq332982511 | Junjie Liu |
16+
| Shixiaowei02 | Xiaowei Shi |
17+
| sogalin | Soga Lin |
18+
| throneclay | Shuai Zhang |
19+
| vin-huang | Vin Huang |
20+
| wgy0804 | Guoya Wang |
21+
| xklnono | Kailu Xu |
22+
| xyoungli | Xiaoyang Li |
23+
| yanan1112 | Yanan Liu |
24+
| yao-matrix | Weifeng Yao |
25+
| zdcocnftcp10 | Dachuan Zhao |
26+
| zhouhuan2009 | Huan Zhou |
27+
| zoooooooyuan | Yuan Zu |

benchmark/README_CPU.md

Lines changed: 49 additions & 24 deletions
Original file line numberDiff line numberDiff line change
@@ -4,6 +4,9 @@
44

55
This time, we only provide benchmark on CPU. In the near future, we will add benchmark on ARM and GPU.
66

7+
> System: `CentOS 7 in Docker`, for benchmark between Anakin and Tensorflow
8+
> System: `CentOS 6.3`, for benchmark between Anakin and Paddle
9+
710
## Counterpart of anakin :
811

912
The counterpart of **`Anakin`** is `Tensorflow 1.8.0`, which installed by Anaconda 4.5.4, run by Python 3.6
@@ -202,55 +205,77 @@ We tested them on single-CPU with different thread numbers.
202205
4 | 18074 | 118696
203206
6 | 26607 | 102044
204207

205-
2. **`Anakin`** VS **`PaddlePaddle/Fluid\`**
206-
208+
2. **`Anakin`** VS **`PaddlePaddle/Fluid`**
209+
We use private dataset and different QPS index in this benchmark.
207210
### <span id = '1'>language model in E5-2650 v4 </span>
208211

209212
- Latency (`ms`) of one batch
210213

211214
ThreadNum | Fluid | Anakin
212215
:---: | :---: | :---: |
213-
1 | 42.09 | 1.90
214-
2 | 42.14 | 2.16
215-
6 | 42.15 | 4.21
216-
10 | 42.14 | 9.26
217-
12 | 42.34 | 11.17
216+
1 | 42.7418 | 1.93589
217+
2 | 42.7418 | 2.49537
218+
6 | 42.7734 | 3.14332
219+
10 | 43.0721 | 4.55329
220+
12 | 42.8501 | 5.09893
218221

219222
- Throughput (`sentence/s`)
220223

221224
ThreadNum | Fluid | Anakin
222225
:---: | :---: | :---: |
223-
1 | 23 | 524
224-
2 | 47 | 916
225-
6 | 141 | 1402
226-
10 | 236 | 1063
227-
12 | 282 | 1044
226+
1 | 23 | 504
227+
2 | 46 | 762
228+
6 | 134 | 1393
229+
10 | 218 | 1556
230+
12 | 260 | 1541
228231

229232
### <span id = '2'>Chinese_ner model in E5-2650 v4 </span>
230233

231234
- Latency (`ms`) of one batch
232235

233236
ThreadNum | Fluid | Anakin
234237
:---: | :---: | :---: |
235-
1 | 0.47 | 0.17
236-
4 | 0.26 | 0.17
237-
6 | 0.36 | 0.17
238-
10 | 0.59 | 0.17
239-
12 | 0.72 | 0.17
238+
1 | 0.380475 | 0.17034
239+
4 | 0.380475 | 0.171143
240+
6 | 0.380475 | 0.172688
241+
10 | 0.380475 | 0.173269
242+
12 | 0.380475 | 0.17668
243+
244+
- Throughput (`sentence/s`)
245+
246+
ThreadNum | Fluid | Anakin
247+
:---: | :---: | :---: |
248+
1 | 7844 | 5822
249+
4 | 7844 | 11377
250+
6 | 7844 | 29725
251+
10 | 7844 | 41238
252+
12 | 7844 | 42790
253+
254+
### <span id = '3'>text_classfication model in E5-2650 v4 </span>
255+
256+
- Latency (`ms`) of one batch
257+
258+
ThreadNum | Fluid | Anakin
259+
:---: | :---: | :---: |
260+
1 | 1.48578 | 1.10088
261+
4 | 1.54025 | 1.11258
262+
6 | 1.68529 | 1.1257
263+
10 | 1.9817 | 1.13267
264+
12 | 2.21864 | 1.1429
240265

241266
- Throughput (`sentence/s`)
242267

243268
ThreadNum | Fluid | Anakin
244269
:---: | :---: | :---: |
245-
1 | 2129 | 5819
246-
4 | 3866 | 11182
247-
6 | 8095 | 30948
248-
10 | 8250 | 44093
249-
12 | 8112 | 47185
270+
1 | 673 | 901
271+
4 | 1289 | 1665
272+
6 | 3458 | 4449
273+
10 | 4875 | 6183
274+
12 | 5265 | 6188
250275

251276
## How to run those Benchmark models?
252277

253-
> 1. You can just run `sh benchmark_tensorflow.sh` and `sh benchmark_anakin.sh`
254-
> 2. Get the model of caffe or fluid, convert model to anakin model, use net_test_*** to test your model.
278+
> 1. You can just run `sh benchmark_tensorflow.sh` and `sh benchmark_anakin.sh`
279+
> 2. Get the model of caffe or fluid, convert model to anakin model, use net_test_*** to test your model.
255280
256281

benchmark/README_GPU.md

Lines changed: 55 additions & 55 deletions
Original file line numberDiff line numberDiff line change
@@ -35,21 +35,21 @@ We tested them on single-GPU with single-thread.
3535

3636
BatchSize | TensorRT | Anakin
3737
:---: | :---: | :---: |
38-
1 | 8.8690 | 8.2815
39-
2 | 15.5344 | 13.9116
40-
4 | 26.6000 | 21.8747
41-
8 | 49.8279 | 40.4076
42-
32 | 188.6270 | 163.7660
38+
1 | 8.85176 | 8.15362
39+
2 | 15.6517 | 13.8716
40+
4 | 26.5303 | 21.8478
41+
8 | 48.2286 | 40.496
42+
32 | 183.994 | 163.035
4343

4444
- GPU Memory Used (`MB`)
4545

4646
BatchSize | TensorRT | Anakin
4747
:---: | :---: | :---: |
48-
1 | 963 | 997
49-
2 | 965 | 1039
50-
4 | 991 | 1115
51-
8 | 1067 | 1269
52-
32 | 1715 | 2193
48+
1 | 887 | 648
49+
2 | 965 | 733
50+
4 | 991 | 810
51+
8 | 1067 | 911
52+
32 | 1715 | 1325
5353

5454

5555
### <span id = '2'>Yolo </span>
@@ -58,100 +58,100 @@ We tested them on single-GPU with single-thread.
5858

5959
BatchSize | TensorRT | Anakin
6060
:---: | :---: | :---: |
61-
1 | 16.4596| 15.2124
62-
2 | 26.6347| 25.0442
63-
4 | 43.3695| 43.5017
64-
8 | 80.9139 | 80.9880
65-
32 | 293.8080| 310.8810
61+
1 | 16.4623| 15.3214
62+
2 | 26.7082| 25.0305
63+
4 | 43.2129| 43.4758
64+
8 | 80.0053 | 80.7645
65+
32 | 283.352| 311.152
6666

6767
- GPU Memory Used (`MB`)
6868

6969

7070
BatchSize | TensorRT | Anakin
7171
:---: | :---: | :---: |
72-
1 | 1569 | 1775
73-
2 | 1649 | 1815
74-
4 | 1709 | 1887
75-
8 | 1731 | 2031
76-
32 | 2253 | 2907
72+
1 | 1226 | 1192
73+
2 | 1326 | 1269
74+
4 | 1435 | 1356
75+
8 | 1563 | 1434
76+
32 | 2150 | 1633
7777

7878
### <span id = '3'> Resnet50 </span>
7979

8080
- Latency (`ms`) of different batch
8181

8282
BatchSize | TensorRT | Anakin
8383
:---: | :---: | :---: |
84-
1 | 4.2459 | 4.1061
85-
2 | 6.2627 | 6.5159
86-
4 | 10.1277 | 11.3327
87-
8 | 17.8209 | 20.6680
88-
32 | 65.8582 | 77.8858
84+
1 | 4.26834 | 3.25853
85+
2 | 6.2811 | 6.12156
86+
4 | 10.1183 | 10.9219
87+
8 | 18.1395 | 20.323
88+
32 | 66.4728 | 83.9934
8989

9090
- GPU Memory Used (`MB`)
9191

9292
BatchSize | TensorRT | Anakin
9393
:---: | :---: | :---: |
94-
1 | 531 | 503
95-
2 | 543 | 517
96-
4 | 583 | 541
97-
8 | 611 | 589
98-
32 | 809 | 879
94+
1 | 932 | 272
95+
2 | 936 | 318
96+
4 | 720 | 376
97+
8 | 697 | 480
98+
32 | 842 | 835
9999

100100
### <span id = '4'> Resnet101 </span>
101101

102102
- Latency (`ms`) of different batch
103103

104104
BatchSize | TensorRT | Anakin
105105
:---: | :---: | :---: |
106-
1 | 7.5562 | 7.0837
107-
2 | 11.6023 | 11.4079
108-
4 | 18.3650 | 20.0493
109-
8 | 32.7632 | 36.0648
110-
32 | 123.2550 | 135.4880
106+
1 | 7.58234 | 5.66457
107+
2 | 11.6014 | 10.9213
108+
4 | 18.3298 | 19.3987
109+
8 | 32.6523 | 37.5575
110+
32 | 123.114 | 149.089
111111

112112
- GPU Memory Used (`MB)`
113113

114114
BatchSize | TensorRT | Anakin
115115
:---: | :---: | :---: |
116-
1 | 701 | 683
117-
2 | 713 | 697
118-
4 | 793 | 721
119-
8 | 819 | 769
120-
32 | 1043 | 1059
116+
1 | 1020 | 420
117+
2 | 961 | 467
118+
4 | 943 | 503
119+
8 | 885 | 606
120+
32 | 1048 | 1077
121121

122122
### <span id = '5'> MobileNet V1 </span>
123123

124124
- Latency (`ms`) of different batch
125125

126126
BatchSize | TensorRT | Anakin
127127
:---: | :---: | :---: |
128-
1 | 45.5156 | 1.3947
129-
2 | 46.5585 | 2.5483
130-
4 | 48.4242 | 4.3404
131-
8 | 52.7957 | 8.1513
132-
32 | 83.2519 | 31.3178
128+
1 | 45.2189 | 1.39566
129+
2 | 46.4538 | 2.50698
130+
4 | 47.8918 | 4.38727
131+
8 | 52.3636 | 8.21416
132+
32 | 83.0503 | 31.33
133133

134134
- GPU Memory Used (`MB`)
135135

136136
BatchSize | TensorRT | Anakin
137137
:---: | :---: | :---: |
138-
1 | 329 | 283
139-
2 | 345 | 289
140-
4 | 371 | 299
141-
8 | 393 | 319
142-
32 | 531 | 433
138+
1 | 516 | 176
139+
2 | 524 | 166
140+
4 | 497 | 165
141+
8 | 508 | 239
142+
32 | 628 | 388
143143

144144
### <span id = '6'> MobileNet V2</span>
145145

146146
- Latency (`ms`) of different batch
147147

148148
BatchSize | TensorRT | Anakin
149149
:---: | :---: | :---: |
150-
1 | 65.6861 | 2.9842
151-
2 | 66.6814 | 4.7472
152-
4 | 69.7114 | 7.4163
153-
8 | 76.1092 | 12.8779
154-
32 | 124.9810 | 47.2142
150+
1 | 65.4277 | 1.80542
151+
2 | 66.2048 | 3.85568
152+
4 | 68.8045 | 6.80921
153+
8 | 75.64 | 12.6038
154+
32 | 124.09 | 47.6079
155155

156156
- GPU Memory Used (`MB`)
157157

saber/funcs/impl/cuda/base/cuda_c/saber_scale.cu

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -47,6 +47,10 @@ SaberStatus SaberScale<NV, AK_FLOAT, AK_FLOAT, AK_FLOAT, \
4747
auto in_data = inputs[0]->data();
4848
auto out_data = outputs[0]->mutable_data();
4949
const int count = inputs[0]->valid_size();
50+
if (inputs.size() > 1) {
51+
_scale_dim = inputs[1]->valid_size();
52+
_inner_dim = count / _scale_dim;
53+
}
5054
if (_scale_dim > 1 || inputs.size() > 1) {
5155
auto scale_data = inputs.size() > 1 ? inputs[1]->data() : _weight.data();
5256
auto bias_data = param.bias_term ? _bias.data() : NULL;

test/framework/graph/graph_parser_from_model_test.cpp

Lines changed: 1 addition & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -7,8 +7,7 @@
77
using namespace anakin;
88
using namespace anakin::graph;
99

10-
//std::string model_path = "/home/chaowen/anakin_v2/model_v2/google_net/googlenet.anakin.bin";
11-
std::string model_path = "/home/chaowen/anakin_v2/model_v2/yolo/yolo.anakin.bin";
10+
std::string model_path = "/path/to/name.anakin.bin";
1211

1312

1413
TEST(GraphTest, graph_load_model) {

test/framework/net/net_exec_multi_thread_test.cpp

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -13,7 +13,7 @@ using Target_H = X86;
1313
using Target = ARM;
1414
using Target_H = ARM;
1515
#endif
16-
std::string model_path = "/home/chaowen/anakin_v2/model_v2/anakin-models/adu/anakin_models/diepsie_light_head/yolo_lane_v2.anakin.bin";
16+
std::string model_path = "../benchmark/CNN/models/vgg16.anakin.bin";
1717

1818
#ifdef USE_CUDA
1919
#if 0

test/framework/net/net_exec_test.cpp

Lines changed: 1 addition & 35 deletions
Original file line numberDiff line numberDiff line change
@@ -16,42 +16,8 @@ using Target_H = ARM;
1616

1717
//#define USE_DIEPSE
1818

19-
//std::string model_path = "/home/chaowen/anakin_v2/model_v2/anakin-models/adu/anakin_models/diepsie_light_head/diepsie_light_head.anakin.bin";
20-
21-
//std::string model_path = "/home/chaowen/anakin_v2/model_v2/anakin-models/adu/anakin_models/diepsie_light_head/diepsie_light_head_base.anakin.bin";
22-
23-
24-
//std::string model_path = "/home/chaowen/anakin_v2/model_v2/anakin-models/adu/anakin_models/diepsie_light_head/densebox.anakin.bin";
25-
26-
//std::string model_path = "/home/chaowen/anakin_v2/model_v2/anakin-models/adu/anakin_models/diepsie_light_head/cnn_seg.anakin.bin";
27-
28-
//std::string model_path = "/home/chaowen/anakin_v2/model_v2/anakin-models/adu/anakin_models/diepsie_light_head/yolo_camera_detector.anakin.bin";
29-
30-
//std::string model_path = "/home/chaowen/anakin_v2/model_v2/anakin-models/adu/anakin_models/diepsie_light_head/yolo_lane_v2.anakin.bin";
31-
32-
// alignment of face
33-
//std::string model_path = "/home/chaowen/anakin_v2/model_v2/anakin-models/adu/anakin_models/diepsie_light_head/net_deploy_stageI.anakin.bin";
34-
35-
//std::string model_path = "/home/chaowen/anakin_v2/model_v2/anakin-models/adu/anakin_models/diepsie_light_head/net_deploy_stageII.anakin.bin";
36-
37-
// residual 7 patch of face
38-
//std::string model_path = "/home/chaowen/anakin_v2/model_v2/anakin-models/adu/anakin_models/diepsie_light_head/residual_net_7patch_3hc.anakin.bin";
39-
40-
// resnet 50
41-
// std::string model_path = "/home/cuichaowen/anakin2/anakin2/benchmark/CNN/mobilenet_v2.anakin.bin";
42-
4319
// vgg16
44-
// std::string model_path = "/home/cuichaowen/anakin2/anakin2/benchmark/CNN/models/vgg16.anakin.bin";
45-
46-
// resnet 101
47-
// std::string model_path = "/home/cuichaowen/parsing/external_converter_v2/output/ResNet-101.anakin.bin";
48-
49-
// animal
50-
// std::string model_path = "/home/cuichaowen/parsing/external_converter_v2/output/animal.anakin.bin";
51-
52-
//std::string model_path = "/home/cuichaowen/github_anakin/icode_model/anakin-models/vis/anakin-models/mainbody/mainbody.anakin2.bin";
53-
54-
std::string model_path = "/home/cuichaowen/github_anakin/icode_model/anakin-models/vis/anakin-models/MobileNetSSD/mobilenet-ssd_fluid.anakin2.bin";
20+
std::string model_path = "../benchmark/CNN/models/vgg16.anakin.bin";
5521

5622
#ifdef USE_CUDA
5723
#if 1

0 commit comments

Comments
 (0)