fix conflicts.

pangge · pangge · commit 7856530c515f · 2018-07-03T19:13:10.000+08:00
diff --git a/AUTHORS.md b/AUTHORS.md
@@ -0,0 +1,27 @@
+| Github account | name |
+|---|---|
+|   chenjiaoAngel   |   Jiao Chen   |
+|   cyj1986 |   Yujuan Cheng    |
+|   feifei14119 |   Fei Wang    |
+|   jackyh  |   Chengjie He |
+|   Jayoprell   |   Xiaocheng Luo   |
+|   jjsbear |   Jingsong Ji |
+|   LittleMaer  |   Yi Zhuang   |
+|   mengkai94   |   Kai Meng    |
+|   micytw  |   Michael Wu  |
+|   pangge  |   Chaowen Cui |
+|   perchbird   |   Xiaokun Yu  |
+|   PeterJkPeng |   Junyi Peng  |
+|   qq332982511 |   Junjie Liu  |
+|   Shixiaowei02    |   Xiaowei Shi |
+|   sogalin |   Soga Lin    |
+|   throneclay  |   Shuai Zhang |
+|   vin-huang   |   Vin Huang   |
+|   wgy0804 |   Guoya Wang  |
+|   xklnono |   Kailu Xu    |
+|   xyoungli    |   Xiaoyang Li |
+|   yanan1112   |   Yanan Liu   |
+|   yao-matrix  |   Weifeng Yao |
+|   zdcocnftcp10    |   Dachuan Zhao    |
+|   zhouhuan2009    |   Huan Zhou   |
+|   zoooooooyuan    |   Yuan Zu |
diff --git a/benchmark/README_CPU.md b/benchmark/README_CPU.md
@@ -4,6 +4,9 @@
 
 This time, we only provide benchmark on CPU. In the near future, we will add benchmark on ARM and GPU.
 
+> System: `CentOS 7 in Docker`, for benchmark between Anakin and Tensorflow  
+> System: `CentOS 6.3`, for benchmark between Anakin and Paddle
+
 ## Counterpart of anakin  :
 
 The counterpart of **`Anakin`** is `Tensorflow 1.8.0`, which installed by Anaconda 4.5.4, run by Python 3.6
@@ -202,55 +205,77 @@ We tested them on single-CPU with different thread numbers.
     4 | 18074 | 118696
     6 | 26607 | 102044
 
-2. **`Anakin`** VS **`PaddlePaddle/Fluid\`**
-
+2. **`Anakin`** VS **`PaddlePaddle/Fluid`**  
+We use private dataset and different QPS index in this benchmark.
 ### <span id = '1'>language model in E5-2650 v4 </span>
 
 - Latency (`ms`) of one batch
 
     ThreadNum | Fluid | Anakin
     :---: | :---: | :---: |
-    1 | 42.09    | 1.90
-    2 | 42.14    | 2.16
-    6 | 42.15   | 4.21
-    10 | 42.14   | 9.26
-    12 | 42.34   | 11.17
+    1 | 42.7418    | 1.93589
+    2 | 42.7418    | 2.49537
+    6 | 42.7734   | 3.14332
+    10 | 43.0721   | 4.55329
+    12 | 42.8501   | 5.09893
 
 - Throughput (`sentence/s`)
 
     ThreadNum | Fluid | Anakin
     :---: | :---: | :---: |
-    1 | 23 | 524
-    2 | 47 | 916
-    6 | 141 | 1402
-    10 | 236   | 1063
-    12 | 282   | 1044
+    1 | 23 | 504
+    2 | 46 | 762
+    6 | 134 | 1393
+    10 | 218   | 1556
+    12 | 260   | 1541
 
 ### <span id = '2'>Chinese_ner model in E5-2650 v4 </span>
 
 - Latency (`ms`) of one batch
 
     ThreadNum | Fluid | Anakin
     :---: | :---: | :---: |
-    1 | 0.47    | 0.17
-    4 | 0.26    | 0.17
-    6 | 0.36    | 0.17
-    10 | 0.59   | 0.17
-    12 | 0.72   | 0.17
+    1 | 0.380475    | 0.17034
+    4 | 0.380475    | 0.171143
+    6 | 0.380475    | 0.172688
+    10 | 0.380475   | 0.173269
+    12 | 0.380475   | 0.17668
+
+- Throughput (`sentence/s`)
+    
+    ThreadNum | Fluid | Anakin
+    :---: | :---: | :---: |
+    1 | 7844  | 5822
+    4 | 7844  | 11377
+    6 | 7844  | 29725
+    10 | 7844 | 41238
+    12 | 7844  | 42790
+
+### <span id = '3'>text_classfication model in E5-2650 v4 </span>
+
+- Latency (`ms`) of one batch
+
+    ThreadNum | Fluid | Anakin
+    :---: | :---: | :---: |
+    1 | 1.48578    | 1.10088
+    4 | 1.54025    | 1.11258
+    6 | 1.68529    | 1.1257
+    10 | 1.9817    | 1.13267
+    12 | 2.21864   | 1.1429
 
 - Throughput (`sentence/s`)
     
     ThreadNum | Fluid | Anakin
     :---: | :---: | :---: |
-    1 | 2129  | 5819
-    4 | 3866  | 11182
-    6 | 8095  | 30948
-    10 | 8250 | 44093
-    12 | 8112  | 47185
+    1 | 673  | 901
+    4 | 1289  | 1665
+    6 | 3458  | 4449
+    10 | 4875 | 6183
+    12 | 5265 | 6188
 
 ## How to run those Benchmark models?
 
-> 1. You can just run `sh benchmark_tensorflow.sh` and  `sh benchmark_anakin.sh`
-> 2. Get the model of caffe or fluid, convert model to anakin model, use net_test_*** to test your model.
+> 1. You can just run `sh benchmark_tensorflow.sh` and  `sh benchmark_anakin.sh`  
+> 2. Get the model of caffe or fluid, convert model to anakin model, use net_test_*** to test your model.  
 
 
diff --git a/benchmark/README_GPU.md b/benchmark/README_GPU.md
@@ -35,21 +35,21 @@ We tested them on single-GPU with single-thread.
 
     BatchSize | TensorRT | Anakin
     :---: | :---: | :---: |
-    1 | 8.8690 | 8.2815
-    2 | 15.5344 | 13.9116
-    4 | 26.6000 | 21.8747 
-    8 | 49.8279 | 40.4076 
-    32 | 188.6270 | 163.7660 
+    1 | 8.85176 | 8.15362
+    2 | 15.6517 | 13.8716
+    4 | 26.5303 | 21.8478 
+    8 | 48.2286 | 40.496 
+    32 | 183.994 | 163.035 
 
 - GPU Memory Used (`MB`)
 
     BatchSize | TensorRT | Anakin
     :---: | :---: | :---: | 
-    1 | 963 | 997
-    2 | 965 | 1039
-    4 | 991 | 1115
-    8 | 1067 | 1269
-    32 | 1715 | 2193
+    1 | 887 | 648 
+    2 | 965 | 733 
+    4 | 991 | 810 
+    8 | 1067 | 911 
+    32 | 1715 | 1325 
 
     
 ### <span id = '2'>Yolo </span>  
@@ -58,100 +58,100 @@ We tested them on single-GPU with single-thread.
 
     BatchSize | TensorRT | Anakin
     :---: | :---: | :---: | 
-    1 | 16.4596| 15.2124
-    2 | 26.6347| 25.0442 
-    4 | 43.3695| 43.5017
-    8 | 80.9139 | 80.9880
-    32 | 293.8080| 310.8810
+    1 | 16.4623| 15.3214
+    2 | 26.7082| 25.0305 
+    4 | 43.2129| 43.4758
+    8 | 80.0053 | 80.7645
+    32 | 283.352| 311.152
 
 - GPU Memory Used (`MB`)
 
 
     BatchSize | TensorRT | Anakin
     :---: | :---: | :---: | 
-    1 | 1569 | 1775
-    2 | 1649 | 1815
-    4 | 1709 | 1887
-    8 | 1731 | 2031
-    32 | 2253 | 2907
+    1 | 1226  | 1192 
+    2 | 1326  | 1269 
+    4 | 1435  | 1356 
+    8 | 1563  | 1434 
+    32 | 2150  | 1633 
 
 ### <span id = '3'> Resnet50 </span> 
 
 - Latency (`ms`) of different batch  
 
     BatchSize | TensorRT | Anakin
     :---: | :---: | :---: | 
-    1 | 4.2459   |  4.1061 
-    2 |  6.2627  |  6.5159 
-    4 | 10.1277  | 11.3327
-    8 | 17.8209 |   20.6680 
-    32 | 65.8582 | 77.8858
+    1 | 4.26834   |  3.25853 
+    2 |  6.2811  |  6.12156 
+    4 | 10.1183  | 10.9219
+    8 | 18.1395 |   20.323 
+    32 | 66.4728 | 83.9934
 
 - GPU Memory Used (`MB`)
 
     BatchSize | TensorRT | Anakin
     :---: | :---: | :---: | 
-    1 | 531  | 503
-    2 | 543  | 517
-    4 | 583 | 541
-    8 | 611 | 589
-    32 |  809 | 879
+    1 | 932 | 272 
+    2 | 936   | 318 
+    4 | 720  | 376 
+    8 | 697  | 480 
+    32 |  842  | 835 
 
 ### <span id = '4'> Resnet101 </span> 
 
 - Latency (`ms`) of different batch  
 
     BatchSize | TensorRT | Anakin
     :---: | :---: | :---: | 
-    1 | 7.5562 | 7.0837  
-    2 | 11.6023 | 11.4079
-    4 | 18.3650 | 20.0493 
-    8 | 32.7632 | 36.0648
-    32 | 123.2550 | 135.4880
+    1 | 7.58234 | 5.66457  
+    2 | 11.6014 | 10.9213
+    4 | 18.3298 | 19.3987 
+    8 | 32.6523 | 37.5575
+    32 | 123.114 | 149.089
 
 - GPU Memory Used (`MB)`
 
     BatchSize | TensorRT | Anakin
     :---: | :---: | :---: | 
-    1 | 701  | 683
-    2 | 713  | 697
-    4 | 793 | 721
-    8 | 819 | 769
-    32 | 1043 | 1059
+    1 | 1020   | 420 
+    2 | 961   | 467 
+    4 | 943  | 503 
+    8 | 885  | 606 
+    32 | 1048  | 1077 
 
 ###  <span id = '5'> MobileNet V1 </span> 
 
 - Latency (`ms`) of different batch  
 
     BatchSize | TensorRT | Anakin
     :---: | :---: | :---: | 
-    1 | 45.5156  |  1.3947
-    2 |  46.5585  |  2.5483
-    4 | 48.4242  | 4.3404
-    8 |  52.7957 |  8.1513
-    32 | 83.2519 | 31.3178
+    1 | 45.2189  |  1.39566
+    2 |  46.4538  |  2.50698
+    4 | 47.8918  | 4.38727
+    8 |  52.3636 |  8.21416
+    32 | 83.0503 | 31.33
 
 - GPU Memory Used (`MB`)
 
     BatchSize | TensorRT | Anakin
     :---: | :---: | :---: | 
-    1 | 329  | 283
-    2 | 345   | 289
-    4 | 371 | 299
-    8 | 393 | 319
-    32 |  531 | 433
+    1 | 516   | 176 
+    2 | 524    | 166 
+    4 | 497  | 165 
+    8 | 508  | 239 
+    32 |  628  | 388 
 
 ###  <span id = '6'> MobileNet V2</span> 
 
 - Latency (`ms`) of different batch  
 
     BatchSize | TensorRT | Anakin
     :---: | :---: | :---: | 
-    1 | 65.6861 | 2.9842
-    2 | 66.6814 | 4.7472
-    4 | 69.7114 | 7.4163
-    8 | 76.1092 | 12.8779
-    32 | 124.9810 | 47.2142
+    1 | 65.4277 | 1.80542
+    2 | 66.2048 | 3.85568
+    4 | 68.8045 | 6.80921
+    8 | 75.64 | 12.6038
+    32 | 124.09 | 47.6079
 
 - GPU Memory Used (`MB`)
 
diff --git a/saber/funcs/impl/cuda/base/cuda_c/saber_scale.cu b/saber/funcs/impl/cuda/base/cuda_c/saber_scale.cu
@@ -47,6 +47,10 @@ SaberStatus SaberScale<NV, AK_FLOAT, AK_FLOAT, AK_FLOAT, \
     auto in_data = inputs[0]->data();
     auto out_data = outputs[0]->mutable_data();
     const int count = inputs[0]->valid_size();
+    if (inputs.size() > 1) {
+        _scale_dim = inputs[1]->valid_size();
+        _inner_dim = count / _scale_dim;
+    }
     if (_scale_dim > 1 || inputs.size() > 1) {
         auto scale_data = inputs.size() > 1 ? inputs[1]->data() : _weight.data();
         auto bias_data = param.bias_term ? _bias.data() : NULL;
diff --git a/test/framework/graph/graph_parser_from_model_test.cpp b/test/framework/graph/graph_parser_from_model_test.cpp
@@ -7,8 +7,7 @@
 using namespace anakin;
 using namespace anakin::graph;
 
-//std::string model_path = "/home/chaowen/anakin_v2/model_v2/google_net/googlenet.anakin.bin";
-std::string model_path = "/home/chaowen/anakin_v2/model_v2/yolo/yolo.anakin.bin";
+std::string model_path = "/path/to/name.anakin.bin";
 
 
 TEST(GraphTest, graph_load_model) {
diff --git a/test/framework/net/net_exec_multi_thread_test.cpp b/test/framework/net/net_exec_multi_thread_test.cpp
@@ -13,7 +13,7 @@ using Target_H = X86;
 using Target = ARM;
 using Target_H = ARM;
 #endif
-std::string model_path = "/home/chaowen/anakin_v2/model_v2/anakin-models/adu/anakin_models/diepsie_light_head/yolo_lane_v2.anakin.bin";
+std::string model_path = "../benchmark/CNN/models/vgg16.anakin.bin";
 
 #ifdef USE_CUDA
 #if 0
diff --git a/test/framework/net/net_exec_test.cpp b/test/framework/net/net_exec_test.cpp
@@ -16,42 +16,8 @@ using Target_H = ARM;
 
 //#define USE_DIEPSE
 
-//std::string model_path = "/home/chaowen/anakin_v2/model_v2/anakin-models/adu/anakin_models/diepsie_light_head/diepsie_light_head.anakin.bin";
-
-//std::string model_path = "/home/chaowen/anakin_v2/model_v2/anakin-models/adu/anakin_models/diepsie_light_head/diepsie_light_head_base.anakin.bin";
-
-
-//std::string model_path = "/home/chaowen/anakin_v2/model_v2/anakin-models/adu/anakin_models/diepsie_light_head/densebox.anakin.bin";
-
-//std::string model_path = "/home/chaowen/anakin_v2/model_v2/anakin-models/adu/anakin_models/diepsie_light_head/cnn_seg.anakin.bin";
-
-//std::string model_path = "/home/chaowen/anakin_v2/model_v2/anakin-models/adu/anakin_models/diepsie_light_head/yolo_camera_detector.anakin.bin";
-
-//std::string model_path = "/home/chaowen/anakin_v2/model_v2/anakin-models/adu/anakin_models/diepsie_light_head/yolo_lane_v2.anakin.bin";
-
-// alignment of face
-//std::string model_path = "/home/chaowen/anakin_v2/model_v2/anakin-models/adu/anakin_models/diepsie_light_head/net_deploy_stageI.anakin.bin";
-
-//std::string model_path = "/home/chaowen/anakin_v2/model_v2/anakin-models/adu/anakin_models/diepsie_light_head/net_deploy_stageII.anakin.bin";
-
-// residual 7 patch of face
-//std::string model_path = "/home/chaowen/anakin_v2/model_v2/anakin-models/adu/anakin_models/diepsie_light_head/residual_net_7patch_3hc.anakin.bin";
-
-// resnet 50
-// std::string model_path = "/home/cuichaowen/anakin2/anakin2/benchmark/CNN/mobilenet_v2.anakin.bin";
-
 // vgg16
-// std::string model_path = "/home/cuichaowen/anakin2/anakin2/benchmark/CNN/models/vgg16.anakin.bin";
-
-// resnet 101
-// std::string model_path = "/home/cuichaowen/parsing/external_converter_v2/output/ResNet-101.anakin.bin";
-
-// animal
-// std::string model_path = "/home/cuichaowen/parsing/external_converter_v2/output/animal.anakin.bin";
-
-//std::string model_path = "/home/cuichaowen/github_anakin/icode_model/anakin-models/vis/anakin-models/mainbody/mainbody.anakin2.bin";
-
-std::string model_path = "/home/cuichaowen/github_anakin/icode_model/anakin-models/vis/anakin-models/MobileNetSSD/mobilenet-ssd_fluid.anakin2.bin";
+std::string model_path = "../benchmark/CNN/models/vgg16.anakin.bin";
 
 #ifdef USE_CUDA
 #if 1
diff --git a/test/framework/net/net_exec_test_sequence_labeling.cpp b/test/framework/net/net_exec_test_sequence_labeling.cpp