性能数据

可以参考benchmark_tools,推荐一键benchmark

ARM测试环境

  • 测试模型
    • fp32模型
      • mobilenet_v1
      • mobilenet_v2
      • squeezenet_v1.1
      • mnasnet
      • shufflenet_v2
    • int8模型
      • mobilenet_v1
      • mobilenet_v2
  • 测试机器(android ndk ndk-r17c)
    • 骁龙855
      • xiaomi mi9, snapdragon 855 (enable sdot instruction)
      • 4xA76(1@2.84GHz + 3@2.4GHz) + 4xA55@1.78GHz
    • 骁龙845
      • xiaomi mi8, 845
      • 2.8GHz(大四核),1.7GHz(小四核)
    • 骁龙835
      • xiaomi mix2, snapdragon 835
      • 2.45GHz(大四核),1.9GHz(小四核)
    • 麒麟970
      • HUAWEI Mate10
  • 测试说明
    • branch: release/v2.6.0
    • warmup=10, repeats=30,统计平均时间,单位是ms
    • 当线程数为1时,DeviceInfo::Global().SetRunMode设置LITE_POWER_HIGH,否者设置LITE_POWER_NO_BIND
    • 模型的输入图像的维度是{1, 3, 224, 224},输入图像的每一位数值是1

ARM测试数据

fp32模型测试数据

paddlepaddle model

骁龙855armv7armv7armv7armv8armv8armv8
threads num124124
mobilenet_v135.1120.6711.8330.5618.5910.44
mobilenet_v226.3615.839.2921.6413.257.95
shufflenet_v24.563.142.354.072.892.28
squeezenet_v1.121.2713.558.4918.0511.517.83
mnasnet21.4013.187.6318.8411.406.80
骁龙845armv7armv7armv7armv8armv8armv8
threads num124124
mobilenet_v165.5637.1719.6563.2332.9817.68
mobilenet_v245.8925.2014.3941.0322.9412.98
shufflenet_v27.314.663.277.084.713.41
squeezenet_v1.136.9822.5313.4534.2720.9612.60
mnasnet39.8523.6412.2537.8120.7011.81
骁龙835armv7armv7armv7armv8armv8armv8
threads num124124
mobilenet_v192.7751.5630.1487.4648.0226.42
mobilenet_v265.7836.5222.3458.3133.0419.87
shufflenet_v210.396.264.469.726.194.41
squeezenet_v1.153.5933.1620.1351.5631.8119.10
mnasnet57.4432.6219.4754.9930.6917.98

caffe model

骁龙855armv7armv7armv7armv8armv8armv8
threads num124124
mobilenet_v132.3818.6510.6930.7518.119.88
mobilenet_v229.4517.8610.8126.6116.269.67
shufflenet_v25.043.142.204.092.852.25
骁龙845armv7armv7armv7armv8armv8armv8
threads num124124
mobilenet_v165.2635.1919.1161.4233.1517.48
mobilenet_v255.5931.3117.6851.5429.6916.00
shufflenet_v27.424.733.337.184.753.39
骁龙835armv7armv7armv7armv8armv8armv8
threads num124124
mobilenet_v195.3852.1630.3792.1046.7126.31
mobilenet_v282.8945.4928.1474.9141.8825.25
shufflenet_v210.256.364.429.686.204.42

int8量化模型测试数据

骁龙855armv7armv7armv7armv8armv8armv8
threads num124124
mobilenet_v137.1821.7111.1614.418.344.37
mobilenet_v227.9516.578.9713.688.164.67
骁龙835armv7armv7armv7armv8armv8armv8
threads num124124
mobilenet_v161.6332.6016.4957.3629.7415.50
mobilenet_v247.1325.6213.5641.8722.4211.72
麒麟970armv7armv7armv7armv8armv8armv8
threads num124124
mobilenet_v163.1332.6316.8558.9229.9615.42
mobilenet_v248.6025.4313.7643.0622.1012.09

华为麒麟NPU测试环境

  • 测试模型
    • fp32模型
      • mobilenet_v1
      • mobilenet_v2
      • squeezenet_v1.1
      • mnasnet
  • 测试机器(android ndk ndk-r17c)
    • 麒麟810
      • HUAWEI Nova5, Kirin 810
      • 2xCortex A76 2.27GHz + 6xCortex A55 1.88GHz
    • 麒麟990
      • HUAWEI Mate 30, Kirin 990
      • 2 x Cortex-A76 Based 2.86 GHz + 2 x Cortex-A76 Based 2.09 GHz + 4 x Cortex-A55 1.86 GHz
    • 麒麟990 5G
      • HUAWEI P40, Kirin 990 5G
      • 2 x Cortex-A76 Based 2.86GHz + 2 x Cortex-A76 Based 2.36GHz + 4 x Cortex-A55 1.95GHz
  • HIAI ddk 版本: 310 or 320
  • 测试说明
    • branch: release/v2.6.1
    • warmup=10, repeats=30,统计平均时间,单位是ms
    • 线程数为1,DeviceInfo::Global().SetRunMode设置LITE_POWER_HIGH
    • 模型的输入图像的维度是{1, 3, 224, 224},输入图像的每一位数值是1

华为麒麟NPU测试数据

paddlepaddle model

  • ddk 310
Kirin810990990 5G
cpu(ms)npu(ms)cpu(ms)npu(ms)cpu(ms)npu(ms)
mobilenet_v141.2012.7631.914.0733.973.20
mobilenet_v229.5712.1222.475.6123.173.51
squeezenet23.969.0417.793.8218.653.01
mnasnet26.4713.6219.545.1720.343.32
  • ddk 320
模型990990-5G
cpu(ms)npu(ms)cpu(ms)npu(ms)
ssd_mobilenetv165.6718.2171.816.6

说明:ssd_mobilenetv1的npu性能为npu、cpu混合调度运行的总时间