基於候選網路搜尋之適用於終端超輕量化且有效率之卷積網路_

帳號：guest(216.73.216.146) 離開系統

字體大小：

詳目顯示

第 1 筆 / 共 1 筆

/1頁

以作者查詢圖書館館藏

、以作者查詢臺灣博碩士論文系統

、以作者查詢全國書目

論文基本資料
摘要
外文摘要
論文目次
參考文獻
電子全文

作者(中文):	王鏡成
作者(外文):	Wang, Ching-Chen
論文名稱(中文):	基於候選網路搜尋之適用於終端超輕量化且有效率之卷積網路
論文名稱(外文):	EfficientNet-eLite: Extremely Lightweight and Efficient CNN Models for Edge Devices by Network Candidate Search
指導教授(中文):	邱瀞德
指導教授(外文):	Chiu, Ching-Te
口試委員(中文):	賴尚宏范倫達
口試委員(外文):	Lai, Shang-Hong Van, Lan-Da
學位類別:	碩士
校院名稱:	國立清華大學
系所名稱:	資訊工程學系
學號:	107062640
出版年(民國):	109
畢業學年度:	109
語文別:	英文
論文頁數:	71
中文關鍵詞:	有效率的網路、卷積神經網路縮減、邊緣設備計算、硬體友善的神經網路
外文關鍵詞:	EfficientNet、Scaling down CNN、Edge inference、Hardware-friendly CNN
相關次數:	推薦:0 點閱:663 評分: 下載:0 收藏:0

在終端設備上運算類神經卷積網路 Convolutional Neural Network(CNN)
是一項非常具有挑戰性的任務，因為這種輕量級的硬體設備並非設計來處
理這種高運算量高複雜度的類神經卷積網絡，而這也是近期先進的類神
經卷積網絡所有的問題。在此論文中，我們提出了網路候選搜尋 Network
Candidate Search（NCS），其目的是減少類神經卷積網路的運算量和複雜且
盡可能減少準確性的犧牲，構建一組超輕量化且精確的類神經卷積網絡模
型。
首先，我們從 EfficientNet-B0（基準模型），以不同降維度（Scale down）
的方式將其模型資源使用量縮小，並將其放入候選池中，在候選池中進行
搜尋以研究資源使用與準確度之間的取捨問題 (Trade-off)。然而，搜索成
本在計算上是昂貴並且無法負擔的。因此，通過觀察訓練類神經卷積網絡
模型期間的學習行為，我們提出淘汰法則，也就是藉由只訓練具有潛力的
模型達到降低訓練成本的目的。同時，我們將候選模型公平地分組，讓具
有相似資源使用情況的模型為一組。這樣，從資源成本的每個級別，我們
可以在組內獲得一個相對較優的模型，稱為 EfficientNet-eLite（極輕量化EfficientNet），與之前的先進類神經卷積網路相比，它具有更好的參數使
用量和準確性。特別的是，我們的 EfficientNet-eLite 9 比起 MnasNet 在影像
網路數據集 ImageNet 上的表現，參數減少了 1.46 倍，而準確率卻提高了
0.56％。
其次，為進一步減輕邊緣設備運算類神經卷積網絡的困難度，我們透過
考慮（ASIC）的設計概念，提出硬體友善的類神經卷積網絡模型。將提出
的類神經卷機網路 EfficientNet-eLite 進行硬體模組化調整，接著收集到候
選池中。透過同樣方式的網路候選搜尋，我們提出一系列 EfficientNet-HF
（硬體友善的 EfficientNet），並發現類神經卷積網絡模型不僅可以準確而且
對 ASIC 設計友善。

Convolutional Neural Network (CNN) for inference on the edge devices is a very challenging task because such lightweight hardware is not born to handle
this heavyweight software, which is the overhead from the modern state-of-the-art
CNN models. In this paper, we propose Network Candidate Search(NCS), targeting at reducing the overhead with trading the accuracy as less as possible, to build a family of extremely lightweight but accurate CNN models.
First of all, we collect CNN models, from EfficientNet-B0 (Baseline model)
to be scaled down in varied way, into the candidate pool, searching around the
pool to study the trade-off between resource usage and performance. However, the
searching cost is computationally expensive and not affordable. Therefore, with
observation of learning behavior during training CNN models, elimination criteria
is introduced to mitigate the training cost by only continuing the training process
of the potential models. Meanwhile, we fairly break candidate models into groups
that models with similar resource usage are gathered. By doing so, from each
level of resource cost, we can obtain a family of relative outperformed models inside the group, called EfficientNet-eLite(Extremely lightweight EfficientNet),
which presents better parameter usage and accuracy than the previous start-of-theart CNNs. Particularly, our EfficientNet-eLite 9 outperforms MnasNet with 1.46x
less parameters and 0.56% higher accuracy on ImageNet.
Secondly, to go further alleviating the difficulty of the CNN inference on the
edge, we provide the novels of techniques to build up hardware-friendly CNN
models by considering design concepts of Application-Specific Integrated Circuit (ASIC). The architecture of resulting network, EfficientNet-eLite, is tuned for
hardware modularity and collected into candidate pool. By the proposed NCS, we
obtain a family of relatively outperformed models, called EfficientNet-HF(Hardwarefriendly EfficientNet), realizing CNN models could be not only accurate but also friendly for ASIC.

1 Introduction 1
1.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.3 Goal . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.4 Contribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.5 Thesis Organization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2 Related Works 7
2.1 Model scaling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.1.1 Depth scaling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
2.1.2 Width scaling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
2.1.3 Resolution scaling . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.1.4 Compound scaling . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.2 EfficientNet . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.2.1 Neural Architecture Search . . . . . . . . . . . . . . . . . . . . . . . . 10
2.2.2 Grid Search . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.3 Hardware-oriented CNN models . . . . . . . . . . . . . . . . . . . . . . . . . 12
2.3.1 RGBD Embedded CNN . . . . . . . . . . . . . . . . . . . . . . . . . 13
3 EfficientNet-eLite: Extremely lightweight and efficient CNN models for edge devices
by network candidate search 17
3.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
3.1.1 Inspiration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
3.1.2 Network Candidate Search by Elimination Tournament(NCS-ET) . . . 19
3.1.3 Define candidate model through scaling . . . . . . . . . . . . . . . . . 20
3.1.4 Elimination Tournament . . . . . . . . . . . . . . . . . . . . . . . . . 21
3.2 Candidate of CNN models . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
3.2.1 Define candidate pool . . . . . . . . . . . . . . . . . . . . . . . . . . 22
3.2.2 Problem statement from defining candidates . . . . . . . . . . . . . . . 23
3.2.3 Define Scaling coefficient from depth . . . . . . . . . . . . . . . . . . 23
3.3 Grouping method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
3.3.1 Grouping motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
3.3.2 Inspiration of grouping concept . . . . . . . . . . . . . . . . . . . . . 27
3.3.3 Grouping problem statement . . . . . . . . . . . . . . . . . . . . . . . 28
3.3.4 Proposed Grouping methodology . . . . . . . . . . . . . . . . . . . . 29
3.4 Criteria for elimination . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
3.4.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
3.4.2 Hypothesis from Observation . . . . . . . . . . . . . . . . . . . . . . 32
3.4.3 Criteria for elimination : Average Accuracy . . . . . . . . . . . . . . . 36
3.4.4 Evaluate the hypothesis . . . . . . . . . . . . . . . . . . . . . . . . . . 36
3.5 Network candidate search . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
3.5.1 Overall algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
3.5.2 Objective formulation . . . . . . . . . . . . . . . . . . . . . . . . . . 41
4 Hardware friendly EfficientNet 45
4.1 Problem statement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
4.2 Proposed friendly design into NCS . . . . . . . . . . . . . . . . . . . . . . . . 46
4.3 Compound channels rounding . . . . . . . . . . . . . . . . . . . . . . . . . . 47
5 Experimental Results 49
5.1 Implementation details . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
5.2 Dataset : ImageNet . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
5.3 Elimination results from each group . . . . . . . . . . . . . . . . . . . . . . . 50
5.4 State-of-the-art models on ImageNet . . . . . . . . . . . . . . . . . . . . . . . 60
6 Conclusion 65
References 67

[1] A. G. Howard, M. Zhu, B. Chen, D. Kalenichenko, W. Wang, T. Weyand, M. Andreetto,
and H. Adam, “Mobilenets: Efficient convolutional neural networks for mobile vision
applications,” CoRR, vol. abs/1704.04861, 2017.
[2] M. Sandler, A. G. Howard, M. Zhu, A. Zhmoginov, and L. Chen, “Inverted residuals
and linear bottlenecks: Mobile networks for classification, detection and segmentation,”
CoRR, vol. abs/1801.04381, 2018.
[3] A. Howard, M. Sandler, G. Chu, L. Chen, B. Chen, M. Tan, W. Wang, Y. Zhu,
R. Pang, V. Vasudevan, Q. V. Le, and H. Adam, “Searching for mobilenetv3,” CoRR,
vol. abs/1905.02244, 2019.
[4] C.-T. Huang, Y.-C. Ding, H.-C. Wang, C.-W. Weng, K.-P. Lin, L.-W. Wang, and L.-D.
Chen, “Ecnn: A block-based and highly-parallel cnn accelerator for edge inference,” in
Proceedings of the 52nd Annual IEEE/ACM International Symposium on Microarchitecture, MICRO ’52, (New York, NY, USA), p. 182–195, Association for Computing
Machinery, 2019.
[5] C. Wang, C. Chiu, C. Huang, Y. Ding, and L. Wang, “Fast and accurate embedded dcnn
for rgb-d based sign language recognition,” in ICASSP 2020 - 2020 IEEE International
Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 1568–1572, 2020.
[6] M. Tan and Q. V. Le, “Efficientnet: Rethinking model scaling for convolutional neural
networks,” in ICML, pp. 6105–6114, 2019.
[7] M. Tan, B. Chen, R. Pang, V. Vasudevan, and Q. V. Le, “Mnasnet: Platform-aware neural
architecture search for mobile,” CoRR, vol. abs/1807.11626, 2018.
[8] A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet classification with deep convolutional neural networks,” in Advances in Neural Information Processing Systems 25
(F. Pereira, C. J. C. Burges, L. Bottou, and K. Q. Weinberger, eds.), pp. 1097–1105, Curran
Associates, Inc., 2012.
[9] K. Simonyan and A. Zisserman, “Very deep convolutional networks for large-scale image
recognition,” in arXiv 1409.1556, 2014.
[10] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,”
CoRR, vol. abs/1512.03385, 2015.
[11] G. Huang, Y. Sun, Z. Liu, D. Sedra, and K. Q. Weinberger, “Deep networks with stochastic
depth,” CoRR, vol. abs/1603.09382, 2016.
67
[12] C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. E. Reed, D. Anguelov, D. Erhan, V. Vanhoucke,
and A. Rabinovich, “Going deeper with convolutions,” CoRR, vol. abs/1409.4842, 2014.
[13] C. Szegedy, V. Vanhoucke, S. Ioffe, J. Shlens, and Z. Wojna, “Rethinking the inception
architecture for computer vision,” CoRR, vol. abs/1512.00567, 2015.
[14] S. Zagoruyko and N. Komodakis, “Wide residual networks,” CoRR, vol. abs/1605.07146,
2016.
[15] Y. Huang, Y. Cheng, D. Chen, H. Lee, J. Ngiam, Q. V. Le, and Z. Chen, “Gpipe: Efficient
training of giant neural networks using pipeline parallelism,” CoRR, vol. abs/1811.06965,
2018.
[16] B. Moons and M. Verhelst, “A 0.3–2.6 tops/w precision-scalable processor for real-time
large-scale convnets,” in 2016 IEEE Symposium on VLSI Circuits (VLSI-Circuits), pp. 1–
2, 2016.
[17] J. Qiu, J. Wang, S. Yao, K. Guo, B. Li, E. Zhou, J. Yu, T. Tang, N. Xu, S. Song, Y. Wang,
and H. Yang, “Going deeper with embedded fpga platform for convolutional neural network,” in Proceedings of the 2016 ACM/SIGDA International Symposium on FieldProgrammable Gate Arrays, FPGA ’16, (New York, NY, USA), p. 26–35, Association
for Computing Machinery, 2016.
[18] R. Andri, L. Cavigelli, D. Rossi, and L. Benini, “Yodann: An architecture for ultralow
power binary-weight cnn acceleration,” IEEE Transactions on Computer-Aided Design
of Integrated Circuits and Systems, vol. 37, no. 1, pp. 48–60, 2018.
[19] N. Pugeault and R. Bowden, “Spelling it out: Real-time asl fingerspelling recognition,”
in Computer Vision Workshops (ICCV Workshops), pp. 1114–1119, 2011.
[20] K. O. Rodriguez and G. C. Chavez, “Finger spelling recognition from rgb-d information using kernel descriptor,” in Graphics, Patterns and Images (SIBGRAPI), 2013 26th
SIBGRAPI-Conference on, pp. 1–7, 2013.
[21] Q. Gao, J. Liu, Z. Ju, Y. Li, T. Zhang, and L. Zhang, “Static hand gesture recognition with
parallel cnns for space human-robot interaction,” in International Conference on Intelligent Robotics and Applications, pp. 462–473, Springer, 2017.
[22] M. Ma, X. Xu, J. Wu, and M. Guo, “Design and analyze the structure based on deep belief
network for gesture recognition,” in Advanced Computational Intelligence (ICACI), 2018
Tenth International Conference on, pp. 40–44, IEEE, 2018.
[23] S.-Z. Li, B. Yu, W. Wu, S.-Z. Su, and R.-R. Ji, “Feature learning based on sae–pca network
for human gesture recognition in rgbd images,” Neurocomputing, vol. 151, pp. 565–573,
2015.
[24] C. Liu, B. Zoph, J. Shlens, W. Hua, L. Li, L. Fei-Fei, A. L. Yuille, J. Huang, and K. Murphy,
“Progressive neural architecture search,” CoRR, vol. abs/1712.00559, 2017.
[25] E. Real, A. Aggarwal, Y. Huang, and Q. V. Le, “Regularized evolution for image classifier
architecture search,” CoRR, vol. abs/1802.01548, 2018.
68
[26] L.-C. Hsu, C.-T. Chiu, K.-T. Lin, H.-H. Chou, and Y.-Y. Pu, “Essa: An energy-aware
bit-serial streaming deep convolutional neural network accelerator,” Journal of Systems
Architecture, vol. 111, p. 101831, 2020.
[27] S. Lim, I. Kim, T. Kim, C. Kim, and S. Kim, “Fast autoaugment,” CoRR,
vol. abs/1905.00397, 2019.
[28] J. Deng, W. Dong, R. Socher, L. jia Li, K. Li, and L. Fei-fei, “Imagenet: A large-scale
hierarchical image database,” in In CVPR, 2009.
[29] R. Mormont, P. Geurts, and R. Marée, “Comparison of deep transfer learning strategies
for digital pathology,” in 2018 IEEE/CVF Conference on Computer Vision and Pattern
Recognition Workshops (CVPRW), pp. 2343–234309, 2018.
[30] K. Lee, X. He, L. Zhang, and L. Yang, “Cleannet: Transfer learning for scalable image
classifier training with label noise,” CoRR, vol. abs/1711.07131, 2017.
[31] O. M. Parkhi, A. Vedaldi, A. Zisserman, and C. V. Jawahar, “Cats and dogs,” in IEEE
Conference on Computer Vision and Pattern Recognition, 2012.
[32] J. Krause, M. Stark, J. Deng, and L. Fei-Fei, “3d object representations for fine-grained
categorization,” 11 2013.
[33] M. . Nilsback and A. Zisserman, “A visual vocabulary for flower classification,” in
2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition
(CVPR’06), vol. 2, pp. 1447–1454, 2006.
[34] B. Wu, F. N. Iandola, P. H. Jin, and K. Keutzer, “Squeezedet: Unified, small, low power
fully convolutional neural networks for real-time object detection for autonomous driving,” CoRR, vol. abs/1612.01051, 2016.
[35] R. B. Girshick, J. Donahue, T. Darrell, and J. Malik, “Rich feature hierarchies for accurate
object detection and semantic segmentation,” CoRR, vol. abs/1311.2524, 2013.
[36] K. Simonyan and A. Zisserman, “Two-stream convolutional networks for action recognition in videos,” CoRR, vol. abs/1406.2199, 2014.
[37] J. Carreira, P. Agrawal, K. Fragkiadaki, and J. Malik, “Human pose estimation with iterative error feedback,” CoRR, vol. abs/1507.06550, 2015.
[38] J. Dai, K. He, and J. Sun, “Instance-aware semantic segmentation via multi-task network
cascades,” CoRR, vol. abs/1512.04412, 2015.
[39] P. Weinzaepfel, J. Revaud, Z. Harchaoui, and C. Schmid, “Deepflow: Large displacement
optical flow with deep matching,” in 2013 IEEE International Conference on Computer
Vision, pp. 1385–1392, 2013.
[40] J. Donahue, L. A. Hendricks, S. Guadarrama, M. Rohrbach, S. Venugopalan, K. Saenko,
and T. Darrell, “Long-term recurrent convolutional networks for visual recognition and
description,” CoRR, vol. abs/1411.4389, 2014.
[41] A. Karpathy and F. Li, “Deep visual-semantic alignments for generating image descriptions,” CoRR, vol. abs/1412.2306, 2014.
69
[42] K. Chatfield, K. Simonyan, A. Vedaldi, and A. Zisserman, “Return of the devil in the
details: Delving deep into convolutional nets,” CoRR, vol. abs/1405.3531, 2014.
[43] K. He, G. Gkioxari, P. Dollár, and R. B. Girshick, “Mask R-CNN,” CoRR,
vol. abs/1703.06870, 2017.
[44] J. Huang, V. Rathod, C. Sun, M. Zhu, A. Korattikara, A. Fathi, I. Fischer, Z. Wojna,
Y. Song, S. Guadarrama, and K. Murphy, “Speed/accuracy trade-offs for modern convolutional object detectors,” CoRR, vol. abs/1611.10012, 2016.
[45] K. Han, Y. Wang, Q. Tian, J. Guo, C. Xu, and C. Xu, “Ghostnet: More features from cheap
operations,” pp. 1577–1586, 06 2020.
[46] G. Huang, S. Liu, L. van der Maaten, and K. Q. Weinberger, “Condensenet: An efficient
densenet using learned group convolutions,” CoRR, vol. abs/1711.09224, 2017.
[47] Q. Wang, B. Wu, P. Zhu, P. Li, W. Zuo, and Q. Hu, “Eca-net: Efficient channel attention
for deep convolutional neural networks,” in 2020 IEEE/CVF Conference on Computer
Vision and Pattern Recognition (CVPR), (Los Alamitos, CA, USA), pp. 11531–11539,
IEEE Computer Society, jun 2020.
[48] R. J. Wang, X. Li, S. Ao, and C. X. Ling, “Pelee: A real-time object detection system on
mobile devices,” CoRR, vol. abs/1804.06882, 2018.
[49] H. Liu, K. Simonyan, and Y. Yang, “DARTS: Differentiable architecture search,” in International Conference on Learning Representations, 2019.
[50] X. Zhang, X. Zhou, M. Lin, and J. Sun, “Shufflenet: An extremely efficient convolutional
neural network for mobile devices,” CoRR, vol. abs/1707.01083, 2017.
[51] B. Zoph, V. Vasudevan, J. Shlens, and Q. V. Le, “Learning transferable architectures for
scalable image recognition,” CoRR, vol. abs/1707.07012, 2017.
[52] C. Liu, B. Zoph, J. Shlens, W. Hua, L. Li, L. Fei-Fei, A. L. Yuille, J. Huang, and K. Murphy,
“Progressive neural architecture search,” CoRR, vol. abs/1712.00559, 2017.
[53] S. You, T. Huang, M. Yang, F. Wang, C. Qian, and C. Zhang, “Greedynas: Towards fast
one-shot nas with greedy supernet,” 03 2020.
[54] H. Cai, L. Zhu, and S. Han, “Proxylessnas: Direct neural architecture search on target task
and hardware,” CoRR, vol. abs/1812.00332, 2018.
[55] B.Wu,X.Dai,P.Zhang,Y.Wang,F.Sun,Y.Wu,Y.Tian,P.Vajda,Y.Jia,andK.Keutzer, “Fbnet: Hardware-aware efficient convnet design via differentiable neural architecture search,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), June 2019.

電子全文
中英文摘要

推文
推薦
評分
引用網址
轉寄

top

詳目顯示

相關論文