基於卷積神經網路加速器之列組式結構化剪枝__國立清華大學博碩士論文全文影像系統

帳號：guest(216.73.216.146) 離開系統

字體大小：

詳目顯示

第 1 筆 / 共 1 筆

/1頁

以作者查詢圖書館館藏

、以作者查詢臺灣博碩士論文系統

、以作者查詢全國書目

論文基本資料
摘要
外文摘要
論文目次
參考文獻
電子全文

作者(中文):	張哲豪
作者(外文):	Chang, Che-Hao
論文名稱(中文):	基於卷積神經網路加速器之列組式結構化剪枝
論文名稱(外文):	Row-Bank Structure Pruning for CNN Inference Accelerator
指導教授(中文):	黃稚存
指導教授(外文):	Huang, Chih-Tsun
口試委員(中文):	劉靖家謝明得
口試委員(外文):	Liou, Jing-Jia Shieh, Ming-Der
學位類別:	碩士
校院名稱:	國立清華大學
系所名稱:	資訊工程學系
學號:	106062592
出版年(民國):	109
畢業學年度:	108
語文別:	英文
論文頁數:	59
中文關鍵詞:	結構化剪枝、神經網路加速器
外文關鍵詞:	Structure Pruning、CNN Inference Accelerator
相關次數:	推薦:0 點閱:515 評分: 下載:0 收藏:0

卷積神經網路為機械學習中的一種基底技術，各領域中使用機械學習的方法幾乎會應用到，例如圖像分類、物件偵測、人臉辨識及自動駕駛等技術。然而卷積神經網路運算結果的過程是透過密集計算模型實現，因此在平台上運行時會帶來延遲及高能源消耗的問題。稀疏神經網路為其中一種舒緩此問題的解法，透過剪取神經網路中冗餘的連結，使得大量的權重變為零，並保持在準確率與原本神經網路相差不多的狀態，這樣的方式可以讓我們將資料壓縮，跳過冗餘的運算。
過去的研究也利用剪枝達到壓縮模型的目的，然而在剪枝時，採用的為細粒度的方式進行剪枝，造成被剪取掉的權重分布零散，帶來不規律性的問題，使得針對細粒度剪枝的模型在加速器上難以設計，因此目標為設計具有稀疏規律性的加速器，並對應加速器中設計的粒度進行結構化剪枝。
在這篇論文中，使用基於行式核心的卷積神經網路加速器，可以跳過權重為 0 的核心行運算，以增進效能。將稀疏核心壓縮成逐行的資料格式，並將格式中非零的行資料儲存，同時對非零行的位置進行編碼儲存，由於非零行的位置存在不規律性，我們固定每個輸出結果的位置及大小，在加速器中設計一個積累模組，先將先前非零行位置的編碼進行解碼取得非零行的位置，在將運算單元的輸出結果與該位置的值進行累加存回。由於使用行式剪枝得到的模型在架構上會產生分配不平衡，造成額外的延遲負擔。為了處理這個問題，我們針對剪枝提出一套結構化優化流程以改善效能，包含負載平衡、選擇使得利用率最大化的稀疏度。

Convolutional neural networks (CNNs) have become essential components in deep learning applications, such as image classification, objection detection, face recognition, autonomous driving, and so on. CNN models consist of intensive computation, which requires not only long latency but also high energy. Therefore, various technologies have been developed to reduce and optimize the computation complexity of CNN. Model pruning for sparse neural networks is one of the solutions that maximize the number of zero weights with minimized accuracy drop.
In recent years, many research works have been presented to exploit efficient pruning techniques to compress data and skip redundant computation. Among them, fine-grained pruning, or element-wise pruning, achieves the highest model accuracy. However, it will bring the irregular data arrangement and make the design of hardware accelerators challenging.
We developed the pruning scheme to zero out row-wise kernels and adopt the CNN accelerator to support such compressed CNN models. Our approach with row-bank data arrangement and non-zero row location for decompression can skip zero-row computation of 2D filters to improve the performance effectively. For 1x1 convolution, we group kernels along the input channels to increase the reusability. Also, the memory buffer for output partial sums is partitioned into memory banks. With the accumulation module outside the processing element (PE) array, the memory buffer provides the row-bank data arrangement with high efficiency. Our row-bank data arrangement also prevents the issue of load imbalance that the traditional row-wise pruning methods suffered. Furthermore, we exploited the strategy to adjust the sparsity of each layer with its sensitivity index. The accuracy can be higher than the pruning with fixed sparsity of all layers.
The proposed structure pruning achieved a greater performance-energy improvement with less accuracy drop compared with state-of-the-art techniques. Our approach with row bank pruning obtained about 2× latency improvement compared with the pruning approach
called Skim-Caffe.
The energy-delay products (EDPs) of our target hardware accelerator with the proposed structure pruned models can be as low as 39%, 46% and 57% in ResNet-50, GoogleNet and DenseNet-201, respectively, compared with the original CNN models running on the row stationary accelerator with a small accuracy drop. The experimental results also show that with the accuracy drop of 1%, we can further increase the sparsity of CNNs. The resultant EDPs can be 19%, 25%, and 31% in ResNet-50, GoogleNet, and DenseNet-201, respectively. The proposed structure pruning approach can provide optimized performance and energy trade-off on the CNN hardware accelerator.

誌謝 I
中文摘要 II
Abstract III
List of Tables IV
List of Figures VI
Chapter 1 INTRODUCTION 1
1.1 Motivation 1
1.2 Contribution 2
Chapter 2 Previous WORK 4
2.1 Network Pruning 4
2.2 Eyeriss 5
Chapter 3 Architecture 7
3.1 Architecture 7
3.1.1 PE structrue 8
3.1.2 Accumulation Module 9
3.1.3 SRAM 11
3.2 Data flow 13
3.2.1 Example of processing data flow 16
3.3 Weight Grouping 23
3.4 Scaling PE array 25
4 Pruning Methodology 26
4.1 Granularity 26
4.2 Encoding and decoding nonzero rows 28
4.3 Pruning flows 32
5 Experiment 38
5.1 Experiment Setup 38
5.2 Experiment Result 39
5.2.1 Compare latency between different granularity 39
5.2.2 Imbalance Analysis 40
5.2.3 Sensitivity Analysis 43
5.2.4 Other pruning method 44
5.2.5 Pruning both Convolutional layer and Fully Connected layer on ResNet-50 46
5.2.6 Comparison with Eyeriss 48
5.2.7 Scaling PE Array 50
5.2.8 Different saliency rules 54
6 Conclusion and Future Works 55
6.1 Conclusion 55
6.2 Future Works 56

[1] Y. L. Y. Bengio and G. Hinton, “Deep learning,” Nature, vol. 521, 2015.
[2] K. Simonyan and A. Zisserman,“Very deep convolutional networks for large-scale image recognition,” CoRR, vol. abs/1409.1556, 2014.
[3] Z.Zhao, P.Zheng, S.Xu, andX. Wu, “Object detection with deep learning: A review,” IEEE Transactions on Neural Networks and Learning Systems, vol. 30, 2019.
[4] F. Schroff, D. Kalenichenko, and J. Philbin, “Facenet: A unified embedding for face recognition and clustering,” CVPR, 2015.
[5] C. Chen, A. Seff, A. Kornhauser, and J. Xiao, “Deepdriving: Learning affordance for direct perception in autonomous driving,” Dec 2015, pp. 2722–2730.
[6] A. Krizhevsky, I. Sutskever, and G. E. Hinton,“Imagenet classification with deep convolutional neural networks,”in Proceedings of the 25th International Conferenceon Neural Information Processing Systems - Volume 1, ser. NIPS’12. USA: Curran Associates Inc., 2012, pp. 1097–1105.
[7] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” CoRR, vol. abs/1512.03385, 2015.
[8] C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. E. Reed, D. Anguelov, D. Erhan, V. Vanhoucke, and A. Rabinovich, “Going deeper with convolutions,” CoRR, vol. abs/1409.4842, 2014.
[9] Y. Chen, T. Krishna, J. S. Emer, and V. Sze, “Eyeriss: An energy-efficient reconfigurable accelerator for deep convolutional neural networks,” IEEE Journal of Solid-State Circuits, vol. 52, no. 1, pp. 127–138, Jan 2017.
[10] M. Zhu and S. Gupta, “To prune, or not to prune: exploring the efficacy of pruning for model compression,” arXiv e-prints, p. arXiv:1710.01878, Oct. 2017.
[11] S. Han, X. Liu, H. Mao, J. Pu, A. Pedram, M. A. Horowitz, and W. J. Dally, “EIE: efficient inference engine on compressed deep neural network,”CoRR,vol.abs/1602.01528, 2016.
[12] S. Zhang, Z. Du, L. Zhang, H. Lan, S. Liu, L. Li, Q. Guo, T. Chen, and Y. Chen, “Cambricon-x: An accelerator for sparse neural networks,” in 2016 49th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO), Oct 2016, pp. 1–12.
[13] J. Albericio, P. Judd, T. Hetherington, T. Aamodt, N. E. Jerger, and A. Moshovos, “Cnvlutin: Ineffectual-neuron-free deep neural network computing,” in 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA), June 2016, pp. 1–13.
[14] D. Kim, J. Ahn, and S. Yoo, “Zena: Zero-aware neural network accelerator,” IEEE Design Test, vol. 35, no. 1, pp. 39–46, Feb 2018.
[15] A. Parashar, M. Rhu, A. Mukkara, A. Puglielli, R. Venkatesan, B. Khailany, J. Emer, S. W. Keckler, and W. J. Dally, “Scnn: An accelerator for compressed-sparse convolutional neural networks,” in 2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA), June 2017, pp. 27–40.
[16] H. Mao, S. Han, J. Pool, W. Li, X. Liu, Y. Wang, and W. J. Dally, “Exploring the Regularity of Sparse Structure in Convolutional Neural Networks, ”ArXiv e-prints,May 2017.
[17] Y. He, X. Zhang, and J. Sun, “Channel pruning for accelerating very deep neural networks,” ICCV, 2017.
[18] J. Luo, J. Wu, and W. Lin, “Thinet: A filter level pruning method for deep neural network compression,” ICCV, 2017.
[19] Y. LeCun, J. Denker, and S. A. Solla, “Optimal brain damage.” NIP’s, vol. 89, 1989.
[20] S. Han, J. Pool, J. Tran, and W. J. Dally, “Learning both weights and connections for efficient neural networks,” CoRR, vol. abs/1506.02626, 2015.
[21] H. Li, A. Kadav, I. Durdanovic, H. Samet, and H. P. Graf, “Pruning filters for efficient convnetss.” ICLR, 2017.
[22] S. Han, H. Mao, and W. J. Dally, “Deepcompression: Compressing deep neural network with pruning, trained quantization and huffman coding,” CoRR, vol. abs/1510.00149, 2015.
[23] J. Park, S. R. Li, W. Wen, H. Li, Y. Chen, and P. Dubey, “Holistic sparsecnn: Forging the trident of accuracy, speed, and size,” CoRR, vol. abs/1608.01409, 2016.
[24] S. R. Li, J. Park, and P. T. P. Tang, “Enabling sparse winograd convolution by native pruning,” CoRR, vol. abs/1702.08597, 2017.
[25] Y. Jia, E. Shelhamer, J. Donahue, S. Karayev, J. Long, R. Girshick, S. Guadarrama, and T. Darrell, “Caffe: Convolutional architecture for fast feature embedding,” arXiv preprint arXiv:1408.5093, 2014.

電子全文
中英文摘要

推文
推薦
評分
引用網址
轉寄

top

詳目顯示

相關論文