基於卷積神經網路加速器之塊狀結構化剪枝__國立清華大學博碩士論文全文影像系統

帳號：guest(216.73.216.146) 離開系統

字體大小：

詳目顯示

第 1 筆 / 共 1 筆

/1頁

以作者查詢圖書館館藏

、以作者查詢臺灣博碩士論文系統

、以作者查詢全國書目

論文基本資料
摘要
外文摘要
論文目次
參考文獻
電子全文

作者(中文):	楊哲彰
作者(外文):	Yang, Che-Chang
論文名稱(中文):	基於卷積神經網路加速器之塊狀結構化剪枝
論文名稱(外文):	Hardware-Aware Block-Based Pruning Method for Convolutional Neural Networks
指導教授(中文):	黃稚存
指導教授(外文):	Huang, Chih-Tsun
口試委員(中文):	謝明得劉靖家
口試委員(外文):	Shieh, Ming-Der Liou, Jing-Jia
學位類別:	碩士
校院名稱:	國立清華大學
系所名稱:	資訊工程學系
學號:	107062553
出版年(民國):	110
畢業學年度:	109
語文別:	英文
論文頁數:	44
中文關鍵詞:	神經網路、剪枝、結構化剪枝、加速器
外文關鍵詞:	neural network、pruning、structure pruning、accelerator
相關次數:	推薦:0 點閱:833 評分: 下載:0 收藏:0

近年來，卷積神經網路的成功應用在許多人工智能的領域，由於卷積神經網
路強大的特徵提取能力，許多複雜的應用，例如圖像分類、物件偵測、人臉辨識
及自動駕駛等技術皆得以完成。但是卷積神經網路的結果是透過大量的計算來實
現，因此帶來龐大的能源消耗及延遲等問題。
為了加速神經網路，許多研究致力於開發神經網路加速器。Eyeriss 是一個可
以提高能源效率的卷積神經網路加速器，透過行固定的方式提高數據重用以及
DRAM，Global Buffer(GLB), Scratch Pad 的階層式設計減少記憶體存取藉以
降低能量損耗。但即便有了加速器的輔助，神經網路的消耗依然巨大以至於難以
應用於即時系統。
神經網路剪枝為其中一種解決此問題的方法，透過刪剪掉冗餘的參數，使大部
分權重歸零，藉以省去運算，並維持原先的準確度，達到減少能源及時間消耗的
目標。
過去的研究也利用剪枝達到壓縮模型的目的，然而在剪枝時，如果選擇細粒
度的方式進行剪枝，因為較高的不規則性，使權重分布零散，難以實現真正的加
速，如果選擇粗粒度方式進行剪枝，則會因為權重分布的限制造成準確度大幅下
降，進而影響壓縮率。
在這篇論文中，我們主要貢獻有三，首先使用 Eyeriss 作為加速器，考慮
Eyeriss 的資料流提出介於 kernel 與 filter 之間的粒度 - 塊，以塊為單位將不重
要的權重歸零，並且對剩餘的塊進行重排，跳過被歸零的塊，減少 Processing
Pass 的數量。再者，如果直接採用密集模型的參數，可能會因為排列時在特定參
數組合會無法減少 Processing Pass 的數量導致 Processing Element (PE) 使用
i
率下降而使加速效果不彰，我們透過在搜尋參數時候考慮剪枝後的結果，找出特
定稀疏度下最適合加速的參數以達加速效果的最大化。第三，在搜尋參數時候可
能會有許多組的參數具有類似的延遲，但具有懸殊的粒度差距，我們可以容忍些
微的延遲以換取更小的粒度得到更高的準確度，進一步提高我們的準確度。
實驗證明，我們的方法在 VGG16, ResNet-50, DenseNet-201 上取得巨大的
成功，在稀疏度 0.3，ResNet-50, DenseNet-201 皆可獲得比原本更高的準確度，
並取得 6% 及 4% 的加速，ResNet-50 在稀疏度 0.5 的實驗下獲得 75.82% 的準
確度以及 45% 的延遲減少，不僅在同稀疏度的準確度上贏過其他方法，比起其
他方法在稀疏度 0.3，也獲得了更高的準確以及更多的加速。

The considerable success of Convolution Neural Networks (CNNs) has led to the rapid development of artificial intelligence (AI). Thanks to the powerful feature extraction capability,
many advanced applications such as image classification, objection detection, face recognition, and autonomous driving can become practical. The superior performance of CNNs
comes from vast amounts of computation and results in gigantic power consumption and
latency, limiting their deployment on mobile devices and embedded systems
Among many neural network accelerators, MIT Eyeriss is the one with better energy
efficiency. It adopts row-stationary dataflow with a three-stage memory hierarchy for the
reuse data, which reduces off-chip DRAM accesses with high power consumption. In addition to the hardware acceleration, the CNN model reduction also plays an important role in
improving the latency of CNN inference. Network pruning is one of the key techniques to
solve this problem. Pruning improves power consumption and latency without significant
accuracy drop by zeroing the redundant or small weights. Many previous works have focused on reducing the computation complexity of the neural network. However, fine-grained
pruning is hard to accelerate due to the high irregularity; coarse-grained pruning bounds the
distribution of weights and increases the accuracy drop.
This thesis has the following three contributions. First, we propose a block-based pruning
that can limit the pruning granularity on the accelerator with row-stationary dataflow. The
size of one block is in between the size of the kernel and the entire filter. After zeroing
unimportant filter blocks, we merge the remaining blocks into the new processing pass and
skip the pruned blocks. Besides, the pruning is performed and balanced on processing groups.
The number of processing passes can be effectively reduced by the pruning ratio. Second, we
relax the searching of scheduling parameters for the accelerator to achieve a better pruning
efficiency, instead of seeking the shortest latency. This pruning-aware scheduling deals with
particular convolution layers when the scheduling with the shortest latency results in one
single processing pass. Third, we evaluate the pruning granularity among different scheduling
results within a certain latency margin, trading off the latency and pruning granularity, which
affects the accuracy drop.
iii
Our method achieves a significant improvement in ResNet-50, VGG16, and DenseNet201, respectively. On ResNet-50, we can reach higher accuracy over other state-of-the-art
approaches with the sparsity of 0.3, 0.5, and 0.7.
The accuracy of our pruned model is 0.57% higher than FPGM with the sparsity of 0.3;
1.92% higher than Thinet-50 with the sparsity of 0.5; 1.65% higher than Thinet-30 with
the sparsity of 0.7. The speedup is up to 2.22× with 2.91% accuracy drop. On VGG16,
we achieve 2.17×speedup with 1.90% accuracy drop. On DenseNet-201, we achieve 1.43×
speedup with 1.06% accuracy drop.

Contents
1 Introduction 1
1.1 Motivation 1
1.2 Contribution 2
2 Previous Works 3
2.1 Network Pruning 3
2.2 Eyeriss 5
3 Pruning Granularity 6
3.1 Granularity 6
3.2 Latency Tolerance 9
4 Processing Flow 12
4.1 Processing Pass Merging 13
4.1.1 Merging 13
4.1.2 Pruning Constraint 14
4.1.3 Estimation 17
4.1.4 Index and Remainder Handling 18
4.1.5 Dataflow 20
4.2 Single Pass Processing Group 22
4.3 Pruning-Aware Scheduling (PAS) 23
5 Experiment 26
5.1 Experiment Setup 26
5.2 Experiment Result 28
5.2.1 The availability of PAS 28
5.2.2 The availability of latency tolerance 29
5.2.3 Other pruning methods 32
5.2.4 Different Networks 33
5.2.5 DRAM Bandwidth Compare with NVDLA 38
6 Conclusion and Feature Works 40
6.1 Conclusion 40
6.2 Feature Works 41
6.2.1 Sensitivity between Layers 41
6.2.2 Different Method to Estimate the Importance of Blocks 41

[1] A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet classification with deep
convolutional neural networks,” in Proceedings of the 25th International Conference on Neural Information Processing Systems - Volume 1, ser. NIPS’12. USA:
Curran Associates Inc., 2012, pp. 1097–1105.
[2] K. Simonyan and A. Zisserman, “Very deep convolutional networks for large-scale
image recognition,” CoRR, vol. abs/1409.1556, 2014.
[3] Z. Zhao, P. Zheng, S. Xu, and X. Wu, “Object detection with deep learning: A
review,” IEEE Transactions on Neural Networks and Learning Systems, vol. 30,
2019.
[4] Y. Kim, “Convolutional neural networks for sentence classification,” October 2014.
[5] C. Chen, A. Seff, A. Kornhauser, and J. Xiao, “Deepdriving: Learning affordance
for direct perception in autonomous driving,” Dec 2015, pp. 2722–2730.
[6] Y. Chen, T. Krishna, J. S. Emer, and V. Sze, “Eyeriss: An energy-efficient reconfigurable accelerator for deep convolutional neural networks,” IEEE Journal
of Solid-State Circuits, vol. 52, no. 1, pp. 127–138, Jan 2017.
[7] K. Simonyan and A. Zisserman, “Overy deep convolutional networks for largescale image recognition,” ICLR, 2015.
[8] J. Wu, C. Leng, Y. Wang, Q. Hu, and J. Cheng, “Quantized convolutional neural
networks for mobile devices,” November 2016.
[9] Z. Liu, J. Li, Z. Shen, G. Huang, S. Yan, and C. Zhang, “Learning efficient
convolutional networks through network slimming,” ICCV, 2017.
[10] H. Li, A. Kadav, I. Durdanovic, H. Samet, and H. P. Graf, “Pruning filters for
efficient convnetss,” ICLR, 2017.
[11] M. Zhu and S. Gupta, “To prune, or not to prune: exploring the efficacy of pruning
for model compression,” arXiv e-prints, p. arXiv:1710.01878, Oct. 2017.
[12] J. Frankle and M. Carbin, “The lottery ticket hypothesis: Finding sparse, trainable neural networks,” ICLR, 2019.
[13] S. Han, H. Mao, and W. J. Dally, “Deep compression: Compressing deep neural
network with pruning, trained quantization and huffman coding,” CoRR, vol.
abs/1510.00149, 2015.
[14] H. Mao, S. Han, J. Pool, W. Li, X. Liu, Y. Wang, and W. J. Dally, “Exploring
the granularity of sparsity in convolutional neural networks,” CVPR, 2017.
[15] Y. He, X. Zhang, and J. Sun, “Channel pruning for accelerating very deep neural
networks,” ICCV, 2017.
[16] Y. LeCun, J. Denker, and S. A. Solla, “Optimal brain damage.” NIP’s, vol. 89,
1989.
[17] S. Han, J. Pool, J. Tran, and W. J. Dally, “Learning both weights and connections
for efficient neural networks,” CoRR, vol. abs/1506.02626, 2015.
[18] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” CoRR, vol. abs/1512.03385, 2015.
[19] Y. He, G. Kang, X. Dong, Y. Fu, and Y. Yang, “Soft filter pruning for accelerating
deep convolutional neural networks,” IJCIA, 2018.
[20] J. Luo, J. Wu, and W. Lin, “Thinet: A filter level pruning method for deep neural
network compression,” ICCV, 2017.
[21] Y. He, P. Liu, Z. Wang, Z. Hu, and Y. Yang, “Filter pruning via geometric median
for deep convolutional neural networks acceleration,” CVPR, 2019.
[22] G. Huang, Z. Liu, L. van der Maaten, and K. Q. Weinberger, “Densely connected
convolutional networks,” CVPR, 2017.
[23] NVIDIA, NVDLA Primer, http://nvdla.org/primer.html.

電子全文
中英文摘要

推文
推薦
評分
引用網址
轉寄

top

詳目顯示

相關論文