帳號:guest(3.23.102.70)          離開系統
字體大小: 字級放大   字級縮小   預設字形  

詳目顯示

以作者查詢圖書館館藏以作者查詢臺灣博碩士論文系統以作者查詢全國書目
作者(中文):盧允凡
作者(外文):Lu, Yun-Fan
論文名稱(中文):基於權重剪枝下的深度學習硬體加速器之工作排程問題
論文名稱(外文):Job Scheduling Based on Weight Pruning in a Deep Learning Accelerator
指導教授(中文):黃婷婷
指導教授(外文):Hwang, TingTing
口試委員(中文):吳中浩
黃稚存
口試委員(外文):Wu, Allen C.-H.
Huang, Chih-Tsun
學位類別:碩士
校院名稱:國立清華大學
系所名稱:資訊系統與應用研究所
學號:107065507
出版年(民國):109
畢業學年度:108
語文別:英文
論文頁數:39
中文關鍵詞:深度學習加速器權重剪枝排程問題卷積神經網路
外文關鍵詞:Deep Learning AcceleratorWeight PruningScheduling ProblemConvolutional Neural Network
相關次數:
  • 推薦推薦:0
  • 點閱點閱:378
  • 評分評分:*****
  • 下載下載:12
  • 收藏收藏:0
深度學習(Deep Learning, DL)在許多領域已取得突破性發展,然而高效能運算 是實現人工智能應用的關鍵。先前研究發現深度神經網路(Deep Neural Network, DNN)中有許多零或非常接近零的權重(Weight)。在深度學習硬體加速器(Deep Learning Accelerator)設計時,將這些權重刪除可以大幅提升運算效能,即為權 重剪枝(Weight Pruning)。但是即使相同的神經網路模型在不同的應用下,模型 的參數也會有所差異。這些差異會造成硬體設計上的不同,而有不同的工作排程 (Job Scheduling)需求。為了縮小硬體設計時間成本,以自動化方法分析並歸納 出適當的工作排程顯得十分重要。我們以權重剪枝(Weight Pruning)為基礎,實 現硬體資源的最佳化問題,並且探討硬體架構下的效能指標,以及提出工作排程 問題(Job Scheduling Problem)的解決方法。
Application of Deep Learning (DL) has achieved a huge breakthrough in many fields. Many innovative DL applications require efficient computation. Previous work has found that neural networks of DL have many zero and near to zero weights. These weights can be deleted, i.e. weight pruning, to improve computation efficiency of the deep neural networks (DNNs). Also, different neural network model varies from one to one. This leads to the difficulty in the hardware design and job scheduling. Thus, using an automation technology to analyze and support the hardware accelerator design flow may be helpful. In this work, we study an optimization problem based on weight pruning technology, discuss the performance of hardware design, and propose a solution to a job scheduling problem.
Abstract
Contents
1 Introduction ----------------------------------------- 1
2 Previous Work ---------------------------------------- 3
3 Motivation ------------------------------------------- 5
4 CNN Basics ------------------------------------------- 13
5 Target Architecture ---------------------------------- 18
6 Problem Definition ----------------------------------- 22
7 Algorithm -------------------------------------------- 24
7.1 Load balancing scheduler ----------------------- 24
7.2 Computation of a layer ------------------------- 25
8 Experimental Results --------------------------------- 28
8.1 Experiment for one layer of the CNN network ---- 28
8.2 A trade-off by using unrolling approach -------- 30
8.3 Experiment for the whole CNN network ----------- 33
9 Conclusions ------------------------------------------ 36


[1] Norman P. Jouppi, Cliff Young, Nishant Patil, David Patterson, Gaurav Agrawal, Ra- minder Bajwa, Sarah Bates, Suresh Bhatia, Nan Boden, Al Borchers, Rick Boyle, Pierre- luc Cantin, Clifford Chao, Chris Clark, Jeremy Coriell, Mike Daley, Matt Dau, Jeffrey Dean, Ben Gelb, Tara Vazir Ghaemmaghami, Rajendra Gottipati, William Gulland, Robert Hagmann, C. Richard Ho, Doug Hogberg, John Hu, Robert Hundt, Dan Hurt, Julian Ibarz, Aaron Jaffey, Alek Jaworski, Alexander Kaplan, Harshit Khaitan, Daniel Killebrew, Andy Koch, Naveen Kumar, Steve Lacy, James Laudon, James Law, Diemthu Le, Chris Leary, Zhuyuan Liu, Kyle Lucke, Alan Lundin, Gordon MacKean, Adri- ana Maggiore, Maire Mahony, Kieran Miller, Rahul Nagarajan, Ravi Narayanaswami, Ray Ni, Kathy Nix, Thomas Norrie, Mark Omernick, Narayana Penukonda, Andy Phelps, Jonathan Ross, Matt Ross, Amir Salek, Emad Samadiani, Chris Severn, Gre- gory Sizikov, Matthew Snelham, Jed Souter, Dan Steinberg, Andy Swing, Mercedes Tan, Gregory Thorson, Bo Tian, Horia Toma, Erick Tuttle, Vijay Vasudevan, Richard Walter, Walter Wang, Eric Wilcox, and Doe Hyun Yoon. In-datacenter performance analysis of a tensor processing unit. In Proceedings of the 44th Annual International Symposium on Computer Architecture, ISCA ’17, page 1–12, New York, NY, USA, 2017. Association for Computing Machinery.
[2] Song Han, Jeff Pool, John Tran, and William J. Dally. Learning both weights and connections for efficient neural networks. CoRR, abs/1506.02626, 2015.
[3] L. Du, Y. Du, Y. Li, J. Su, Y. Kuan, C. Liu, and M. F. Chang. A reconfigurable streaming deep convolutional neural network accelerator for internet of things. IEEE Transactions on Circuits and Systems I: Regular Papers, 65(1):198–208, Jan 2018.
[4] Yann Lecun, J. S. Denker, Sara A. Solla, R. E. Howard, and L.D. Jackel. Optimal brain damage. In David Touretzky, editor, Advances in Neural Information Processing Systems (NIPS 1989), Denver, CO, volume 2. Morgan Kaufmann, 1990.
[5] Suyog Gupta, Ankur Agrawal, Kailash Gopalakrishnan, and Pritish Narayanan. Deep learning with limited numerical precision. CoRR, abs/1502.02551, 2015.
[6] Song Han, Huizi Mao, and William J. Dally. Deep compression: Compressing deep neural network with pruning, trained quantization and huffman coding. CoRR, abs/1510.00149, 2015.
[7] Forrest N. Iandola, Matthew W. Moskewicz, Khalid Ashraf, Song Han, William J. Dally, and Kurt Keutzer. Squeezenet: Alexnet-level accuracy with 50x fewer parameters and <1mb model size. CoRR, abs/1602.07360, 2016.
[8] Andrew G. Howard, Menglong Zhu, Bo Chen, Dmitry Kalenichenko, Weijun Wang, To- bias Weyand, Marco Andreetto, and Hartwig Adam. Mobilenets: Efficient convolutional neural networks for mobile vision applications. CoRR, abs/1704.04861, 2017.
[9] Jeremy Fowers, Kalin Ovtcharov, Michael Papamichael, Todd Massengill, Ming Liu, Daniel Lo, Shlomi Alkalay, Michael Haselman, Logan Adams, Mahdi Ghandi, Stephen Heil, Prerak Patel, Adam Sapek, Gabriel Weisz, Lisa Woods, Sitaram Lanka, Steven K. Reinhardt, Adrian M. Caulfield, Eric S. Chung, and Doug Burger. A configurable cloud- scale dnn processor for real-time ai. In Proceedings of the 45th Annual International Symposium on Computer Architecture, ISCA ’18, page 1–14. IEEE Press, 2018.
[10] Y. Chen, T. Krishna, J. S. Emer, and V. Sze. Eyeriss: An energy-efficient reconfig- urable accelerator for deep convolutional neural networks. IEEE Journal of Solid-State Circuits, 52(1):127–138, Jan 2017.
[11] Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton. Imagenet classification with deep convolutional neural networks. In F. Pereira, C. J. C. Burges, L. Bottou, and K. Q. Weinberger, editors, Advances in Neural Information Processing Systems 25, pages 1097–1105. Curran Associates, Inc., 2012.
[12] John L Hennessy and David A Patterson. Computer architecture: a quantitative ap- proach. Elsevier, 2017.
 
 
 
 
第一頁 上一頁 下一頁 最後一頁 top
* *