帳號:guest(3.144.118.103)          離開系統
字體大小: 字級放大   字級縮小   預設字形  

詳目顯示

以作者查詢圖書館館藏以作者查詢臺灣博碩士論文系統以作者查詢全國書目
作者(中文):陳建臣
作者(外文):Chen, Jian-Chen
論文名稱(中文):實作晶片網路於RISC-V Rocket Chip 平台
論文名稱(外文):A Network-on-Chip Implementation on RISC-V Rocket Chip Platform
指導教授(中文):劉靖家
指導教授(外文):Liou, Jing-Jia
口試委員(中文):黃稚存
呂仁碩
口試委員(外文):Huang, Chih-Tsun
Liu, Ren-Shuo
學位類別:碩士
校院名稱:國立清華大學
系所名稱:電機工程學系
學號:103061607
出版年(民國):107
畢業學年度:106
語文別:英文
論文頁數:81
中文關鍵詞:Rocket Chip 平台晶片網路網路介面
外文關鍵詞:Rocket Chip platformNetwork-on-ChipChiselTileLink network interface
相關次數:
  • 推薦推薦:0
  • 點閱點閱:342
  • 評分評分:*****
  • 下載下載:0
  • 收藏收藏:0
在系統晶片(SoC)中,各式的模組單元皆擁有著不同的頻寬(bandwidth)需求。晶片網路(NoC) 具有著可高度擴充性(scalability)及可提供高頻寬的特性(high bandwidth),因此在系統晶片中,成為了替代傳統匯流排的選擇。

Rocket Chip 平臺是一個以Chisel(下一代新的暫存器層級的硬體描述語言)所寫成的開源系統晶片產生器。而目前Rocket Chip平臺只提供了傳統匯流排來來作為IP間的資訊傳遞與溝通,隨著平台中IP數目的增加,使用者就可能遇到頻寬及擴充性等問題。因此,在此論文中,我們實做出了晶片網路,並將它整合到了Rocket Chip 平臺中。我們採用了OpenSoC Fabric計畫中的晶片網路架構,為了將此晶片網路架構與Rocket Chip 平臺做整合,我們因此對它做了些修正。 除了修正工作外,它缺少了一個很重要的元件,因此,我們另外設計出了網路介面模組(NI)。網路介面模組主要負責將Rocket Chip中所使用的TileLink匯流排協定訊息轉譯成能夠於晶片網路中傳遞的封包型式,同時也扮演了硬體加速器與晶片網路間的溝通介面。 透過此溝通介面,可將原本使用TileLink匯流排協定的各式IP與晶片網路作串聯,使得IP間可成功地透過晶片網路作資訊傳遞與溝通。

我們另外設計出了數個IP,包含了bridge DMA、 線性整流函數(ReLU)加速器、 最大池化(Max Pooling)加速器,以整合到基於晶片網路所實作出的平臺,並用來確認正確性與效能。最後,透過一些實驗,我們將展示出加速器分別透過TileLink介面的匯流排與晶片網路作為資訊傳遞的效能差異。
Network-on-chip (NoC) has emerged as an alternative to traditional bus interconnect for its scalability and high bandwidth, especially in a SoC with a large number of modules with different bandwidth requirements.

In this thesis, we implemented a NoC fabric on Rocket Chip platform, an open-source system-on-chip generator in Chisel (a next-generation RTL language). Based on OpenSoC Fabric project, we designed a network interface (NI) to translate Rocket Chip TileLink bus protocol into packets for NoC transport, and also a packet interface for hardware accelerator modules.

To validate our NoC-based platform, several IPs including bridge DMA, ReLU, and Max Pooling engines are designed to integrate together with a NoC. In the experiments, we will show and compare the performance of the accelerators with TileLink bus and SoC Fabric NoC.
1 Introduction 10
1.1 Thesis Organization 11
2 Background 12
2.1 Chisel 12
2.2 Rocket Chip 13
2.3 TileLink 14
2.3.1 TileLink Protocol Conformance Levels 14
2.3.2 TileLink Uncached Lightweight 15
2.4 OpenSoC Fabric 16
2.5 Convolutional Neural Networks(CNN) 18
2.5.1 Rectified Linear Unit (ReLU) 18
2.5.2 Max Pooling 18
3 Proposed Platforms with Distinct Interconnect - NoC And Bus 20
3.1 Accelerators Communication with Bus on RISC-V Rocket Chip Platform Overview 20
3.2 Bridge DMA 23
3.2.1 Operation Size Discussion 24
3.2.2 Efficiently Utilize Burst Property 25
3.3 Global Buffer And Fragmentor 26
3.3.1 Global Buffer 26
3.3.2 Fragmentor 27
3.4 Accelerator 29
3.4.1 Benifit of Possessing Master Port And Slave Simultaneously 30
3.5 Data Bus 31
3.6 Accelerators Communication with NoC on RISC-V Rocket Chip Platform Overview 31
4 NoC Implementation on Rocket Chip Platform 34
4.1 Approach for Connectiong NoC(OpenSoC Fabric) to Rocket Chip Platform 34
4.2 Router And Mesh NoC 36
4.2.1 Router Architecture 36
4.2.2 Mesh NoC Construction And Router Revision 37
4.3 Injection Queue And Ejection Queue 39
4.3.1 Injection Queue 39
4.3.2 Ejection Queue 40
4.4 Flit Format 41
4.5 Network Interface (NI) 43
4.5.1 Master Network Interface(MNI) 43
4.5.2 Slave Network Interface(SNI) 45
4.6 Request Packet And Response Packet 49
5 Accelerator Implementation 51
5.1 ReLU Engine 51
5.1.1 ReLU Calculation Core 53
5.2 Max Pooling Engine 54
5.2.1 Implementation Idea for Max Pooling Calculation 56
5.2.2 Max Pooling Calculation Architecture 64
5.3 Blocking Feature 67
6 Experiment Result And Verification 68
6.1 Latency Analysis 68
6.1.1 Max Pooling Calculation Latency Measurement 68
6.1.2 ReLU Latency Measurement 68
6.1.3 DMA Operation Latency Measurement 70
6.2 Single Frame Experiment 70
6.2.1 Experimental Environment 70
6.2.2 Experiment Result 71
6.2.2.1 Max Pooling Parameter versu Execution time 72
6.2.2.2 Accelerator Version Comparison for Window Size = 2 ; Stride =
1 and Window Size = 3 ; Stride = 1 73
6.3 Multi-core versus Multi-frame Experiment 74
6.3.1 Experimental Environment 74
6.3.2 Total Execution Time Result 78
6.4 DMA Function Verification 78
7 Conclusions and Future Work 79
7.1 Conclusions 79
7.2 Future Work 79
[1] F. Fatollahi-Fard, D. Donofrio, G. Michelogiannakis, and J. Shalf, “Opensoc fabric: On- chip network generator,” in 2016 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS), April 2016, pp. 194–203.
[2] L. Benini and G. D. Micheli, “Networks on chips: a new soc paradigm,” Computer, vol. 35, no. 1, pp. 70–78, Jan 2002.
[3] W. J. Dally and B. Towles, “Route packets, not wires: on-chip interconnection networks,” in Proceedings of the 38th Design Automation Conference (IEEE Cat. No.01CH37232), 2001, pp. 684–689.
[4] K. Asanovi, R. Avizienis, J. Bachrach, S. Beamer, D. Biancolin, C. Celio, H. Cook,
D. Dabbelt, J. Hauser, A. Izraelevitz, S. Karandikar, B. Keller, D. Kim, J. Koenig, Y. Lee,
E. Love, M. Maas, A. Magyar, H. Mao, M. Moreto, A. Ou, D. A. Patterson, B. Richards,
C. Schmidt, S. Twigg, H. Vo, and A. Waterman, “The rocket chip generator,” EECS Department, University of California, Berkeley, Tech. Rep. UCB/EECS-2016-17, Apr 2016. [Online]. Available: http://www2.eecs.berkeley.edu/Pubs/TechRpts/2016/EECS-2016-17. html
[5] B. Keller, M. Cochet, B. Zimmer, Y. Lee, M. Blagojevic, J. Kwak, A. Puggelli, S. Bailey,
P. F. Chiu, P. Dabbelt, C. Schmidt, E. Alon, K. Asanovi, and B. Nikoli, “Sub-microsecond adaptive voltage scaling in a 28nm fd-soi processor soc,” in ESSCIRC Conference 2016: 42nd European Solid-State Circuits Conference, Sept 2016, pp. 269–272.

[6] H. Mao, “Hardware acceleration for memory to memory copies,” Master’s thesis, EECS Department, University of California, Berkeley, Jan 2017. [Online]. Available: http://www2.eecs.berkeley.edu/Pubs/TechRpts/2017/EECS-2017-2.html
[7] J. Bachrach, H. Vo, B. Richards, Y. Lee, A. Waterman, R. Aviienis, J. Wawrzynek, and
K. Asanovi, “Chisel: Constructing hardware in a scala embedded language,” in DAC Design Automation Conference 2012, June 2012, pp. 1212–1221.
[8] M. Odersky, Programming in Scala. Artima, 2016. [Online]. Available: https:
//books.google.com.tw/books?id=h8\ rswEACAAJ
[9] A. Waterman, Y. Lee, D. A. Patterson, K. Asanovic, V. I. U. level Isa, A. Waterman, Y. Lee, and D. Patterson, “The risc-v instruction set manual,” 2014.
[10] Y. Lee, A. Waterman, H. Cook, B. Zimmer, B. Keller, A. Puggelli, J. Kwak, R. Jevtic, S. Bai- ley, M. Blagojevic, P. F. Chiu, R. Avizienis, B. Richards, J. Bachrach, D. Patterson, E. Alon,
B. Nikolic, and K. Asanovic, “An agile approach to building risc-v microprocessors,” IEEE Micro, vol. 36, no. 2, pp. 8–20, Mar 2016.
[11] TileLink Specification, SiFive Inc, 2017, https://www.sifive.com/documentation/tilelink/tilelink- spec/.
[12] A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet classification with deep convolutional neural networks,” in Advances in Neural Information Processing Systems 25, F. Pereira, C. J. C. Burges, L. Bottou, and K. Q. Weinberger, Eds. Curran Associates, Inc., 2012, pp. 1097–1105. [Online]. Available: http://papers.nips.cc/paper/ 4824-imagenet-classification-with-deep-convolutional-neural-networks.pdf
[13] Netscope CNN Analyzer, https://dgschwend.github.io/netscope/quickstart.html.
(此全文未開放授權)
電子全文
中英文摘要
 
 
 
 
第一頁 上一頁 下一頁 最後一頁 top
* *