作者(外文):Huang, Ping-Li
論文名稱(外文):A Spike-Based Convolution Neural Network (SCNN)Accelerator with Reduced On-Chip Memory Data Flow and Data sparse operation
指導教授(外文):Tang, Kea-Tiong
口試委員(外文):LIU, REN-SHUO
Lu, Chih-Cheng
外文關鍵詞:CNNSNNAcceleratorsDigital CircuitsSparsity
人工智能網路的興起是因其有許多的應用,例如:圖像辨識、語言辨識等,為了能在邊緣設備上實現則需要更高的能源效率,來因應裝置的受限。 突波神經網絡 (spiking neural networks, SNN) 被認為是潛在的候選者,因為它們的計算特性可以減少乘法運算。 它們只需要加法和移位運算即可進行計算。 將此方法應用於 CNN 網絡可以降低計算的功耗,即藉由加法器來實現累加的部分,而位移器則是來取代非線性的運算。 這種混合網絡稱為突波捲積神經網路 (Spiking-CNN, SCNN)。
但是,為了達到更好的運算速度,往往需要大容量的內存來儲存需要的權重以及特徵,這就需要一定的面積和功耗。 本文提供了一種SCNN的數據流,可以減少芯片上所需的內存大小,並提供另一種混合數據流來減少片上內存,並針對SCNN的高稀疏性設計零跳躍。 這種方法可以降低操作所需的功耗。 通過這些方式,減少了整體所需的片上存儲器,同時提高了能效。 在CIFA-10數據集的應用下達到了104.76TOPs/W。並且相較於其他發表之突波捲積神經網路加速器在相同的應用下所需要片上記憶體能達到最少的數量。
Abstract—The rise of artificial intelligence networks is due to their numerous applications, such as image recognition and speech recognition. However, achieving these applications on edge devices requires higher energy efficiency to accommodate device limitations. Spiking neural networks(SNN) are considered potential candidates because their computational characteristics can reduce multiplication operations. They only require addition and shifting operations for calculations. Applying this method to CNN networks can reduce computational power by implementing accumulation through adders and replacing nonlinear operations with shifters. This hybrid network is known as Spiking-CNN (SCNN).

However, to achieve better computational speed, large memory capacity is often required to store the necessary weights and features, which in turn requires a certain area and power consumption. This paper provides a data flow of SCNN that can reduce the memory size required on the chip and provides another mixed data flow to reduce on-chip memory and design zero-skipping for the high sparsity of SCNN. This method can reduce the power consumption required for the operation. In these ways, the overall required on-chip memory is reduced, and energy efficiency is also improved. It reaches 104.76TOPs/W under the application of the CIFA-10 dataset. Moreover, compared to other published spiking convolutional neural network accelerators in the same application, it requires the minimum amount of on-chip memory.
第 1 章 緒論 .......................................................... 1
1.1 研究背景 ......................................................... 1
1.2 研究動機 ......................................................... 6
1.3 章節簡介 ......................................................... 8
第 2 章 文獻回顧 .......................................................... 9
2.1 神經網路加速器 ..................................................... 9
2.1.1 深度捲積神經網路加速器架構 ..................................... 9
2.1.2 資料的搬移與重複利用 .......................................... 11
2.2 突波捲積神經網路加速器 ............................................ 13
2.2.1 突波神經網路 ................................................... 13
2.2.2 突波捲積神經網路 ............................................... 14
2.3 研究動機及挑戰 .................................................... 15
第 3 章 突波捲積神經網路數據流 ........................................... 17
3.1 突波捲積神經網路數據流 ............................................ 17
3.1.1 層運算優先數據流 ............................................... 17
3.1.2 時間步長優先數據流 ............................................. 19
3.2 混合式數據流 ....................................................... 21
第 4 章 突波捲積神經網路加速器設計 ....................................... 27
4.1 突波運算行為與複合式數據流 ........................................ 27
4.1.1 突波神經元模型與電路 ........................................... 27
4.1.2 複合式數據流架構及選擇機制 ..................................... 29
4.2 資料稀疏運算 ...................................................... 29
4.2.1 非零權重篩選運算 .............................................. 30
4.2.2 運算單元跳過之機制 ............................................ 31
4.3 運算單元與記憶體單元陣列配置 ...................................... 33
4.4 加速器架構 ........................................................ 34
第 5 章 實驗結果與討論 ................................................... 40
5.1 模擬結果 ........................................................... 40
5.1.1 運算單元結果 .................................................. 40
5.1.2 資料流優化結果 ................................................. 44
5.2 結果與其他先進研究之比較 ........................................... 45
第 6 章 結論與未來發展 ................................................... 48
6.1 結論 ............................................................... 48
6.2 未來發展 ........................................................... 48
參考文獻 ................................................................ 50
