帳號:guest(3.17.162.214)          離開系統
字體大小: 字級放大   字級縮小   預設字形  

詳目顯示

以作者查詢圖書館館藏以作者查詢臺灣博碩士論文系統以作者查詢全國書目
作者(中文):郭源芳
作者(外文):Kuo, Yuan-Fang
論文名稱(中文):適用於單圖像去模糊神經網路硬體加速之模型策劃與具硬體效能優勢的推論架構設計
論文名稱(外文):Hardware-Oriented CNN Modeling and Hardware-Optimized Inference Architecture Design for Real-time Single Image Deblurring Acceleration
指導教授(中文):黃朝宗
指導教授(外文):Huang, Chao-Tsung
口試委員(中文):呂仁碩
賴永康
口試委員(外文):Liu, Ren-Shuo
Lai, Yeong-Kang
學位類別:碩士
校院名稱:國立清華大學
系所名稱:電機工程學系
學號:109061578
出版年(民國):111
畢業學年度:111
語文別:英文
論文頁數:56
中文關鍵詞:單圖像去模糊應用卷積神經網路硬體加速影像處理
外文關鍵詞:Single image deblurringConvolutional Neural NetworkCNNImage processingInference architectureHardware acceleration
相關次數:
  • 推薦推薦:0
  • 點閱點閱:225
  • 評分評分:*****
  • 下載下載:0
  • 收藏收藏:0
近幾年,卷積神經網絡在單幅圖像盲運動去模糊的領域中取得了顯著的成果,因為它能夠直接從大規模數據中學習如何將模糊的圖像正確的還原出清晰的圖片,而無需仰賴任何有關於模糊核的資訊。然而,這些基於卷積神經網絡的去模糊方法,往往伴隨著龐大的參數量、高度的計算量和複雜的模型結構,使得這些方法難以應用於有限資源的終端裝置上。因此,模型結構與硬體架構的協同設計對於在邊緣加速器上實現實時、高品質的圖像去模糊至關重要。
本篇論文的目的是找出一個適合在硬體上做加速的去模糊模型結構,並設計一個高效的推論架構,進而在有限資源的硬體上實現實時的去模糊應用。首先,我們基於最先進的模型結構— MPRNet 進行大量的硬體導向模型訓練實驗,用以簡化原複雜且高參數量與計算量的模型。根據實驗結果,我們構建了 EDERNet,這是一種由擴展縮減模塊組成的編碼器-解碼器模型結構,與MPRNet 相比,它能在去模糊效果和運算複雜度的權衡中表現更好。具體來說,在相當的參數量底下, EDERNet 可以在節省48.88% 運算複雜度的情況下將GoPro 數據集的 PSNR 結果提升 0.52 dB。由此可見, EDERNet 更適合硬體加速並達到高品質的去模糊結果。
接著,在使用區塊式運算流程的硬體加速器上實現 EDERNet 時,不同的計算方法可能會導致加速器對於運算單元、記憶體面積和 DRAM 頻寬等資源的需求有所不同。因此,基於 eCNN 的運算架構,我們提出了一個在實現編解碼器模型結構時,更能夠減少硬體資源負擔的運算架構。更明確地說,我們首先比較現有的不同機制在處理相鄰區塊間卷積運算重疊的像素值時所需的硬體資源。分析結果指出,相比於重新計算的機制,使用重複使用的機制於編解碼器模型結構中可以將複雜度和頻寬分別降低 95.02% 和 89.02%,而額外的記憶體面積負擔僅為 6.64%。此外,我們設計了兩種方法來進一步優化運算複雜度和頻寬使用量。首先, Interleave Zero-Padding 的作法為每兩個卷積層進行一次邊界填充,其與初始填充的方法相比,因填充導致實質複雜度的增加可以從 95.55% 降低到 0.62%。其次, Feature Map Collection 虛擬地擴大了記憶體緩衝區的大小,因此進一步減少了 47% 的頻寬。
綜上所述,在台積電 40 奈米製程技術下,實作結果表明,採用硬體效能優化後的推論架構來實現 EDERNet,可以達到每秒 27 幀的高畫質單圖像去模糊應用,是使用原始 eCNN 推理架構實現時的 4.03 倍,且 DRAM 頻寬的使用量也可以在幾乎無額外面積的負擔下減少 6.46 倍。
Recently, convolutional neural networks (CNNs) have demonstrated outstanding performance in single image blind-motion deblurring, which directly learn the restoration functions from large-scale data without any information of blur kernels. However, these CNN-based approaches are usually hard to be applied to edge devices due to their large parameter size, high computational complexity,
and complicated model structure. Therefore, co-design of model structure and hardware architecture is crucial for realizing real-time and high-quality image deblurring on edge accelerators.
The aim of this thesis is to find a hardware-friendly model structure and design an effcient inference architecture to achieve real-time edge deblurring. To begin with, we construct EDERNet, a single stage encoder-decoder model structure with expansion-reduction blocks, which is originated from the result of comprehensive hardware-oriented CNN modeling experiments based on the state-ofthe-art model, MPRNet. The training results show that EDERNet has better performance-complexity trade-off compared to MPRNet-small, a smaller version
of MPRNet with same model size as our EDERNet. Specifically, EDERNet can increase PSNR performance on GoPro dataset by 0.52 dB with 48.88% complexity reduction, thus is more suitable for hardware acceleration.
Furthermore, when it comes to implementing EDERNet on block-based edge accelerators, different computing methodologies may result in different hardware performance in terms of computational complexity, on-chip memory, and off-chip
bandwidth. Therefore, we propose a hardware-optimized inference architecture based on eCNN to reduce implementation overhead. To be more specific, we first compare the overhead of existing mechanisms when addressing overlapped features between adjacent blocks. The analytical results indicate that applying reuse mechanism to encoder-decoder model structure can reduce complexity and
bandwidth by 95.02% and 89.02%, respectively, with only 6.64% area overhead compared to recompute mechanism. In addition, we devise two methods to further optimize complexity and bandwidth consumption. First, Interleave Zero-Padding handles padding for every two convolutional layers. The increasement of intrinsic complexity caused by padding can be reduced from 95.55% to 0.62% compared to initial padding method. Second, Feature Map Collection implicitly enlarge block buffer size, which further reduces bandwidth by 47%.
In conclusion, the implementation result based on TSMC 40nm process technology shows that implementing EDERNet with our hardware-optimized inference architecture can achieve Full-HD 27 fps single image deblurring, which is 4.03 times compared to implementing with original eCNN inference architecture. The off-chip bandwidth consumption can also be reduced by 6.46 times with negligible area overhead.
摘要 i
Abstract iii
誌謝 v
1 Introduction 1
1.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.2.1 Single Image Deblurring . . . . . . . . . . . . . . . . . . . . . . . 3
1.2.2 Common Approaches of CNN-based Deblurring Method . . . . . . . . . . . 4
1.2.3 Block-based Inference Flow . . . . . . . . . . . . . . . . . . . . . . 7
1.3 Thesis Organization . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2 Challenge of Deploying SOTA Deblurring Model to Edge Accelerator 11
2.1 Target SOTA Model Architecture - MPRNet . . . . . . . . . . . . . . . . 11
2.2 Target Edge Accelerator - eCNN . . . . . . . . . . . . . . . . . . . . . 14
2.3 Analysis of Challenge . . . . . . . . . . . . . . . . . . . . . . . . . 15
3 Hardware-oriented CNN Modeling 17
3.1 Simplify MPRNet by Ablation Study . . . . . . . . . . . . . . . . . . . 17
3.1.1 Experiment Setting . . . . . . . . . . . . . . . . . . . . . . . . . . 17
3.1.2 Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
3.2 Model Planning under Limited Hardware Resources . . . . . . . . . . . . 21
3.2.1 Baseline Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
3.2.2 Scaling Level . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
3.2.3 Main Processing Block . . . . . . . . . . . . . . . . . . . . . . . . 24
3.2.4 Model Depth and Width . . . . . . . . . . . . . . . . . . . . . . . . 26
3.2.5 Skip Connection . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
3.3 Proposed EDERNet . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
4 Inference Architecture for Encoder-decoder Model Structure . 33
4.1 Complexity-aware Inference Flow Analysis . . . . . . . . . . . . . . . . 33
4.2 Hardware-optimized Inference Architecture Design . . . . . . . . . . . . 36
4.2.1 Interleave Zero Padding . . . . . . . . . . . . . . . . . . . . . . . . 36
4.2.2 Feature Map Collection . . . . . . . . . . . . . . . . . . . . . . . . 38
4.3 Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
4.3.1 System Specification . . . . . . . . . . . . . . . . . . . . . . . . . 42
4.3.2 Inference Datapath Architecture . . . . . . . . . . . . . . . . . . . . 44
4.3.3 Analysis of Performance . . . . . . . . . . . . . . . . . . . . . . . . 47
5 Conclusion and Discussion 51
5.1 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
5.2 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
[1] Seungjun Nah, Tae Hyun Kim, and Kyoung Mu Lee, “Deep multi-scale convolutional neural network for dynamic scene deblurring,” in 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017, pp.257–265.
[2] Xin Tao, Hongyun Gao, Xiaoyong Shen, Jue Wang, and Jiaya Jia, “Scalerecurrent network for deep image deblurring,” in 2018 IEEE/CVF Conference
on Computer Vision and Pattern Recognition, 2018, pp. 8174–8182.
[3] Kaihao Zhang, Wenhan Luo, Yiran Zhong, Lin Ma, Björn Stenger, Wei Liu, and Hongdong Li, “Deblurring by realistic blurring,” CoRR, vol. abs/2004.01860, 2020.
[4] Hongguang Zhang, Yuchao Dai, Hongdong Li, and Piotr Koniusz, “Deep stacked hierarchical multi-patch network for image deblurring,” in 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition
(CVPR), 2019, pp. 5971–5979.
[5] Maitreya Suin, Kuldeep Purohit, and A. N. Rajagopalan, “Spatiallyattentive patch-hierarchical network for adaptive motion deblurring,” in
2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition
(CVPR), 2020, pp. 3603–3612.
[6] Syed Waqas Zamir, Aditya Arora, Salman Khan, Munawar Hayat, Fahad Shahbaz Khan, Ming-Hsuan Yang, and Ling Shao, “Multi-stage progressive image restoration,” in 2021 IEEE/CVF Conference on Computer
Vision and Pattern Recognition (CVPR), 2021, pp. 14816–14826.
[7] Sung-Jin Cho, Seo-Won Ji, Jun-Pyo Hong, Seung-Won Jung, and Sung-Jea Ko, “Rethinking coarse-to-fine approach in single image deblurring,” CoRR, vol. abs/2108.05054, 2021.
[8] Chao-Tsung Huang, Yu-Chun Ding, Huan-Ching Wang, Chi-Wen Weng, KaiPing Lin, Li-Wei Wang, and Li-De Chen, “Ecnn: A block-based and highly-parallel cnn accelerator for edge inference,” in Proceedings of the 52nd Annual IEEE/ACM International Symposium on Microarchitecture, New York, NY, USA, 2019, MICRO ’52, p. 182– 195, Association for Computing Machinery.
[9] Koen Goetschalckx and Marian Verhelst, “Depfin: A 12nm, 3.8tops depthfirst cnn processor for high res. image processing,” in 2021 Symposium on
VLSI Circuits, 2021, pp. 1–2.
[10] Weisheng Dong, Lei Zhang, Guangming Shi, and Xiaolin Wu, “Image deblurring and super-resolution by adaptive sparse domain selection and adaptive regularization,” IEEE Transactions on Image Processing, vol. 20, no. 7, pp. 1838–1857, 2011.
[11] Kaiming He, Jian Sun, and Xiaoou Tang, “Single image haze removal using dark channel prior,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 33, no. 12, pp. 2341–2353, 2011.
[12] Orest Kupyn, Volodymyr Budzan, Mykola Mykhailych, Dmytro Mishkin, and Jiri Matas, “Deblurgan: Blind motion deblurring using conditional adversarial networks,” CoRR, vol. abs/1711.07064, 2017.
[13] Hongyun Gao, Xin Tao, Xiaoyong Shen, and Jiaya Jia, “Dynamic scene deblurring with parameter selective sharing and nested skip connections,” in 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2019, pp. 3843–3851.
[14] Olaf Ronneberger, Philipp Fischer, and Thomas Brox, “U-net: Convolutional networks for biomedical image segmentation,” CoRR, vol. abs/1505.04597, 2015.
[15] Fisher Yu and Vladlen Koltun, “Multi-scale context aggregation by dilated convolutions,” in International Conference on Learning Representations (ICLR), May 2016.
[16] Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun, “Deep residual learning for image recognition,” in 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016, pp. 770–778.
[17] Dongwon Park, Dong Un Kang, Jisoo Kim, and Se Young Chun, “Multitemporal recurrent neural networks for progressive non-uniform single image
deblurring with incremental temporal training,” in Computer Vision – ECCV 2020, Andrea Vedaldi, Horst Bischof, Thomas Brox, and Jan-Michael Frahm,
Eds., Cham, 2020, pp. 327–343, Springer International Publishing.
[18] Yulun Zhang, Yapeng Tian, Yu Kong, Bineng Zhong, and Yun Fu, “Residual dense network for image restoration,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 43, no. 7, pp. 2480–2495, 2021.
[19] Manoj Alwani, Han Chen, Michael Ferdman, and Peter Milder, “Fused-layer cnn accelerators,” in 2016 49th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO), 2016.
[20] Juhyoung Lee, Jinsu Lee, and Hoi-Jun Yoo, “Srnpu: An energy-effcient cnnbased super-resolution processor with tile-based selective super-resolution in mobile devices,” IEEE Journal on Emerging and Selected Topics in Circuits and Systems, vol. 10, no. 3, pp. 320–334, 2020.
[21] Ilya Loshchilov and Frank Hutter, “SGDR: stochastic gradient descent with restarts,” CoRR, vol. abs/1608.03983, 2016.
[22] Jian Sun, Wenfei Cao, Zongben Xu, and Jean Ponce, “Learning a convolutional neural network for non-uniform motion blur removal,” in 2015 IEEE
Conference on Computer Vision and Pattern Recognition (CVPR), 2015, pp.
769–777.
[23] Tae Hyun Kim and Kyoung Mu Lee, “Segmentation-free dynamic scene deblurring,” in 2014 IEEE Conference on Computer Vision and Pattern
Recognition, 2014, pp. 2766–2773.
[24] Y.-C. Ding et al., “A 4.6-8.3 tops/w 1.2-4.9 tops cnn-based computational imaging processor with overlapped stripe inference achieving 4k ultra-hd 30fps,” European Solid-State Circuits Conference (ESSCIRC), 2022, accepted.
[25] Philipp Gysel, Mohammad Motamedi, and Soheil Ghiasi, “Hardwareoriented approximation of convolutional neural networks,” CoRR, vol.
abs/1604.03168, 2016.

 
 
 
 
第一頁 上一頁 下一頁 最後一頁 top
* *