使用Vivado高階合成實現Farneback光流演算法的高效能FPGA實作__國立清華大學博碩士論文全文影像系統

帳號：guest(216.73.216.157) 離開系統

字體大小：

詳目顯示

第 1 筆 / 共 1 筆

/1頁

以作者查詢圖書館館藏

、以作者查詢臺灣博碩士論文系統

、以作者查詢全國書目

論文基本資料
摘要
外文摘要
論文目次
參考文獻
電子全文

作者(中文):	鍾子綮
作者(外文):	Zhong, Zi-Qi
論文名稱(中文):	使用Vivado高階合成實現Farneback光流演算法的高效能FPGA實作
論文名稱(外文):	A High-Performance FPGA Implementation of Farneback Optical Flow Algorithm with Vivado High Level Synthesis
指導教授(中文):	劉靖家
指導教授(外文):	Liou, Jing-Jia
口試委員(中文):	黃稚存呂仁碩
學位類別:	碩士
校院名稱:	國立清華大學
系所名稱:	電機工程學系
學號:	104061519
出版年(民國):	107
畢業學年度:	106
語文別:	英文
論文頁數:	83
中文關鍵詞:	光流演算法、高階合成、可程式化邏輯閘陣列
外文關鍵詞:	Optical Flow、High Level Synthesis、FPGA
相關次數:	推薦:0 點閱:852 評分: 下載:0 收藏:0

Farneback光流演算法相較於其他的傳統光流演算法可以更精確地預估物體的移動,
但也由於它的高精確性,所以它的複雜度很高,難以純軟體在嵌入式系統上達到即
時的運算。所以在此篇論文中,我們設計硬體加速器,並利用軟硬體協同工作來幫
助Farneback光流演算法做加速,並實作於Xilinx的FPGA上。
我們使用Xilinx提供的Vivado高階合成工具來設計硬體加速器,我們將軟體運算緩慢的地方硬體化來加速。我們先分析它的資料流,然後實現像素層級的管線化設計,並找出其中可平行化的部分,透過平行化設計來平衡管線設計的每個硬體吞吐量(Throughput)。同時, 我們也嘗試各種Vivado高階合成工具提供的指令(Loop pipeline/unroll)來進行優化,以此得到最適合硬體使用量與加速倍率。
此外,我們提出一個新穎的Backtracking資料流,來解決原先資料流的問題,讓我們能夠實現更長的像素層級的管線設計,來進一步提高硬體的吞吐量,同時減少硬體資源使用量。在硬體設計方面我們提供浮點數(Float point)與定點數(Fixed point)的設計。
最後,我們提出一個軟硬體協同加速系統,它使用多個硬體加速器與多個軟體的
線程(Thread) 來實現圖片層級的管線設計,再更進一步提高圖片的吞吐量。在我們現有Xilinx FPGA上,以 120*160的圖片為例子,能實現相較於軟體的36.6倍加速,達到即時運算的標準。而如果有更大的FPGA能夠使用,我們則能實現相較於軟體176倍的加速。

Farneback optical flow algorithms can be used to estimate movements of objects more accurately than traditional approaches do, but its high complexity causes difficulty of real-time software implementation, especially in an embedded system. In this thesis, we developed a hardware accelerator of Farneback optical flow on Xilinx FPGAs. We adopt Vivado high-level synthesis tool to
implement the algorithm with a pixel-level pipeline and use parallel design to balance the throughput of each pipeline stage. Besides, a novel backtracking data flow algorithm is proposed to enable a longer pixel-level pipeline for a higher throughput. Lastly, we use multiple accelerators and threads to implement a frame-level pipeline for even higher system performance. Our FPGA implementation of the hardware-accelerated system can achieve 36.6 times speedup than the software implementation (results with a 160*120 frame size). If a larger capacity of FPGA is available, the system is estimated to achieve 176 times speedup.

1 Introduction 12
1.1 Thesis Organization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
2 Background 15
2.1 Farneback Optical Flow Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . 15
2.1.1 Polynomial Expansion . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
2.1.2 Displacement Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
2.2 Vivado High Level Synthesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
2.3 References of FPGA Implementation . . . . . . . . . . . . . . . . . . . . . . . . . 20
3 Accelerator Architecture Design of Farneback Optical Flow 22
3.1 Basic Architecture of Hierarchical Level Pipeline Design . . . . . . . . . . . . . . 22
3.2 Balance Each Stage of Pipeline by Parallel Design and HLS Directives . . . . . . . 24
3.3 Specialized Buffer for Image Processing Pipeline Design . . . . . . . . . . . . . . 26
3.3.1 Line buffer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
3.3.2 Dispatcher . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
3.4 Architecture Design of Independent IP . . . . . . . . . . . . . . . . . . . . . . . . 29
3.4.1 PolynomialExpansion . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
3.4.2 UpdateMatrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
3.4.3 GaussianBlurUpdateFlow . . . . . . . . . . . . . . . . . . . . . . . . . . 33
3.4.4 MedianBlurFlow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
3.5 Longer Pipeline Design According to The Original Data Flow . . . . . . . . . . . 35
3.6 Longer Pipeline Design According to Backtracking Data Flow . . . . . . . . . . . 36
4 System Architecture Design of Farneback Optical Flow 38
4.1 System Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
4.2 Two Stages Frame Level Pipeline . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
4.3 Four Stages Frame Level Pipeline . . . . . . . . . . . . . . . . . . . . . . . . . . 40
5 Verification and Experiments 43
5.1 Experimental Environment and Benchmark . . . . . . . . . . . . . . . . . . . . . 43
5.2 Speedup and Hardware Resources Analysis . . . . . . . . . . . . . . . . . . . . . 47
5.2.1 No Design Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . 47
5.2.1.1 Fixed Point . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
5.2.1.2 Float Point . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
5.2.2 PolynomialExpansion Hardware Implementation . . . . . . . . . . . . . . 49
5.2.2.1 Fixed Point . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
5.2.2.2 Float Point . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
5.2.3 UpdateMatrices Hardware Implementation . . . . . . . . . . . . . . . . . 54
5.2.3.1 Fixed Point . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
5.2.3.2 Float Point . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
5.2.4 GaussianBlurUpdateFlow Hardware Implementation . . . . . . . . . . . . 58
5.2.4.1 Fixed Point . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
5.2.4.2 Float Point . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
5.2.5 MedianBlurFlow Hardware Implementation . . . . . . . . . . . . . . . . . 64
5.2.5.1 Fixed Point . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
5.2.5.2 Float Point . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
5.2.6 Longer Pipeline Design Implementation According to Original Data Flow . 68
5.2.6.1 Fixed Point . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
5.2.6.2 Float Point . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
5.2.7 Longer Pipeline Design Implementation According to Backtracking Data
Flow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
5.2.7.1 Fixed Point . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
5.2.7.2 Float Point . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
5.3 The Full System Speedup and The Resources Analysis . . . . . . . . . . . . . . . 72
6 Conclusions and Future Work 79
6.1 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
6.2 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
A The FPGA Specification 80

[1] M. Durkovic, M. Zwick, F. Obermeier, and K. Diepold, “Performance of optical flow techniques on graphics hardware”, in 2006 IEEE International Conference on Multimedia and Expo, July 2006, pp. 241–244.
[2] J. Chase, B. Nelson, J. Bodily, Z. Wei, and D. J. Lee, “Real-time optical flow calculations on fpga and gpu architectures: A comparison study”, in 2008 16th International Symposium on Field-Programmable Custom Computing Machines, April 2008, pp. 173–182.
[3] K. Pauwels, M. Tomasi, J. Diaz Alonso, E. Ros, and M. M. Van Hulle, “A comparison of fpga and gpu for real-time phase-based optical flow, stereo, and local image features”, IEEE Transactions on Computers, vol. 61, no. 7, pp. 999–1012, July 2012.
[4] Bruce D. Lucas and Takeo Kanade, “An iterative image registration technique with an application to stereo vision”, 1981, pp. 674–679.
[5] Berthold K. P. Horn and Brian G. Schunck, “Determining optical flow”, Artificial Intelligence archive, vol. 17, no. 1-3, pp. 185–203, August 1981.
[6] Farneback Gunnar, “Fast and accurate motion estimation using orientation tensors and parametric motion models”, in In: Proceedings of 15th International Conference on Pattern Recognition, September 2000.
[7] Farneback Gunnar, “Two-frame motion estimation based on polynomial expansion”, in in Proceeding. SCIA’03 Proceedings of the 13th Scandinavian conference on Image analysis, July 2003.
[8] J. Monson, M. Wirthlin, and B. L. Hutchings, “Implementing high-performance, low-power
fpga-based optical flow accelerators in c”, in 2013 IEEE 24th International Conference on
Application-Specific Systems, Architectures and Processors, June 2013, pp. 363–369.
[9] F. Barranco, M. Tomasi, J. Diaz, M. Vanegas, and E. Ros, “Parallel architecture for hierarchical
optical flow estimation based on fpga”, IEEE Transactions on Very Large Scale
Integration (VLSI) Systems, vol. 20, no. 6, pp. 1058–1067, June 2012.
[10] G. Botella, E. Ros, M. Rodriguez, A. Garcia, E. Andres, M. C. Molina, E. Castillo, and
L. Parrilla, “Fpga based architecture for robust optical flow computation”, in 2008 4th Southern
Conference on Programmable Logic, March 2008, pp. 1–6.
[11] M. Kunz, A. Ostrowski, and P. Zipf, “An fpga-optimized architecture of horn and schunck
optical flow algorithm for real-time applications”, in 2014 24th International Conference on
Field Programmable Logic and Applications (FPL), Sept 2014, pp. 1–4.
[12] Z. Wei, D. j. Lee, B. Nelson, and M. Martineau, “A fast and accurate tensor-based optical
flow algorithm implemented in fpga”, in Applications of Computer Vision, 2007. WACV ’07.
IEEE Workshop on, Feb 2007, pp. 18–18.
[13] Xilinx, Vivado Design Suite User Guide High level Synthesis, 2017.
[14] Heng Wang, Alexander Kl¨aser, Cordelia Schmid, and Cheng-Lin Liu, “Action Recognition
by Dense Trajectories”, in IEEE Conference on Computer Vision & Pattern Recognition,
Colorado Springs, United States, June 2011, pp. 3169–3176, http://hal.inria.fr/inria-
00583818/en.
[15] Yakun Sophia Shao, Sam Likun Xi, Vijayalakshmi Srinivasan, Gu-Yeon Wei, and David
Brooks, “Co-designing accelerators and soc interfaces using gem5-aladdin”, in Microarchitecture
(MICRO), 2016 49th Annual IEEE/ACM International Symposium on, Oct. 2016.
[16] Xilinx, zynq-7000 all programmable soc zc702 base targeted reference design, 2015.
[17] Xilinx, Design Protocol Processing Systems with High-Level Synthesis, 2014.
[18] P. Ruetz and R. Brodersen, “A realtime image processing chip set”, in International Solid-
State Circuits Conference (ISSCC), Feb 1986.
[19] W. Kamp, R. Kunemund, H. Soldner, and R. Hofer, “Programmable 2d linear filter for video
applications”, IEEE Journal of Solid-State Circuits, vol. 25, no. 3, pp. 735–740, June 1990.
[20] Jing Pu, Steven Bell, Xuan Yang, Jeff Setter, Stephen Richardson, Jonathan Ragan-Kelley,
and Mark Horowitz, “Programming heterogeneous systems from an image processing dsl”,
ACM Trans. Archit. Code Optim., vol. 14, no. 3, pp. Article 26, 25 pages, August 2017.
[21] Xilinx, PetaLinux Tools Documentation Reference Guide(UG1144), 2015,
http://www.sciencedirect.com/science/article/pii/S0026271405003008.
[22] C. Schuldt, I. Laptev, and B. Caputo, “Recognizing human actions: A ocal svm
approach.”, in Computer Vision and Pattern Recognition (CVPR), August 2004,
http://www.nada.kth.se/cvap/actions/.
[23] Xilinx, Clocking Wizard v5.3 LogiCORE IP Product Guide (PG065), 2016.
[24] Avnet Zynq-7000 SoC Mini-ITX Development, https://www.xilinx.com/products/boards-andkits/
1-4b47l9.html.

(此全文未開放授權)
電子全文
中英文摘要

推文
推薦
評分
引用網址
轉寄

top

詳目顯示

相關論文