帳號:guest(          離開系統
字體大小: 字級放大   字級縮小   預設字形  


作者(外文):Zhong, Zi-Qi
論文名稱(外文):A High-Performance FPGA Implementation of Farneback Optical Flow Algorithm with Vivado High Level Synthesis
指導教授(外文):Liou, Jing-Jia
外文關鍵詞:Optical FlowHigh Level SynthesisFPGA
  • 推薦推薦:0
  • 點閱點閱:329
  • 評分評分:*****
  • 下載下載:0
  • 收藏收藏:0
但也 由於它的高精確性,所以它的複雜度很高,難以純軟體在嵌入式系統上達到即
我們使用Xilinx提供的Vivado高階合成工具來設計硬體加速器,我們將軟體運算緩慢的地方 硬體化來加速。我們先分析它的資料流,然後實現像素層級的管線化設計,並找出其中可平行化的部分,透過平行化設計來平衡管線設計的每個硬體吞吐量(Throughput)。同時, 我們也嘗試各種Vivado高階合成工具提供的指令(Loop pipeline/unroll)來進行優化,以此得到最適合硬體使用量與加速倍率。
此外,我們提出一個新穎的Backtracking資料流,來解決原先資料流的問題,讓我們能夠實現更長的像素層級的管線設計,來進一步提高硬體的吞吐量,同時減少硬體資源使用量。在硬體設計方面我們提供浮點數(Float point)與定點數(Fixed point)的設計。
線程(Thread) 來實現圖片層級的管線設計,再更進一步提高圖片的吞吐量。在我們現有Xilinx FPGA上,以 120*160的圖片為例子,能實現相較於軟體的36.6倍加速,達到即時運算的標準。而如果有 更大的FPGA能夠使用,我們則能實現相較於軟體176倍的加速。
Farneback optical flow algorithms can be used to estimate movements of objects more accurately than traditional approaches do, but its high complexity causes difficulty of real-time software implementation, especially in an embedded system. In this thesis, we developed a hardware accelerator of Farneback optical flow on Xilinx FPGAs. We adopt Vivado high-level synthesis tool to
implement the algorithm with a pixel-level pipeline and use parallel design to balance the throughput of each pipeline stage. Besides, a novel backtracking data flow algorithm is proposed to enable a longer pixel-level pipeline for a higher throughput. Lastly, we use multiple accelerators and threads to implement a frame-level pipeline for even higher system performance. Our FPGA implementation of the hardware-accelerated system can achieve 36.6 times speedup than the software implementation (results with a 160*120 frame size). If a larger capacity of FPGA is available, the system is estimated to achieve 176 times speedup.
1 Introduction 12
1.1 Thesis Organization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
2 Background 15
2.1 Farneback Optical Flow Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . 15
2.1.1 Polynomial Expansion . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
2.1.2 Displacement Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
2.2 Vivado High Level Synthesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
2.3 References of FPGA Implementation . . . . . . . . . . . . . . . . . . . . . . . . . 20
3 Accelerator Architecture Design of Farneback Optical Flow 22
3.1 Basic Architecture of Hierarchical Level Pipeline Design . . . . . . . . . . . . . . 22
3.2 Balance Each Stage of Pipeline by Parallel Design and HLS Directives . . . . . . . 24
3.3 Specialized Buffer for Image Processing Pipeline Design . . . . . . . . . . . . . . 26
3.3.1 Line buffer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
3.3.2 Dispatcher . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
3.4 Architecture Design of Independent IP . . . . . . . . . . . . . . . . . . . . . . . . 29
3.4.1 PolynomialExpansion . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
3.4.2 UpdateMatrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
3.4.3 GaussianBlurUpdateFlow . . . . . . . . . . . . . . . . . . . . . . . . . . 33
3.4.4 MedianBlurFlow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
3.5 Longer Pipeline Design According to The Original Data Flow . . . . . . . . . . . 35
3.6 Longer Pipeline Design According to Backtracking Data Flow . . . . . . . . . . . 36
4 System Architecture Design of Farneback Optical Flow 38
4.1 System Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
4.2 Two Stages Frame Level Pipeline . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
4.3 Four Stages Frame Level Pipeline . . . . . . . . . . . . . . . . . . . . . . . . . . 40
5 Verification and Experiments 43
5.1 Experimental Environment and Benchmark . . . . . . . . . . . . . . . . . . . . . 43
5.2 Speedup and Hardware Resources Analysis . . . . . . . . . . . . . . . . . . . . . 47
5.2.1 No Design Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . 47 Fixed Point . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47 Float Point . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
5.2.2 PolynomialExpansion Hardware Implementation . . . . . . . . . . . . . . 49 Fixed Point . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49 Float Point . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
5.2.3 UpdateMatrices Hardware Implementation . . . . . . . . . . . . . . . . . 54 Fixed Point . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54 Float Point . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
5.2.4 GaussianBlurUpdateFlow Hardware Implementation . . . . . . . . . . . . 58 Fixed Point . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58 Float Point . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
5.2.5 MedianBlurFlow Hardware Implementation . . . . . . . . . . . . . . . . . 64 Fixed Point . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64 Float Point . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
5.2.6 Longer Pipeline Design Implementation According to Original Data Flow . 68 Fixed Point . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68 Float Point . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
5.2.7 Longer Pipeline Design Implementation According to Backtracking Data
Flow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70 Fixed Point . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70 Float Point . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
5.3 The Full System Speedup and The Resources Analysis . . . . . . . . . . . . . . . 72
6 Conclusions and Future Work 79
6.1 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
6.2 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
A The FPGA Specification 80
[1] M. Durkovic, M. Zwick, F. Obermeier, and K. Diepold, “Performance of optical flow techniques on graphics hardware”, in 2006 IEEE International Conference on Multimedia and Expo, July 2006, pp. 241–244.
[2] J. Chase, B. Nelson, J. Bodily, Z. Wei, and D. J. Lee, “Real-time optical flow calculations on fpga and gpu architectures: A comparison study”, in 2008 16th International Symposium on Field-Programmable Custom Computing Machines, April 2008, pp. 173–182.
[3] K. Pauwels, M. Tomasi, J. Diaz Alonso, E. Ros, and M. M. Van Hulle, “A comparison of fpga and gpu for real-time phase-based optical flow, stereo, and local image features”, IEEE Transactions on Computers, vol. 61, no. 7, pp. 999–1012, July 2012.
[4] Bruce D. Lucas and Takeo Kanade, “An iterative image registration technique with an application to stereo vision”, 1981, pp. 674–679.
[5] Berthold K. P. Horn and Brian G. Schunck, “Determining optical flow”, Artificial Intelligence archive, vol. 17, no. 1-3, pp. 185–203, August 1981.
[6] Farneback Gunnar, “Fast and accurate motion estimation using orientation tensors and parametric motion models”, in In: Proceedings of 15th International Conference on Pattern Recognition, September 2000.
[7] Farneback Gunnar, “Two-frame motion estimation based on polynomial expansion”, in in Proceeding. SCIA’03 Proceedings of the 13th Scandinavian conference on Image analysis, July 2003.
[8] J. Monson, M. Wirthlin, and B. L. Hutchings, “Implementing high-performance, low-power
fpga-based optical flow accelerators in c”, in 2013 IEEE 24th International Conference on
Application-Specific Systems, Architectures and Processors, June 2013, pp. 363–369.
[9] F. Barranco, M. Tomasi, J. Diaz, M. Vanegas, and E. Ros, “Parallel architecture for hierarchical
optical flow estimation based on fpga”, IEEE Transactions on Very Large Scale
Integration (VLSI) Systems, vol. 20, no. 6, pp. 1058–1067, June 2012.
[10] G. Botella, E. Ros, M. Rodriguez, A. Garcia, E. Andres, M. C. Molina, E. Castillo, and
L. Parrilla, “Fpga based architecture for robust optical flow computation”, in 2008 4th Southern
Conference on Programmable Logic, March 2008, pp. 1–6.
[11] M. Kunz, A. Ostrowski, and P. Zipf, “An fpga-optimized architecture of horn and schunck
optical flow algorithm for real-time applications”, in 2014 24th International Conference on
Field Programmable Logic and Applications (FPL), Sept 2014, pp. 1–4.
[12] Z. Wei, D. j. Lee, B. Nelson, and M. Martineau, “A fast and accurate tensor-based optical
flow algorithm implemented in fpga”, in Applications of Computer Vision, 2007. WACV ’07.
IEEE Workshop on, Feb 2007, pp. 18–18.
[13] Xilinx, Vivado Design Suite User Guide High level Synthesis, 2017.
[14] Heng Wang, Alexander Kl¨aser, Cordelia Schmid, and Cheng-Lin Liu, “Action Recognition
by Dense Trajectories”, in IEEE Conference on Computer Vision & Pattern Recognition,
Colorado Springs, United States, June 2011, pp. 3169–3176, http://hal.inria.fr/inria-
[15] Yakun Sophia Shao, Sam Likun Xi, Vijayalakshmi Srinivasan, Gu-Yeon Wei, and David
Brooks, “Co-designing accelerators and soc interfaces using gem5-aladdin”, in Microarchitecture
(MICRO), 2016 49th Annual IEEE/ACM International Symposium on, Oct. 2016.
[16] Xilinx, zynq-7000 all programmable soc zc702 base targeted reference design, 2015.
[17] Xilinx, Design Protocol Processing Systems with High-Level Synthesis, 2014.
[18] P. Ruetz and R. Brodersen, “A realtime image processing chip set”, in International Solid-
State Circuits Conference (ISSCC), Feb 1986.
[19] W. Kamp, R. Kunemund, H. Soldner, and R. Hofer, “Programmable 2d linear filter for video
applications”, IEEE Journal of Solid-State Circuits, vol. 25, no. 3, pp. 735–740, June 1990.
[20] Jing Pu, Steven Bell, Xuan Yang, Jeff Setter, Stephen Richardson, Jonathan Ragan-Kelley,
and Mark Horowitz, “Programming heterogeneous systems from an image processing dsl”,
ACM Trans. Archit. Code Optim., vol. 14, no. 3, pp. Article 26, 25 pages, August 2017.
[21] Xilinx, PetaLinux Tools Documentation Reference Guide(UG1144), 2015,
[22] C. Schuldt, I. Laptev, and B. Caputo, “Recognizing human actions: A ocal svm
approach.”, in Computer Vision and Pattern Recognition (CVPR), August 2004,
[23] Xilinx, Clocking Wizard v5.3 LogiCORE IP Product Guide (PG065), 2016.
[24] Avnet Zynq-7000 SoC Mini-ITX Development, https://www.xilinx.com/products/boards-andkits/

第一頁 上一頁 下一頁 最後一頁 top
* *