帳號:guest(          離開系統
字體大小: 字級放大   字級縮小   預設字形  


作者(外文):Wei-Chun Chang
論文名稱(外文):Assisted Design Optimization using High-Level Synthesis Flow
指導教授(外文):Huang, Chih Tsun
口試委員(外文):Liou, Jing Jia
Huang, Juinn Dar
外文關鍵詞:High-level synthesisMemoryDesign Space Exploration
  • 推薦推薦:0
  • 點閱點閱:302
  • 評分評分:*****
  • 下載下載:0
  • 收藏收藏:0
準確的模擬時間、耗能、面積的估計。 而前寄存器傳輸級(pre-RTL),
是卻不用產生RTL的程式。 但是隨著設計的複雜化,不管是用RTL或
環展開(loop unrolling)、 內存分區(memory partition)組合的點。 首先我們
到動態資料關係圖(DDDG)。 接著我們會分析DDDG來尋找內存分區的參
數。 然後傳統分配資料的方法主要有區塊(block)、循環(cyclic)、區塊循
環(block-cyclic)。 但是這些可能會造成在性能上的瓶頸。 所以我們也提出
記憶體分區。 在這個流程的最後,我們會用高階合成工具把我們要探索
現有的高接合成工具,像是Vivado HLS,可以使用C/C++/Systemc語
言的程式,然後根據不同的參數來得到不同的設計。 然而,為了使用,程
式的撰寫方式是非常受局限的。 所以我們為了增進內存分區、循環展開、
實驗數據顯示我們可以大量的減少模擬時間。 我們的記憶體分區方法也
Current research works in accelerator designs mainly relies on register-transfer level (RTL)-
based flows to obtain accurate timing, power, and area estimations. Pre-RTL synthesis tool
such as Aladdin [1] can also be used to obtain approximately accurate estimations without
generating RTL code. However, design exploration of large or complex designs has become
a time-consuming process even using RTL or pre-RTL tools.
In this thesis, we proposed a design assisted flow which can efficiently reduce the searching
points of design exploration when using pre-synthesis tool considering micro-architecture
factors, such as loop unrolling, and memory partition. First, we use Aladdin [1] to quickly
explore the unrolling factor without considering memory partition and generate dynamic
data dependence graphs (DDDG). After choosing a unrolling number, the DDDG is analyzed
to explore the memory partition. However, conventional methods for memory partition are
mainly block, cyclic, or block-cyclic. The memory partition affects the performance a lot, and
it may be the bottleneck for the performance. In our flow, we proposed a memory-remapping
methodology to improve the source code with the better data placement in memory partitions
based on the DDDG. In the end, we use high-level synthesis tool to generate RTL code to
obtain accelerator designs with performance, area, and power.
Existing high-level synthesis (HLS) tools, such as Vivado HLS, can generate different
architectures of the application by applying different user’s configurartions in
C/C++/SystemC. However, the coding style is quite limited. Therefore, we provide three
patch methods, which address the improvement of memory partition, loop unrolling, and
input buffer of the high-level hardware description, respectively.
Experiment results show that we can dramatically reduce the simulation time. Our
memory-remapping methodology can also improve the performance of the design with an
optimized number of BRAM.
1 Introduction 2
2 Preliminary 5
2.1 Introduction to Aladdin . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.1.1 Framework . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.1.2 Dynamic Data Dependency Graphs . . . . . . . . . . . . . . . . . . . 6
2.2 H.-T. Tsai’s Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
3 Proposed Methodology 12
3.1 Proposed Flow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
3.1.1 Specification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
3.1.2 Unroll Exploration . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
3.1.3 Partition Exploration . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
3.1.4 Data Remapping . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
3.1.5 New Data Remapping . . . . . . . . . . . . . . . . . . . . . . . . . . 21
3.1.6 Case Study - Sobel Gy . . . . . . . . . . . . . . . . . . . . . . . . . . 23
3.1.7 Code Modification . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
3.1.8 HLS & Select . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
4 Experiment 34
4.1 Experiment Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
4.2 Experiment Result of Our Remapping Approach . . . . . . . . . . . . . . . . 34
4.3 Experiment Result of Our Proposed Flow . . . . . . . . . . . . . . . . . . . . 38
5 Conclusion and Future Work 39
5.1 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
5.2 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
[1] Y. S. Shao, B. Reagen, G.-Y. Wei, and D. Brooks “Aladdin: A Pre-RTL, PowerPerformance
Accelerator Simulator Enabling Large Design Space Exploration of Customized
Architectures,” in ACM/IEEE International Symposium on Computer Architecture
(ISCA), pp. 97-108 Jun. 2014.
[2] J. Cong, W. Jiang, B. Liu, and Y. Zou “Automatic Memory Partitioning and Scheduling
for Throughput and Power Optimization” in ACM, 2011.
[3] Y. Wang, P. Zhang, X. Cheng, and J. Cong “An Integrated and Automated Memory
Optimization Flow for FPGA Behavioral Synthesis” in ASPDAC, 2012.
[4] P. Li, Y. Wang, P. Zhang, G. Luo, T. Wang, and J. Cong “Memory Partitioning and
Scheduling Co-optimization in Behavioral Synthesis” in ICCAD, 2012.
[5] Y. Wang, P. Li, P. Zhang, C. Zhang, and J. Cong “Memory Partitioning for Multidimensional
Arrays in High-Level Synthesis” in DAC, 2013.
[6] Y. Wang, P. Li, and J. Cong “Theory and Algorithm for Generalized Memory Partitioning
in High-Level Synthesis” in FPGA, 2014.
[7] P. Li, P. Zhang, L. -N. Pouchet, and J. Cong “Resource-Aware Throughput Optimization
for High-Level Synthesis” in FPGA, 2015.
[8] M. Li, P. Zhang, C. Zhu, H. Jia, X. Xie, J. Cong, and W. Gao “High Efficiency VLSI
Implementation of an Edge-directed Video Up-scaler Using High Level Synthesis” in
IEEE International Conference on Consumer Electronics (ICCE), 2015.
[9] C.-T. Huang, H.-T. Tsai “Performance Optimization of Accelerators using C-bassd
High-Level Synthesis Flow ”
[10] Brandon Reagen, Robert Adolf, Yakun Sophia Shao, Gu-Yeon Wei, David Brooks
“MachSuite: Benchmarks for Accelerator Design and Customized Architectures” in
Workload Characterization (IISWC), IEEE International Symposium on), 2014.
[11] B. Carrion Schafer and A. Mahapatra “S2CBench : Synthesizable SystemC Benchmark
Suite for High-Level Synthesis” in IEEE Embedded Systems Letters (Volume:6 , Issue:
3 ) 2014.
[12] J. Cong, V. Sarkar, G. Reinman and A. Bui “Customizable Domain Specific Computing”
in IEEE Design & Test of Computers, 2011.
[13] Avalible:http://accelerator.eecs.harvard.edu/isca14tutorial/isca2014-tutorial-cadbenchmarks.pdf
第一頁 上一頁 下一頁 最後一頁 top
* *