帳號:guest(3.14.253.202)          離開系統
字體大小: 字級放大   字級縮小   預設字形  

詳目顯示

以作者查詢圖書館館藏以作者查詢臺灣博碩士論文系統以作者查詢全國書目
作者(中文):吳昕益
作者(外文):Wu, Hsin-I
論文名稱(中文):協同行為導向之系統模擬平台應用於平行運作系統設計、除錯
論文名稱(外文):A Virtualization-Assisted Full-System Simulation Approach for the Verification of System Inter-Component Interactions
指導教授(中文):蔡仁松
指導教授(外文):Tsay, Ren-Song
口試委員(中文):邱瀞德
李哲榮
蘇培陞
蘇泓萌
口試委員(外文):Chiu, Ching-Te
Lee, Che-Rung
Su, Pei-Sheng
Su, Hung-Meng
學位類別:博士
校院名稱:國立清華大學
系所名稱:資訊工程學系
學號:101062812
出版年(民國):109
畢業學年度:108
語文別:中文
論文頁數:64
中文關鍵詞:模型建構效能分析模擬虛擬化
外文關鍵詞:ModelingPerformance analysisSimulationVirtualization
相關次數:
  • 推薦推薦:0
  • 點閱點閱:764
  • 評分評分:*****
  • 下載下載:15
  • 收藏收藏:0
全系統模擬對於Embedded System設計驗證至關重要。其Flexibility與Early Availability性質特別適合設計初期探索與驗證系統行為。然而,傳統的模擬加速方法通常會遭遇Performance、Accuracy或是Scalability之困難。因此,本計畫提出VIRA (VIRtualization-Assisted)方法,建立Fast、Accurate與Scalable之全系統模擬器以克服上述困難。為了增加模擬效能,VIRA將Hardware-Assisted Component執行於Host HardwareDevice以得到Native Execution之效能。為了確保Accuracy與正確的Data Dependency,本計畫提出了一個包含Bus Contention Delay之Deterministic Timing Model。為了增加擴充功能,VIRA整合了Software-Modeled Component以支援新增功能,並使用快速Data Pass-Through機制以減少被模擬元件間的Communication Overhead。我們透過在市售的SoC(System-on-Chip)板上實作此一技術來驗證所提出的Virtualization-Assisted全系統模擬。實驗結果顯示,除了能夠得到正確的Inter-Component Interaction結果,執行速度也比Commercial Functional Simulator快了58〜625倍。
We propose in this thesis a near-real-time performance full-system simulation approach with hardware acceleration using virtualization techniques. Traditional acceleration approaches generally cannot capture inter-component interactions due to unpredictable component simulation progress. Our approach leverages existing hardware virtualization framework and devises three key implementation techniques to achieve fast and accurate full-system simulations. First, our approach utilizes the virtualization framework trap mechanism and precisely intercepts inter-component interactions with no need to check every data access, but effectively maintains deterministic chronological orders of inter-component interactions. Second, VIRA provides very accurate system performance estimation for early system-level designs through effective integration of component timing models, interrupt effects, and bus contention analysis. Third, VIRA achieves near-real-time performance by having software and hardware simulated components executed on the same host machine to minimize the overhead of inter-component data exchange. We implement the proposed approach on a virtualization-enabled off-the-shelf System-on-Chip board to demonstrate the effectiveness of our idea. The experiments show that VIRA always produces deterministic results while running 58~625 times faster than a commercial tool and the system performance estimation is only 3~6% from real systems. Moreover, our deterministic full-system simulator is also verified to carry as little as 2~57% overhead compared to ideal native executions on the same host hardware devices.
摘要 2
Abstract 3
Contents 4
I. Introduction 7
II. Related work 13
2.1 Software-Based Simulation Acceleration 13
2.2 Hardware-Based Simulation Acceleration 15
2.3 High Abstraction Modeling Approaches 17
III. THE VIRTUALIZATION-ASSISTED APPROACH 20
3.1 Hardware Annotation 20
3.2 Data-Dependency-based Synchronization 25
3.3 Runtime Operation Timing Calculation 28
3.4 Contention-Aware Timing Model 30
IV. IMPLEMENTATION 34
4.1. Support Both HACs and SMCs 34
4.2 Integrate HACs and SMCs through Fast Data Path 36
4.3 Intercept Synchronization Points 37
4.4 Identify SDAs 39
4.5 VIRA Simulator Architecture 42
4.6 VIRA Full Simulation Flow 43
V. EXPERIMENTAL RESULTS 46
5.1 Performance Comparison 47
5.2 Full System Simulations Considering Bus Contention Effect 52
5.3 Full System Performance Estimation 53
VI. CONCLUSION 55
References 55
[1]Wu, M. H., Wang, P. C., Fu, C. Y., and Tsay, R. S. “A Distributed Timing Synchronization Technique for Parallel Multi-Core Instruction-Set Simulation.” In ACM Transactions on Embedded Computing Systems. no. 54. 2013.
[2]Cai, L., & Gajski, D. “Transaction level modeling: an overview.” In Proceedings of the 1st IEEE/ACM/IFIP international conference on Hardware/software codesign and system synthesis. pp. 19-24. 2003
[3]Bellard, F. “QEMU, a Fast and Portable Dynamic Translator.” In USENIX Annual Technical Conference. pp. 41-46. 2005.
[4]Khaligh, R. S., & Radetzki, M. “A dynamic load balancing method for parallel simulation of accuracy adaptive TLMs.” In Specification & Design Languages. pp. 1-6. 2010.
[5]Chen, J., Annavaram, M., & Dubois, M. “SlackSim: a platform for parallel simulations of CMPs on CMPs.” In ACM SIGARCH Computer Architecture News. pp. 20-29. 2009.
[6]Moy, M. (2013, March). “Parallel programming with SystemC for loosely timed models: a non-intrusive approach.” In Proceedings of the Conference on Design, Automation and Test in Europe. pp. 9-14. 2013.
[7]Weinstock, J. H., Schumacher, C., Leupers, R., Ascheid, G., & Tosoratto, L. “Time-decoupled parallel SystemC simulation.” In Proceedings of the Conference on Design, Automation and Test in Europe. pp.1-4. 2014.
[8]Vinco, S., Chatterjee, D., Bertacco, V., & Fummi, F. “SAGA: SystemC acceleration on GPU architectures.” In Proceedings of the Design Automation Conference. pp. 115-120. 2012.
[9]Sinha, R., Prakash, A., & Patel, H. D. “Parallel simulation of mixed-abstraction SystemC models on GPUs and multicore CPUs.” In Design Automation Conference Asia and South Pacific. pp. 455-460. 2012.
[10]Nakamura, Y., Hosokawa, K., Kuroda, I., Yoshikawa, K., & Yoshimura, T. “A fast hardware/software co-verification method for system-on-a-chip by using a C/C++ simulator and FPGA emulator with shared register communication.” In Proceedings of the Design Automation Conference. pp. 299-304. 2004.
[11]Chung, E. S., Nurvitadhi, E., Hoe, J. C., Falsafi, B., & Mai, K., “PROToFLEX: FPGA-accelerated hybrid functional simulator.” In Parallel and Distributed Processing Symposium. pp.1-6. 2007.
[12]Chiou, D., Sunwoo, D., Kim, J., Patil, N. A., Reinhart, W., Johnson, D. E. & Angepat, H. “FPGA -accelerated simulation technologies (fast): Fast, full-system, cycle-accurate simulators.” In Proceedings of the International Symposium on Microarchitecture. pp. 249-261. 2007.
[13]Tan, Z., Waterman, A., Avizienis, R., Lee, Y., Cook, H., Patterson, D., & Asanović, K. “RAMP gold: an FPGA-based architecture simulator for multiprocessors.” In Proceedings of the Design Automation Conference. pp. 463-468. 2010.
[14]Dall, C., & Nieh, J. “KVM/ARM: the design and implementation of the linux ARM hypervisor.” In ACM SIGARCH Computer Architecture News. pp. 333-348. 2014.
[15]Erdfelt, J., & Drake, D. LibUSB Homepage. http://www.libusb.org.
[16]Woo, S. C., Ohara, M., Torrie, E., Singh, J. P., & Gupta, A. “The SPLASH-2 programs: Characterization and methodological considerations.” In ACM SIGARCH computer architecture news. pp. 24-36. 1995.
[17]https://www.96boards.org/product/rock960/
[18]Russell, R. “ virtio: towards a de-facto standard for virtual I/O devices.” In ACM SIGOPS Operating Systems Review. pp.95-103. 2008.
[19]Chandran, P., Chandra, J., Simon, B. P., & Ravi, D. “Parallelizing SystemC kernel for fast hardware simulation on SMP machines.” In Proceedings of the ACM/IEEE/SCS 23rd Workshop on Principles of Advanced and Distributed Simulation. pp. 80-87. 2009.
[20]Raghav, S., Marongiu, A., Pinto, C., Atienza, D., Ruggiero, M., & Benini, L. “Full-system simulation of many-core heterogeneous SOCs using GPU and QEMU semihosting.” In Proceedings of the Workshop on General Purpose Processing with Graphics Processing Units. pp. 101-109. 2012.
[21]Pellauer, M., Adler, M., Kinsy, M., Parashar, A., & Emer, J. “HAsim: FPGA-based high-detail multicore simulation using time-division multiplexing.” In High Performance Computer Architecture International Symposium. pp. 406-417. 2011.
[22]Tan, Z., Waterman, A., Cook, H., Bird, S., Asanović, K., & Patterson, D. “A case for FAME: FPGA architecture model execution.” In ACM SIGARCH Computer Architecture News. pp. 290-301. 2010.
[23]Mukherjee, S. S., Reinhardt, S.K., Falsafi, B., Litzkow, M., Hill, M.D., Wood, D.A., Huss-Lederman, S. & Larus, J.R. “Wisconsin Wind Tunnel II: a fast, portable parallel architecture simulator.” In IEEE Concurrency. pp.12-20. 2000.
[24]Kivity, A., Kamay, Y., Laor, D., Lublin, U., & Liguori, A. “kvm: the Linux virtual machine monitor.” In Proceedings of the Linux Symposium. pp. 225-230. 2007.
[25]Khaligh, R. S., & Radetzki, M. “Efficient parallel transaction level simulation by exploiting temporal decoupling.” In Analysis, Architectures and Modelling of Embedded Systems. pp. 149-158. 2009.
[26]Matteo Monchiero, Jung Ho Ahn, Ayose Falcón, Daniel Ortega, and Paolo Faraboschi. “How to simulate 1000 cores.” In ACM SIGARCH Computer Architecture News 37, no. 2. pp. 10-19. 2009.
[27]Rodman, N. “ARM FastModels–Virtual Platforms for Embedded Software Development.” In Information Quarterly Magazine. pp. 33-36. 2008.
[28]Lo, Chen Kang, and Ren Song Tsay. “Automatic generation of Cycle Accurate and Cycle Count Accurate transaction level bus models from a formal model.” In Design Automation Conference Asia and South Pacific. pp. 558-563. 2009.
[29]Pasricha, S., Dutt, N., & Ben-Romdhane, M. “Fast exploration of bus-based communication architectures at the CCATB abstraction.” In ACM Transactions on Embedded Computing Systems (TECS), 2008.
[30]Caldari, M., Conti, M., Coppola, M., Curaba, S., Pieralisi, L., & Turchetti, C.). “Transaction-level models for AMBA bus architecture using SystemC 2.0.” In Proceedings of the conference on Design, Automation and Test in Europe: Designers' Forum-Volume 2. (p. 20026). 2003.
[31]Radetzki, M., & Khaligh, R. S. “Modelling Alternatives for Cycle Approximate Bus TLMs.” In FDL. pp. 74-79. 2007.
[32]Rosén, J., Neikter, C. F., Eles, P., Peng, Z., Burgio, P., & Benini, L. “Bus access design for combined worst and average case execution time optimization of predictable real-time applications on multiprocessor systems-on-chip.” In Real-Time and Embedded Technology and Applications Symposium (RTAS), pp. 291-301. 2011.
[33]Mao-Lin Li, Chen-Kang Lo, Li-Chun Chen, Hong-Jie Huang, Jen-Chieh Yeh, Ren-Song Tsay, “A Formal Full Bus TLM Modeling for Fast and Accurate Contention Analysis,” In the 17th Workshop on Synthesis And System Integration of Mixed Information technologies. 2012.
[34]Hwang, Y., Abdi, S., & Gajski, D. "Cycle-approximate retargetable performance estimation at the transaction level.” In Proceedings of the conference on Design, automation and test in Europe. pp. 3-8. 2008.
[35]Schirrmeister, F., Benchorin, S., & Thoen, F. “Using virtual platforms for pre-silicon software development.” In White paper, Synopsys. 2008
[36]Wang, Z., Liu, R., Chen, Y., Wu, X., Chen, H., Zhang, W., & Zang, B. “COREMU: a scalable and portable parallel full-system emulator.” In ACM SIGPLAN Notices, 46(8). pp. 213-222. 2011.
[37]Crockett, L. H., Elliot, R. A., Enderwitz, M. A., & Stewart, R. W. “The Zynq Book: Embedded Processing with the Arm Cortex-A9 on the Xilinx Zynq-7000 All Programmable Soc.” In Strathclyde Academic Media. 2014.
[38]Bammi, J. R., Kruijtzer, W., Lavagno, L., Harcourt, E., & Lazarescu, M. T. “Software performance estimation strategies in a system-level design tool.” In Proceedings of the eighth international workshop on Hardware/software codesign. pp. 82-86. 2000.
[39]Popek, G. J., & Goldberg, R. P. “Formal requirements for virtualizable third generation architectures.” In Communications of the ACM. pp. 412-421. 1974.
[40]Ding, J. H., Chang, P. C., Hsu, W. C., & Chung, Y. C. “PQEMU: A parallel system emulator based on QEMU.” In Parallel and Distributed Systems (ICPADS), 2011 IEEE 17th International Conference. pp. 276-283. 2011.
[41]Hong, D. Y., Hsu, C. C., Yew, P. C., Wu, J. J., Hsu, W. C., Liu, P., & Chung, Y. C. “HQEMU: a multi-threaded and retargetable dynamic binary translator on multicores.” In Proceedings of the Tenth International Symposium on Code Generation and Optimization pp. 104-113. 2012.
[42]Bringmann, O., Ecker, W., Gerstlauer, A., Goyal, A., Mueller-Gritschneder, D., Sasidharan, P., & Singh, S. “The next generation of virtual prototyping: Ultra-fast yet accurate simulation of HW/SW systems.” In Proceedings of the Design, Automation & Test in Europe Conference. pp. 1698-1707. 2015.
[43]Vinco, S., Guarnieri, V., & Fummi, F. “Code Manipulation for Virtual Platform Integration.” In IEEE Transactions on Computers, 65(9), pp. 2694-2708. 2016.
[44]Sandberg, A., Nikoleris, N., Carlson, T. E., Hagersten, E., Kaxiras, S., & Black-Schaffer, D. “Full speed ahead: Detailed architectural simulation at near-native speed.“ In Workload Characterization International Symposium. pp. 183-192. 2015.
[45]R. E. Wunderlich, T. F. Wenisch, B. Falsafi, and J. C. Hoe, “SMARTS: Accelerating Microarchitecture Simulation via Rigorous Statistical Sampling,” In Proc. International Symposium on Computer Architecture (ISCA), pp. 84–95, 2003.
[46]E. Perelman, G. Hamerly, and B. Calder. “Picking Statistically Valid and Early Simulation Points,” In the International Symposium on Parallel Architecture and Compilation Techniques, 2003.
[47]Sugerman, J., Venkitachalam, G., & Lim, B. H. “Virtualizing I/O Devices on VMware Workstation's Hosted Virtual Machine Monitor, “ In USENIX Annual Technical Conference, General Track. pp. 1-14, 2001.
[48]Lamport, L. “How to make a multiprocessor computer that correctly executes multiprocess program, “ In IEEE transactions on computers, (9), pp. 690-691. 1979
[49]Chen, S. Y., Chen, C. H., & Tsay, R. S. “An activity-sensitive contention delay model for highly efficient deterministic full-system simulations.” In Design, Automation and Test in Europe Conference and Exhibition. pp. 1-6. 2014.
[50]Zukerman, M. “Introduction to queueing theory and stochastic teletraffic models,“ In arXiv preprint arXiv:1307.2968. 2013
[51]Fritts, J. E., Steiling, F. W., & Tucek, J. A. “Mediabench II video: expediting the next generation of video systems research,” In Embedded Processors for Multimedia and Communications II (Vol. 5683). pp. 79-94. 2005.
[52]x265 [Online]. Available: http://x265.org
[53]Fan-Wei Yu, Bo-Han Zeng, Yu-Hung Huang, Hsin-I Wu, Che-Rung Lee and Ren-Song Tsay “A Critical-Section-Level Timing Synchronization Approach for Deterministic Multi-Core Instruction-Set Simulations,” In Design, Automation and Test in Europe Conference and Exhibition. 2013
[54]Jones, M. T. “Linux initial RAM disk (initrd) overview,“ In IBM developerworks, linux, Technical library. 2006
[55]Schirner, G., & Domer, R. “Result-oriented modeling—A novel technique for fast and accurate TLM,“ In IEEE Transactions on computer-aided design of integrated circuits and systems. pp. 1688-1699. 2007
[56]Wu, M. H., Wang, P. C., Fu, C. Y., and Tsay, R. S.,”A Distributed Timing Synchronization Technique for Parallel Multi-Core Instruction-Set Simulation”. In ACM Transactions on Embedded Computing Systems. 2013.
[57]Wu, H. I., Chen, C. K., Lu, T. Y., & Tsay, R. S., “A highly efficient full-system virtual prototype based on virtualization-assisted approach”. In Design, Automation & Test in Europe Conference & Exhibition. 2018
[58]Iqbal, S. M. Z., Liang, Y., & Grahn, H., “Parmibench-an open-source benchmark for embedded multiprocessor systems”. In IEEE Computer Architecture Letters, 9(2), pp. 45-48. 2010
[59]http://cubieboard.org/model/
[60]Karandikar, S., Mao, H., Kim, D., Biancolin, D., Amid, A., Lee, D., & Huang, Q., “FireSim: FPGA-accelerated cycle-exact scale-out system simulation in the public cloud”. In Proceedings of the 45th Annual International Symposium on Computer Architecture, pp. 29-42. 2018
 
 
 
 
第一頁 上一頁 下一頁 最後一頁 top
* *