帳號:guest(18.116.38.243)          離開系統
字體大小: 字級放大   字級縮小   預設字形  

詳目顯示

以作者查詢圖書館館藏以作者查詢臺灣博碩士論文系統以作者查詢全國書目
作者(中文):張子逸
作者(外文):Chang, Tzu-Yi
論文名稱(中文):在指令層級仿真器進行快速且可組態的時脈近似時間估算
論文名稱(外文):Configurable Fast Cycle-approximate Timing Estimation for Instruction-level Emulators
指導教授(中文):金仲達
指導教授(外文):King, Chung-Ta
口試委員(中文):陳添福
李哲榮
學位類別:碩士
校院名稱:國立清華大學
系所名稱:資訊工程學系
學號:103062520
出版年(民國):105
畢業學年度:104
語文別:英文中文
論文頁數:45
中文關鍵詞:時間估算功能仿真器動態二進制轉譯
外文關鍵詞:Timing EstimationFunctional EmulatorDynamic Binary Translation
相關次數:
  • 推薦推薦:0
  • 點閱點閱:509
  • 評分評分:*****
  • 下載下載:0
  • 收藏收藏:0
時間估算對於設計微處理器來說是非常重要的。而要在處理器龐大的設計空間進行開發,時間估算必須要能夠達到快速的執行並且保有足夠的準確度。像是Gem5這類的時脈精準模擬器(cycle-accurate simulator)雖然能夠提供非常準確的時間資訊,但他的執行速度卻非常地緩慢。有一個可行的替代方案是在功能仿真模擬器或是仿真器(functional simulator/emulator)上結合適合的時間模型,這些時間模型分別預估不同的處理器元件的時間資訊,像是指令管線、快取和分支預測等,而如何結合這些時間模組所產生的時間資訊並達到準確的時間預估是一個需要解決的問題。一方面是因為在實際的處理器中,這些處理器元件彼此之間會互相交流來運行,而為了達到準確的時間估算,時間模型彼此之間也必須互相交流來進行估算。另一方面,為了讓不同的時間模型能夠輕易地被置入模擬器中以達到可組態的目的,同時能夠透過重複使用已經預估過的時間資訊以達到快速的模擬,我們希望時間模型彼此之間可以盡可能的獨立。在這篇論文中,我們提出能夠在指令層級仿真器進行快速且可組態的時脈近似時間估算的方法來達到上述的目的。我們擴充Qemu這個動態二進制碼轉譯的功能仿真器,使他能夠置入不同的時間模型以提供時脈近似的時間資訊,同時,我們也提出Timing Record Caching的技術以保有快速的模擬速度,這需要讓每個基本塊(basic block)每次透過指令管線時間模型預估的時間是固定的並且與其他的時間模組之間彼此獨立,我們會說明如何完成這個需求並同時維持時間資訊的準確性。最後,我們也提供剖析(instrumentation and profiling)的功能來幫助使用者去做分析以及找到設計的瓶頸。
Timing estimation plays an important role in designing today`s microprocessors. To exploit the huge design space of processors, timing estimation must be performed very fast while maintaining sufficient accuracy. Cycle-accurate simulators such as Gem5 can produce very accurate timing information but are too slow. A viable alternative is to use functional simulators/emulators coupled with appropriate timing models. The problem is how to integrate the timing data from the timing models of different processor components, e.g. pipeline, branch predictor, and cache, to produce accurate timing estimation. On the one hand, as the processor components interact with each other in real machines, the timing models should also interact for accurate timing estimation. On the other hand, we hope that the timing models of different processor components be as independent as possible so that the models can be pluggable for configurability and their timing data can be reused for fast simulation. In this thesis, we present a configurable fast cycle-approximate timing estimation method for instruction-level emulators that addresses the above issue. We extend QEMU, a dynamic binary translation functional emulator, to allow timing models of different processor components to be plugged in to provide cycle-approximate timing information, while at the same time keeps the simulation speed fast by providing a Timing Record Caching technique. The latter requires that the pipeline timing of each basic block to be fixed and independent of other models. We show how this can done while still maintain timing accuracy. Finally, our tool also provides instrumentation and profiling capability to help designers analyze and find design bottlenecks.
1 Introduction 1
2 Background 5
2.1 Dynamic Binary Translation . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.2 Tiny Code Generation and Code Cache . . . . . . . . . . . . . . . . . . . . . 6
2.3 Helper Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
3 Proposed Methodology 9
3.1 Con gurable Simulator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
3.2 Timing Record Caching . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
4 Implementation 15
4.1 Execution Time of Instructions . . . . . . . . . . . . . . . . . . . . . . . . . 15
4.2 Static Timing Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
4.3 Dynamic Timing Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
4.3.1 Cache Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
4.3.2 Branch Prediction Model . . . . . . . . . . . . . . . . . . . . . . . . . 22
4.4 Instrumentation and Pro ling . . . . . . . . . . . . . . . . . . . . . . . . . . 24
5 Experiment and Results 25
5.1 Experiment Environment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
5.2 Experiment Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
5.2.1 Timing Validation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
5.2.2 Performance improvement . . . . . . . . . . . . . . . . . . . . . . . . 30
5.2.3 Instrumentation and Pro ling . . . . . . . . . . . . . . . . . . . . . . 31
6 Related Works 34
7 Conclusion and Future Work 37
A Observation of the Processor Behavior 39
[1] Fabrice Bellard, “Qemu, a fast and portable dynamic translator.”, in USENIX Annual Technical Conference, FREENIX Track, 2005, pp. 41–46.
[2] Jan Gustafsson, Adam Betts, Andreas Ermedahl, and Bjo ̈rn Lisper, “The ma ̈lardalen wcet benchmarks: Past, present and future”, in OASIcs-OpenAccess Series in Infor- matics. Schloss Dagstuhl-Leibniz-Zentrum fuer Informatik, 2010, vol. 15.
[3] Tran Van Dung, Ittetsu Taniguchi, Takuji Hieda, and Hiroyuki Tomiyama, “Function profiling for embedded software by utilizing qemu and analyzer tool”, in 2013 IEEE 56th International Midwest Symposium on Circuits and Systems (MWSCAS). IEEE, 2013, pp. 1251–1254.
[4] Bojan Mihajlovi ́c, Zˇeljko Zˇili ́c, and Warren J Gross, “Dynamically instrumenting the qemu emulator for linux process trace generation with the gdb debugger”, ACM Trans- actions on Embedded Computing Systems (TECS), vol. 13, no. 5s, pp. 167, 2014.
[5] Xin Tong and Andreas Moshovos, “Qtrace: a framework for customizable full system instrumentation”, in Performance Analysis of Systems and Software (ISPASS), 2015 IEEE International Symposium on. IEEE, 2015, pp. 245–255.
[6] Anderson Luiz Sartor, Ulisses Brisolara Corrˆea, and Antonio Carlos Schneider
Beck Filho, “Androprof: A profiling tool for the android platform”, in 2013 III Brazilian Symposium on Computing Systems Engineering. IEEE, 2013, pp. 23–28.
[7] Tzu-Hsiang Su, Wei-Shan Wu, Chen-Te Chou, Yuan-Chun Cheng, Meng-Ting Tsai, and Tien-Fu Chen, “Accelerating full-system simulation and app analysis through focused multi-granularity profiling”, in Electronic System Level Synthesis Conference (ESLsyn), Proceedings of the 2014. IEEE, 2014, pp. 1–6.
[8] Nathan Binkert, Bradford Beckmann, Gabriel Black, Steven K Reinhardt, Ali Saidi, Arkaprava Basu, Joel Hestness, Derek R Hower, Tushar Krishna, Somayeh Sardashti, et al., “The gem5 simulator”, ACM SIGARCH Computer Architecture News, vol. 39, no. 2, pp. 1–7, 2011.
[9] Nathan L Binkert, Ronald G Dreslinski, Lisa R Hsu, Kevin T Lim, Ali G Saidi, and Steven K Reinhardt, “The m5 simulator: Modeling networked systems”, IEEE Micro, vol. 26, no. 4, pp. 52–60, 2006.
[10] Milo MK Martin, Daniel J Sorin, Bradford M Beckmann, Michael R Marty, Min Xu, Alaa R Alameldeen, Kevin E Moore, Mark D Hill, and David A Wood, “Multifacet’s general execution-driven multiprocessor simulator (gems) toolset”, ACM SIGARCH Computer Architecture News, vol. 33, no. 4, pp. 92–99, 2005.
[11] Igor Bo ̈hm, Bj ̈orn Franke, and Nigel Topham, “Cycle-accurate performance modelling in an ultra-fast just-in-time dynamic binary translation instruction set simulator”, in Embedded Computer Systems (SAMOS), 2010 International Conference on. IEEE, 2010, pp. 1–10.
[12] Ho Young Kim and Tag Gon Kim, “Performance simulation modeling for fast evaluation of pipelined scalar processor by evaluation reuse”, in Proceedings of the 42nd annual Design Automation Conference. ACM, 2005, pp. 341–344.
[13] Yan Luo, Ying Li, Xinyu Yuan, and Rong Yin, “Qsim: Framework for cycle-accurate simulation on out-of-order processors based on qemu”, in Instrumentation, Measure- ment, Computer, Communication and Control (IMCCC), 2012 Second International Conference on. IEEE, 2012, pp. 1010–1015.
[14] David Thach, Yutaka Tamiya, Shin’ya Kuwamura, and Atsushi Ike, “Fast cycle estima- tion methodology for instruction-level emulator”, in 2012 Design, Automation & Test in Europe Conference & Exhibition (DATE). IEEE, 2012, pp. 248–251.
(此全文限內部瀏覽)
電子全文
摘要
 
 
 
 
第一頁 上一頁 下一頁 最後一頁 top
* *