|
[1]Wu, M. H., Wang, P. C., Fu, C. Y., and Tsay, R. S. “A Distributed Timing Synchronization Technique for Parallel Multi-Core Instruction-Set Simulation.” In ACM Transactions on Embedded Computing Systems. no. 54. 2013. [2]Cai, L., & Gajski, D. “Transaction level modeling: an overview.” In Proceedings of the 1st IEEE/ACM/IFIP international conference on Hardware/software codesign and system synthesis. pp. 19-24. 2003 [3]Bellard, F. “QEMU, a Fast and Portable Dynamic Translator.” In USENIX Annual Technical Conference. pp. 41-46. 2005. [4]Khaligh, R. S., & Radetzki, M. “A dynamic load balancing method for parallel simulation of accuracy adaptive TLMs.” In Specification & Design Languages. pp. 1-6. 2010. [5]Chen, J., Annavaram, M., & Dubois, M. “SlackSim: a platform for parallel simulations of CMPs on CMPs.” In ACM SIGARCH Computer Architecture News. pp. 20-29. 2009. [6]Moy, M. (2013, March). “Parallel programming with SystemC for loosely timed models: a non-intrusive approach.” In Proceedings of the Conference on Design, Automation and Test in Europe. pp. 9-14. 2013. [7]Weinstock, J. H., Schumacher, C., Leupers, R., Ascheid, G., & Tosoratto, L. “Time-decoupled parallel SystemC simulation.” In Proceedings of the Conference on Design, Automation and Test in Europe. pp.1-4. 2014. [8]Vinco, S., Chatterjee, D., Bertacco, V., & Fummi, F. “SAGA: SystemC acceleration on GPU architectures.” In Proceedings of the Design Automation Conference. pp. 115-120. 2012. [9]Sinha, R., Prakash, A., & Patel, H. D. “Parallel simulation of mixed-abstraction SystemC models on GPUs and multicore CPUs.” In Design Automation Conference Asia and South Pacific. pp. 455-460. 2012. [10]Nakamura, Y., Hosokawa, K., Kuroda, I., Yoshikawa, K., & Yoshimura, T. “A fast hardware/software co-verification method for system-on-a-chip by using a C/C++ simulator and FPGA emulator with shared register communication.” In Proceedings of the Design Automation Conference. pp. 299-304. 2004. [11]Chung, E. S., Nurvitadhi, E., Hoe, J. C., Falsafi, B., & Mai, K., “PROToFLEX: FPGA-accelerated hybrid functional simulator.” In Parallel and Distributed Processing Symposium. pp.1-6. 2007. [12]Chiou, D., Sunwoo, D., Kim, J., Patil, N. A., Reinhart, W., Johnson, D. E. & Angepat, H. “FPGA -accelerated simulation technologies (fast): Fast, full-system, cycle-accurate simulators.” In Proceedings of the International Symposium on Microarchitecture. pp. 249-261. 2007. [13]Tan, Z., Waterman, A., Avizienis, R., Lee, Y., Cook, H., Patterson, D., & Asanović, K. “RAMP gold: an FPGA-based architecture simulator for multiprocessors.” In Proceedings of the Design Automation Conference. pp. 463-468. 2010. [14]Dall, C., & Nieh, J. “KVM/ARM: the design and implementation of the linux ARM hypervisor.” In ACM SIGARCH Computer Architecture News. pp. 333-348. 2014. [15]Erdfelt, J., & Drake, D. LibUSB Homepage. http://www.libusb.org. [16]Woo, S. C., Ohara, M., Torrie, E., Singh, J. P., & Gupta, A. “The SPLASH-2 programs: Characterization and methodological considerations.” In ACM SIGARCH computer architecture news. pp. 24-36. 1995. [17]https://www.96boards.org/product/rock960/ [18]Russell, R. “ virtio: towards a de-facto standard for virtual I/O devices.” In ACM SIGOPS Operating Systems Review. pp.95-103. 2008. [19]Chandran, P., Chandra, J., Simon, B. P., & Ravi, D. “Parallelizing SystemC kernel for fast hardware simulation on SMP machines.” In Proceedings of the ACM/IEEE/SCS 23rd Workshop on Principles of Advanced and Distributed Simulation. pp. 80-87. 2009. [20]Raghav, S., Marongiu, A., Pinto, C., Atienza, D., Ruggiero, M., & Benini, L. “Full-system simulation of many-core heterogeneous SOCs using GPU and QEMU semihosting.” In Proceedings of the Workshop on General Purpose Processing with Graphics Processing Units. pp. 101-109. 2012. [21]Pellauer, M., Adler, M., Kinsy, M., Parashar, A., & Emer, J. “HAsim: FPGA-based high-detail multicore simulation using time-division multiplexing.” In High Performance Computer Architecture International Symposium. pp. 406-417. 2011. [22]Tan, Z., Waterman, A., Cook, H., Bird, S., Asanović, K., & Patterson, D. “A case for FAME: FPGA architecture model execution.” In ACM SIGARCH Computer Architecture News. pp. 290-301. 2010. [23]Mukherjee, S. S., Reinhardt, S.K., Falsafi, B., Litzkow, M., Hill, M.D., Wood, D.A., Huss-Lederman, S. & Larus, J.R. “Wisconsin Wind Tunnel II: a fast, portable parallel architecture simulator.” In IEEE Concurrency. pp.12-20. 2000. [24]Kivity, A., Kamay, Y., Laor, D., Lublin, U., & Liguori, A. “kvm: the Linux virtual machine monitor.” In Proceedings of the Linux Symposium. pp. 225-230. 2007. [25]Khaligh, R. S., & Radetzki, M. “Efficient parallel transaction level simulation by exploiting temporal decoupling.” In Analysis, Architectures and Modelling of Embedded Systems. pp. 149-158. 2009. [26]Matteo Monchiero, Jung Ho Ahn, Ayose Falcón, Daniel Ortega, and Paolo Faraboschi. “How to simulate 1000 cores.” In ACM SIGARCH Computer Architecture News 37, no. 2. pp. 10-19. 2009. [27]Rodman, N. “ARM FastModels–Virtual Platforms for Embedded Software Development.” In Information Quarterly Magazine. pp. 33-36. 2008. [28]Lo, Chen Kang, and Ren Song Tsay. “Automatic generation of Cycle Accurate and Cycle Count Accurate transaction level bus models from a formal model.” In Design Automation Conference Asia and South Pacific. pp. 558-563. 2009. [29]Pasricha, S., Dutt, N., & Ben-Romdhane, M. “Fast exploration of bus-based communication architectures at the CCATB abstraction.” In ACM Transactions on Embedded Computing Systems (TECS), 2008. [30]Caldari, M., Conti, M., Coppola, M., Curaba, S., Pieralisi, L., & Turchetti, C.). “Transaction-level models for AMBA bus architecture using SystemC 2.0.” In Proceedings of the conference on Design, Automation and Test in Europe: Designers' Forum-Volume 2. (p. 20026). 2003. [31]Radetzki, M., & Khaligh, R. S. “Modelling Alternatives for Cycle Approximate Bus TLMs.” In FDL. pp. 74-79. 2007. [32]Rosén, J., Neikter, C. F., Eles, P., Peng, Z., Burgio, P., & Benini, L. “Bus access design for combined worst and average case execution time optimization of predictable real-time applications on multiprocessor systems-on-chip.” In Real-Time and Embedded Technology and Applications Symposium (RTAS), pp. 291-301. 2011. [33]Mao-Lin Li, Chen-Kang Lo, Li-Chun Chen, Hong-Jie Huang, Jen-Chieh Yeh, Ren-Song Tsay, “A Formal Full Bus TLM Modeling for Fast and Accurate Contention Analysis,” In the 17th Workshop on Synthesis And System Integration of Mixed Information technologies. 2012. [34]Hwang, Y., Abdi, S., & Gajski, D. "Cycle-approximate retargetable performance estimation at the transaction level.” In Proceedings of the conference on Design, automation and test in Europe. pp. 3-8. 2008. [35]Schirrmeister, F., Benchorin, S., & Thoen, F. “Using virtual platforms for pre-silicon software development.” In White paper, Synopsys. 2008 [36]Wang, Z., Liu, R., Chen, Y., Wu, X., Chen, H., Zhang, W., & Zang, B. “COREMU: a scalable and portable parallel full-system emulator.” In ACM SIGPLAN Notices, 46(8). pp. 213-222. 2011. [37]Crockett, L. H., Elliot, R. A., Enderwitz, M. A., & Stewart, R. W. “The Zynq Book: Embedded Processing with the Arm Cortex-A9 on the Xilinx Zynq-7000 All Programmable Soc.” In Strathclyde Academic Media. 2014. [38]Bammi, J. R., Kruijtzer, W., Lavagno, L., Harcourt, E., & Lazarescu, M. T. “Software performance estimation strategies in a system-level design tool.” In Proceedings of the eighth international workshop on Hardware/software codesign. pp. 82-86. 2000. [39]Popek, G. J., & Goldberg, R. P. “Formal requirements for virtualizable third generation architectures.” In Communications of the ACM. pp. 412-421. 1974. [40]Ding, J. H., Chang, P. C., Hsu, W. C., & Chung, Y. C. “PQEMU: A parallel system emulator based on QEMU.” In Parallel and Distributed Systems (ICPADS), 2011 IEEE 17th International Conference. pp. 276-283. 2011. [41]Hong, D. Y., Hsu, C. C., Yew, P. C., Wu, J. J., Hsu, W. C., Liu, P., & Chung, Y. C. “HQEMU: a multi-threaded and retargetable dynamic binary translator on multicores.” In Proceedings of the Tenth International Symposium on Code Generation and Optimization pp. 104-113. 2012. [42]Bringmann, O., Ecker, W., Gerstlauer, A., Goyal, A., Mueller-Gritschneder, D., Sasidharan, P., & Singh, S. “The next generation of virtual prototyping: Ultra-fast yet accurate simulation of HW/SW systems.” In Proceedings of the Design, Automation & Test in Europe Conference. pp. 1698-1707. 2015. [43]Vinco, S., Guarnieri, V., & Fummi, F. “Code Manipulation for Virtual Platform Integration.” In IEEE Transactions on Computers, 65(9), pp. 2694-2708. 2016. [44]Sandberg, A., Nikoleris, N., Carlson, T. E., Hagersten, E., Kaxiras, S., & Black-Schaffer, D. “Full speed ahead: Detailed architectural simulation at near-native speed.“ In Workload Characterization International Symposium. pp. 183-192. 2015. [45]R. E. Wunderlich, T. F. Wenisch, B. Falsafi, and J. C. Hoe, “SMARTS: Accelerating Microarchitecture Simulation via Rigorous Statistical Sampling,” In Proc. International Symposium on Computer Architecture (ISCA), pp. 84–95, 2003. [46]E. Perelman, G. Hamerly, and B. Calder. “Picking Statistically Valid and Early Simulation Points,” In the International Symposium on Parallel Architecture and Compilation Techniques, 2003. [47]Sugerman, J., Venkitachalam, G., & Lim, B. H. “Virtualizing I/O Devices on VMware Workstation's Hosted Virtual Machine Monitor, “ In USENIX Annual Technical Conference, General Track. pp. 1-14, 2001. [48]Lamport, L. “How to make a multiprocessor computer that correctly executes multiprocess program, “ In IEEE transactions on computers, (9), pp. 690-691. 1979 [49]Chen, S. Y., Chen, C. H., & Tsay, R. S. “An activity-sensitive contention delay model for highly efficient deterministic full-system simulations.” In Design, Automation and Test in Europe Conference and Exhibition. pp. 1-6. 2014. [50]Zukerman, M. “Introduction to queueing theory and stochastic teletraffic models,“ In arXiv preprint arXiv:1307.2968. 2013 [51]Fritts, J. E., Steiling, F. W., & Tucek, J. A. “Mediabench II video: expediting the next generation of video systems research,” In Embedded Processors for Multimedia and Communications II (Vol. 5683). pp. 79-94. 2005. [52]x265 [Online]. Available: http://x265.org [53]Fan-Wei Yu, Bo-Han Zeng, Yu-Hung Huang, Hsin-I Wu, Che-Rung Lee and Ren-Song Tsay “A Critical-Section-Level Timing Synchronization Approach for Deterministic Multi-Core Instruction-Set Simulations,” In Design, Automation and Test in Europe Conference and Exhibition. 2013 [54]Jones, M. T. “Linux initial RAM disk (initrd) overview,“ In IBM developerworks, linux, Technical library. 2006 [55]Schirner, G., & Domer, R. “Result-oriented modeling—A novel technique for fast and accurate TLM,“ In IEEE Transactions on computer-aided design of integrated circuits and systems. pp. 1688-1699. 2007 [56]Wu, M. H., Wang, P. C., Fu, C. Y., and Tsay, R. S.,”A Distributed Timing Synchronization Technique for Parallel Multi-Core Instruction-Set Simulation”. In ACM Transactions on Embedded Computing Systems. 2013. [57]Wu, H. I., Chen, C. K., Lu, T. Y., & Tsay, R. S., “A highly efficient full-system virtual prototype based on virtualization-assisted approach”. In Design, Automation & Test in Europe Conference & Exhibition. 2018 [58]Iqbal, S. M. Z., Liang, Y., & Grahn, H., “Parmibench-an open-source benchmark for embedded multiprocessor systems”. In IEEE Computer Architecture Letters, 9(2), pp. 45-48. 2010 [59]http://cubieboard.org/model/ [60]Karandikar, S., Mao, H., Kim, D., Biancolin, D., Amid, A., Lee, D., & Huang, Q., “FireSim: FPGA-accelerated cycle-exact scale-out system simulation in the public cloud”. In Proceedings of the 45th Annual International Symposium on Computer Architecture, pp. 29-42. 2018
|