|
[1] Hidehiro Fujiwara, Shunsuke Okumura, Yusuke Iguchi, Hiroki Noguchi, Hiroshi Kawaguchi, and Masahiko Yoshimoto. A dependable sram with 7t/14t memory cells. In ieice transactions on electronics, pages 423{432, 2009. [2] A. Jain and C. Lin. Back to the future: Leveraging belady's algorithm for improved cache replacement. In 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA), pages 78{89, 2016. [3] Changkyu Kim, Doug Burger, and Stephen W. Keckler. An adaptive, non-uniform cache structure for wire-delay dominated on-chip caches. In Kourosh Gharachorloo, editor, Proceedings of the 10th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS-X), San Jose, California, USA, October 5-9, 2002., pages 211{222. ACM Press, 2002. [4] D. Liu. Asip (application specific instruction-set processors) design. In 2009 IEEE 8th International Conference on ASIC, pages 16{16, 2009. [5] John L. Henning. Spec cpu2006 benchmark descriptions. SIGARCH Comput. Archit. News, 34(4):1{17, September 2006. [6] Scott Beamer, Krste Asanovic, and David A. Patterson. The GAP benchmark suite. CoRR, abs/1508.03619, 2015. [7] Bibiche Geuskens and Kenneth Rose. Modeling microprocessor performance. Springer Science & Business Media, 2012. [8] Naveen Verma and Anantha P. Chandrakasan. A 256 kb 65 nm 8t subthreshold sram employing sense-amplifier redundancy. In IEEE Journal of Solid-State Circuits, pages 141{149, 2008. [9] Ming-Hsien Tu, Jihi-Yu Lin, Ming-Chien Tsai, Chien-Yu Lu, Yuh-Jiun Lin, Meng-Hsueh Wang, Huan-Shun Huang, Kuen-Di Lee, Wei-Chiang (Willis) Shih, Shyh-Jye Jou, and Ching-Te Chuang. A single-ended disturb-free 9t subthreshold sram with cross-point data-aware write word-line structure, negative bit-line, and adaptive read operation timing tracing. In IEEE Journal of Solid-State Circuits, pages 1469{1482, 2012. [10] Ik Joon Chang, Jae-Joon Kim, Sang Phill Park, and Kaushik Roy. A 32kb 10t sub-threshold SRAM array with bit-interleaving and differential read scheme in 90nm CMOS. In 2008 IEEE International Solid-State Circuits Conference, ISSCC 2008, Digest of Technical Papers, San Francisco, CA, USA, February 3-7, 2008, pages 388-389. IEEE, 2008. [11] Yi-Wei Chiu, Yu-Hao Hu, Ming-Hsien Tu, Jun-Kai Zhao, Yuan-Hua Chu, Shyh-Jye Jou, and Ching-Te Chuang. 40 nm bit-interleaving 12t subthreshold SRAM with data-aware write-assist. IEEE Trans. on Circuits and Systems, 61-I(9):2578{2585, 2014. [12] C.D. Moore, S.J. Keller, and A.J. Martin. Ultra-low-power variation-tolerant radiation-hardened cache design, December 10 2013. US Patent 8,605,516. [13] John L Hennessy and David A Patterson. Computer architecture: a quantitative approach. Elsevier, 2017. [14] J. Lira, C. Molina, and A. Gonz´lez. Hk-nuca: Boosting data searches in dynamic non-uniform cache architectures for chip multiprocessors. In 2011 IEEE International Parallel Distributed Processing Symposium, pages 419{430, May 2011. [15] Anurag Mukkara, Nathan Beckmann, and Daniel Sanchez. Whirlpool: Improving dynamic cache management with static data classification. In Proceedings of the Twenty-First International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS '16, page 113{127, New York, NY, USA, 2016. Association for Computing Machinery. [16] N. Beckmann, P. Tsai, and D. Sanchez. Scaling distributed cache hierarchies through computation and data co-scheduling. In 2015 IEEE 21st International Symposium on High Performance Computer Architecture (HPCA), pages 538{550, Feb 2015. [17] Ashok B. Mehta. SystemVerilog Assertions and Functional Coverage: Guide to Language, Methodology and Applications. Springer Publishing Company, Incorporated, 2nd edition, 2016. [18] Kevin Reick, Pia N. Sanda, Scott B. Swaney, Jeffrey W. Kellington, Michael J. Mack, Michael S. Floyd, and Daniel Henderson. Fault-tolerant design of the IBM power6 microprocessor. IEEE Micro, 28(2):30{38, 2008. [19] Moinuddin K. Qureshi and Zeshan Chishti. Operating secded-based caches at ultra-low voltage with FLAIR. In 2013 43rd Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN), Budapest, Hungary, June 24-27, 2013, pages 1{11. IEEE Computer Society, 2013. [20] Zeshan Chishti, Alaa R. Alameldeen, Chris Wilkerson, Wei Wu, and Shih-Lien Lu. Improving cache lifetime reliability at ultra-low voltages. In David H. Albonesi, Margaret Martonosi, David I. August, and Jose F. Martinez, editors, 42st Annual IEEE/ACM International Symposium on Microarchitecture (MICRO-42 2009), December 12-16, 2009, New York, New York, USA, pages 89{99. ACM, 2009. [21] Alaa R. Alameldeen, Ilya Wagner, Zeshan Chishti, Wei Wu, Chris Wilkerson, and Shih-Lien Lu. Energy-efficient cache design using variable-strength error-correcting codes. In Ravi Iyer, Qing Yang, and Antonio Gonzalez, editors, 38th International Symposium on Computer Architecture (ISCA 2011), June 4-8, 2011, San Jose, CA, USA, pages 461{472. ACM, 2011. [22] Meilin Zhang, Vladimir Stojanovic, and Paul Ampadu. Reliable ultra-low-voltage cache design for many-core systems. IEEE Trans. on Circuits and Systems, 59-II(12):858{862, 2012. [23] Arup Chakraborty, Houman Homayoun, Amin Khajeh, Nikil Dutt, Ahmed M. Eltawil, and Fadi J. Kurdahi. E < MC2: less energy through multi-copy cache. In Vinod Kathail, Reid Tatge, and Rajeev Barua, editors, Proceedings of the 2010 International Conference on Compilers, Architecture, and Synthesis for Embedded Systems, CASES 2010, Scottsdale, AZ, USA, October 24-29, 2010, pages 237{246. ACM, 2010. [24] Gulay Yalcin, Azam Seyedi, Osman S. Unsal, and Adrian Cristal. Flexicache: Highly reliable and low power cache under supply voltage scaling. In Gonzalo Hernandez, Carlos Jaime Barrios Hernandez, Gilberto Diaz, Carlos Garcia Garino, Sergio Nes-machnow, Tomas Perez-Acle, Mario A. Storti, and Mariano Vazquez, editors, High Performance Computing - First HPCLATAM - CLCAR Latin American Joint Conference, CARLA 2014, Valparaiso, Chile, October 20-22, 2014. Proceedings, volume 485 of Communications in Computer and Information Science, pages 173{190. Springer, 2014. [25] Jaume Abella, Javier Carretero, Pedro Chaparro, Xavier Vera, and Antonio Gonzalez. Low vccmin fault-tolerant cache with highly predictable performance. In David H. Albonesi, Margaret Martonosi, David I. August, and Jose F. Martinez, editors, 42st Annual IEEE/ACM International Symposium on Microarchitecture (MICRO-42 2009), December 12-16, 2009, New York, New York, USA, pages 111{121. ACM, 2009. [26] Amin Ansari, Shuguang Feng, Shantanu Gupta, and Scott Mahlke. Archipelago: A polymorphic cache design for enabling robust near-threshold operation. In International Symposium on High Performance Computer Architecture, pages 539{550, 2011. [27] Tayyeb Mahmood, Soontae Kim, and Seokin Hong. Macho: a failure model-oriented adaptive cache architecture to enable near-threshold voltage scaling. In International Symposium on High Performance Computer Architecture, pages 532{541, 2013. [28] Avesta Sasan, Houman Homayoun, Ahmed M. Eltawil, and Fadi Kurdahi. Inquisitive defect cache: A means of combating manufacturing induced process variation. In IEEE Transactions on Very Large Scale Integration (VLSI) Systems, pages 1597{1609, 2011. [29] Farrukh Hijaz, Qingchuan Shi, and Omer Khan. A private level-1 cache architecture to exploit the latency and capacity tradeoffs in multicores operating at near-threshold voltages. In 2013 IEEE 31st International Conference on Computer Design, ICCD 2013, Asheville, NC, USA, October 6-9, 2013, pages 85{92. IEEE Computer Society, 2013. [30] Chun Liu, A. Sivasubramaniam, M. Kandemir, and M. J. Irwin. Enhancing l2 organization for cmps with a center cell. In Proceedings 20th IEEE International Parallel Distributed Processing Symposium, pages 10 pp.{, April 2006. [31] Z. Guz, I. Keidar, A. Kolodny, and U. Weiser. Nahalal: Cache organization for chip multiprocessors. IEEE Computer Architecture Letters, 6(1):21{24, Jan 2007. [32] P. Tsai, N. Beckmann, and D. Sanchez. Jenga: Software-defined cache hierarchies. In 2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA), pages 652{665, June 2017. [33] Z. Radovic and E. Hagersten. Efficient synchronization for nonuniform communication architectures. In SC '02: Proceedings of the 2002 ACM/IEEE Conference on Supercomputing, pages 13{13, Nov 2002. [34] Liqun Cheng, N. Muralimanohar, K. Ramani, R. Balasubramonian, and J. B. Carter. Interconnect-aware coherence protocols for chip multiprocessors. In 33rd International Symposium on Computer Architecture (ISCA'06), pages 339{351, June 2006. [35] E. Bolotin, Z. Guz, I. Cidon, R. Ginosar, and A. Kolodny. The power of priority: Noc based distributed cache coherency. In First International Symposium on Networks-on-Chip (NOCS'07), pages 117{126, May 2007. [36] D. S. Gracia, G. Dimitrakopoulos, T. M. Arnal, M. G. H. Katevenis, and V. V. Yufera. Lp-nuca: Networks-in-cache for high-performance low-power embedded processors. IEEE Transactions on Very Large Scale Integration (VLSI) Systems, 20(8):1510{1523, Aug 2012. [37] S. K. Sadasivam, B. W. Thompto, R. Kalla, and W. J. Starke. Ibm power9 processor architecture. IEEE Micro, 37(2):40{51, Mar 2017. [38] Z. Chishti, M. D. Powell, and T. N. Vijaykumar. Distance associativity for high-performance energy-efficient non-uniform cache architectures. In Proceedings. 36th Annual IEEE/ACM International Symposium on Microarchitecture, 2003. MICRO-36., pages 55{66, Dec 2003. [39] J. Huh, C. Kim, H. Shafi, L. Zhang, D. Burger, and S. W. Keckler. A nuca substrate for exible cmp cache sharing. IEEE Transactions on Parallel and Distributed Systems, 18(8):1028{1040, Aug 2007. [40] Seongbeom Kim, Dhruba Chandra, and Yan Solihin. Fair cache sharing and partitioning in a chip multiprocessor architecture. In Proceedings of the 13th International Conference on Parallel Architectures and Compilation Techniques, PACT '04, page 111{122, USA, 2004. IEEE Computer Society. [41] Thomas Y. Yeh and Glenn Reinman. Fast and fair: data-stream quality of service. In Thomas M. Conte, Paolo Faraboschi, William H. Mangione-Smith, and Walid A. Najjar, editors, Proceedings of the 2005 International Conference on Compilers, Architecture, and Synthesis for Embedded Systems, CASES 2005, San Francisco, California, USA, September 24-27, 2005, pages 237{248. ACM, 2005. [42] Javier Lira, Carlos Molina, and Antonio Gonzalez. Lru-pea: A smart replacement policy for non-uniform cache architectures on chip multiprocessors. 2009 IEEE International Conference on Computer Design, pages 275{281, 2009. [43] P. Tsai, N. Beckmann, and D. Sanchez. Nexus: A new approach to replication in distributed shared caches. In 2017 26th International Conference on Parallel Architectures and Compilation Techniques (PACT), pages 166{179, Sep. 2017. [44] Qianqian Wu and Zhenzhou Ji. A reuse-degree based locality classifier for locality-aware data replication. IEEE Access, PP:1{1, 12 2019. [45] J. Merino, V. Puente, and J. A. Gregorio. Esp-nuca: A low-cost adaptive non-uniform cache architecture. In HPCA - 16 2010 The Sixteenth International Symposium on High-Performance Computer Architecture, pages 1{10, Jan 2010. [46] S. Akioka, F. Li, K. Malkowski, P. Raghavan, M. Kandemir, and M. J. Irwin. Ring data location prediction scheme for non-uniform cache architectures. In 2008 IEEE International Conference on Computer Design, pages 693{698, Oct 2008. [47] S. Das and H. K. Kapoor. Towards a better cache utilization by selective data storage for cmp last level caches. In 2016 29th International Conference on VLSI Design and 2016 15th International Conference on Embedded Systems (VLSID), pages 92{97, Jan 2016. [48] S. Chen, L. Huang, and S. Li. An address remapping algorithm to reduce power consumption in noc-based chip-multiprocessors. In 2016 International SoC Design Conference (ISOCC), pages 209{210, Oct 2016. [49] K. Shyam and R. Govindarajan. An array allocation scheme for energy reduction in partitioned memory architectures. In Shriram Krishnamurthi and Martin Odersky, editors, Compiler Construction, pages 32{47, Berlin, Heidelberg, 2007. Springer Berlin Heidelberg. [50] Shounak Chakraborty and Hemangee K. Kapoor. Performance linked dynamic cache tuning: A static energy reduction approach in tiled cmps. Microprocessors and Microsystems, 52:221 { 235, 2017. [51] Y. Wang, L. Zhang, Y. Han, H. Li, and X. Li. Data remapping for static nuca in degradable chip multiprocessors. IEEE Transactions on Very Large Scale Integration (VLSI) Systems, 23(5):879{892, May 2015. [52] A. Tretter, P. Kumar, and L. Thiele. Interleaved multi-bank scratchpad memories: A probabilistic description of access con icts. In 2015 52nd ACM/EDAC/IEEE Design Automation Conference (DAC), pages 1{6, June 2015. [53] Y. Ding and W. Zhang. Wcet analysis of static nuca caches. In 2014 IEEE 33rd International Performance Computing and Communications Conference (IPCCC), pages 1{6, Dec 2014. [54] B. Ramakrishna Rau. Pseudo-randomly interleaved memory. In IN PROCEEDINGS OF THE 18TH ANNUAL INTERNATIONAL SYMPOSIUM ON COMPUTER ARCHITECTURE, pages 74{83, 1991. [55] Andre Seznec. Bank-interleaved cache or memory indexing does not require euclidean division. In 11th Annual Workshop on Duplicating, Deconstructing and Debunking, Portland, United States, June 2015. [56] Christophe Giacomotto, Mandeep Singh, Milena Vratonjic, and Vojin G. Oklobdz-ija. Energy efficiency of power-gating in low-power clocked storage elements. In 18th International Workshop on Integrated Circuit and System Design. Power and Timing Modeling, Optimization and Simulation - Volume 5349, PATMOS 2008, pages 268{276, Berlin, Heidelberg, 2009. Springer-Verlag. [57] Orhan Kislal, Jagadish Kotra, Xulong Tang, Mahmut Taylan Kandemir, and Myoung-soo Jung. Enhancing computation-to-core assignment with physical location information. SIGPLAN Not., 53(4):312{327, June 2018. [58] S. Cho and L. Jin. Managing distributed, shared l2 caches through os-level page allocation. In 2006 39th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO'06), pages 455{468, Dec 2006. [59] Aamer Jaleel, Kevin B. Theobald, Simon C. Steely Jr., and Joel S. Emer. High performance cache replacement using re-reference interval prediction (RRIP). In Andre Seznec, Uri C. Weiser, and Ronny Ronen, editors, 37th International Symposium on Computer Architecture (ISCA 2010), June 19-23, 2010, Saint-Malo, France, pages 60{71. ACM, 2010. [60] L. A. Belady. A study of replacement algorithms for a virtual-storage computer. IBM Systems Journal, 5(2):78{101, 1966. [61] Elizabeth J. O'Neil, Patrick E. O'Neil, and Gerhard Weikum. An optimality proof of the lru-K page replacement algorithm. J. ACM, 46(1):92{112, 1999. [62] John L. Henning. SPEC CPU2000: measuring CPU performance in the new millennium. IEEE Computer, 33(7):28{35, 2000. [63] Hussein Al-Zoubi, Aleksandar Milenkovic, and Milena Milenkovic. Performance evaluation of cache replacement policies for the spec cpu2000 benchmark suite. In Proceedings of the 42Nd Annual Southeast Regional Conference, ACM-SE 42, pages 267{272, New York, NY, USA, 2004. ACM. [64] Moinuddin K. Qureshi, Aamer Jaleel, Yale N. Patt, Simon C. Steely, and Joel Emer. Adaptive insertion policies for high performance caching. In Proceedings of the 34th Annual International Symposium on Computer Architecture, ISCA '07, pages 381{391, New York, NY, USA, 2007. ACM. [65] C. Wu, A. Jaleel, W. Hasenplaugh, M. Martonosi, S. C. Steely, and J. Emer. Ship: Signature-based hit predictor for high performance caching. In 2011 44th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO), pages 430{441, Dec 2011. [66] Zhan Shi, Xiangru Huang, Akanksha Jain, and Calvin Lin. Applying deep learning to the cache replacement problem. In Proceedings of the 52nd Annual IEEE/ACM International Symposium on Microarchitecture, MICRO '52, page 413{425, New York, NY, USA, 2019. Association for Computing Machinery. [67] David Molnar, Matt Piotrowski, David Schultz, and David Wagner. The program counter security model: Automatic detection and removal of controlow side channel attacks. In Dong Ho Won and Seungjoo Kim, editors, Information Security and Cryptology - ICISC 2005, pages 156{168, Berlin, Heidelberg, 2006. Springer Berlin Heidelberg. [68] Santosh D. Chede and Kishore D. Kulat. Design overview of processor based implantable pacemaker. JCP, 3(8):49{57, 2008. [69] Sparsh Mittal. A survey of techniques for improving energy efficiency in embedded computing systems. IJCAET, 6(4):440{459, 2014. [70] Li Shang, Li-Shiuan Peh, and Niraj K. Jha. Dynamic voltage scaling with links for power optimization of interconnection networks. In The Ninth International Symposium on High-Performance Computer Architecture, pages 91{102, 2003. [71] Asit K. Mishra, Reetuparna Das, Soumya Eachempati, Ravi Iyer, N. Vijaykrishnan, and Chita R. Das. A case for dynamic frequency tuning in on-chip networks. In Proceedings of the 42Nd Annual IEEE/ACM International Symposium on Microarchitecture, MICRO 42, pages 292{303, New York, NY, USA, 2009. ACM. [72] Evert Seevinck, Frans J. List, and Jan Lohstroh. Static-noise margin analysis of mos sram cells. In IEEE Journal of Solid-State Circuits, pages 748{754, 1987. [73] Grigorios Magklis, Michael L. Scott, Greg Semeraro, David H. Albonesi, and Steven Dropsho. Profile-based dynamic voltage and frequency scaling for a multiple clock domain microprocessor. In Proceedings of the 30th Annual International Symposium on Computer Architecture, ISCA '03, pages 14{27, New York, NY, USA, 2003. ACM. [74] Canturk Isci, Gilberto Contreras, and Margaret Martonosi. Live, runtime phase monitoring and prediction on real systems with application to dynamic power management. In Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture, MICRO 39, pages 359{370, Washington, DC, USA, 2006. IEEE Computer Society. [75] Gaurav Dhiman and Tajana Simunic Rosing. Dynamic voltage frequency scaling for multi-tasking systems using online learning. In Diana Marculescu, Anand Raghunathan, Ali Keshavarzi, and Vijaykrishnan Narayanan, editors, Proceedings of the 2007 International Symposium on Low Power Electronics and Design, 2007, Portland, OR, USA, August 27-29, 2007, pages 207{212. ACM, 2007. [76] Xi Chen, Zheng Xu, Hyungjun Kim, Paul Gratz, Jiang Hu, Michael Kishinevsky, and Umit Ogras. In-network monitoring and control policy for dvfs of cmp networks-on-chip and last level caches. ACM Trans. Des. Autom. Electron. Syst., 18(4):47:1{47:21, October 2013. [77] Christian Poellabauer, Leo Singleton, and Karsten Schwan. Feedback-based dynamic voltage and frequency scaling for memory-bound real-time applications. In Real Time and Embedded Technology and Applications Symposium, pages 234{243, 2005. [78] Xing Fu, Khairul Kabir, and Xiaorui Wang. Cache-aware utilization control for energy efficiency in multi-core real-time systems. In Karl-Erik Arzen, editor, 23rd Euromicro Conference on Real-Time Systems, ECRTS 2011, Porto, Portugal, 5-8 July, 2011, pages 102{111. IEEE Computer Society, 2011. [79] Toshikazu Suzuki, Yoshinobu Yamagami, Ichiro Hatanaka, Akinori Shibayama, Hironori Akamatsu, and Hiroyuki Yamauchi. A sub-0.5-v operating embedded sram featuring a multi-bit-error-immune hidden-ecc scheme. In IEEE Journal of Solid-State Circuits, pages 152{160, 2006. [80] Toshikazu Suzuki, Yoshinobu Yamagami, Ichiro Hatanaka, Akinori Shibayama, Hironori Akamatsu, and Hiroyuki Yamauchi. Fault-containment in cache memories for tmr redundant processor systems. In IEEE Transactions on Computers, pages 386{397, 2002. [81] Shunsuke Okumura, Shusuke Yoshimoto, Kosuke Yamaguchi, Yohei Nakata, Hiroshi Kawaguchi, and Masahiko Yoshimoto. 7t SRAM enabling low-energy simultaneous block copy. In Jacqueline Snyder, Rakesh Patel, and Tom Andre, editors, IEEE Custom Integrated Circuits Conference, CICC 2010, San Jose, California, USA, 19-22 September, 2010, Proceedings, pages 1{4. IEEE, 2010. [82] Jinwook Jung, Yohei Nakata, Shunsuke Okumura, Hiroshi Kawaguchi, and Masahiko Yoshimoto. Reconfiguring cache associativity: Adaptive cache design for wide-range reliable low-voltage operation using 7t/14t SRAM. IEICE Transactions, 96-C(4):528{537, 2013. [83] Henry Cook, Miquel Moreto, Sarah Bird, Khanh Dao, David A. Patterson, and Krste Asanovic. A hardware evaluation of cache partitioning to improve utilization and energy-efficiency while preserving responsiveness. In Avi Mendelson, editor, The 40th Annual International Symposium on Computer Architecture, ISCA'13, Tel-Aviv, Israel, June 23-27, 2013, pages 308{319. ACM, 2013. [84] Milo M. K. Martin, Daniel J. Sorin, Bradford M. Beckmann, Michael R. Marty, Min Xu, Alaa R. Alameldeen, Kevin E. Moore, Mark D. Hill, and David A. Wood. Multifacet's general execution-driven multiprocessor simulator (gems) toolset. In Computer Architecture News, 2005. [85] Radu David, Paul Bogdan, and Radu Marculescu. Dynamic power management for multicores: Case study using the intel scc. In International Conference on VLSI and System-on-Chip, pages 147{152, 2012. [86] Yoongu Kim, Michael Papamichael, Onur Mutlu, and Mor Harchol-Balter. Thread cluster memory scheduling: Exploiting differences in memory access behavior. In Proceedings of the 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture, MICRO '43, pages 65{76, Washington, DC, USA, 2010. IEEE Computer Society. [87] Andrea Bartolini, Matteo Cacciari, Andrea Tilli, Luca Benini, and Matthias Gries. A virtual platform environment for exploring power, thermal and reliability management control strategies in high-performance multicores. In Proceedings of the 20th Symposium on Great Lakes Symposium on VLSI, GLSVLSI '10, pages 311{316, New York, NY, USA, 2010. ACM. [88] Yohei Nakata, Shunsuke Okumura, Hiroshi Kawaguchi, and Masahiko Yoshimoto. 0.5-v operation variation-aware word-enhancing cache architecture using 7t/14t hybrid sram. In Proceedings of the 16th ACM/IEEE International Symposium on Low Power Electronics and Design, ISLPED '10, pages 219{224, New York, NY, USA, 2010. ACM. [89] A. Boro, B. Thomas, S. R. Ahamed, and G. Trivedi. Fpga implementation of a dedicated processor for temperature prediction. In 2016 International Conference on Accessibility to Digital World (ICADW), pages 21{26, Dec 2016. [90] Moustafa Alzantot, Yingnan Wang, Zhengshuang Ren, and Mani B. Srivastava. Rstensor ow: GPU enabled tensor ow for deep learning on commodity android devices. In Proceedings of the 1st International Workshop on Embedded and Mobile Deep Learning (Deep Learning for Mobile Systems and Applications), EMDL@MobiSys 2017, Niagara Falls, NY, USA, June 23, 2017, pages 7{12. ACM, 2017. [91] H. Ramchoun, M. A. Janati Idrissi, Y. Ghanou, and M. Ettaouil. Multilayer perceptron: Architecture optimization and training with mixed activation functions. In Proceedings of the 2Nd International Conference on Big Data, Cloud and Applications, BDCA'17, pages 71:1{71:6, New York, NY, USA, 2017. ACM. [92] Y. Lecun, L. Bottou, Y. Bengio, and P. Haffner. Gradient-based learning applied to document recognition. Proceedings of the IEEE, 86(11):2278{2324, Nov 1998. [93] J. R. Diamond, D. S. Fussell, and S. W. Keckler. Arbitrary modulus indexing. In 2014 47th Annual IEEE/ACM International Symposium on Microarchitecture, pages 140{152, Dec 2014. [94] Yaming Yin, Shuming Chen, and Xiao Hu. Input buffer planning for network-on-chip router design. In 2010 International Conference on Computer Application and System Modeling (ICCASM 2010), volume 13, pages V13{201{V13{204, 2010. [95] Nathan Binkert, Bradford Beckmann, Gabriel Black, Steven K. Reinhardt, Ali Saidi, Arkaprava Basu, Joel Hestness, Derek R. Hower, Tushar Krishna, Somayeh Sardashti, Rathijit Sen, Korey Sewell, Muhammad Shoaib, Nilay Vaish, Mark D. Hill, and David A. Wood. The gem5 simulator. SIGARCH Comput. Archit. News, 39(2):1{7, August 2011. [96] Saugata Ghose, Tianshi Li, Nastaran Hajinazar, Damla Senol Cali, and Onur Mutlu. Understanding the interactions of workloads and DRAM types: A comprehensive experimental study. CoRR, abs/1902.07609, 2019. [97] Bakiri Mohammed, Christophe Guyeux, Jean-Francois Couchot, and Abdelkrim Oud-jida. Survey on hardware implementation of random number generators on fpga: Theory and experimental analyses. Computer Science Review, 27:135{153, 02 2018. [98] Himanshu Bhatnagar. Advanced ASIC Chip Synthesis: Using Synopsys Design Compiler Physical Compiler and PrimeTime. Kluwer Academic Publishers, Norwell, MA, USA, 2nd edition, 2002. [99] TSMC. 0.13-micron technology | taiwan semiconductor manufacturing company. https://www.tsmc.com/english/dedicatedFoundry/technology/0.13um. htm, 2019. Accessed: 2019-06-12. [100] Shrut Patel. Cache-implementation. https://github.com/shrut1996/ Cache-Implementation, 2017. [101] Millind Mittal. Computer system and method of allocating cache memories in a multilevel cache hierarchy utilizing a locality hint within an instruction, October 27 1998. US Patent 5,829,025. [102] DavidWonnacott. Achieving scalable locality with time skewing. International Journal of Parallel Programming, 30:2002, 1999. [103] Wim Heirman, Kristof Du Bois, Yves Vandriessche, Stijn Eyerman, and Ibrahim Hur. Near-side prefetch throttling: Adaptive prefetching for high-performance many-core processors. In Proceedings of the 27th International Conference on Parallel Architectures and Compilation Techniques, PACT '18, pages 28:1{28:11, New York, NY, USA, 2018. ACM. [104] ARM. Arm information center. http://infocenter.arm.com/help/index.jsp, 2019. Accessed: 2019-05-23. [105] Andes. Andes technology corporation. http://www.andestech.com/en/homepage/, 2019. Accessed: 2019-06-13. [106] RISC-V International Archive. Risc v cores and socs. https://github.com/riscv/ riscv-wiki/wiki/RISC-V-Cores-and-SoCs, 2019. Accessed: 2019-05-23. [107] Moritz Lipp, Michael Schwarz, Daniel Gruss, Thomas Prescher, Werner Haas, Anders Fogh, Jann Horn, Stefan Mangard, Paul Kocher, Daniel Genkin, Yuval Yarom, and Mike Hamburg. Meltdown: Reading kernel memory from user space. In 27th USENIX Security Symposium (USENIX Security 18), 2018. [108] Paul Kocher, Jann Horn, Anders Fogh, , Daniel Genkin, Daniel Gruss, Werner Haas, Mike Hamburg, Moritz Lipp, Stefan Mangard, Thomas Prescher, Michael Schwarz, and Yuval Yarom. Spectre attacks: Exploiting speculative execution. In 40th IEEE Symposium on Security and Privacy (S&P'19), 2019. |