|
[1] International technology roadmap for semiconductors (itrs) 2.0 executive report,2015. https://goo.gl/HVUwRj. [2] K. K. V. M. Brendan O’Donoghue, Remi Munos. Combining policy gradient and q-learning. In International Conference on Learning Representations, ICLR’17, 2017. [3] Y.-G. Chen, W.-Y. Wen, T. Wang, Y. Shi, and S.-C. Chang. Q-learning based dynamic voltage scaling for designs with graceful degradation. In Proceedings of the 2015 Symposium on International Symposium on Physical Design (ISPD), 2015. [4] H. Hantao, P. D. S. Manoj, D. Xu, H. Yu, and Z. Hao. Reinforcement learning based self-adaptive voltage-swing adjustment of 2.5d i/os for many-core microprocessor and memory communication. In Proceedings of the 2014 IEEE/ACM International Conference on Computer-Aided Design, ICCAD ’14, pages 224–229, 2014. [5] D.-C. Juan and D. Marculescu. Power-aware performance increase via core/uncorereinforcement control for chip-multiprocessors. In 2012 ACM/IEEE international symposium on Low power electronics and design (ISLPED), pages 97–102, 2012. [6] D. P. Kingma and J. Ba. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980, 2014. [7] L. Li, W. Chu, J. Langford, and R. E. Schapire. A contextual-bandit approach to personalized news article recommendation. In Proceedings of the 19th International Conference on World Wide Web, WWW ’10, pages 661–670. ACM, 2010. [8] Y. Li. Deep reinforcement learning: An overview. CoRR, abs/1701.07274, 2017. [9] V. Mnih, K. Kavukcuoglu, D. Silver, A. Graves, I. Antonoglou, D. Wierstra, and M. A. Riedmiller. Playing atari with deep reinforcement learning. CoRR, abs/1312.5602, 2013. [10] V. Mnih, K. Kavukcuoglu, D. Silver, A. A. Rusu, J. Veness, M. G. Bellemare, A. Graves, M. Riedmiller, A. K. Fidjeland, G. Ostrovski, et al. Human-level control through deep reinforcement learning. Nature, 518(7540):529, 2015. [11] O. Nachum, M. Norouzi, K. Xu, and D. Schuurmans. Bridging the gap between value and policy based reinforcement learning. In Advances in Neural Information Processing Systems, 2017. [12] I. Osband, C. Blundell, A. Pritzel, and B. Van Roy. Deep exploration via bootstrapped dqn. In D. D. Lee, M. Sugiyama, U. V. Luxburg, I. Guyon, and R. Garnett, editors, Advances in Neural Information Processing Systems 29, pages 4026–4034. Curran Associates, Inc., 2016. [13] J. Peters and S. Schaal. Policy gradient methods for robotics. In 2006 IEEE/RSJ International Conference on Intelligent Robots and Systems, pages 2219–2225, 2006. [14] S. Roy, M. Choudhury, R. Puri, and D. Z. Pan. Towards optimal performance-area trade-off in adders by synthesis of parallel prefix structures. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 33(10):1517–1530, 2014. [15] H. Shen, J. Lu, and Q. Qiu. Learning based dvfs for simultaneous temperature, performance and energy management. In Thirteenth International Symposium on Quality Electronic Design (ISQED), pages 747–754, 2012. [16] B. C. Stadie, S. Levine, and P. Abbeel. Incentivizing exploration in reinforcement learning with deep predictive models. CoRR, abs/1507.00814, 2015. [17] R. S. Sutton and A. G. Barto. Reinforcement learning: An introduction, volume 1. MIT press Cambridge, 1998. [18] R. S. Sutton, D. McAllester, S. Singh, and Y. Mansour. Policy gradient methods for reinforcement learning with function approximation. In Proceedings of the 12th International Conference on Neural Information Processing Systems, NIPS’99, pages 1057–1063. MIT Press, 1999. [19] Synopsys. Design Compiler User Guide, 2010. http://eclass.uth.gr/eclass/modules/document/index.php?course=MHX303&download=/5346dc69nktr/5346dc86FWh3.pdf. [20] H. Tang, R. Houthooft, D. Foote, A. Stooke, O. X. Chen, Y. Duan, J. Schulman, F. DeTurck, and P. Abbeel. # exploration: A study of count-based exploration for deep reinforcement learning. In Advances in Neural Information Processing Systems, pages 2750–2759, 2017. [21] K. Tanigawa and T. Hironaka. Design consideration for reconfigurable processor ds-hietrade-off between performance and chip area. In SoC Design Conference (ISOCC), 2011 International, pages 187–190. IEEE, 2011. [22] G. Theocharous, P. S. Thomas, and M. Ghavamzadeh. Personalized ad recommendation systems for life-time value optimization with guarantees. In Proceedings of the 24th International Conference on Artificial Intelligence, IJCAI’15, pages 1806–1812. AAAI Press, 2015. [23] D. Xu, N. Yu, H. Huang, P. D. S. Manoj, and H. Yu. Q-learning-based voltageswing tuning and compensation for 2.5-d memory-logic integration. IEEE Design Test, 35(2):91–99, April 2018. [24] R. Ye and Q. Xu. Learning-based power management for multi-core processors via idle period manipulation. In Asia and South Pacific Design Automation Conference(ASPDAC), pages 115–120, 2012. [25] D. Zeng, K. Liu, S. Lai, G. Zhou, and J. Zhao. Relation classification via convolutional deep neural network. In Proceedings of COLING 2014, the 25th International Conference on Computational Linguistics: Technical Papers, pages 2335–2344, 2014. [26] B. Zoph and Q. V. Le. Neural architecture search with reinforcement learning. arXiv preprint arXiv:1611.01578, 2016. |