Levin, Esther, Roberto Pieraccini, and Wieland Eckert. "Using Markov decision process for learning dialogue strategies." Acoustics, Speech and Signal Processing(1998)
Jurcicek, Filip, et al. "Natural belief-critic: a reinforcement algorithm for parameter estimation in statistical spoken dialogue systems.",INTERSPEECH(2010)
Shalabh Bhatnagar†, Richard S. Sutton‡, Mohammad Ghavamzadeh§, and Mark Lee¶, “Natural Actor–Critic Algorithms”, Automatica 45.11 (2009)
Watkins, Christopher JCH, and Peter Dayan. "Q-learning." Machine learning8.3-4(1992)
Janssen, Christian P., and Wayne D. Gray. "When, What, and How Much to Reward in Reinforcement Learning‐Based Models of Cognition." Cognitive science 36.2 (2012)
Jurčíček, Filip, Blaise Thomson, and Steve Young. "Reinforcement learning for parameter estimation in statistical spoken dialogue systems." Computer Speech & Language 26.3 (2012)
Syed, Umar Ali. " Reinforcement learning without rewards " Princeton University, (2010)
Abbeel, Pieter, and Andrew Y. Ng. "Apprenticeship learning via inverse reinforcement learning." Proceedings of the twenty-first international conference on Machine learning. ACM,(2004)
Jurčíček, Filip, Blaise Thomson, and Steve Young. "Natural actor and belief critic: Reinforcement algorithm for learning parameters of dialogue systems modelled as POMDPs." ACM Transactions on Speech and Language Processing (TSLP) 7.3 (2011)
Chen, Guan-Yi, Edward Chao-Chun Kao, and Von-Wun Soo. "Learning Interrogation Strategies while Considering Deceptions in Detective Interactive Stories." AIIDE (2013)
Arora, Sanjeev, Elad Hazan, and Satyen Kale. "The Multiplicative Weights Update Method: a Meta-Algorithm and Applications." Theory of Computing 8.1 (2012)