帳號:guest(          離開系統
字體大小: 字級放大   字級縮小   預設字形  


作者(外文):Chen, Hsu-Chen
論文名稱(中文):利用預測性報酬來學習資訊檢索 對話的產生
論文名稱(外文):Learning To Generate Information Retrieval Dialogues Using Prediction Rewards
指導教授(外文):Soo, Von-Wun
口試委員(外文):Shih, Wei-Kuan
Chen, Hwann-Tzong
外文關鍵詞:Information retrieval dialoguesReinforcement learningRewardPrediction
  • 推薦推薦:0
  • 點閱點閱:195
  • 評分評分:*****
  • 下載下載:2
  • 收藏收藏:0

Reinforcement learning has been used in dialogue systems for several years. However, most of research focuses on how to improve the efficiency and accuracy performance by developing new learning algorithms. Some papers discuss the learning performance with respect to a reward schema, but do not address how rewards can be designed automatically in a more general way. In fact, designing a reward schema in reinforcement learning is not a trivial task. Some problem domains do not have specific cost and reward for reinforcement learning, the designers must conceive a reward schema that can provide with proper reward and cost during problem solving process in order to guide the learning system to acquire the optimal policy for actions.
In our information retrieval dialogue system for restaurants, we wish to train the information retrieval system to generate a proper dialogue in order to identify users’ true intention quickly and response with the correct information that users want. We propose a prediction model for user’s preference to help us. The idea of prediction is based on the data dependency as to how much a dialogue action may reduce quantity of information under retrieval and how likely it may be effective to find the information in terms of probability. In this paper, we demonstrate a new method designing a reward schema which is based on this prediction model that can automatically provide rewards for reinforcement learning. 
1 Introduction
1.1 Background
1.2 Motivation
1.3 Related work
1.4 Goals of research
2 A restaurant information retrieval dialogue system
2.1 System architecture
2.2 Database contents
2.3 System states and actions
2.4 Q-learning algorithm
3 A prediction model for the reward schema
3.1 A prediction model
3.2 Algorithm for constructing a prediction model
3.3 The rationales in giving rewards
3.4 A Reward schema
4 Evaluation
4.1 Experiments
4.2 Experiment 1
4.3 Experiment 2
4.4 Experiment 3
5 Conclusion and Future work
5.1 Conclusion
5.2 Future work
Levin, Esther, Roberto Pieraccini, and Wieland Eckert. "Using Markov decision process for learning dialogue strategies." Acoustics, Speech and Signal Processing(1998)

Jurcicek, Filip, et al. "Natural belief-critic: a reinforcement algorithm for parameter estimation in statistical spoken dialogue systems.",INTERSPEECH(2010)

Shalabh Bhatnagar†, Richard S. Sutton‡, Mohammad Ghavamzadeh§, and Mark Lee¶, “Natural Actor–Critic Algorithms”, Automatica 45.11 (2009)

Watkins, Christopher JCH, and Peter Dayan. "Q-learning." Machine learning8.3-4(1992)

Janssen, Christian P., and Wayne D. Gray. "When, What, and How Much to Reward in Reinforcement Learning‐Based Models of Cognition." Cognitive science 36.2 (2012)

Jurčíček, Filip, Blaise Thomson, and Steve Young. "Reinforcement learning for parameter estimation in statistical spoken dialogue systems." Computer Speech & Language 26.3 (2012)

Syed, Umar Ali. " Reinforcement learning without rewards " Princeton University, (2010)

Abbeel, Pieter, and Andrew Y. Ng. "Apprenticeship learning via inverse reinforcement learning." Proceedings of the twenty-first international conference on Machine learning. ACM,(2004)

Jurčíček, Filip, Blaise Thomson, and Steve Young. "Natural actor and belief critic: Reinforcement algorithm for learning parameters of dialogue systems modelled as POMDPs." ACM Transactions on Speech and Language Processing (TSLP) 7.3 (2011)

Chen, Guan-Yi, Edward Chao-Chun Kao, and Von-Wun Soo. "Learning Interrogation Strategies while Considering Deceptions in Detective Interactive Stories." AIIDE (2013)

Arora, Sanjeev, Elad Hazan, and Satyen Kale. "The Multiplicative Weights Update Method: a Meta-Algorithm and Applications." Theory of Computing 8.1 (2012)
第一頁 上一頁 下一頁 最後一頁 top
* *