利用預測性報酬來學習資訊檢索對話的產生__國立清華大學博碩士論文全文影像系統

帳號：guest(3.138.116.1) 離開系統

字體大小：

詳目顯示

第 1 筆 / 共 1 筆

/1頁

以作者查詢圖書館館藏

、以作者查詢臺灣博碩士論文系統

、以作者查詢全國書目

論文基本資料
摘要
外文摘要
論文目次
參考文獻
電子全文

作者(中文):	陳旭晨
作者(外文):	Chen, Hsu-Chen
論文名稱(中文):	利用預測性報酬來學習資訊檢索對話的產生
論文名稱(外文):	Learning To Generate Information Retrieval Dialogues Using Prediction Rewards
指導教授(中文):	蘇豐文
指導教授(外文):	Soo, Von-Wun
口試委員(中文):	石維寬陳煥宗
口試委員(外文):	Shih, Wei-Kuan Chen, Hwann-Tzong
學位類別:	碩士
校院名稱:	國立清華大學
系所名稱:	資訊工程學系
學號:	101062514
出版年(民國):	103
畢業學年度:	102
語文別:	英文
論文頁數:	30
中文關鍵詞:	資訊檢索對話、增強式學習、報酬、預測
外文關鍵詞:	Information retrieval dialogues、Reinforcement learning、Reward、Prediction
相關次數:	推薦:0 點閱:195 評分: 下載:2 收藏:0

增強式學習被使用在對話系統已經好幾年，但多數的文章都在探討如何研發一個新的演算法用來增進效率與正確性。有一些的文章提到如何使用報酬機制來改善學習的效率卻沒有設計成可以自動化給報酬的方式。在增強式學習中，設計一個好的報酬機制不是一件簡單的事。在某些問題領域中，並沒有已經設計好的報酬機制可以直接套用在增強式學習中，設計者必須絞盡腦汁事想出一個完美的報酬機制，才能讓系統學習到最佳的對話模式。
在餐廳資訊檢索對話系統中，我們希望訓練出一個能迅速判別使用者意圖並且採取恰當對話方式的系統。我們將會建立一個預測模型來達成這個目標。預測的思想是利用資料依存性，以對話的動作可以減少多少訊息量的檢索和如何有效的找到所需的資訊做機率的計算。在這篇文章中，我們提出一個根據預測模型所設計的報酬機制自動化新方法。

Reinforcement learning has been used in dialogue systems for several years. However, most of research focuses on how to improve the efficiency and accuracy performance by developing new learning algorithms. Some papers discuss the learning performance with respect to a reward schema, but do not address how rewards can be designed automatically in a more general way. In fact, designing a reward schema in reinforcement learning is not a trivial task. Some problem domains do not have specific cost and reward for reinforcement learning, the designers must conceive a reward schema that can provide with proper reward and cost during problem solving process in order to guide the learning system to acquire the optimal policy for actions.
In our information retrieval dialogue system for restaurants, we wish to train the information retrieval system to generate a proper dialogue in order to identify users’ true intention quickly and response with the correct information that users want. We propose a prediction model for user’s preference to help us. The idea of prediction is based on the data dependency as to how much a dialogue action may reduce quantity of information under retrieval and how likely it may be effective to find the information in terms of probability. In this paper, we demonstrate a new method designing a reward schema which is based on this prediction model that can automatically provide rewards for reinforcement learning.

摘要
Abstract
1 Introduction
1.1 Background
1.2 Motivation
1.3 Related work
1.4 Goals of research
2 A restaurant information retrieval dialogue system
2.1 System architecture
2.2 Database contents
2.3 System states and actions
2.4 Q-learning algorithm
3 A prediction model for the reward schema
3.1 A prediction model
3.2 Algorithm for constructing a prediction model
3.3 The rationales in giving rewards
3.4 A Reward schema
4 Evaluation
4.1 Experiments
4.2 Experiment 1
4.3 Experiment 2
4.4 Experiment 3
5 Conclusion and Future work
5.1 Conclusion
5.2 Future work
References

Levin, Esther, Roberto Pieraccini, and Wieland Eckert. "Using Markov decision process for learning dialogue strategies." Acoustics, Speech and Signal Processing(1998)

Jurcicek, Filip, et al. "Natural belief-critic: a reinforcement algorithm for parameter estimation in statistical spoken dialogue systems.",INTERSPEECH(2010)

Shalabh Bhatnagar†, Richard S. Sutton‡, Mohammad Ghavamzadeh§, and Mark Lee¶, “Natural Actor–Critic Algorithms”, Automatica 45.11 (2009)

Watkins, Christopher JCH, and Peter Dayan. "Q-learning." Machine learning8.3-4(1992)

Janssen, Christian P., and Wayne D. Gray. "When, What, and How Much to Reward in Reinforcement Learning‐Based Models of Cognition." Cognitive science 36.2 (2012)

Jurčíček, Filip, Blaise Thomson, and Steve Young. "Reinforcement learning for parameter estimation in statistical spoken dialogue systems." Computer Speech & Language 26.3 (2012)

Syed, Umar Ali. " Reinforcement learning without rewards " Princeton University, (2010)

Abbeel, Pieter, and Andrew Y. Ng. "Apprenticeship learning via inverse reinforcement learning." Proceedings of the twenty-first international conference on Machine learning. ACM,(2004)

Jurčíček, Filip, Blaise Thomson, and Steve Young. "Natural actor and belief critic: Reinforcement algorithm for learning parameters of dialogue systems modelled as POMDPs." ACM Transactions on Speech and Language Processing (TSLP) 7.3 (2011)

Chen, Guan-Yi, Edward Chao-Chun Kao, and Von-Wun Soo. "Learning Interrogation Strategies while Considering Deceptions in Detective Interactive Stories." AIIDE (2013)

Arora, Sanjeev, Elad Hazan, and Satyen Kale. "The Multiplicative Weights Update Method: a Meta-Algorithm and Applications." Theory of Computing 8.1 (2012)

電子全文
摘要

推文
推薦
評分
引用網址
轉寄

top

詳目顯示

相關論文