作者(外文):Kuo, Yu-Lun
論文名稱(外文):Research and Implementation of Retrieval-based FAQ Question Answering System Using q-Q Semantic Similarity Matching-application for Software Technical Support FAQ
指導教授(外文):Yeh, Wei-Chang
口試委員(外文):Liang, Yun-Chia
Lai, Chyh-Ming
外文關鍵詞:Frequently Asked QuestionsQuestion and AnsweringSentence Bidirectional Encoder Representations from TransformersSemantic Similarity
  鑒於以往用戶在使用企業常見問答集(FAQ, Frequently Asked Questions)的時候,往往只能透過關鍵字搜尋的方式進行檢索,而且需要從大量匹配到關鍵字的結果中,逐一查找是否有符合其詢問意圖的問答,導致搜尋效率不彰。本研究希望透過改善FAQ檢索與匹配方式,將傳統關鍵字搜尋模式改為更能完整描述用戶查詢意圖的問答式查詢(Question Answering),並依據用戶提問的語義快速檢索及篩選相似度最高的FAQ返回給用戶,以提升FAQ服務用戶自助求解的效率與能力。
  本研究以軟體技術開發常見問答集為例,實際為具有英文、繁體中文與簡體中文混雜的特定領域跨語言FAQ知識庫,建置出一個基於語義相似度匹配的FAQ問答系統。以往建立此類問答系統必須熟知自然語言和海量語言資料的訓練才能得到有效的成果,此系統重點著重在節省巨量資料蒐集與訓練的時間,快速部屬FAQ問答系統,將企業網站的FAQ問答集中的問題轉換成包含語義資訊的特徵向量表示,透過比較與用戶提問之間的向量距離,取得語義相似程度評判的依據。本研究除了使用基於變換器的語句雙向編碼器表示技術(SBERT, Sentence Bidirectional Encoder Representation from Transformers)模型實作出基於語義相似度匹配的FAQ問答系統之外,另建立相似問句擴充規則進行優化,在問答的過程中擷取合適的用戶提問與回答作為檢索資料庫標準問題的相似問句擴充,供下一輪問答檢索使用,持續不斷地豐富同一問題的問句描述多樣性。
  研究結果顯示,在個案的91輪實際問答測試過程中,本研究所提之系統與單純使用SBERT模型的方法相比,回答準確率從89.01%提升到94.51%,平均倒數排序(MRR, Mean Reciprocal Rank)也提升5.3%。
In the past, if the users search for a query in a company's frequently asked questions (FAQ), they must use the keyword method, and then search from the abundant results that matching the keyword one by one. This way of searching is inefficient, and the users may need to find the help of a real customer service center. It will cause additional manpower load for a company. In this research, we are dedicated to improving this inefficient searching method. By improving retrieval and matching method in searching a query in FAQ, it can more comprehensively describe the user’s intention by using question answering, instead of using keyword method.
This study takes a cross-language FAQ of software technology development as an example, and actually builds a retrieval-based question and answering system by using semantic similarity matching for a FAQ database with mixed English, traditional Chinese and simplified Chinese descriptions. In the past, people who want to establish such a question answering system must be familiar with the training of natural language and having massive corpus data to obtain effective results. This system focuses on saving the time of huge data collection and training, quickly deploying the FAQ question and answer system. In this research, we not only implement a FAQ question and answering system that using semantic similarity matching, but also establish a question expansion rules for optimization, continuously enrich the diversity of questions in the retrieval database. The results of this research show that the accuracy of the FAQ question answering system proposed in this study was improved by 5.5% and mean reciprocal rank (MRR) by 5.3% compared with purely using Sentence Bidirectional Encoder Representations from Transformers (SBERT) model.
摘要 ................................................i
Abstract ...........................................ii
誌謝 ..............................................iii
目錄 ...............................................iv
圖目錄 .............................................vi
表目錄 ............................................vii
第一章、 緒論 .....................................1
1.1 研究背景 ...................................1
1.2 研究動機與目的 .............................2
1.3 研究架構 ...................................4
第二章、 文獻回顧 ..................................5
2.1 FAQ檢索模式介紹 ............................5
2.2 文本相似度 .................................6
2.3 BERT模型 ...................................7
2.4 SBERT模型 .................................10
2.5 文獻回顧小結 ...............................12
第三章、 研究方法 .................................13
3.1 Dataset .....................................14
3.2 系統設計 ......................................15
3.3 系統處理程序 ...............................16
3.3.1 Flow1:基於SBERT的相似q-Q檢索流程 ..........16
3.3.2 Flow2:檢索資料庫相似問句擴充流程 ...........19
第四章、 結果與分析 ................................22
第五章、 結論 .....................................32
5.1 研究貢獻 ......................................32
5.2 未來展望 ......................................33
參考文獻

