帳號:guest(3.145.206.135)          離開系統
字體大小: 字級放大   字級縮小   預設字形  

詳目顯示

以作者查詢圖書館館藏以作者查詢臺灣博碩士論文系統以作者查詢全國書目
作者(中文):郭于綸
作者(外文):Kuo, Yu-Lun
論文名稱(中文):基於語義相似度匹配的FAQ問答系統實作研究-以軟體技術支援常見問答集為例
論文名稱(外文):Research and Implementation of Retrieval-based FAQ Question Answering System Using q-Q Semantic Similarity Matching-application for Software Technical Support FAQ
指導教授(中文):葉維彰
指導教授(外文):Yeh, Wei-Chang
口試委員(中文):梁韵嘉
賴智明
口試委員(外文):Liang, Yun-Chia
Lai, Chyh-Ming
學位類別:碩士
校院名稱:國立清華大學
系所名稱:工業工程與工程管理學系碩士在職專班
學號:109036501
出版年(民國):111
畢業學年度:110
語文別:中文
論文頁數:36
中文關鍵詞:常見問答集問答式查詢基於變換器的語句雙向編碼器表示技術語義相似度
外文關鍵詞:Frequently Asked QuestionsQuestion and AnsweringSentence Bidirectional Encoder Representations from TransformersSemantic Similarity
相關次數:
  • 推薦推薦:0
  • 點閱點閱:63
  • 評分評分:*****
  • 下載下載:0
  • 收藏收藏:0
  鑒於以往用戶在使用企業常見問答集(FAQ, Frequently Asked Questions)的時候,往往只能透過關鍵字搜尋的方式進行檢索,而且需要從大量匹配到關鍵字的結果中,逐一查找是否有符合其詢問意圖的問答,導致搜尋效率不彰。本研究希望透過改善FAQ檢索與匹配方式,將傳統關鍵字搜尋模式改為更能完整描述用戶查詢意圖的問答式查詢(Question Answering),並依據用戶提問的語義快速檢索及篩選相似度最高的FAQ返回給用戶,以提升FAQ服務用戶自助求解的效率與能力。
  本研究以軟體技術開發常見問答集為例,實際為具有英文、繁體中文與簡體中文混雜的特定領域跨語言FAQ知識庫,建置出一個基於語義相似度匹配的FAQ問答系統。以往建立此類問答系統必須熟知自然語言和海量語言資料的訓練才能得到有效的成果,此系統重點著重在節省巨量資料蒐集與訓練的時間,快速部屬FAQ問答系統,將企業網站的FAQ問答集中的問題轉換成包含語義資訊的特徵向量表示,透過比較與用戶提問之間的向量距離,取得語義相似程度評判的依據。本研究除了使用基於變換器的語句雙向編碼器表示技術(SBERT, Sentence Bidirectional Encoder Representation from Transformers)模型實作出基於語義相似度匹配的FAQ問答系統之外,另建立相似問句擴充規則進行優化,在問答的過程中擷取合適的用戶提問與回答作為檢索資料庫標準問題的相似問句擴充,供下一輪問答檢索使用,持續不斷地豐富同一問題的問句描述多樣性。
  研究結果顯示,在個案的91輪實際問答測試過程中,本研究所提之系統與單純使用SBERT模型的方法相比,回答準確率從89.01%提升到94.51%,平均倒數排序(MRR, Mean Reciprocal Rank)也提升5.3%。
In the past, if the users search for a query in a company's frequently asked questions (FAQ), they must use the keyword method, and then search from the abundant results that matching the keyword one by one. This way of searching is inefficient, and the users may need to find the help of a real customer service center. It will cause additional manpower load for a company. In this research, we are dedicated to improving this inefficient searching method. By improving retrieval and matching method in searching a query in FAQ, it can more comprehensively describe the user’s intention by using question answering, instead of using keyword method.
This study takes a cross-language FAQ of software technology development as an example, and actually builds a retrieval-based question and answering system by using semantic similarity matching for a FAQ database with mixed English, traditional Chinese and simplified Chinese descriptions. In the past, people who want to establish such a question answering system must be familiar with the training of natural language and having massive corpus data to obtain effective results. This system focuses on saving the time of huge data collection and training, quickly deploying the FAQ question and answer system. In this research, we not only implement a FAQ question and answering system that using semantic similarity matching, but also establish a question expansion rules for optimization, continuously enrich the diversity of questions in the retrieval database. The results of this research show that the accuracy of the FAQ question answering system proposed in this study was improved by 5.5% and mean reciprocal rank (MRR) by 5.3% compared with purely using Sentence Bidirectional Encoder Representations from Transformers (SBERT) model.
摘要 ................................................i
Abstract ...........................................ii
誌謝 ..............................................iii
目錄 ...............................................iv
圖目錄 .............................................vi
表目錄 ............................................vii
第一章、 緒論 .....................................1
1.1 研究背景 ...................................1
1.2 研究動機與目的 .............................2
1.3 研究架構 ...................................4
第二章、 文獻回顧 ..................................5
2.1 FAQ檢索模式介紹 ............................5
2.2 文本相似度 .................................6
2.3 BERT模型 ...................................7
2.4 SBERT模型 .................................10
2.5 文獻回顧小結 ...............................12
第三章、 研究方法 .................................13
3.1 Dataset .....................................14
3.2 系統設計 ......................................15
3.3 系統處理程序 ...............................16
3.3.1 Flow1:基於SBERT的相似q-Q檢索流程 ..........16
3.3.2 Flow2:檢索資料庫相似問句擴充流程 ...........19
第四章、 結果與分析 ................................22
第五章、 結論 .....................................32
5.1 研究貢獻 ......................................32
5.2 未來展望 ......................................33
參考文獻 .............................................34

[1] Hu, W.-C., D.-F. Yu, and H.C. Jiau. A faq finding process in open source project forums. in 2010 Fifth International Conference on Software Engineering Advances. 2010. IEEE.
[2] Sneiders, E. Automated FAQ answering with question-specific knowledge representation for web self-service. in 2009 2nd Conference on Human System Interactions. 2009. IEEE.
[3] Good, B.M., J. Hartzell, and R.S. Tosten. A Smarter FAQ. in IKE. 2006. Citeseer.
[4] Wu, C.-H., J.-F. Yeh, and M.-J. Chen, Domain-specific FAQ retrieval using independent aspects. ACM Transactions on Asian Language Information Processing (TALIP), 2005. 4(1): p. 1-17.
[5] Mao, J. and J. Zhu. FAQ Auto Constructing Based on Clustering. in 2012 International Conference on Computer Science and Electronics Engineering. 2012. IEEE.
[6] Wu, C.-H., J.-F. Yeh, and Y.-S. Lai, Semantic segment extraction and matching for internet FAQ retrieval. IEEE transactions on knowledge and data engineering, 2006. 18(7): p. 930-940.
[7] Oyama, S., T. Kokubo, and T. Ishida, Domain-specific web search with keyword spices. IEEE Transactions on knowledge and data engineering, 2004. 16(1): p. 17-27.
[8] Shaw, A., Using Chatbots to Easily Create Interactive and Intelligent FAQ Webpages. Journal of Applied Global Research, 2012. 5(15).
[9] Sneiders, E. Automated faq answering: Continued experience with shallow language understanding. in Question Answering Systems. Papers from the 1999 AAAI Fall Symposium. 1999.
[10] Oo, L.M. and N.S.M. Kham, FAQ Finder for Computer Related Questions. 2010, MERAL Portal.
[11] Juan, Z.M. An effective similarity measurement for FAQ question answering system. in 2010 International Conference on Electrical and Control Engineering. 2010. IEEE.
[12] Zhang, P.Y. Sentence similarity metric and its application in FAQ system. in Advanced Materials Research. 2013. Trans Tech Publ.
[13] Shivhre, N., SMS based FAQ retrieval, in Multilingual Information Access in South Asian Languages. 2013, Springer. p. 131-141.
[14] Shaikh, A.D., et al., Improving accuracy of sms based faq retrieval system, in Multilingual Information Access in South Asian Languages. 2013, Springer. p. 142-156.
[15] Wang, D., et al., Latent semantic inference for agriculture FAQ retrieval. World Academy of Science, Engineering and Technology, 2007. 1(6): p. 388-392.
[16] Tseng, W.-T., Y.-C. Hsu, and B. Chen. Effective FAQ Retrieval and Question Matching Tasks with Unsupervised Knowledge Injection. in International Conference on Text, Speech, and Dialogue. 2021. Springer.
[17] Leung, N.K. and S.K. Lau, No More" Keyword Search" or FAQ: Innovative Ontology and Agent Based Dynamic User Interface. IAENG International Journal of Computer Science, 2007. 33(1).
[18] Chiu, D.-Y., Y.-C. Pan, and W.-C. Chang. Using rough set theory to construct e-learning faq retrieval infrastructure. in 2008 First IEEE International Conference on Ubi-Media Computing. 2008. IEEE.
[19] Ming-rui, S., et al., Transfer learning based QA model of FAQ using CQA data. Journal of East China Normal University (Natural Science), 2019. 2019(5): p. 74.
[20] Sakata, W., et al. FAQ retrieval using query-question similarity and BERT-based query-answer relevance. in Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval. 2019.
[21] Mass, Y., et al. Unsupervised FAQ retrieval with question generation and BERT. in Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. 2020.
[22] Zhang, X.F., et al., COUGH: A challenge dataset and models for COVID-19 FAQ retrieval. arXiv preprint arXiv:2010.12800, 2020.
[23] 蔡永橙等, 數位典藏技術導論. 2007, 臺北: 臺灣大學出版中心.
[24] Resnik, P., Using information content to evaluate semantic similarity in a taxonomy. arXiv preprint cmp-lg/9511007, 1995.
[25] Jiang, J.J. and D.W. Conrath, Semantic similarity based on corpus statistics and lexical taxonomy. arXiv preprint cmp-lg/9709008, 1997.
[26] Lai, Y.-S., K.-L. Lee, and C.-H. Wu. Intention Extraction and Semantic Matching for Internet FAQ Retrieval Using Spoken Language Query. in Sixth International Conference on Spoken Language Processing. 2000.
[27] Sunilkumar, P. and A.P. Shaji. A survey on semantic similarity. in 2019 International Conference on Advances in Computing, Communication and Control (ICAC3). 2019. IEEE.
[28] Burke, R.D., et al., Question answering from frequently asked question files: Experiences with the faq finder system. AI magazine, 1997. 18(2): p. 57-57.
[29] Wang, J.-Y., et al. A telecom-domain online customer service assistant based on question answering with word embedding and intent classification. in Proceedings of the IJCNLP 2017, System Demonstrations. 2017.
[30] Cai, L.-Q., et al., Intelligent question answering in restricted domains using deep learning and question pair matching. Ieee Access, 2020. 8: p. 32922-32934.
[31] Bengio, Y., R. Ducharme, and P. Vincent, A neural probabilistic language model. Advances in Neural Information Processing Systems, 2000. 13.
[32] Mikolov, T., et al., Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781, 2013.
[33] Huang, Z. and W. Zhao, Combination of ELMo representation and CNN approaches to enhance service discovery. IEEE Access, 2020. 8: p. 130782-130796.
[34] Devlin, J., et al., Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805, 2018.
[35] Pires, T., E. Schlinger, and D. Garrette, How multilingual is multilingual BERT? arXiv preprint arXiv:1906.01502, 2019.
[36] Hakala, K. and S. Pyysalo. Biomedical named entity recognition with multilingual BERT. in Proceedings of The 5th Workshop on BioNLP Open Shared Tasks. 2019.
[37] Reimers, N. and I. Gurevych, Sentence-bert: Sentence embeddings using siamese bert-networks. arXiv preprint arXiv:1908.10084, 2019.
[38] Santander-Cruz, Y., et al., Semantic Feature Extraction Using SBERT for Dementia Detection. Brain Sciences, 2022. 12(2): p. 270.
[39] Boban, I., A. Doko, and S. Gotovac, Improving sentence retrieval using sequence similarity. Applied Sciences, 2020. 10(12): p. 4316.
[40] Ha, T.-T., et al. Utilizing SBERT For Finding Similar Questions in Community Question Answering. in 2021 13th International Conference on Knowledge and Systems Engineering (KSE). 2021. IEEE.
[41] Kim, H. and J. Seo, High-performance FAQ retrieval using an automatic clustering method of query logs. Information processing & management, 2006. 42(3): p. 650-661.
[42] Honeck, R.P., Semantic similarity between sentences. Journal of Psycholinguistic Research, 1973. 2(2): p. 137-151.
[43] Contractor, D., et al. Handling noisy queries in cross language faq retrieval. in Proceedings of the 2010 conference on empirical methods in natural language processing. 2010
(此全文20270706後開放外部瀏覽)
電子全文
摘要
 
 
 
 
第一頁 上一頁 下一頁 最後一頁 top
* *