帳號:guest(3.145.168.203)          離開系統
字體大小: 字級放大   字級縮小   預設字形  

詳目顯示

以作者查詢圖書館館藏以作者查詢臺灣博碩士論文系統以作者查詢全國書目
作者(中文):徐浩軒
作者(外文):Hsu, Hao-Hsuan
論文名稱(中文):小十:基於機器學習與自然語言處理的智慧答題機器人
論文名稱(外文):Xiao-Shih: An Intelligent Question Answering Bot With Machine Learning and Natural Language Processing
指導教授(中文):黃能富
指導教授(外文):Huang, Nen-Fu
口試委員(中文):許健平
韓永楷
陳俊良
張耀中
張宏義
口試委員(外文):Sheu, Jang-Ping
Hon, Wing-Kai
Chen, Jiann-Liang
Chang, Yao-Chung
Chang, Hong-Yi
學位類別:博士
校院名稱:國立清華大學
系所名稱:資訊系統與應用研究所
學號:101065503
出版年(民國):112
畢業學年度:111
語文別:英文
論文頁數:79
中文關鍵詞:磨課師答題機器人自然語言處理機器學習尋找重複問題迷思概念發掘
外文關鍵詞:MOOCsquestion answering botnatural language processingmachine learningfinding duplicate questionsmisconception discovery
相關次數:
  • 推薦推薦:0
  • 點閱點閱:156
  • 評分評分:*****
  • 下載下載:0
  • 收藏收藏:0
為減輕磨課師(Massive Open Online Courses,MOOCs)課程中老師及助教答覆大量學習者問題的時間延遲和負擔,並幫助學習者即時解決問題,本論文利用兩門線上課程「Python 資料科學實作」、「電腦網路概論」的討論區資料,提出基於自然語言處理(natural language processing,NLP)的迷思概念發掘(misconception discovery)及尋找重複問題之方法:擴散問題相似度(spreading question similarity,SQS),並進一步利用機器學習(machine learning,ML)研製答題機器人:小十(Xiao-Shih)。小十在尋找重複問題的子任務中,最主要的挑戰是識別用詞不同的重複問題,因此本論文將相似單詞納入計算並實作了關鍵字網路產生器(KN-generator),它利用word2vec模型預測單詞相似度並自動建立關鍵字網路(keyword network),SQS再於此關鍵字網路中迭代地尋找問題中的所有關鍵字,並把它們的相似單詞一併納入計算。經實驗證明,本論文提出的SQS演算法,其召回率(recall)遠高於先進的神經語言模型Google BERT;由SQS、BERT等特徵所訓練的小十,在這兩門實驗課程的測試資料中進行答題,分別可達到1.0及0.979的幾近完美正確率,此外,本論文提出的自我充實設計能讓小十藉由獲取外部問答知識庫來擴充自己的知識庫,經證實可提高回覆率。最終,小十在磨課師的領域中成為最先進的答題機器人。
This dissertation developed an intelligent question answering (QA) bot named Xiao-Shih to solve learners’ problems in real time and reduce the burden of instructors in answering numerous questions on massive open online courses (MOOCs). Xiao-Shih has three building blocks: data collection, question retrieval, and answer selection. In data collection, Xiao-Shih integrated internal QA pairs from MOOCs and external QA pairs from a community-based question answering (cQA) site to enrich its knowledge base. In question retrieval, the main challenge of this task is to identify duplicate questions expressed in different words. Therefore, this dissertation proposed a novel approach named spreading question similarity (SQS) driven by deep learning-based keyword networks. Experiments have proved recall of SQS is substantially higher than that of Google BERT, an advanced neural language model; SQS outperforms BERT on accuracy above a prediction probability threshold of 0.8. In answer selection, Xiao-Shih was trained with machine learning and achieved the near-perfect correct rate of 1.0 and 0.979 on two experimental courses, “Python for Data Science” and “Introduction to Computer Networks.” In the end, Xiao-Shih outperforms Jill Waston, a noted QA bot on MOOCs, on answer rate. Xiao-Shih is not only the first QA bot on Chinese-based MOOCs but the state-of-the-art one in the field of MOOCs.
謝誌 ii
摘要 iii
Abstract iv
List of Figures vii
List of Tables ix
Chapter 1 Introduction 1
1.1. Background 1
1.2. Motivation 2
1.2.1. Identifying Learners’ Misconceptions 3
1.2.2. Relieving Instructors’ Efforts in Answering Questions 5
1.2.3. Solving Learners’ Problems Immediately 7
Chapter 2 Related Work 10
2.1. Misconception Discovery and Keyword Network Construction 10
2.2. Question Answering Systems 11
2.3. Question Retrieval 13
2.4. Answer Selection 14
2.5. Evaluation of Question Answering Bot 15
Chapter 3 Misconception Discovery 17
3.1. Data Description 17
3.2. Misconception Discovery in Forums With NLP 19
3.2.1. Tokenization 19
3.2.2. Discovering Keywords of Misconceptions 20
3.2.3. Filtering out and Visualizing Keywords of Misconceptions 21
3.2.4. Generating Keyword Networks 26
Chapter 4 Xiao-Shih 1.0: A QA Bot With ML 28
4.1. Data Description 28
4.2. KN-generator 30
4.3. Question Retrieval With SQS 34
4.3.1. Implementation of SQS 34
4.3.2. Evaluation of SQS 36
4.3.3. Comparisons of BERT and SQS 39
4.4. Building Xiao-Shih 1.0: Answer Selection With ML 39
4.4.1. Building Xiao-Shih 0.1 With SQS 39
4.4.2. Building Xiao-Shih 1.0 With ML 42
4.4.3. Building Robust Model With K-fold Cross-validation 43
4.5. Evaluating Xiao-Shih 1.0 44
4.5.1. Answer Rate and Correct Rate of Xiao-Shih 1.0 44
4.5.2. Robustness of Xiao-Shih 1.0 45
4.5.3. Effectiveness of SQS 47
Chapter 5 Xiao-Shih 2.0: A Self-enriched QA Bot 49
5.1. System Architecture 50
5.2. Data Description 50
5.3. Building Xiao-Shih 2.0 52
5.3.1. Self-enriched Mechanism 52
5.3.2. Training and Validating Xiao-Shih 2.0 With Optimization 54
5.3.3. Feature Importance of Xiao-Shih 2.0 58
5.3.4. Trade-off Between Precision and Recall 59
5.4. Evaluating Xiao-Shih 2.0 61
5.4.1. Final Test 61
5.4.2. Comparisons With Jill Watson 62
Chapter 6 Conclusion and Future Works 64
Bibliography 67
Appendix A: Publications 75
Appendix B: Examples of Duplicate Questions 77
[1] L. Pappano, “The Year of the MOOC,” The New York Times, Nov. 2012. Accessed on: Feb. 1, 2020. [Online]. Available: https://www.nytimes.com/2012/11/04/education/edlife/massive-open-online-courses-are-multiplying-at-a-rapid-pace.html
[2] edX. Accessed on: Feb. 1, 2020. [Online]. Available: https://www.edx.org/
[3] Coursera. Accessed on: Feb. 1, 2020. [Online]. Available: https://www.coursera.org/
[4] Udacity. Accessed on: Feb. 1, 2020. [Online]. Available: https://www.udacity.com/
[5] ShareCourse. Accessed on: Feb. 1, 2020. [Online]. Available: https://www.sharecourse.net/
[6] “Introduction to Computer Networks” Course on ShareCourse. Accessed on: Feb. 1, 2020. [Online] Available: https://www.sharecourse.net/sharecourse/course/view/courseInfo/1907
[7] K. Jordan, “Initial Trends in Enrolment and Completion of Massive Open Online Courses,” International Review of Research in Open and Distributed Learning, vol. 15, no. 1, pp. 133–160, Feb. 2014, doi: 10.19173/irrodl.v15i1.1651.
[8] “Python for Data Science” Course on ShareCourse. Accessed on: Feb. 1, 2020. [Online]. Available: https://www.sharecourse.net/sharecourse/course/view/courseInfo/1729
[9] A. Singhal, “Modern Information Retrieval: A Brief Overview,” IEEE Data Engineering Bulletin, vol. 24, no. 4, pp. 35–43, Mar. 2001.
[10] P. Jaccard, “The Distribution of the Flora of the Alpine Zone,” New Phytologist, vol. 11, no. 2, pp. 37–50, Feb. 1912, doi: 10.1111/j.1469-8137.1912.tb05611.x.
[11] Stack Overflow. Accessed on: Nov. 17, 2022. [Online]. Available: https://stackoverflow.com/
[12] M. Ahasanuzzaman, M. Asaduzzaman, C. K. Roy, and K. A. Schneider, “Mining Duplicate Questions of Stack Overflow,” IEEE/ACM 13th Working Conference on Mining Software Repositories (MSR), May 14–15, 2016, pp. 402–412, doi: 10.1109/MSR.2016.048.
[13] R. Kim, L. Olfman, T. Ryan, and E. Eryilmaz, “Leveraging a personalized system to improve self-directed learning in online educational environments,” Computers & Education, vol. 70, pp. 150–160, Jan. 2014, doi: 10.1016/j.compedu.2013.08.006.
[14] H. Khalil and M. Ebner, “MOOCs Completion Rates and Possible Methods to Improve Retention - A Literature Review,” in Proc. EdMedia 2014--World Conf. Educational Media and Technology, June 23–26, 2014, pp. 1236–1244.
[15] A. S. Sunar, S. White, N. A. Abdullah and H. C. Davis, “How Learners’ Interactions Sustain Engagement: A MOOC Case Study,” IEEE Trans. Learning Technologies, vol. 10, no. 4, pp. 475–487, 1 Oct.–Dec. 2017, doi: 10.1109/TLT.2016.2633268.
[16] W. A. Sahlman and L. Kind, "Khan academy," Harvard Business, 2011. [Online]. Available: https://mittalsouthasiainstitute.harvard.edu/wp-content/uploads/2012/08/Khan_Academy.pdf
[17] Y. H. Chen, N. F. Huang, J. W. Tzeng, C. A.. Lee, Y. X. Huang and H. H. Huang, "A Personalized Learning Path Recommender System with LINE Bot in MOOCs Based on LSTM," 2022 11th International Conference on Educational and Information Technology (ICEIT), 2022, pp. 40-45, doi: 10.1109/ICEIT54416.2022.9690754.
[18] A. Maedche and S. Staab, “Ontology learning for the Semantic Web,” IEEE Intelligent Systems, vol. 16, no. 2, pp. 72–79, Mar.–Apr. 2001, doi: 10.1109/5254.920602.
[19] A. Singhal, “Introducing the Knowledge Graph: things, not strings,” May 2012. Accessed on: Feb. 1, 2020. [Online]. Available: https://www.blog.google/products/search/introducing-knowledge-graph-things-not/
[20] K. Bollacker, C. Evans, P. Paritosh, T. Sturge, and J. Taylor, “Freebase: A collaboratively created graph database for structuring human knowledge,” in Proc. 2008 ACM SIGMOD Int. Conf. Management of Data, pp. 1247–1250, Jun. 9–12, 2008, doi: 10.1145/1376616.1376746.
[21] S. Auer, C. Bizer, G. Kobilarov, J. Lehmann, R. Cyganiak, and Z. Ives, “DBpedia: A Nucleus for a Web of Open Data,” in Proc. 6th Int. Semantic Web 2nd Asian Conf. Asian Semantic Web Conf., pp. 722–735, Nov. 11–15, 2007, doi: 10.1007/978-3-540-76298-0_52.
[22] F. M. Suchanek, G. Kasneci, and G. Weikum, “YAGO: A Large Ontology from Wikipedia and WordNet,” Journal of Web Semantics, vol. 6, no. 3, pp. 203–217, Sep. 2008, doi: 10.1016/j.websem.2008.06.001.
[23] A. Goel and L. Polepeddi, “Jill Watson: A Virtual Teaching Assistant for Online Education,” School of Interactive Computing Technical Reports, 2016, doi:10.4324/9781351186193-7.
[24] IBM Watson. Accessed on: Feb. 1, 2020. [Online]. https://www.ibm.com/watson/.
[25] A. K. Goel, A Teaching Assistant Named Jill Watson on TED Talks. Accessed: September 23, 2021. [Online Video]. Available: https://www.youtube.com/watch?v=WbCguICyfTA
[26] B. Kratzwald, A. Eigenmann, and S. Feuerriegel, “RankQA: Neural Question Answering with Answer Re-Ranking,” in Proc. 57th Annu. Meeting Association for Computational Linguistics (ACL), Jul. 28–Aug. 2, 2019, pp. 6076–6085, doi: 10.18653/v1/P19-1611.
[27] A. Agarwal, N. Sachdeva, R. K. Yadav, V. Udandarao, V. Mittal, A. Gupta, and A. Mathur, “EDUQA: Educational Domain Question Answering System Using Conceptual Network Mapping," IEEE Int. Conf. Acoustics, Speech and Signal Processing (ICASSP), May 12–17, 2019, pp. 8137–8141, doi: 10.1109/ICASSP.2019.8683538.
[28] Y. Lin and H. Shen, “SmartQ: A Question and Answer System for Supplying High-Quality and Trustworthy Answers,” IEEE Trans. Big Data, vol. 4, no. 4, pp. 600–613, Dec. 2018, doi: 10.1109/TBDATA.2017.2735442.
[29] H. Shen, G. Liu, H. Wang, and N. Vithlani, “SocialQ&A: An Online Social Network Based Question and Answer System,” IEEE Trans. Big Data, vol. 3, no. 1, pp. 91–106, Mar. 2017, doi: 10.1109/TBDATA.2016.2629487.
[30] A. H. Asiaee, T. Mining, P. Doshi, and R. L. Tarleton, “A framework for ontology-based question answering with application to parasite immunology,” Journal of Biomedical Semantics, vol. 6, pp. 1–25, Jul. 2015, doi: 10.1186/s13326-015-0029-x.
[31] A. Abacha and P. Zweigenbaum, “MEANS: A medical question-answering system combining NLP techniques and semantic Web technologies,” Information Processing and Management, vol. 51, no. 5, pp. 570–594, Sep. 2015, doi: 10.1016/j.ipm.2015.04.006.
[32] X. Xue, J. Jeon and W. B. Croft, “Retrieval models for question and answer archives,” in Proc. 31st Annu. Int. ACM SIGIR Conf. on Research and Development in Information Retrieval, Jul. 2008, pp. 475–482, doi: 10.1145/1390334.1390416.
[33] N. Othman, R. Faiz, and K. Smaïli, "Enhancing Question Retrieval in Community Question Answering Using Word Embeddings," Procedia Computer Science, vol. 159, pp. 485-494, 2019/01/01/ 2019, doi: https://doi.org/10.1016/j.procs.2019.09.203.
[34] G. Zhou, L. Cai, J. Zhao and K. Liu, “Phrase-Based Translation Model for Question Retrieval in Community Question Answer Archives,” in Proc. 49th Annu. Meeting Association for Computational Linguistics: Human Language Technologies. pp. 653–662, Jun. 19–24, 2011.
[35] T. Mikolov, I. Sutskever, K. Chen, G. Corrado, and J. Dean, “Distributed Representations of Words and Phrases and their Compositionality,” 27rd Conf. Neural Information Processing Systems (NIPS), Dec. 5–10, 2013, pp. 3111–3119.
[36] C. D. Manning, P. Raghavan and H. Schütze, “Introduction to information Retrieval,” Cambridge, U.K.:Cambridge Univ. Press, 2008.
[37] S. Robertson and H. Zaragoza, “The Probabilistic Relevance Framework: BM25 and Beyond,” Foundations and Trends® in Information Retrieval, vol. 3, no. 4, pp. 333–389, Dec. 2009, doi: 10.1561/1500000019.
[38] D. Ferrucci, E. Brown, J. Chu-Carroll, J. Fan, D. Gondek, A. A. Kalyanpur, A. Lally, J. W. Murdock, E. Nyberg, J. Prager, N. Schlaefer, and C. Welty, “Building Watson: An Overview of the DeepQA Project,” AI Magazine, vol. 31, pp. 59–79, Jul. 2010, doi: 10.1609/aimag.v31i3.2303.
[39] L. Nie, X. Wei, D. Zhang, X. Wang, Z. Gao, and Y. Yang, “Data-Driven Answer Selection in Community QA Systems,” IEEE Trans. Knowledge and Data Engineering, vol. 29, no. 6, pp. 1186–1198, Jun. 2017, doi: 10.1109/TKDE.2017.2669982.
[40] Y. Xiang, Q. Chen, X. Wang, and Y. Qin, “Answer Selection in Community Question Answering via Attentive Neural Networks,” IEEE Signal Processing Letters, vol. 24, no. 4, pp. 505–509, Apr. 2017, doi: 10.1109/LSP.2017.2673123.
[41] F. Wu, X. Duan, J. Xiao, Z. Zhao, S. Tang, Y. Zhang, and Y. Zhuang., “Temporal Interaction and Causal Influence in Community-Based Question Answering,” IEEE Trans. Knowledge and Data Engineering, vol. 29, no. 10, pp. 2304–2317, Oct. 2017, doi: 10.1109/TKDE.2017.2720737.
[42] X. Cheng, S. Zhu, S. Su, and G. Chen, “A Multi-Objective Optimization Approach for Question Routing in Community Question Answering Services,” IEEE Trans. Knowledge and Data Engineering, vol. 29, no. 9, pp. 1779–1792, Sep. 2017, doi: 10.1109/TKDE.2017.2696008.
[43] Z. Zhao, L. Zhang, X. He, and W. Ng, “Expert Finding for Question Answering via Graph Regularized Matrix Completion,” IEEE Trans. Knowledge and Data Engineering, vol. 27, no. 4, pp. 993–1004, Apr. 2015, doi: 10.1109/TKDE.2014.2356461.
[44] Á. Rodrigo and A. Peñas, “On Evaluating the Contribution of Validation for Question Answering,” IEEE Trans. Knowledge and Data Engineering, vol. 27, no. 4, pp. 1157–1161, Apr. 2015, doi: 10.1109/TKDE.2014.2373363.
[45] A. Peñas and Á. Rodrigo, “A Simple Measure to Assess Non-response,” in Proc. 49th Annu. Meeting Association for Computational Linguistics: Human Language Technologies, Jun. 19–24, 2011, pp. 1415–1424.
[46] L. Nie, Y. Zhao, X. Wang, J. Shen, and T. Chua, “Learning to Recommend Descriptive Tags for Questions in Social Forums,” ACM Trans. Information Systems, vol. 32, no. 1, pp. 1–23, Jan. 2014, doi: 10.1145/2559157.
[47] “Introduction to Computer Networks” Course on ShareCourse. Accessed on: Oct. 2, 2022. [Online] Available: https://www.sharecourse.net/sharecourse/course/view/courseInfo/568
[48] Jieba. Accessed on: Feb. 1, 2020. [Online]. Available: https://github.com/fxsjy/jieba
[49] G. Salton and M.J. McGill. Introduction to Modern Information Retrieval, New York, NY, USA: McGraw-Hill, Inc., 1986.
[50] L. Breiman, "Random Forests," Machine Learning, vol. 45, no. 1, pp. 5-32, 2001/10/01 2001, doi: 10.1023/A:1010933404324.
[51] Stack Overflow Open Datasets on Google BigQuery. Accessed on: Feb. 1, 2020. [Online]. Avaliable: https://bigquery.cloud.google.com/dataset/bigquery-public-data:stackoverflow
[52] Zhihu. Accessed on: Nov. 17, 2022. [Online]. Available: https://www.zhihu.com/
[53] Quora. Accessed on: Nov. 17, 2022. [Online]. Available: https://www.quora.com/
[54] ChatGPT. Accessed on: Jan. 3, 2023. [Online]. Available: https://openai.com/blog/chatgpt/
[55] H. H. Hsu and N. F. Huang, “Xiao-Shih: A Self-Enriched Question Answering Bot With Machine Learning on Chinese-Based MOOCs,” in IEEE Transactions on Learning Technologies, vol. 15, no. 2, pp. 223-237, 1 April 2022. doi: 10.1109/TLT.2022.3162572
[56] H. H. Hsu and N. F. Huang, “Xiao-Shih: The Educational Intelligent Question Answering Bot on Chinese-Based MOOCs,” 2018 17th IEEE International Conference on Machine Learning and Applications (ICMLA), Orlando, FL, USA, 2018, pp. 1316-1321, doi: 10.1109/ICMLA.2018.00213.
[57] H. H. Hsu, N. F. Huang, S. C. Chen, C. A. Lee, and J. W. Tzeng, “Misconceptions mining and visualizations for Chinese-based MOOCs forum based on NLP,” 2017 IEEE 2nd International Conference on Big Data Analysis (ICBDA), Beijing, China, 2017, pp. 634-639, doi: 10.1109/ICBDA.2017.8078712.
 
 
 
 
第一頁 上一頁 下一頁 最後一頁 top
* *