帳號:guest(3.143.205.64)          離開系統
字體大小: 字級放大   字級縮小   預設字形  

詳目顯示

以作者查詢圖書館館藏以作者查詢臺灣博碩士論文系統以作者查詢全國書目
作者(中文):林憶姍
作者(外文):Lin, Yi-Shan
論文名稱(中文):使用基於生成式AI之智慧工程文件摘要系統以達成流程優化之成果
論文名稱(外文):Using an intelligent engineering document summarization system based on generative AI to achieve process optimization
指導教授(中文):張瑞芬
指導教授(外文):Amy, J.C.Trappey
口試委員(中文):陳冠宇
何佩勳
口試委員(外文):Chen, Guan-Yu
He, Pei-Xun
學位類別:碩士
校院名稱:國立清華大學
系所名稱:工業工程與工程管理學系
學號:111034602
出版年(民國):113
畢業學年度:112
語文別:中文
論文頁數:75
中文關鍵詞:關鍵字提取自然語言處理自動化摘要
外文關鍵詞:Keyword ExtractionNatural Language Processing (NLP)Automated Summarization
相關次數:
  • 推薦推薦:0
  • 點閱點閱:5
  • 評分評分:*****
  • 下載下載:0
  • 收藏收藏:0
工程邀標書(Request for Quotation, RFQ)為各式工程進行招標時,發布給具製造能力之潛在投標廠商的文件,內容包含了項目要求、技術規範、合約條款、預算……等重要資訊,大型變壓器製造等高度客製化工程產業,經常使用RFQ文件作為進行詢報價的工具。RFQ蘊含之資訊內容繁瑣,有許多不可疏漏之關鍵資訊,且因高度客製化產業之顧客規範的需求複雜度高,需耗費工程技術專業人員許多的心力及時間閱讀並截取重點,以進行成本估算而後產出最佳報價。但招標通常有時間的限制,要在短時間內完成整體報價流程極具難度,且產品設計之報價多是透過專業人員的經驗法則而得,每位人員判斷基準皆有所不同,會產生報價結果不一致的狀況。因此,本研究期望能透過RFQ文件的自動摘要系統,讓報價人員可以更快速的將變壓器關鍵規格表填答完畢,並提出較一致且合理的報價,藉此提高企業的資源使用率及資訊準確性。本研究利用變壓器規格的關鍵字,提取出RFQ全文中含有關鍵字的重要句子,做為RFQ訓練集之摘要,並在經由資料集的驗證後,獲得約2,252筆前案RFQ文件集與規格原文、摘要對照之資料組合作為模型的完整訓練集(Training datasets),並對Transformers架構的PEGASUS自然語言處理模型進行微調(Fine-tune),提升模型產出結果和語法的精準度。接下來,是利用透過規格表填答率與ROUGE摘要品質評估方法來進行文本內容及摘要的比對和評估,並經由工程領域專家的驗證及電力變壓器規格表的完整性,來確認摘要結果的正確性和有效性,以組成一產生工程文件專用之自動化摘要生成流程。最後,本研究使用新RFQ進行驗證整體流程的驗證,期望可增加成本評估與報價的流暢度。
The Request for Quotation (RFQ) is a document issued to potential manufacturing bidders when various engineering projects are open for bidding. It encompasses crucial information such as project requirements, technical specifications, contract terms, budget, and other vital details. In industries with highly customized projects, like the manufacturing of large transformers, RFQs are commonly used as tools to solicit price quotations. The information within an RFQ is intricate, containing numerous critical details that must not be overlooked. Due to the complexity of customer specifications in highly customized industries, such as the production of large transformers, reading and extracting key information from RFQs demand significant effort and time from engineering professionals. The bidding process often has tight deadlines, making it challenging to complete the entire quotation process within a short timeframe. Moreover, the pricing of product designs typically relies on the experiential judgment of professionals, leading to inconsistencies in quotation results due to differing individual assessment criteria. Therefore, this research aims to streamline the process by developing an automatic summarization system for RFQ documents. This system enables quotation professionals to quickly fill in key specifications for transformers, resulting in more consistent and reasonable quotations. The study involves extracting important sentences containing keywords related to transformer specifications from the entire RFQ text to create a summary, using these summaries as a training dataset. After validation, approximately 2,252 records of previous RFQ documents, along with the original specifications and corresponding summaries, are collected for the complete training set. The PEGASUS natural language processing model based on the Transformers architecture is fine-tuned to improve the precision of the model's output and syntax. The research then assesses the quality of the generated summaries and the content of the text using the specification completion rate and the ROUGE summary quality evaluation method. Verification by domain experts in the engineering field and confirmation of the completeness of power transformer specifications ensure the correctness and effectiveness of the summary results, forming an automated summarization generation process specifically for engineering documents. Finally, the research validates the entire process using a new RFQ to enhance the fluency of cost estimation and quoting.
摘要 I
Abstract II
圖目錄 VII
表目錄 VIII
第1章、 緒論 1
1.1 研究背景與動機 1
1.2 研究目的 1
1.3 研究架構 2
第2章、 文獻回顧 3
2.1 電力變壓器 3
2.1.1 電力變壓器構造及製程 5
2.2 工程邀標書 8
2.3 自然語言處理: Transformers 編碼器(Encoder) 解碼器(Encoder) 架構 8
2.3.1 PEGASUS 10
2.3.2 BERT 10
2.3.3 GPT 10
2.4 自動化摘要 11
2.4.1 自動化摘要常使用之方法 14
2.5 文本摘要品質評估 14
第3章、 方法論 16
3.1 資料預處理 17
3.2 提取關鍵字訓練集 17
3.3 保留率及壓縮率 18
3.4 ROUGE摘要品質評估指標 18
3.5 PEGASUS模型應用 20
3.5.1 模型效能 22
3.5.2 詞嵌入 23
3.5.3 PEGASUS-X 23
3.5.4 模型訓練 24
3.5.5 PEGASUS與Word2Vec關聯性與差異 24
3.6 摘要生成 26
第4章、 案例探討及模型驗證 27
4.1 數據集資料 27
4.2 摘要品質評估與模型驗證 30
4.2.1 模型訓練集規格表填答率與壓縮率結果 33
4.2.2 摘要規格表填答率結果 34
4.2.3 ROUGE摘要品質評估指標結果 36
4.2.4 摘要壓縮率結果 37
4.3 電腦輔助成果 38
第5章、 結論 40
參考資料 41
附件A – RFQ文檔範例 47
附件B –變壓器紙本規格表 65
附件C –變壓器電子規格表 66
附件D – PEGASUS模型訓練集規格保留率、壓縮率結果 69
附件E – PEGASUS模型摘要結果範例 71
1. Alamry, H., & Alyousef, A. (2016). Electrical power transformer.
2. Arvind, D., Khushdeep, S., & Deepak, K. (2008, April). Condition monitoring of power transformer: A review. In 2008 IEEE/PES Transmission and Distribution Conference and Exposition (pp. 1-6). IEEE.
3. Atabansi, C. C., Nie, J., Liu, H., Song, Q., Yan, L., & Zhou, X. (2023). A survey of Transformer applications for histopathological image analysis: New developments and future directions. BioMedical Engineering OnLine, 22(1), 96.
4. Azuma, D., Ito, N., & Ohta, M. (2020). Recent progress in Fe-based amorphous and nanocrystalline soft magnetic materials. Journal of Magnetism and Magnetic Materials, 501, 166373.
5. Bhavya, K., Rafeeque, P. C., & Murali, R. Automatic Text Summarizing System Using Reinforcement Learning Technique.
6. Chongsuntornsri, A., & Sornil, O. (2006, October). An automatic Thai text summarization using topic sensitive PageRank. In 2006 International Symposium on Communications and Information Technologies (pp. 547-552). IEEE.
7. Devlin, J., Chang, M. W., Lee, K., & Toutanova, K. (2018). Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805.
8. Dhivyaa, C. R., Nithya, K., Janani, T., Kumar, K. S., & Prashanth, N. (2022, January). Transliteration based generative pre-trained transformer 2 model for Tamil text summarization. In 2022 International Conference on Computer Communication and Informatics (ICCCI) (pp. 1-6). IEEE.
9. Genest, P. E., & Lapalme, G. (2011, June). Framework for abstractive summarization using text-to-text generation. In Proceedings of the workshop on monolingual text-to-text generation (pp. 64-73).
10. Genest, P. E., & Lapalme, G. (2012, July). Fully abstractive approach to guided summarization. In Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers) (pp. 354-358).
11. Gupta, M. S. (1980). Georg Simon ohm and Ohm's law. IEEE Transactions on Education, 23(3), 156-162.
12. Hernández, A., & Amigó, J. M. (2021). Attention mechanisms and their applications to complex systems. Entropy, 23(3), 283.
13. Hovy, E., & Lin, C. Y. (1998, October). Automated text summarization and the SUMMARIST system. In TIPSTER TEXT PROGRAM PHASE III: Proceedings of a Workshop held at Baltimore, Maryland, October 13-15, 1998 (pp. 197-214).
14. Ibrahim, K., Sharkawy, R. M., Temraz, H. K., & Salama, M. M. A. (2022). Reliability calculations based on an enhanced transformer life expectancy model. Ain Shams Engineering Journal, 13(4), 101661.
15. Iwendi, C., Ponnan, S., Munirathinam, R., Srinivasan, K., & Chang, C. Y. (2019). An efficient and unique TF/IDF algorithmic model-based data analysis for handling applications with big data streaming. Electronics, 8(11), 1331.
16. Kryściński, W., Keskar, N. S., McCann, B., Xiong, C., & Socher, R. (2019). Neural text summarization: A critical evaluation. arXiv preprint arXiv:1908.08960.
17. Kulkarni, S. V., & Khaparde, S. A., “Transformer engineering: design and practice,” (Vol. 25). CRC press, 2004.
18. Le, Q., & Mikolov, T. (2014, June). Distributed representations of sentences and documents. In International conference on machine learning (pp. 1188-1196). PMLR.
19. Lee, C. S., Jian, Z. W., & Huang, L. K. (2005). A fuzzy ontology and its application to news summarization. IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics), 35(5), 859-880.
20. Lee, J., Dang, H., Uzuner, O., & Henry, S. (2021, June). MNLP at MEDIQA 2021: fine-tuning Pegasus for consumer health question summarization. In Proceedings of the 20th Workshop on Biomedical Language Processing (pp. 320-327).
21. Liao, R., Liang, S., Sun, C., Yang, L., & Sun, H. (2010). A comparative study of thermal aging of transformer insulation paper impregnated in natural ester and in mineral oil. European Transactions on Electrical Power, 20(4), 518-533.
22. Lin, C. Y. (2004, July). Rouge: A package for automatic evaluation of summaries. In Text summarization branches out (pp. 74-81).
23. Liu, M., Wang, Z., & Wang, L. (2021, February). Automatic Chinese Text Summarization for Emergency Domain. In Journal of Physics: Conference Series (Vol. 1754, No. 1, p. 012213). IOP Publishing.
24. Liu, W., Yang, C., & Zhou, X. (2018). A network quotation framework for customised parts through rough requests. International Journal of Computer Integrated Manufacturing, 31(12), 1220-1234.
25. Matsumoto, H., Shibako, Y., Shiihara, Y., Nagata, R., & Neba, Y., “Three-phase lines to Single-phase Coil Planar Contactless Power Transformer,” IEEE Transactions on Industrial Electronics, 65(4), pp. 2904-2914, 2018.
26. Meier, H., Völker, O., & Funke, B. (2011). Industrial product-service systems (IPS 2) Paradigm shift by mutually determined products and services. The International Journal of Advanced Manufacturing Technology, 52, 1175-1191.
27. Mihalcea, R., & Tarau, P. (2004, July). Textrank: Bringing order into text. In Proceedings of the 2004 conference on empirical methods in natural language processing (pp. 404-411).
28. Mikolov, T., Chen, K., Corrado, G., & Dean, J. “Efficient estimation of word representations in vector space,” arXiv preprint arXiv:1301.3781, 2013.
29. Mishra, D., Baral, A., & Chakravorti, S. (2023). Reliable Assessment of Oil-Paper Insulation Used in Power Transformer Using Concise Dielectric Response Measurement. IEEE Transactions on Dielectrics and Electrical Insulation.
30. Mridha, M. F., Lima, A. A., Nur, K., Das, S. C., Hasan, M., & Kabir, M. M. (2021). A survey of automatic text summarization: Progress, process and challenges. IEEE Access, 9, 156043-156070.
31. Nair, V., Katariya, N., Amatriain, X., Valmianski, I., & Kannan, A. (2021). Adding more data does not always help: A study in medical conversation summarization with PEGASUS. arXiv preprint arXiv:2111.07564.
32. Nenkova, A., Maskey, S., & Liu, Y., “Automatic summarization” Foundations and Trends® in Information Retrieval, 5(2–3), 103-233, 2011.
33. Nenkova, A., Passonneau, R., & McKeown, K. (2007). The pyramid method: Incorporating human content selection variation in summarization evaluation. ACM Transactions on Speech and Language Processing (TSLP), 4(2), 4-es.
34. Phang, J., Zhao, Y., & Liu, P. J. (2022). Investigating efficiently extending transformers for long input summarization. arXiv preprint arXiv:2208.04347.
35. Prasad, C., Kallimani, J. S., Harekal, D., & Sharma, N. (2020, October). Automatic Text Summarization Model using Seq2Seq Technique. In 2020 Fourth International Conference on I-SMAC (IoT in Social, Mobile, Analytics and Cloud)(I-SMAC) (pp. 599-604). IEEE.
36. Qaiser, S., & Ali, R. (2018). Text mining: use of TF-IDF to examine the relevance of words to documents. International Journal of Computer Applications, 181(1), 25-29.
37. Radford, A., Narasimhan, K., Salimans, T., & Sutskever, I. (2018). Improving language understanding by generative pre-training.
38. Radford, A., Wu, J., Child, R., Luan, D., Amodei, D., & Sutskever, I. (2019). Language models are unsupervised multitask learners. OpenAI blog, 1(8), 9.
39. Raffel, C., Shazeer, N., Roberts, A., Lee, K., Narang, S., Matena, M., ... & Liu, P. J. (2020). Exploring the limits of transfer learning with a unified text-to-text transformer. The Journal of Machine Learning Research, 21(1), 5485-5551.
40. Ronan, E. R., Sudhoff, S. D., Glover, S. F., & Galloway, D. L. (2002). A power electronic-based distribution transformer. IEEE Transactions on Power Delivery, 17(2), 537-543.
41. Rothe, S., Narayan, S., & Severyn, A. (2020). Leveraging pre-trained checkpoints for sequence generation tasks. Transactions of the Association for Computational Linguistics, 8, 264-280.
42. Saggion, H., & Poibeau, T. (2013). Automatic text summarization: Past, present and future. Multi-source, multilingual information extraction and summarization, 3-21.
43. Septyani, H. I., Arifianto, I., & Purnomoadi, A. P. (2011, July). High voltage transformer bushing problems. In Proceedings of the 2011 International Conference on Electrical Engineering and Informatics (pp. 1-4). IEEE.
44. Sha, Y., Zhou, Y., Nie, D., Wu, Z., & Deng, J. (2014). A study on electric conduction of transformer oil. IEEE Transactions on Dielectrics and Electrical Insulation, 21(3), 1061-1069.
45. Srividya, K., Bommuluri, S. K., Asapu, V. V. V. K., Illa, T. R., Basa, V. R., & Chatradi, R. V. S. (2022, December). A Hybrid Approach for Automatic Text Summarization and Translation based On Luhn, Pegasus, and Textrank Algorithms. In 2022 International Conference on Smart Generation Computing, Communication and Networking (SMART GENCON) (pp. 1-8). IEEE.
46. Su, Y., Xiang, H., Xie, H., Yu, Y., Dong, S., Yang, Z., & Zhao, N. (2020). Application of bert to enable gene classification based on clinical evidence. BioMed research international, 2020.
47. Takajo, S., Ito, T., Omura, T., & Okabe, S. (2017). Loss and noise analysis of transformer ComprisingGrooved grain-oriented silicon steel. IEEE Transactions on Magnetics, 53(9), 1-6.
48. Tanaka, H., Kinoshita, A., Kobayakawa, T., Kumano, T., & Kato, N. (2009, August). Syntax-driven sentence revision for broadcast news summarization. In Proceedings of the 2009 Workshop on Language Generation and Summarisation (UCNLG+ Sum 2009) (pp. 39-47).
49. Thiviyanathan, V. A., Ker, P. J., Leong, Y. S., Abdullah, F., Ismail, A., & Jamaludin, M. Z. (2022). Power transformer insulation system: A review on the reactions, fault detection, challenges and future prospects. Alexandria Engineering Journal, 61(10), 7697-7713.
50. Trappey, A. J. C., Trappey, C., & Govindarajan, U. H. (2019, October). Knowledge extraction of RfQ engineering documents for smart manufacturing. In 22nd International Conference on Advances in Materials and Processing Technologies (AMPT 2019).
51. Trappey, A. J., Chang, A. C., Trappey, C. V., & Chien, J. Y. C. (2022). Intelligent RFQ summarization using natural language processing, text mining, and machine learning techniques. Journal of Global Information Management (JGIM), 30(1), 1-26.
52. Trappey, A. J., Trappey, C. V., Chao, M. H., & Wu, C. T. (2022). VR-enabled engineering consultation chatbot for integrated and intelligent manufacturing services. Journal of Industrial Information Integration, 26, 100331.
53. Trappey, A. J., Trappey, C. V., Chao, M. H., Hong, N. J., & Wu, C. T. (2021). A vr-enabled chatbot supporting design and manufacturing of large and complex power transformers. Electronics, 11(1), 87.
54. Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., ... & Polosukhin, I. (2017). Attention is all you need. Advances in neural information processing systems, 30.
55. Verberne, S., Krahmer, E., Wubben, S., & van den Bosch, A. (2020). Query-based summarization of discussion threads. Natural Language Engineering, 26(1), 3-29.
56. Verma, P., & Om, H. (2019). MCRMR: Maximum coverage and relevancy with minimal redundancy based multi-document summarization. Expert Systems with Applications, 120, 43-56.
57. Wan, Z., & Beil, D. R. (2009). RFQ auctions with supplier qualification screening. Operations Research, 57(4), 934-949.
58. Wang, W. C. (2018). 以多語系自然語言理解與機器學習為基之智慧型專利摘要系統.[ Intelligent patent summarization system incorporating multiple natural language understanding and machine learning capability].清華大學工業工程與工程管理學系學位論文, 2018, 1-84.
59. Wei, Y., & Ding, Y. (2023). Application of Text Rank Algorithm Fused with LDA in Information Extraction Model. IEEE Access.
60. Wu, N., Green, B., Ben, X., & O'Banion, S. (2020). Deep transformer models for time series forecasting: The influenza prevalence case. arXiv preprint arXiv:2001.08317.
61. Xu, W., Li, C., Lee, M., & Zhang, C. (2020). Multi-task learning for abstractive text summarization with key information guide network. EURASIP Journal on Advances in Signal Processing, 2020(1), 1-11.
62. Yang, T. H., Lu, C. C., & Hsu, W. L. (2021, November). More than Extracting" Important" Sentences: the Application of PEGASUS. In 2021 International Conference on Technologies and Applications of Artificial Intelligence (TAAI) (pp. 131-134). IEEE.
63. You, H., Ye, Y., Zhou, T., Zhu, Q., & Du, J. (2023). Robot-Enabled Construction Assembly with Automated Sequence Planning based on ChatGPT: RoboGPT. arXiv preprint arXiv:2304.11018.
64. Zhang, J., Zhao, Y., Saleh, M., & Liu, P. (2020, November). PEGASUS: Pre-training with extracted gap-sentences for abstractive summarization. In International Conference on Machine Learning (pp. 11328-11339). PMLR.
65. Zhang, M., Zhou, G., Yu, W., Huang, N., & Liu, W. (2022). A comprehensive survey of abstractive text summarization based on deep learning. Computational intelligence and neuroscience, 2022.
66. Zhao, F., Li, X., Gao, Y., Li, Y., Feng, Z., & Zhang, C. (2022). Multi-layer features ablation of BERT model and its application in stock trend prediction. Expert Systems with Applications, 207, 117958.
67. Zheng, J., & Fischer, M. (2023). BIM-GPT: a Prompt-Based Virtual Assistant Framework for BIM Information Retrieval. arXiv preprint arXiv:2304.09333.
68. 余駿. (2006). 本體論為基之智慧型專利文件自動摘要方法論研究. [A Novel Methodology for Automated Ontology-Based Patent Document Summarization]. 清華大學工業工程與工程管理學系學位論文, 2006.
69. 高豪伸. (2005). 應用關鍵辭彙辨識技術與測量重要資訊密度之文件自動摘要系統[A Document Summarization System Using Key-Phrase Recognition and Significant Information Density]. 清華大學工業工程與工程管理學系學位論文, 2005.
70. 張簡宇傑. (2020). 基於文字探勘之智慧工程文件摘要系統[Engineering Document Summarization System Using Text Mining Methods]. 清華大學工業工程與工程管理學系學位論文, 2020.
(此全文20270709後開放外部瀏覽)
電子全文
摘要
 
 
 
 
第一頁 上一頁 下一頁 最後一頁 top
* *