以文件內容為基礎之多文件脈絡關係分析-以產品相關文件分析為例_

帳號：guest(216.73.216.96) 離開系統

字體大小：

詳目顯示

第 1 筆 / 共 1 筆

/1頁

以作者查詢圖書館館藏

、以作者查詢臺灣博碩士論文系統

、以作者查詢全國書目

論文基本資料
摘要
外文摘要
論文目次
參考文獻
電子全文

作者(中文):	朱寧敏
作者(外文):	Chu, Ning Min
論文名稱(中文):	以文件內容為基礎之多文件脈絡關係分析-以產品相關文件分析為例
論文名稱(外文):	Multi-document Context Relationship Analysis - A Case Study of Product Related Documents
指導教授(中文):	侯建良
指導教授(外文):	Hou, Jiang Liang
口試委員(中文):	吳建瑋廖崇碩
學位類別:	碩士
校院名稱:	國立清華大學
系所名稱:	工業工程與工程管理學系
學號:	103034604
出版年(民國):	105
畢業學年度:	104
語文別:	中文
論文頁數:	228
中文關鍵詞:	文件脈絡關係、文件類別判定、閱讀內容建議
外文關鍵詞:	Document Context Relationship、Classification、Reading Recommendation
相關次數:	推薦:0 點閱:910 評分: 下載:40 收藏:0

當資訊需求者透過網際網路搜尋所需之文件資料時，由搜索引擎所尋得之文件通常以與搜尋條件相關性高或常被其他瀏覽者點選之文件為優先出現，即符合搜尋條件文件之排序並未考量文件之脈絡關係（即文件之排序未參考文件內容參照的先後關聯），導致資訊需求者無法依文件間合理的先後次第、由淺入深地閱讀文件，因而可能花費較多時間理解文件內容、或在閱讀文件的過程中面臨理解困難的問題。
為解決上述問題，本研究乃先透過搜索引擎蒐集網際網路之各類文件，將所蒐集之文件加以分類，並擷取各文件之特徵點；之後，本研究即依各文件特質擷取結果歸納各類文件之區分特質。依前述作業之解析結果，本研究發展一套「文件脈絡關係分析」方法論，而此方法論主要乃包含「文件特質擷取」、「文件類別判定」及「文件脈絡排序」等三大階段。其中，「文件特質擷取」階段可將搜索引擎尋得之文件依其文件內容擷取特徵點；之後，「文件類別判定」階段乃依文件特質擷取結果、搭配已歸納之各類文件區分特質判定各目標文件所對應之文件類別；最後，「文件脈絡排序」階段則將各類別之文件依閱讀先後次第由淺入深地予以排序，並以視覺化方式呈現此排序結果，以呈現文件間之脈絡關係，供讀者方便地選讀所尋得之目標文件。
藉由上述方法，資訊需求者可在尋得所需之文件資料後，以本研究發展之方法自大量文件中取得文件間合理之排序，並可依文件之先後次第由淺入深地閱讀文件，減少理解文件內容與困難問題的時間，進而提供不同對象閱讀之建議內容，以及學習過程之關係脈絡建議。

As one searches required documents via keywords over the Internet, ranks of the related documents are determined based on their correlation with the specified keywords and their click rates. That is, context relationship between the related documents is not employed to determine the rank. As a result, readers have to spend more time to understand the document contents or face difficulties in understanding the documents. In order to solve the problems, this research analyzes a great number of documents and generalizes the relationship between document characteristics and document categories. On the basis of the analysis results, this research develops a model for context relationship analysis of multiple documents. By using the proposed model, characteristics and categories of documents can be identified by using determinant vectors. Finally, the documents can be sorted and the context relationship of documents can be visually displayed for reading. As a whole, the research can assist readers to acquire reasonable and visualized ranking of documents and to read the documents in appropriate sequence.

目錄

摘要 I
ABSTRACT II
目錄 III
圖目錄 V
表目錄 VIII
第一章、研究背景 1
1.1研究動機與目的 1
1.2研究步驟 4
1.3研究定位 7
第二章、文獻回顧 11
2.1文件特質擷取 11
2.1.1依質化特性擷取文件特質 11
2.1.2依量化特性擷取文件特質 14
2.1.3依質化與量化特性擷取文件特質 20
2.2文件分類 24
2.2.1以監督式方法判定文件類別 24
2.2.2以半監督式方法判定文件類別 32
2.2.3以非監督式方法判定文件類別 35
2.3文件排序 39
2.3.1以搜尋字特質為基礎之文件排序模式 39
2.3.2以文件特質為基礎之文件排序模式 42
2.3.3以資訊需求者特質為基礎之文件排序模式 48
2.4小結 52
第三章、以文件內容為基礎之多文件脈絡關係分析模式 54
3.1現行文件內容解析 55
3.1.1文件特徵點與文件類別釐清 56
3.1.2特徵點與文件類別之關係分析 62
3.2文件特質擷取 67
3.3文件類別判定 74
3.4文件脈絡排序 91
3.5小結 95
第四章、系統規劃與架構 97
4.1系統核心架構 97
4.2系統功能架構 98
4.3資料模式定義 101
4.4系統功能運作流程 103
4.4.1系統功能操作流程 103
4.4.2系統資料傳遞流程 107
4.5系統開發工具 108
第五章、系統績效驗證與分析 109
5.1系統運作概況說明 109
5.2系統驗證方式說明 114
5.3系統驗證結果分析 118
第六章、結論與未來發展 136
6.1論文總結 136
6.2未來發展 139
參考文獻 141
附錄A、現行文件內容解析前置作業 147
附錄B、系統功能說明 166
附錄C、模式與系統於第二階段各週期之績效驗證結果 182

參考文獻

1. Agrawal, J., Sharma, N., Kumar, P., Parshav, V. and Goudar, R. H., 2013, "Ranking of Searched Documents Using Semantic Technology," Procedia Engineering, Vol. 64, pp. 1-7.
2. Akbari Torkestani, J., 2012, "An Adaptive Learning Automata-Based Ranking Function Discovery Algorithm," Journal of Intelligent Information Systems, Vol. 39, No. 2, pp. 441-459.
3. Alsmadi, I. and Alhami, I., 2015, "Clustering and Classification of Email Contents," Journal of King Saud University - Computer and Information Sciences, Vol. 27, No. 1, pp. 46-57.
4. Al-Tahrawi, M. M. and Al-Khatib, S. N., 2015, "Arabic Text Classification Using Polynomial Networks," Journal of King Saud University - Computer and Information Sciences, Vol. 27, No. 4, pp. 437-449.
5. Benny, A. and Philip, M., 2015, "Keyword Based Tweet Extraction and Detection of Related Topics," Procedia Computer Science, Vol. 46, pp. 364-371.
6. Bonzanini, M., Martinez-Alvarez, M. and Roelleke, T., 2012, "Opinion Summarisation through Sentence Extraction: An Investigation with Movie Reviews," Proceedings of the 35th International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 1121-1122.
7. Chan, W. K. and Chong, W. C., 2004, "Unsupervised Clustering for Nontextual Web Document Classification," Decision Support Systems, Vol. 37, No. 3, pp. 377-396.
8. Chen, Y.-H., Lu, J.-L. and Tsai, M. F., 2014, "Finding Keywords in Blogs: Efficient Keyword Extraction in Blog Mining via User Behaviors," Expert Systems with Applications, Vol. 41, No. 2, pp. 663-670.
9. Choi, D., Kim, T., Min, M. and Lee, J-H., 2011, "An Approach to Use Query-Related Web Context on Document Ranking," Proceedings of the 5th International Conference on Ubiquitous Information Management and Communication, pp. 1-7.
10. Dali, L., Fortuna, B. and Rupnik, J., 2010, "Learning to Rank for Personalized News Article Retrieval," Workshop on Applications of Pattern Analysis, pp. 152-159.
11. Daniłowicz, C. and Baliński, J., 2001, "Document Ranking Based upon Markov Chains," Information Processing & Management, Vol. 37, No. 4, pp. 623-637.
12. Druck, G., Pal, C., McCallum, A. and Zhu, X., 2007, "Semi-Supervised Classification with Hybrid Generative/Discriminative Methods," Proceedings of the 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 280-289.
13. Duh, K. and Kirchhoff, K., 2011, "Semi-Supervised Ranking for Document Retrieval," Computer Speech and Language, Vol. 25, No. 2, pp. 261-281.
14. Elsas, J. L., Carvalho, V. R. and Carbonell, J. G., 2008, "Fast Learning of Document Ranking Function with Committee Perceptron," Proceedings of the 2008 International Conference on Web Search and Data Mining, pp. 55-64.
15. Ercan, G. and Cicekli, I., 2007, "Using Lexical Chains for Keyword Extraction," Information Processing and Management, Vol. 43, No. 6, pp. 1705-1714.
16. Figueiredo, F., Rocha, L., Couto, T., Salles, T., Gonçalves, M. A. and Jr, W. M., 2011, "Word Co-Occurrence Features for Text Classification," Information Systems, Vol. 36, No. 5, pp. 843-858.
17. Ghiassi, M., Olschimke, M., Moon, B. and Arnaudo, P., 2012, "Automated Text Classification Using a Dynamic Artificial Neural Network Model," Expert Systems with Applications, Vol. 39, No. 12, pp. 10967-10976.
18. Guan, H., Zhou, J., Xiao, B., Guo, M. and Yang, T., 2013, "Fast Dimension Reduction for Document Classification Based on Imprecise Spectrum Analysis," Information Sciences, Vol. 222, pp. 147-162.
19. Hahm, G. J., Lee, J. H. and Suh, H. W., 2015, "Semantic Relation Based Personalized Ranking Approach for Engineering Document Retrieval," Advanced Engineering Informatics, Vol. 29, No. 3, pp. 366-379.
20. Haveliwala, T. H., 2003, "Topic-Sensitive PageRank: A Context-Sensitive Ranking Algorithm for Web Search," IEEE Transactions on Knowledge & Data Engineering, Vol. 15, No. 4, pp. 784-796.
21. Hawalah, A. and Fasli, M., 2011, "A Hybrid Re-Ranking Algorithm Based on Ontological User Profiles," Proceedings of the 3rd Conference on Computer Science and Electronic Engineering, pp. 50-55.
22. Hernández, I., Rivero, C. R., Ruiz, D. and Corchuelo, R., 2014, "CALA: An Unsupervised URL-Based Web Page Classification System," Knowledge-Based Systems, Vol. 57, pp. 168-180.
23. Hong, B. and Zhen, D, 2012, "An Extended Keyword Extraction Method," Physics Procedia, Vol. 24, pp. 1120-1127.
24. Jameel, S. and Qian, X., 2012, "An Unsupervised Technical Readability Ranking Model by Building a Conceptual Terrain in LSI," Proceedings of the 8th International Conference on Semantics, Knowledge and Grids, pp. 39-46.
25. Jiang, Z., Zhang, S. and Zeng, J., 2013, "A Hybrid Generative/Discriminative Method for Semi-Supervised Classification," Knowledge-Based Systems, Vol. 37, pp. 137-145.
26. Ji, D., Zhao, S. and Xiao, G., 2009, "Chinese Document Re-Ranking Based on Automatically Acquired Term Resource," Language Resources and Evaluation, Vol. 43, No. 4, pp. 385-406.
27. Jun, S., Park, S.-S. and Jang, D.-S., 2014, "Document Clustering Method Using Dimension Reduction and Support Vector Clustering to Overcome Sparseness," Expert Systems with Applications, Vol. 41, No. 7, pp. 3204-3212.
28. Ko, Y. and Seo, J., 2009, "Text Classification from Unlabeled Documents with Bootstrapping and Feature Projection Techniques," Information Processing & Management, Vol. 45, No. 1, pp. 70-83.
29. Lee, L. H., Isa, D., Choo, W. O. and Chue, W. Y., 2012, "High Relevance Keyword Extraction Facility for Bayesian Text Classiﬁcation on Different Domains of Varying Characteristic," Expert Systems with Applications, Vol. 39, No. 1, pp. 1147-1155.
30. Li, C. H. and Park, S. C., 2009, "An Efficient Document Classification Model Using an Improved Back Propagation Neural Network and Singular Value Decomposition," Expert Systems with Applications, Vol. 36, No. 2, pp. 3208-3215.
31. Lin, S.-S., 2009, "A Document Classification and Retrieval System for R&D in Semiconductor Industry – A Hybrid Approach," Expert Systems with Applications, Vol. 36, No. 3, pp. 4753-4764.
32. Liu, Y., Zhang, L., Song, R., Nie, J.-Y. and Wen, J.-R., 2009, "Clustering Queries for Better Document Ranking," Proceedings of the 18th ACM Conference on Information and Knowledge Management, pp. 1569-1572.
33. Li, Z., Zhou, D., Juan, Y.-F. and Han, J., 2010, "Keyword Extraction for Social Snippets," Proceedings of the 19th International Conference on World Wide Web, pp. 1143-1144.
34. Lloret, E. and Palomar, M., 2013, "Towards Automatic Tweet Generation: A Comparative Study from the Text Summarization Perspective in the Journalism Genre," Expert Systems with Applications, Vol. 40, No. 16, pp. 6624-6630.
35. Lopez, C., Prince, V. and Roche, M., 2014, "How Can Catchy Titles Be Generated without Loss of Informativeness?" Expert Systems with Applications, Vol. 41, No. 4, pp. 1051-1062.
36. Miao, D., Duan, Q., Zhang, H. and Jiao, N., 2009, "Rough Set Based Hybrid Algorithm for Text Classification," Expert Systems with Applications, Vol. 36, No. 5, pp. 9168-9174.
37. Nebhi, K., 2012, "Ontology-Based Information Extraction from Twitter," Proceedings of the Workshop on Information Extraction and Entity Analytics on Social Media Data, pp. 17-22.
38. Okamoto, J. and Ishizaki, S., 2011, "Important Sentence Extraction Using Contextual Semantic Network," Procedia - Social and Behavioral Sciences, Vol. 27, pp. 86-94.
39. Ouertani, H. C., 2013, "Implicit Sensitive Text Summarization Based on Data Conveyed by Connectives," International Journal of Advanced Computer Science & Application, Vol. 4, No. 11 pp. 1-4.
40. Özel, S. A., 2011, "A Web Page Classification System Based on A Genetic Algorithm Using Tagged-Terms As Features," Expert Systems with Applications, Vol. 38, No. 4, pp. 3407-3415.
41. Pak, A. and Paroubek, P., 2010, "Twitter as a Corpus for Sentiment Analysis and Opinion Mining," Proceedings of the 7th Conference on International Language Resources and Evaluation, pp. 1320-1326.
42. Preethi, P. G., Uma, V. and Kumar, A., 2015, "Temporal Sentiment Analysis and Causal Rules Extraction from Tweets for Event Prediction," Procedia Computer Science, Vol. 48, pp. 84-89.
43. Qin, L., Zheng, Q., Jiang, S., Huang, Q. and Gao, W., 2008, "Unsupervised Texture Classification: Automatically Discover and Classify Texture Patterns," Image and Vision Computing, Vol. 26, No. 5, pp. 647-656.
44. Roul, R. K., Devanand, O. R. and Sahay, S. K., 2014, "Web Document Clustering and Ranking Using Tf-Idf Based Apriori Approach," IJCA Proceedings on International Conference on Advances in Computer Engineering and Applications, No. 2, pp. 74-78.
45. Tsui, E., Wang, W. M., Cai, L., Cheung, C. F. and Lee, W. B., 2014, "Knowledge-Based Extraction of Intellectual Capital-Related Information from Unstructured Data," Expert Systems with Applications, Vol. 41, No. 4 pp. 1315-1325.
46. Usui, S., Palmes, P., Nagata, K., Taniguchi, T. and Ueda, N., 2007, "Keyword Extraction, Ranking, and Organization for the Neuroinformatics Platform," BioSystems, Vol. 88, No. 3, pp. 334-342.
47. Wang, Z. and Sun, X., 2011, "Document Classification Algorithm Based on MMP and LS-SVM," Procedia Engineering, Vol. 15, pp. 1565-1569.
48. Wen, K., Li, R., Xia, J. and Gu, X., 2014, "Optimizing Ranking Method Using Social Annotations Based on Language Model," Artificial Intelligence Review, Vol. 41, No. 1, pp. 81-96.
49. Xiang, B., Jiang, D., Pei, J., Sun, X., Chen, E. and Li, H., 2010, "Context-Aware Ranking in Web Search," Proceedings of the 33rd International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 451-458.
50. Yang, P., Gao, W., Tan, Q. and Wong, K.-F., 2013, "A Link-Bridged Topic Model for Cross-Domain Document Classification," Information Processing and Management, Vol. 49, No. 6, pp. 1181-1193.
51. Yu, H., Oh, J. and Han, W.-S., 2009, "Efficient Feature Weighting Methods for Ranking," Proceedings of the 18th ACM Conference on Information and Knowledge Management, pp. 1157-1166.
52. Zhao, W. X., Jiang, J., He, J., Song, Y., Achananuparp, P., Lim, E.-P. and Li, X., 2011, "Topical Keyphrase Extraction from Twitter," Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, Vol. 1, pp. 379-388.
53. Zhao, X.-G., Wang, G., Bi, X., Gong, P. and Zhao, Y., 2011, "XML Document Classification Based on ELM," Neurocomputing, Vol. 74, No. 16, pp. 2444-2451.
54. Zhou, S., Chen, Q. and Wang, X., 2013, "Active Deep Learning Method for Semi-Supervised Sentiment Classification," Neurocomputing, Vol. 120, pp. 536-546.
55. Zhou, S., Chen, Q. and Wang, X., 2014, "Fuzzy Deep Belief Networks for Semi-Supervised Sentiment Classification," Neurocomputing, Vol. 131, pp. 312-322.

電子全文
摘要

推文
推薦
評分
引用網址
轉寄

top

詳目顯示

相關論文