帳號:guest(3.140.186.54)          離開系統
字體大小: 字級放大   字級縮小   預設字形  

詳目顯示

以作者查詢圖書館館藏以作者查詢臺灣博碩士論文系統以作者查詢全國書目
作者(中文):鄒承翰
作者(外文):Zou, Chen-Han
論文名稱(中文):以非監督式機器學習為基發展智慧知識本體建構方法論
論文名稱(外文):Using un-supervised machine learning approach to generate knowledge ontology for patent analytics
指導教授(中文):張瑞芬
指導教授(外文):Trappey, Amy J.C.
口試委員(中文):邱銘傳
劉建良
學位類別:碩士
校院名稱:國立清華大學
系所名稱:工業工程與工程管理學系
學號:105034528
出版年(民國):107
畢業學年度:106
語文別:英文
論文頁數:78
中文關鍵詞:專利分群潛在狄利克雷分配本體論
外文關鍵詞:patent clusteringLatent Dirichlet Allocationontology
相關次數:
  • 推薦推薦:0
  • 點閱點閱:231
  • 評分評分:*****
  • 下載下載:7
  • 收藏收藏:0
現今科技日新月異,物聯網(IoT)與智慧網實系統(CPS)的蓬勃發展,造就了工業第四次革命(工業4.0)。隨著物聯網等技術快速的推廣,其漸漸被應用在各式各樣的行業當中,也帶來許多的創新應用和商業機會,其中之一便是零售產業。智慧零售中使用許多物聯網的整合技術,例如整合供應鏈,包括生產過程(零件、機器、設備等等)、倉儲物流和服務,將其數位化和互連化。消費者也因科技的進步改變了消費習慣。因此,對許多企業來說,企業為了成功發展並具全球競爭力,需能掌握快速增長的領域專利及其重點技術創新趨勢。本研究將發展一智能化的智財布局主題探索方法論,進行動態的專利主題更新及領域本體知識地圖的建構。亦將發展一電腦輔助智財主題探索平台,以利R&D智財策略的決策支援。本研究擬透過全球專利資料庫之檢索,以分群演算法將大量專利資料進行智能化文本群聚前處理。基於潛在狄利克雷分配(Latent Dirichlet Allocation, LDA)在智能化文本主題識別、文本相似度分析上的諸多學術探討,本研究將運用此非監督式機器學習之強人工智慧方法,訓練並建構領域專利主題模型,以探索出各類主題及各主題下之關鍵技術字詞。再更進一步,本研究亦透過重要關鍵技術詞彙及其關連性探索,進行知識本體地圖的建構,以利領域技術與功效知識的視覺化,提供相關產業參考,以利其動態掌握全球專利佈局與競爭策略。本研究以智慧零售為目標領域,探討其知識本體。
The growth of global patenting activities has been phenomenal in recent decades due to rapid technology development and enterprises seeking protection for their technical innovation. This research aims to develop a novel methodology of "intelligent intellectual property (IP) topic e-discovery." The intelligent IP topic e-discovery will track technology development key topics dynamically and automatically. This system will be built into a computer-supported IP topic e-discovery platform to support R&D planning and IP strategies. This research searches related patents through paid or free patent database, e.g. Derwent Innovation Index platform and USPTO. The first step is to use smart clustering methods to separate patents into key groups. Then, Latent Dirichlet Allocation (LDA), an unsupervised machine learning (M/L) approach, will be investigated (and completed with other methods). The topic models are constructed and their key technical terms under each topic are discovered. Finally, the project will extract important key terms and conduct the process of ontology construction, using the fundamental concept of hierarchical LDA. Related technical and functional terms can be visualized on the ontology maps (i.e. domain knowledge maps) to help enterprises analyze the target technology developing trend in patent portfolios.
中文摘要 I
Abstract II
致謝 III
List of Figures VI
List of Tables VII
1. Introduction 1
1.1 Research Background 1
1.2 Research Motivation 4
1.3 Research Framework and Procedure 5
2. Literature Review 7
2.1 Patent Clustering 7
2.2 Latent Dirichlet Allocation (LDA) 9
2.2.1 Perplexity 11
2.3 Concept of ontology 12
3. Methodology 17
3.1 Patents search 19
3.2 Patent document preprocessing 19
3.3 Clustering 20
3.3.1 K means 21
3.3.2 Hierarchical clustering 24
3.3.3 Clustering validation 25
3.4 LDA parameter training and topic modeling 27
3.4.1 LDA parameters training and topic modeling 31
3.4.2 Perplexity calculation 35
3.5 Ontology construction 38
4. Case Analysis 40
4.1 Patent search strategy – Smart retailing case 40
4.2 Patent clustering – Smart retailing case 42
4.2.1 Clustering results validation 44
4.2.2 Cluster naming 44
4.3 LDA topic modeling and result output – Smart retailing case 45
4.4 Ontology construction – Smart retailing case 51
4.5 Case 2: Smart machinery 59
4.6 Case conclusions 64
5. Conclusions and Future Work 67
References 69
1. Abacha, A. B., Da Silveira, M., & Pruski, C. (2013). Medical ontology validation through question answering. In Conference on Artificial Intelligence in Medicine in Europe (pp. 196-205). Springer, Berlin, Heidelberg.
2. Abbas, O. A. (2008). Comparisons Between Data Clustering Algorithms. International Arab Journal of Information Technology (IAJIT), 5(3).
3. Alliance, S. C. (2011). The mobile payments and NFC landscape: A US perspective. Smart Card Alliance, 1-53.
4. Altuntas, S., Dereli, T., & Kusiak, A. (2015). Forecasting technology success based on patent data. Technological Forecasting and Social Change, 96, 202-214. Abbas, A., Zhang, L., & Khan, S. U. (2014). A literature review on the state-of-the-art in patent analysis. World Patent Information, 37, 3-13.
5. Anzai, Y. (2012). Pattern recognition and machine learning. Elsevier.
6. Beltz, H., Fülöp, A., Wadhwa, R. R., & Érdi, P. (2017). From ranking and clustering of evolving networks to patent citation analysis. In Neural Networks (IJCNN), 2017 International Joint Conference on (pp. 1388-1394). IEEE.
7. Blei, D. M., Ng, A. Y., & Jordan, M. I. (2003). Latent dirichlet allocation. Journal of machine Learning research, 3(Jan), 993-1022.
8. Brown, P. F., Pietra, V. J. D., Mercer, R. L., Pietra, S. A. D., & Lai, J. C. (1992). An estimate of an upper bound for the entropy of English. Computational Linguistics, 18(1), 31-40.
9. Campbell, R. S. (1983). Patent trends as a technological forecasting tool. World Patent Information, 5(3), 137-143.
10. Chakrabarti, S., Dom, B., Agrawal, R., & Raghavan, P. (1998). Scalable feature selection, classification and signature generation for organizing large text databases into hierarchical topic taxonomies. The VLDB journal, 7(3), 163-178.
11. Chandrasekaran, B., Josephson, J. R., & Benjamins, V. R. (1999). What are ontologies, and why do we need them?. IEEE Intelligent Systems and their applications, 14(1), 20-26.
12. Comanor, W. S., & Scherer, F. M. (1969). Patent statistics as a measure of technical change. Journal of political economy, 77(3), 392-398.
13. Coughlin, D. M., Campbell, M. C., & Jansen, B. J. (2016). A web analytics approach for appraising electronic resources in academic libraries. Journal of the Association for Information Science and Technology, 67(3), 518-534.
14. Choukri, D. (2014). A new distributed expert system to ontology evaluation. Procedia Computer Science, 37, 48-55.
15. D'Agostini, G. (2003). Bayesian inference in processing experimental data: principles and basic applications. Reports on Progress in Physics, 66(9), 1383.
16. Daim, T. U., Rueda, G., Martin, H., & Gerdsri, P. (2006). Forecasting emerging technologies: Use of bibliometrics and patent analysis. Technological Forecasting and Social Change, 73(8), 981-1012.
17. De Bellis, N. (2009). Bibliometrics and citation analysis: from the science citation index to cybermetrics. Scarecrow Press.
18. Fattori, M., Pedrazzi, G., & Turra, R. (2003). Text mining applied to patent mapping: a practical business case. World Patent Information, 25(4), 335-342.
19. Furukawa, T., Mori, K., Arino, K., Hayashi, K., & Shirakawa, N. (2015). Identifying the evolutionary process of emerging technologies: A chronological network analysis of World Wide Web conference sessions. Technological Forecasting and Social Change, 91, 280-294.
20. Gao, Y., Wang, M., Zha, Z. J., Shen, J., Li, X., & Wu, X. (2013). Visual-textual joint relevance learning for tag-based social image search. IEEE Transactions on Image Processing, 22(1), 363-376.
21. Girolami, M., & Kabán, A. (2003). On an equivalence between PLSI and LDA. In Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval (pp. 433-434). ACM.
22. Gregory, J. (2015). The Internet of Things: revolutionizing the retail industry. Accenture Strategy.
23. Grüninger, M., & Fox, M. S. (1995). The role of competency questions in enterprise engineering. In Benchmarking—Theory and practice (pp. 22-31). Springer US.
24. Guarino, N. (1997). Understanding, building and using ontologies. International Journal of Human-Computer Studies, 46(2-3), 293-310.
25. Guha, S., & Mishra, N. (2016). Clustering data streams. In Data Stream Management (pp. 169-187). Springer, Berlin, Heidelberg.
26. Hendler, J. (2001). Agents and the semantic web. IEEE Intelligent systems, 16(2), 30-37.
27. Hoang, D. T., Kaur, J., & Menczer, F. (2010). Crowdsourcing scholarly data.
28. Hofmann, T. (1999). Probabilistic latent semantic analysis. In Proceedings of the Fifteenth conference on Uncertainty in artificial intelligence (pp. 289-296). Morgan Kaufmann Publishers Inc.
29. Horridge, M., Knublauch, H., Rector, A., Stevens, R., & Wroe, C. (2004). A Practical Guide To Building OWL Ontologies Using The Protégé-OWL Plugin and CO-ODE Tools Edition 1.0. University of Manchester.
30. Hsu, F. C., Trappey, A. J., Trappey, C. V., Hou, J. L., & Liu, S. J. (2006). Technology and knowledge document cluster analysis for enterprise R&D strategic planning. International Journal of Technology Management, 36(4), 336-353.
31. Kim, Y. G., Suh, J. H., & Park, S. C. (2008). Visualization of patent analysis for emerging technology. Expert Systems with Applications, 34(3), 1804-1812.
32. Kohonen, T., Kaski, S., Lagus, K., Salojarvi, J., Honkela, J., Paatero, V., & Saarela, A. (2000). Self organization of a massive document collection. IEEE transactions on neural networks, 11(3), 574-585.
33. Korobkin, D., Fomenkov, S., Kravets, A., Kolesnikov, S., & Dykov, M. (2015). Three-steps methodology for patents prior-art retrieval and structured physical knowledge extracting. Creativity in Intelligent Technologies and Data Science. CCIS, 535, 124-136.
34. Kostoff, R. N., Toothman, D. R., Eberhart, H. J., & Humenik, J. A. (2001). Text mining using database tomography and bibliometrics: A review. Technological Forecasting and Social Change, 68(3), 223-253.
35. Kushner, H., & Yin, G. G. (2003). Stochastic approximation and recursive algorithms and applications (Vol. 35). Springer Science & Business Media.
36. Larkey, L. S. (1999). A patent search and classification system. In Proceedings of the fourth ACM conference on Digital libraries (pp. 179-187). ACM.
37. Lee, Y., Kim, S. Y., Song, I., Park, Y., & Shin, J. (2014). Technology opportunity identification customized to the technological capability of SMEs through two-stage patent analysis. Scientometrics, 100(1), 227-244.
38. Lemley, M. A., & Shapiro, C. (2005). Probabilistic patents. The Journal of Economic Perspectives, 19(2), 75-98.
39. Liu, K., & Chen, Y. (2014). A study of patent numbers forecasting by linear regression on cloud storage technology. International Journal of Arts and Commerce, 3(8), 207-217.
40. Ma, J., & Porter, A. L. (2015). Analyzing patent topical information to identify technology pathways and potential opportunities. Scientometrics, 102(1), 811-827.
41. Madani, F., & Weber, C. (2016). The evolution of patent mining: Applying bibliometrics analysis and keyword network analysis. World Patent Information, 46, 32-48.
42. Maskeri, G., Sarkar, S., & Heafield, K. (2008). Mining business topics in source code using latent dirichlet allocation. In Proceedings of the 1st India software engineering conference(pp. 113-120). ACM.
43. Meireles, M. R. G., Carvalho, J. R., do Patrocínio Júnior, Z. K., & Almeida, P. E. (2017). Automatic Patent Clustering using SOM and Bibliographic Coupling. iSys-Revista Brasileira de Sistemas de Informação, 10(1), 06-18.
44. Merkl, D. (1998). Text classification with self-organizing maps: Some lessons learned. Neurocomputing, 21(1), 61-77.
45. Mogee, M. E. (1991). Using patent data for technology analysis and planning. Research-Technology Management, 34(4), 43-49.
46. Morris, S., DeYong, C., Wu, Z., Salman, S., & Yemenu, D. (2002). DIVA: a visualization system for exploring document databases for technology forecasting. Computers & Industrial Engineering, 43(4), 841-862.
47. Noh, H., Jo, Y., & Lee, S. (2015). Keyword selection and processing strategy for applying text mining to patent analysis. Expert Systems with Applications, 42(9), 4348-4360.
48. Noy, N. F., & McGuinness, D. L. (2001). Ontology development 101: A guide to creating your first ontology.
49. Pantano, E., & Timmermans, H. (2014). What is smart for retailing?. Procedia Environmental Sciences, 22, 101-107.
50. Pavlik, J. V., & McIntosh, S. (2018). Converging media. Oxford University Press.
51. Pilkington, A. (2003, July). Technology commercialisation: Patent portfolio alignment and the fuel cell. In Management of Engineering and Technology, 2003. PICMET'03. Technology Management for Reshaping the World. Portland International Conference on (pp. 400-407). IEEE.
52. Plouffe, C. R., Vandenbosch, M., & Hulland, J. (2000). Why smart cards have failed: looking to consumer and merchant reactions to a new payment technology. International Journal of Bank Marketing, 18(3), 112-123.
53. Prettenhofer, P., Weiss, R., Dubourg, V., Vanderplas, J., Passos, A., Cournapeau, D., ... & Duchesnay, E. (2016). scikit-learn: Machine Learning in Python.
54. Pritchard, J. K., Stephens, M., & Donnelly, P. (2000). Inference of population structure using multilocus genotype data. Genetics, 155(2), 945-959.
55. Rokach, L., & Maimon, O. (2005). Clustering methods. In Data mining and knowledge discovery handbook (pp. 321-352). Springer US.
56. Rosen-Zvi, M., Griffiths, T., Steyvers, M., & Smyth, P. (2004). The author-topic model for authors and documents. In Proceedings of the 20th conference on Uncertainty in artificial intelligence (pp. 487-494). AUAI Press.Kim, G. J., Park, S. S., & Jang, D. S. (2015). Technology forecasting using topic-based patent analysis.
57. Rousseeuw, P. J. (1987). Silhouettes: a graphical aid to the interpretation and validation of cluster analysis. Journal of computational and applied mathematics, 20, 53-65.
58. Schaer, P. (2013). Applied informetrics for digital libraries: an overview of foundations, problems and current approaches. Historical Social Research/Historische Sozialforschung, 267-281.
59. Sebastiani, F. (2002). Machine learning in automated text categorization. ACM computing surveys (CSUR), 34(1), 1-47.
60. Steyvers, M., & Griffiths, T. (2007). Probabilistic topic models. Handbook of latent semantic analysis, 427(7), 424-440.
61. Skrypnyk, I., & Lowe, D. G. (2004, November). Scene modelling, recognition and tracking with invariant image features. In Mixed and Augmented Reality, 2004. ISMAR 2004. Third IEEE and ACM International Symposium on (pp. 110-119). IEEE.
62. Sure, Y., Staab, S., & Studer, R. (2004). On-to-knowledge methodology (OTKM). In Handbook on ontologies (pp. 117-132). Springer Berlin Heidelberg.
63. Tang, J., Wang, B., Yang, Y., Hu, P., Zhao, Y., Yan, X., ... & Usadi, A. K. (2012). PatentMiner: topic-driven patent analysis and mining. In Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining (pp. 1366-1374). ACM.
64. Tartir, S., Arpinar, I. B., & Sheth, A. P. (2010). Ontological evaluation and validation. In Theory and applications of ontology: Computer applications (pp. 115-130). Springer Netherlands.
65. Thrun, S. (2010). Toward robotic cars. Communications of the ACM, 53(4), 99-106.
66. Trappey, A. J., Hsu, F. C., Trappey, C. V., & Lin, C. I. (2006). Development of a patent document classification and search platform using a back-propagation network. Expert Systems with Applications, 31(4), 755-765.
67. Trappey, C. V., Wu, H. Y., Taghaboni-Dutta, F., & Trappey, A. J. (2011). Using patent data for technology forecasting: China RFID patent analysis. Advanced Engineering Informatics, 25(1), 53-64.
68. Tseng, Y. H., Lin, C. J., & Lin, Y. I. (2007). Text mining techniques for patent analysis. Information Processing & Management, 43(5), 1216-1247.
69. Uschold, M., & Gruninger, M. (1996). Ontologies: Principles, methods and applications. The knowledge engineering review, 11(2), 93-136.
70. Uschold, M., & King, M. (1995). Towards a methodology for building ontologies (pp. 15-30). Edinburgh: Artificial Intelligence Applications Institute, University of Edinburgh.
71. Velmurugan, T., & Santhanam, T. (2010). Computational complexity between K means and K-medoids clustering algorithms for normal and uniform distributions of data points. Journal of computer science, 6(3), 363.
72. Wang, W., Barnaghi, P. M., & Bargiela, A. (2010). Probabilistic topic models for learning terminological ontologies. IEEE Transactions on Knowledge and Data Engineering, 22(7), 1028-1040.
73. Wang, X., Qiu, P., Zhu, D., Mitkova, L., Lei, M., & Porter, A. L. (2015). Identification of technology development trends based on subject–action–object analysis: The case of dye-sensitized solar cells. Technological forecasting and social change, 98, 24-46.
74. Watts, R. J., & Porter, A. L. (2007). Mining conference proceedings for corporate technology knowledge management. International Journal of Innovation and Technology Management, 4(02), 103-119.
75. Williamson, S., Wang, C., Heller, K. A., & Blei, D. M. (2010). The IBP compound Dirichlet process and its application to focused topic modeling. In Proceedings of the 27th international conference on machine learning (ICML-10) (pp. 1151-1158).
76. Ye, M., Li, C., Chen, G., & Wu, J. (2005). EECS: an energy efficient clustering scheme in wireless sensor networks. In Performance, Computing, and Communications Conference, 2005. IPCCC 2005. 24th IEEE International (pp. 535-540). IEEE.
77. Yoon, B., & Lee, S. (2008, June). Patent analysis for technology forecasting: Sector-specific applications. In Engineering Management Conference, 2008. IEMC Europe 2008. IEEE International (pp. 1-5). IEEE.
78. Yoon, B., & Park, Y. (2004). A text-mining-based patent network: Analytical tool for high-technology trend. The Journal of High Technology Management Research, 15(1), 37-50.
79. Yoon, B., & Park, Y. (2007). Development of new technology forecasting algorithm: Hybrid approach for morphology analysis and conjoint analysis of patent information. IEEE Transactions on Engineering Management, 54(3), 588-599.
80. Zhi, L., & Wang, H. (2009, December). A Construction Method of Ontology in Patent Domain Based on UML and OWL. In Information Management, Innovation Management and Industrial Engineering, 2009 International Conference on (Vol. 3, pp. 224-227). IEEE.
81. Zhou, X., Zhang, Y., Porter, A. L., Guo, Y., & Zhu, D. (2014). A patent analysis method to trace technology evolutionary pathways. Scientometrics, 100(3), 705-721.
 
 
 
 
第一頁 上一頁 下一頁 最後一頁 top
* *