帳號:guest(3.142.174.13)          離開系統
字體大小: 字級放大   字級縮小   預設字形  

詳目顯示

以作者查詢圖書館館藏以作者查詢臺灣博碩士論文系統以作者查詢全國書目
作者(中文):蔡東峻
作者(外文):Tsai, Tung-Chun
論文名稱(中文):考慮同義字詞之智慧專利搜索系統
論文名稱(外文):A smart patent search system considering synonymous words and phrases
指導教授(中文):張瑞芬
指導教授(外文):Trappey, Amy J. C.
口試委員(中文):吳政隆
施翠倚
口試委員(外文):Wu, Jheng-Long
Shih, Tsui-Yii
學位類別:碩士
校院名稱:國立清華大學
系所名稱:工業工程與工程管理學系
學號:106034520
出版年(民國):108
畢業學年度:107
語文別:中文
論文頁數:83
中文關鍵詞:同義字提取同義詞提取機器學習特徵萃取迭代法太陽能
外文關鍵詞:synonym extractionmachine learningpattern-based extractionbootstrappingsolar power
相關次數:
  • 推薦推薦:0
  • 點閱點閱:55
  • 評分評分:*****
  • 下載下載:3
  • 收藏收藏:0
專利為政府授予專利權人在法定之期間內可享有專利技術的排他權,不僅保護發明人之權利,也因如此,企業開發一項新產品時,自然必須進行完整且精確的專利檢索,以防此產品上市後被提起侵權訴訟,然而,不同專利間的關鍵技術名詞並未統一,故本研究擬設計一種智慧搜索的方法,預先以智慧化的方式建構同義字與同義詞的對照表,如此一來,儘管某些技術名詞有許多不同的別稱,都能夠被找出,將比過去傳統的精確比對方法更有效率且完整,也能夠幫助企業進行更完善的專利分析,免除可能的智財訴訟或商譽威脅。本研究將分別以特徵萃取及機器學習方法建構同義字典及同義詞典,採用英文字典及網路百科全書作為訓練資料來源,待建構完成後,以兩種方式分別對同義字典與同義辭典進行驗證與分析,檢視其涵蓋性及辨識能力,最後則實際以太陽能相關專利之案例,檢視本研究提出之智慧專利搜索系統的成效,觀察本系統找出之專利字詞數量是否足夠、專利文件中的重要關鍵技術同義字詞是否確實被找出,本研究期以此同義字詞智慧專利搜索系統,幫助企業或專利權人實施更完善之專利檢索,保障其權益並免除不必要之訴訟威脅與損失。
Patents grant the patentee exclusive rights for their technology over a period of time. Companies have to implement a patent search before the product development, which called freedom-to-operate analysis. This may help them find out the prior art patents that are related to the product being developed. In patent documents, there are many synonyms, but the traditional searching methods cannot detect them well. To conduct a complete freedom-to-operate analysis, this study constructed a word level dictionary by pattern-based method and phrase level dictionary by machine learning approach. In the validation phase, we choose two different ways to test the word level dictionary and the phrase level dictionary. In case study, this study collected patents in solar power domain to examine the effectiveness of the smart patent search system proposed in this study. We checked whether the number of patent words found in the system is sufficient and whether the key technical words of the patents are found out. This study expects to help enterprises or patentees implement more complete patent search by this patent search system, which protects their rights and avoid unnecessary litigations and losses.
摘要 I
ABSTRACT II
誌謝 III
目錄 IV
圖目錄 VI
表目錄 VII
壹、緒論 1
1.1 研究背景與動機 1
1.2 研究範圍與目的 2
1.3 研究方法與步驟 3
貳、文獻探討 6
2.1 專利之定義與專利侵權 6
2.2 詞嵌入 9
2.2.1 One-Hot Encoding 9
2.2.2 Term Frequency-Inverse Document Frequency 10
2.2.3 Word2vec 11
2.3 單純貝氏分類器 13
2.3.1 貝氏定理 13
2.3.2 單純貝氏分類 14
2.4 正則表達式 14
2.5 同義字詞提取 15
2.5.1 同義字提取 15
2.5.2 同義詞提取 16
2.6 語料庫 18
2.6.1 The Online Plain Text English Dictionary 18
2.6.2 維基百科 18
2.7 傳統專利搜索方法 19
2.8 同義字詞提取之知識本體論 20
參、研究方法 22
3.1建構同義字典 24
3.2建構同義詞典 28
3.2.1 資料前處理與結構拆解 29
3.2.2 蒐集正面訓練資料 29
3.2.3 蒐集反面訓練資料 30
3.2.4 訓練與分類 32
3.3驗證與分析 33
3.3.1 同義字典驗證 33
3.3.2 同義詞典驗證 34
3.3.3 資料集與字詞典之相關統計 36
肆、案例分析 38
4.1 太陽能領域專利 38
4.1.1 同義字典、同義詞典與ALWC之檢索能力比較 39
4.1.2 檢索出之同義字詞重要性 42
4.1.3 擴增搜索式及結果分析 43
4.2 變壓器領域邀標書 45
伍、結論 49
參考資料 52
APPENDIX A 58
APPENDIX B 62
APPENDIX C 66
[1] Ba, J., & Caruana, R. (2014). Do deep nets really need to be deep? In Advances in Neural Information Processing Systems, 2654-2662.
[2] Bengio, Y., Ducharme, R., Vincent, P., & Jauvin, C. (2003). A neural probabilistic language model. Journal of machine learning research, 3(Feb), 1137-1155.
[3] Bøhn, C., & Nørvåg, K. (2010, April). Extracting named entities and synonyms from Wikipedia. In Advanced Information Networking and Applications (AINA), 2010 24th IEEE International Conference. IEEE, Perth, Western Australia, 1300-1307.
[4] Bojanowski, P., Grave, E., Joulin, A., & Mikolov, T. (2016). Enriching word vectors with subword information. Retrieved from https://arxiv.org/abs/1607.04606
[5] Bunescu, R., & Paşca, M. (2006, April). Using encyclopedic knowledge for named entity disambiguation. In 11th conference of the European Chapter of the Association for Computational Linguistics, Trento, Italy, 9-16.
[6] Christlein, V., & Maier, A. (2018, April). Encoding CNN Activations for Writer Recognition. In 2018 13th IAPR International Workshop on Document Analysis Systems (DAS). IEEE, Vienna, Austria, 169-174.
[7] Ciresan, D. C., Meier, U., Masci, J., Maria Gambardella, L., & Schmidhuber, J. (2011, July). Flexible, high performance convolutional neural networks for image classification. In IJCAI Proceedings-International Joint Conference on Artificial Intelligence, Barcelona, Spain, 1237-1242.
[8] Cui, X., Potok, T. E., & Palathingal, P. (2005, June). Document clustering using particle swarm optimization. In Swarm Intelligence Symposium, IEEE, Pasadena, CA, 185-191.
[9] Domingos, P., & Pazzani, M. (1996, July). Beyond independence: Conditions for the optimality of the simple bayesian classifier. In Proc. 13th Intl. Conf. Machine Learning, 105-112.
[10] Dominguez, S., Campoy, P., & Baeza, C. (2000). On automated trademark search techniques. Pattern Recognition and Applications. IOS Press, Amsterdam, Netherlands, 126-133.
[11] Fu, Z., Sun, X., Linge, N., & Zhou, L. (2014). Achieving effective cloud search services: multi-keyword ranked search over encrypted cloud data supporting synonym query. IEEE Transactions on Consumer Electronics, 60(1), 164-172.
[12] Garofalakis, M. N., Rastogi, R., & Shim, K. (1999, September). SPIRIT: Sequential pattern mining with regular expression constraints. In VLDB, 7-10.
[13] Hearst, M. A. (1992, August). Automatic acquisition of hyponyms from large text corpora. In Proceedings of the 14th conference on Computational linguistics, 2. Association for Computational Linguistics, 539-545.
[14] Hinton, G. E. (1986, August). Learning distributed representations of concepts. In Proceedings of the eighth annual conference of the cognitive science society, 1-12.
[15] Honnibal, M., & Johnson, M. (2015). An improved non-monotonic transition system for dependency parsing. In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, 1373-1378.
[16] Hu, F., Shao, Z., & Ruan, T. (2015). Self-Supervised Synonym Extraction from the Web. Journal of Information Science and Engineering, 31(3), 1133–1148.
[17] Jagannatha, A., Chen, J., & Yu, H. (2015, September). Mining and ranking biomedical synonym candidates from Wikipedia. In Proceedings of the Sixth International Workshop on Health Text Mining and Information Analysis, Lisbon, Portugal, 142-151.
[18] Krizhanovsky, A. (2006). Synonym search in wikipedia: Synarcher. Retrieved from https://arxiv.org/abs/cs/0606097
[19] Krizhevsky, A., Sutskever, I., & Hinton, G. E. (2012). Imagenet classification with deep convolutional neural networks. In Advances in Neural Information Processing Systems, 1097-1105.
[20] Le, Q., & Mikolov, T. (2014, January). Distributed representations of sentences and documents. In International Conference on Machine Learning, Beijing, China, 1188-1196.
[21] Lee, C., Song, B., & Park, Y. (2013). How to assess patent infringement risks: a semantic patent claim analysis using dependency relationships. Technology Analysis & Strategic Management, 25(1), 23-38.
[22] McCallum, A., & Nigam, K. (1998, July). A comparison of event models for naive Bayes text classification. In AAAI-98 workshop on learning for text categorization, 41-48.
[23] Mehling, H., Schossig, P., & Kalz, D. (2009). Latent heat storage in buildings. Storing heat and cold in a compact and demand-oriented manner; Latentwaermespeicher in Gebaeuden. Waerme und Kaelte kompakt und bedarfsgerecht speichern, 1-20.
[24] Mikolov, T., Chen, K., Corrado, G., & Dean, J. (2013). Efficient estimation of word representations in vector space. Retrieved from https://arxiv.org/abs/1301.3781
[25] Mikolov, T., Chen, K., Corrado, G., & Dean, J. (2013). Efficient estimation of word representations in vector space. Retrieved from https://arxiv.org/abs/1301.3781
[26] Nguyen, A., Yosinski, J., & Clune, J. (2015, June). Deep neural networks are easily fooled: High confidence predictions for unrecognizable images. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, 427-436.
[27] Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., ... & Vanderplas, J. (2011). Scikit-learn: Machine learning in Python. Journal of machine learning research, 12(Oct), 2825-2830.
[28] Powers, D. M. (2011). Evaluation: from precision, recall and F-measure to ROC, informedness, markedness and correlation. Journal of Machine Learning Technologies, 2(1), 37–63.
[29] Riloff, E., & Jones, R. (1999, July). Learning dictionaries for information extraction by multi-level bootstrapping. In AAAI/IAAI, 474-479.
[30] Rosen, K. H. (2011). Discrete mathematics and its applications. New York: McGraw-Hill.
[31] Salton, G., Fox, E. A., & Wu, H. (1983). Extended Boolean information retrieval. Communications of the ACM, 26(11), 1022-1036.
[32] Simonyan, K., & Zisserman, A. (2014). Very deep convolutional networks for large-scale image recognition. Retrieved from https://arxiv.org/abs/1409.1556
[33] Surden, H. (2014). Machine learning and law. Wash. L. Rev., 89, 87-115.
[34] Trappey, A. J., Chen, P. P., Trappey, C. V., & Ma, L. (2019). A Machine Learning Approach for Solar Power Technology Review and Patent Evolution Analysis. Applied Sciences, 9(7), 1478.
[35] Turian, J., Ratinov, L., & Bengio, Y. (2010, July). Word representations: a simple and general method for semi-supervised learning. In Proceedings of the 48th annual meeting of the association for computational linguistics. Association for Computational Linguistics, 384-394.
[36] Wang, C., Cao, L., & Zhou, B. (2015, June). Medical synonym extraction with concept space models. In Twenty-Fourth International Joint Conference on Artificial Intelligence, 989-995.
[37] Wang, T., & Hirst, G. (2012). Exploring patterns in dictionary definitions for synonym extraction. Natural Language Engineering, 18(3), 313-342.
[38] Weber, R. (1999, June). Intelligent jurisprudence research: a new concept. In Proceedings of the 7th international conference on Artificial intelligence and law, ACM, 164-172.
[39] Wiebe, J., & Riloff, E. (2005, February). Creating subjective and objective sentence classifiers from unannotated texts. In International Conference on Intelligent Text Processing and Computational Linguistics, Springer, Berlin, Germany.
[40] Wilson, T., Wiebe, J., & Hoffmann, P. (2005, October). Recognizing contextual polarity in phrase-level sentiment analysis. In Proceedings of the conference on human language technology and empirical methods in natural language processing. Association for Computational Linguistics, Vancouver, Canada, 347-354.
[41] Wu, F., & Weld, D. S. (2010, July). Open information extraction using Wikipedia. In Proceedings of the 48th annual meeting of the association for computational linguistics. Association for Computational Linguistics, Uppsala, Sweden, 118-127.
[42] Wu, H., & Zhou, M. (2003, July). Optimizing synonym extraction using monolingual and bilingual resources. In Proceedings of the Second International Workshop on Paraphrasing. Association for Computational Linguistics, Sapporo, Japan, 72-79.
[43] Yoon, B., & Park, Y. (2004). A text-mining-based patent network: Analytical tool for high-technology trend. The Journal of High Technology Management Research, 15(1), 37-50.
[44] Yoon, J., & Kim, K. (2011). Identifying rapidly evolving technological trends for R&D planning using SAO-based semantic patent networks. Scientometrics, 88(1), 213-228.
[45] 宋皇志(民106),人工智能在專利檢索之應用初探。全國律師,21(10),27-37。
[46] 經濟部智慧財產局。本國專利技術名詞中英對照詞庫查詢系統。取自https://paterm.tipo.gov.tw/IPOTechTerm/doIPOTechTermIndexTerm.do
 
 
 
 
第一頁 上一頁 下一頁 最後一頁 top
* *