結合詞性標記與知識圖譜的產品標籤技術__國立清華大學博碩士論文全文影像系統

帳號：guest(3.128.203.75) 離開系統

字體大小：

詳目顯示

第 1 筆 / 共 1 筆

/1頁

以作者查詢圖書館館藏

、以作者查詢臺灣博碩士論文系統

、以作者查詢全國書目

論文基本資料
摘要
外文摘要
論文目次
參考文獻
電子全文

作者(中文):	李享
作者(外文):	Lee, Hsiang
論文名稱(中文):	結合詞性標記與知識圖譜的產品標籤技術
論文名稱(外文):	Integrating POS Tagging and Knowledge Graphs for Product Labeling
指導教授(中文):	陳宜欣
指導教授(外文):	Chen, Yi-Shin
口試委員(中文):	洪智傑彭文志
口試委員(外文):	Hung, Chih-Chieh Peng, Wen-Chih
學位類別:	碩士
校院名稱:	國立清華大學
系所名稱:	資訊工程學系
學號:	111062699
出版年(民國):	113
畢業學年度:	112
語文別:	英文
論文頁數:	42
中文關鍵詞:	命名實體識別、電子商務產品標籤、知識圖譜、特徵增強
外文關鍵詞:	Named Entity Recognition (NER)、E-commerce Product Labeling、Knowledge Graphs、Feature Augmentation
相關次數:	推薦:0 點閱:169 評分: 下載:0 收藏:0

為了滿足現代電商平台對低延遲服務的需求，我們的框架採用了相對較小的模型，以最大限度地減少整體推理時間。我們的方法包括三個階段：細粒度集成、特徵提取和標籤預測。在細粒度集成階段，我們使用基於規則的方法提取和集成產品特徵。在特徵提取階段，我們利用詞性標注和知識圖譜來增強特徵表示。最後，在標籤預測階段，我們在特徵增強後重構並分類產品名稱。這個框架顯著提高了電商產品類別分類的準確性和效率。通過利用小模型和基於規則的集成，我們解決了中文命名實體識別任務的獨特挑戰，為電商產品標註提供了一個強有力的解決方案。

To meet the demands for low-latency services on modern e-commerce platforms, our framework employs relatively small models to minimize overall inference time. Our methodology consists of three stages: Fine-Grained Integration, Feature Extraction, and Label Prediction. In the Fine-Grained Integration stage, we extract product features and integrate them using a rule-based approach. In the Feature Extraction stage, we utilize POS tagging and knowledge graphs to enhance feature representation. Finally, in the Label Prediction stage, we reconstruct and classify product names after feature augmentation. This framework significantly enhances the accuracy and efficiency of e-commerce product category classification. By leveraging small models and rule-based integration, we address the unique challenges of Chinese NER tasks, offering a robust solution for product labeling in e-commerce.

1 Introduction . . . . . 1
2 Related Work . . . . . 5
2.1 Flat Named Entity Recognition . . . . . 5
2.2 Text Encoding . . . . . 7
2.3 Knowledge Graph . . . . . 8
3 Methodology . . . . . 10
3.1 Overview . . . . . 10
3.2 Data Availability Identification and Recovering . . . . . 13
3.3 Fine-Grained Integration Phase . . . . . 14
3.3.1 Named Entity Recognition . . . . . 14
3.3.2 Word Segmentation . . . . . 16
3.3.3 Entity Integration Determination . . . . . 16
3.3.4 Feature Diversity Required Score . . . . . 22
3.4 Label Prediction Phase . . . . . 24
3.4.1 Feature Augmentation . . . . . 24
3.4.2 Label Classification . . . . . 26
4 Experiment . . . . . 28
vi
4.1 Experiment Setup . . . . . 28
4.2 Datasets . . . . . 29
4.3 Experiment Result . . . . . 31
5 Conclusion . . . . . 37
References . . . . . 39

[1] Tareq Al-Moslmi, Marc Gallofr ́e Oca ̃na, Andreas L Opdahl, and Csaba Veres. Named entity extraction for knowledge graphs: A literature overview. IEEE Access, 8:32862– 32881, 2020.
[2] Tianqi Chen and Carlos Guestrin. Xgboost: A scalable tree boosting system. In Proceedings of the 22nd acm sigkdd international conference on knowledge discovery and data mining, pages 785–794, 2016.
[3] Yiming Cui, Wanxiang Che, Ting Liu, Bing Qin, and Ziqing Yang. Pre-training with whole word masking for chinese bert. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 29:3504–3514, 2021.
[4] Yiming Cui, Wanxiang Che, Ting Liu, Bing Qin, Ziqing Yang, Shijin Wang, and Guoping Hu. Pre-training with whole word masking for chinese BERT. CoRR, abs/1906.08101, 2019.
[5] Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805, 2018. 39
[6] Jason Fries, Sen Wu, Alex Ratner, and Christopher R ́e. Swellshark: A generative model for biomedical named entity recognition without labeled data. arXiv preprint arXiv:1704.06360, 2017.
[7] Qizhen He, Liang Wu, Yida Yin, and Heming Cai. Knowledge-graph augmented word representations for named entity recognition. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 34, pages 7919–7926, 2020.
[8] Xuming Hu, Yong Jiang, Aiwei Liu, Zhongqiang Huang, Pengjun Xie, Fei Huang, Lijie Wen, and Philip S. Yu. Entity-to-text based data augmentation for various named entity recognition tasks, 2023.
[9] Michael P LaValley. Logistic regression. Circulation, 117(18):2395–2399, 2008.
[10] Xiaonan Li, Hang Yan, Xipeng Qiu, and Xuanjing Huang. FLAT: Chinese NER using flat-lattice transformer. In Dan Jurafsky, Joyce Chai, Natalie Schluter, and Joel Tetreault, editors, Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 6836–6842, Online, July 2020. Association for Computational Linguistics.
[11] Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, Mandar Joshi, Danqi Chen, Omer Levy, Mike Lewis, Luke Zettlemoyer, and Veselin Stoyanov. Roberta: A robustly optimized BERT pretraining approach. CoRR, abs/1907.11692, 2019.
[12] Zhangxun Liu, Conghui Zhu, and Tiejun Zhao. Chinese named entity recognition with a sequence labeling approach: based on characters, or based on words? In International Conference on Intelligent Computing, pages 634–640. Springer, 2010. 40
[13] David Nadeau and Satoshi Sekine. A survey of named entity recognition and classi-fication. Lingvisticae Investigationes, 30(1):3–26, 2007.
[14] Hoang-Van Nguyen, Francesco Gelli, and Soujanya Poria. Dozen: cross-domain zero shot named entity recognition with knowledge graph. In Proceedings of the 44th international ACM SIGIR conference on research and development in information retrieval, pages 1642–1646, 2021.
[15] L.F. Rau. Extracting company names from text. In [1991] Proceedings. The Seventh IEEE Conference on Artificial Intelligence Application, volume i, pages 29–32, 1991.
[16] Stephen Robertson and Hugo Zaragoza. The probabilistic relevance framework: Bm25 and beyond. Found. Trends Inf. Retr., 3(4):333–389, apr 2009.
[17] Gerard Salton and Christopher Buckley. Term-weighting approaches in automatic text retrieval. Information Processing Management, 24(5):513–523, 1988.
[18] K Sparck Jones, S Walker, and S.E Robertson. A probabilistic model of information retrieval: development and comparative experiments: Part 2. Information Processing Management, 36(6):809–840, 2000.
[19] Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N.Gomez, Lukasz Kaiser, and Illia Polosukhin. Attention is all you need. CoRR, abs/1706.03762, 2017.
[20] Yaqing Wang, Quanming Yao, James T Kwok, and Lionel M Ni. Generalizing from a few examples: A survey on few-shot learning. ACM computing surveys (csur), 53(3):1–34, 2020.41
[21] Da Xu, Chuanwei Ruan, Evren Korpeoglu, Sushant Kumar, and Kannan Achan. Product knowledge graph embedding for e-commerce. In Proceedings of the 13th International Conference on Web Search and Data Mining, WSDM ’20, page 672–680, New York, NY, USA, 2020. Association for Computing Machinery.

電子全文
摘要

推文
推薦
評分
引用網址
轉寄

top

詳目顯示

相關論文