帳號:guest(216.73.216.146)          離開系統
字體大小: 字級放大   字級縮小   預設字形  

詳目顯示

以作者查詢圖書館館藏以作者查詢臺灣博碩士論文系統以作者查詢全國書目
作者(中文):忻嘉欣
作者(外文):Xin, Jia-Xin
論文名稱(中文):基於得分記錄與賽後報導辨識籃球比賽的亮點
論文名稱(外文):Recognizing Highlights of Basketball Games Based on Box Scores and Postgame Reports
指導教授(中文):蘇豐文
指導教授(外文):SOO, VON-WUN
口試委員(中文):陳宜欣
王俊堯
口試委員(外文):CHEN, YI-SHIN
WANG, CHUN-YAO
學位類別:碩士
校院名稱:國立清華大學
系所名稱:資訊系統與應用研究所
學號:103065468
出版年(民國):107
畢業學年度:106
語文別:英文
論文頁數:45
中文關鍵詞:機器學習多標籤分類器信息提取自然語言處理
外文關鍵詞:Machine LearningMulti-label ClassifierInformation ExtractionNatural Language Processing
相關次數:
  • 推薦推薦:0
  • 點閱點閱:804
  • 評分評分:*****
  • 下載下載:0
  • 收藏收藏:0
在這個人工智慧的時代,大量繁重的任務都已被智能的程式所包辦。然而,在體育新聞寫作上,無論是中文還是英文的籃球網站,都仍在採用比較低效率的人工寫作的方式。為了解決比賽結束後要等很長時間才能看到比賽簡報的痛點,本研究建立了一個基於多標籤分類學習的能夠自動預測比賽亮點的系統,能在比賽後立即生成圈出重點的比賽報表。另外,本研究還提出了一個新的原型,它結合了比賽報告和比賽報表兩者優點,在調查中被證實得到了比現有比賽報告和比賽報表都高的讀者滿意度。而且,讀者幾乎無法分別預測出的亮點跟從人工寫作中提取出來的亮點之間的差別。
In the era of artificial intelligence, a large amount of tasks are finished by intelligent programs. However, this is not the case in sports news report composition. On both Chinese and English website, a hour latency of available postgame reports or summary is often the case. To address such issue, this paper used multi-label classifiers such as BR, PCC, and RAkEL to build a predictive model to predict possibly highlighted figures in the Box Score. The Hamming Score of selected models can achieve as high as 96.7% while the exact match rate as 83.2%. According to result of 2 surveys, our proposed prototype that could be used to visualize the information of both news reports and Box Score immediately after the game buzzer has gained most satisfaction from readers. Beside, readers failed to clearly distinguish predicted highlights of our models from extracted highlights of the news reports.
Abstract ….…….…….…….…….……….…….……….…….…….…….……….. i
摘要……….…….…….…….…….……….…….…….…….…….……….…….….ii
Acknowledgement…….…….…….…….…….…….…….…….…….…….……..iii
Table of Contents……….…….…….…….…….…….……….…….…….…….…iv
Table of Figures…………….…….…….…….…….……….…….…….……….…vi
Table of Equations…….…….…….…….…….……….…….…….…….…….….vii
Table of Lists…………..…….…….…….…….…….……….…….…….…….…viii
Chapter 1 Introduction …….…….…….…….……….…….…….…….….…..… 1
Chapter 2 Related Work ……….…….…….…….…….……….…….…….……..5
2.1 Recent machine learning applications on automatic generation…..…5
2.1.1 Narrative Science……….…….…….…….…….……….…….. 5
2.1.2 Dreamwriter…….…….…….…….…….……….…….……..…6
2.2 Machine learning studies on sports field……….…….…….…….…….7
2.3 Related studies on multi-label classification…….…….…….…………9
2.3.1 Formal definition of multi-label classification…….…….….. 10
2.3.2 Binary Relevance(BR) Methods……….…….…….………….10
2.3.3 Label Powerset(LP) Methods…….…….…….…………….…11
2.3.4 Classifier Chains(CC) Methods……….…….…….…………. 11
2.3.5 Why using multi-label classifiers……….…….…….……….. 12
Chapter 3 Methodology ……….…….…….………….…….…….……….…14
3.1 Data collection……….…….…….………….…….…….………… 15
3.2 Data Preprocessing……….…….…….………….…….…….…….15
3.2.1 Modifying content in Box Score tables……….….………15
3.2.2 Obtaining Labels from the game reports………….…….17
3.2.3 Data description……….…….…….…….…….…………. 19
3.3 Multi-label Classifier……….…….…….…….…….………….…..21
3.3.1 BR Methods……….…….…….………….…….…….……21
3.3.2 RAkEL Methods………….…….…….….…….…….……21
3.3.3 PCC Methods……….…….…….………….……….……..22
3.4 Evaluation Metrics for multi-label classification…….…….…….22
3.4.1 Label-based metrics……….…….…….……….…….……23
3.4.2 Example-based metrics……….…….…….……….…….. 24
Chapter 4 Experimental Results……….…….…….………….……….…… 26
4.1 Hypothesis……….…….…….………….…….…….….…….……. 26
4.2 Models and evaluation metrics……….…….…….….…….…….. 28
4.2.1 Experiment I……….…….…….………….….…….……..29
4.2.2 Experiment II……….…….…….………….….…….……30
4.2.3 Experiment III……….…….………….…….…….………32
4.3 Surveys…..…….…….…….……….…….………….…….…….….33
4.4.1 Survey on usability of the proposed prototype….….….. 34
4.4.2 Survey on quality of the prediction……….….…….…….36
Chapter 5 Conclusion and Limitation…..…….…………….…….…….….. 38
Reference……….…….…….………….…….…….……….…….…….…….. 40
[1] Mark Sweney, Time spent reading newspapers worldwide falls over 25% in four years, https://www.theguardian.com/media/2015/jun/01/global-newspaper-readership-zenithoptimedia-media-consumption
[2] Simon Khalaf , On Their Tenth Anniversary, Mobile Apps Start Eating Their Own , http://flurrymobile.tumblr.com/post/155761509355/on-their-tenth-anniversary-mobile-apps-start
[3] Simon Khalaf, U.S. Consumers Time-Spent on Mobile Crosses 5 Hours a Day, http://flurrymobile.tumblr.com/post/157921590345/us-consumers-time-spent-on-mobile-crosses-5
[4] David Silver and Demis Hassabis, The world’s best Go player says he still has “one last move” to defeat Google’s AlphaGo AI. Quartz. 4 January 2017.
[5] David Silver and Demis Hassabis, Research Blog: AlphaGo: Mastering the ancient game of Go with Machine Learning. Google Research Blog. 27 January 2016.
[6] Leah Burrows, What artificial intelligence will look like in 2030, SEAS Communications, September 9, 2016
[7] Steven Levy, Can an Algorithm Write a Better News Story Than a Human Reporter?, April. 24. 2012
[8] He Huifeng, End of the road for journalists? Tencent's Robot reporter 'Dreamwriter' churns out perfect 1,000-word news story - in 60 seconds, South China Morning Post. 11 September, 2015,
[9] Rise of the robot journalist, http://www.scmp.com/article/1007279/rise-robot-journalist
[10] Ping West, 機器人寫的新聞?!騰訊Dreamwriter一分鐘就能出稿, September 11, 2015, https://meet.bnext.com.tw/articles/view/36356
[11] Daewon Kim, Seongcheol Kim, Newspaper companies' determinants in adopting robot journalism, Technological Forecasting and Social Change, 117, 2017, 184-195
[12] Yong Rui et al. Automatically Extracting Highlights for TV Baseball Programs, MULTIMEDIA '00 Proceedings of the eighth ACM international conference on Multimedia, 105-115
[13] Nadjet Bouayad-Agha Gerard Casamayor, Content selection from an ontology-based knowledge base for the generation of football summaries, Proceedings of the 13th European Workshop on Natural Language Generation (ENLG), 72–81, 2011
[14]João Pinto Barbosa Machado Aires , Automatic Generation of Sports News, Master Dissertation, June, 2016
[15] Changsheng Xu et al. Using Webcast Text for Semantic Event Detection in Broadcast Sports Video, IEEE Transactions on Multimedia, Volume 10, 1342 - 1355, 07, Nov. 2008
[16] Yong Rui et al. Semantic Event Extraction from Basketball Games using Multi-Modal Analysis, Multimedia and Expo, 2007 IEEE International Conference, 08 August 2007
[17] Xiaojun Wan et al, Overview of the NLPCC-ICCPOL 2016 Shared Task: Sports News Generation from Live Webcast Scripts, ICCPOL 2016, NLPCC 2016: Natural Language Understanding and Intelligent Applications 870-875
[18]Tang Renjun et al. Football News Generation from Chinese Live Webcast Script, ICCPOL 2016, NLPCC 2016: Natural Language Understanding and Intelligent Applications 778-789
[19] Maofu Liu et al. Sports News Generation from Live Webcast Scripts Based on Rules and Templates, ICCPOL 2016, NLPCC 2016: Natural Language Understanding and Intelligent Applications, 876-884
[20] Chun-Min Chen, Ling-Hwei Chen, A novel approach for semantic event extraction from sports webcast text, Multimed Tools Appl (2014) 71: 1937-1952
[21] Jesse Read, Tutorial on Multi-label Classification, https://jmread.github.io/talks/Tutorial-MLC-Porto.pdf
[22] Zhang, Danchen and He, Daqing and Zhao, Sanqiang and Li, Lei Enhancing Automatic ICD-9-CM Code Assignment for Medical Texts with PubMed. In: BioNLP Workshop 2017
[23] Trohidis, K., Tsoumakas, G., Kalliris, G. et al. Multi-label classification of music by emotion, J AUDIO SPEECH MUSIC PROC. (2011) 2011: 4.
[24] Shotton, J., Winn, J., Rother, C. et al. TextonBoost for Image Understanding: Multi-Class Object Recognition and Segmentation by Jointly Modeling Texture, Layout, and Context, January 2009, Vol. 81, Issue 1, 2–23
[25] Guoxian Yu et al., Protein Function Prediction using Multi-label Ensemble Classification, IEEE/ACM Transactions on Computational Biology and Bioinformatics, Vol. 10, 4, 1045 - 1057, July-Aug. 2013
[26] De Carvalho A. C. P. L. F., and Freitas A. A tutorial on multi-label classification techniques. Foundations of Computational Intelligence. Studies in Computational Intelligence, Springer-Verlag, 177-195, 2009
[27] Dembczyn ́ski K., Waegwman W., Cheng W., and Hu ̈llermeier E. On label dependence in multi-label classification. 2nd International Workshop on Learning Multi-label Data. MLD’10. ICML, 5-12
[28] Tsoumakas G., and Katakis I. Multi-label classification: An overview. 2007, International Journal of Data Warehousing and Mining, Vol. 3, 1-13
[29] Tsoumakas G., Zhang M.-L., and Zhou Z.-H. Learning from multi-label data. 2009, Tutorial at ECML/PKDD
[30] Tsoumakas G., Katakis I., and Vlahavas I. Mining multi-label data. 2010 Data Mining and Knowledge Discovery Handbook, Springer-Verlag, 667-686
[31] A match example of Advanced Stats http://stats.nba.com
[32] A match example found on ESPN website, http://www.espn.com/nba/game?gameId=400975365
[33]A final game report example found Sina Sports, http://sports.sina.com.cn/basketball/nba/2017-01-11/doc-ifxzkfuh6897437.shtml
[34] A final game report example found Hupu NBA, https://nba.hupu.com/games/recap/153654
[35] Steven Bird, Ewan Klein, and Edward Loper, Natural Language Processing with Python – Analyzing Text with the Natural Language Toolkit, http://www.nltk.org/book/
[36] NBA Collective Bargaining Agreement, https://en.wikipedia.org/wiki/NBA_Collective_Bargaining_Agreement
[37] Min-Ling Zhang, Zhi-Hua Zhou, A Review on Multi-Label Learning Algorithms, IEEE Transactions on Knowledge and Data Engineering, 1819-1837, Volume: 26, Issue: 8, Aug. 2014
[38] M. R. Boutell, J. Luo, X. Shen, and C. M. Brown, “Learning multi-label scene classification,” Pattern Recognition, vol. 37, no. 9, pp. 1757–1771, 2004.
[39] Grigorios Tsoumakas, Ioannis Katakis, and Ioannis Vlahavas. Data Mining and Knowledge Discovery Handbook, Part 6. Springer, Chapter Mining Multi-label Data, 667–685. 2010a.

[40] Domingos, Pedro; Pazzani, Michael (1997). On the optimality of the simple Bayesian classifier under zero-one loss. Machine Learning. 29: 103–137.
[41] S.B. Kotsiantis, Supervised Machine Learning: A Review of Classification Techniques, Informatica 31(2007) 249-268, 2007
[42] Prinzie, Anita, Poel, Dirk, Random Multiclass Classification: Generalizing Random Forests to Random MNL and Random NB, Database and Expert Systems Applications. 4653. p. 349
[43] Anthony Bagnall, Gavin C. Cawley, On the Use of Default Parameter Settings in the Empirical Evaluation of Classification Algorithms, 20 Mar 2017, arXiv:1703.06777v1
[44] Daniel Myers, About Box Plus/Minus (BPM), https://www.basketball-reference.com/about/bpm.html
[45] Dembczynski, Krzysztof, Waegeman, Willem, Cheng, Weiwei, and Hu ̈llermeier, Eyke. Regret analysis for performance metrics in multi-label classification: the case of hamming and subset zero-one loss. In ECML/PKDD, 280–295. 2010.
[46] JMP®, Version 13. SAS Institute Inc., Cary, NC, 1989-2018.
[47] Read, Jesse et al., MEKA: A Multi- label/Multi-target Extension to Weka, Journal of Machine Learning Research, 2016, 17, 21, 15, http://jmlr.org/papers/v17/12-164.html
[48] Mark Hall, Eibe Frank, Geoffrey Holmes, Bernhard Pfahringer, Peter Reutemann, Ian H. Witten, The WEKA Data Mining Software: An Update, 2009, SIGKDD Explorations, Volume 11, Issue 1.
[49] K. Dembczyński, W. Cheng, and E. Hüllermeier, “Bayes optimal multi-label classification via probabilistic classifier chains,” in Proceedings of the 27th International Conference on Machine Learning, Haifa, Israel, 2010, pp. 279–286.
[50] Tsoumakas G., Vlahavas I. Random k-Labelsets: An Ensemble Method for Multilabel Classification. Machine Learning: ECML 2007. ECML 2007. Lecture Notes in Computer Science, vol 4701
[51] Jesse Read. Scalable Multi-label Classification. PhD Dissertation. University of Waikato. 2010.
[52] Jesse Read et al., Classifier Chains for Multi-label Classification, Machine Learning, December 2011, 85:333
[53] J. P. Pestian, C. Brew, P. M. Matykiewicz, D. J. Hovermale, N. Johnson, K. B. Cohen, and W. Duch. 2007. A shared task involving multi-label classification of clinical free text. In Proceedings of ACL BioNLP. 97–104.
(此全文未開放授權)
電子全文
中英文摘要
 
 
 
 
第一頁 上一頁 下一頁 最後一頁 top
* *