帳號:guest(18.118.146.14)          離開系統
字體大小: 字級放大   字級縮小   預設字形  

詳目顯示

以作者查詢圖書館館藏以作者查詢臺灣博碩士論文系統以作者查詢全國書目
作者(中文):陳麗光
作者(外文):Chen, Li-Kuang
論文名稱(中文):學術文章文步的辨識與分類
論文名稱(外文):Identification and Classification of Rhetorical Function Expressions in Academic Articles
指導教授(中文):張俊盛
指導教授(外文):Chang, Jason S.
口試委員(中文):蕭若綺
張智星
高照明
學位類別:碩士
校院名稱:國立清華大學
系所名稱:資訊系統與應用研究所
學號:109065704
出版年(民國):112
畢業學年度:111
語文別:英文
論文頁數:61
中文關鍵詞:修辭功能序列標記文步電腦輔助英語學習
外文關鍵詞:rhetorical functionsequence labellingmovecomputer-assisted English learning
相關次數:
  • 推薦推薦:0
  • 點閱點閱:129
  • 評分評分:*****
  • 下載下載:0
  • 收藏收藏:0
修辭功能詞句(Rhetorical Function Expression, RFE,又稱「文步」詞句)在構建有說服力的學術寫作論證中扮演著至關重要的角色,然而現有的學習資源有限,且通常在沒有上下文的情況下呈現。我們在此論文,提出了一個學術文本 RFE 資料集,以及兩個模型。模型在給定一學術文章,分別能從中辨識和分類 RFE ,以提供學習者更符合直覺的寫作輔助。我們通過半自動的方式,透過使用既有的一組學術片語,在現有的資料集中標注出 RFE,並研究了三種標註策略的效果。我們使用該資料集,訓練一個用於識別學術寫作中 RFE 的序列標註模型,該模型在自動評估指標下展現了相當不錯的評估結果。此外,我們也展現,在多任務(multi-tasking)設定下,將 RFE 標註作為輔助任務可有效提高修辭類別分類的準確率。本研究提供了一套完整的 RFE 資料標註、訓練和評估工作流程設計,為未來的研究提供了基準。
Rhetorical Function Expression (RFE) plays a crucial role in constructing persuasive argument, but existing learning resources provide limited amount of examples and are often presented without context. We present an RFE-annotated dataset extracted from scholarly texts and a model that identifies RFEs infor a given academic article.
Using a set of existing rhetoric phrases, we develop a RFE dataset semi-heuristically, and investigate the effectiveness of three annotation strategies. We trained and evaluated a sequence-tagging model for identifying RFEs in scholarly writing, showing promising results based on automated evaluation. We also demonstrate that using RFE-tagging as an auxiliary task is effective in improving rhetorical category classification under a multi-tasking setting. Our work offers a comprehensive workflow design of RFE data annotation, training, and evaluation, and provides baselines for future research.
Abstract (Chinese) I
Acknowledgements (Chinese) II
Abstract III
Contents IV
List of Figures VI
List of Tables VII
1 Introduction 1
2 Related Work 4
3 Methodology 8
3.1 The System Pipeline . . . . . . . . . . . . . . . . . . . . . . . . . . 9
3.1.1 Annotating Rhetorical Function Expressions . . . . . . . . . 9
3.1.2 Learning to Identify RFEs . . . . . . . . . . . . . . . . . . . 10
3.1.3 Classification with RFE-Tagging as an Auxiliary Task . . . . 11
3.1.4 Inference . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
4 Experiments 14
4.1 Data Labelling and Preprocessing . . . . . . . . . . . . . . . . . . . 14
4.1.1 Datasets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
4.1.2 Tagging Rhetorical Function Expressions in Sentences . . . . 16
4.2 Evaluation Metrics . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
4.3 Experimental Settings . . . . . . . . . . . . . . . . . . . . . . . . . 20
5 Results and Discussion 22
5.1 Sequence Tagging . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
5.2 Classification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
6 Conclusion 32
Bibliography 34
A Complete List of Patterns 43
A.1 Rhetorical Categories of Patterns . . . . . . . . . . . . . . . . . . . 43
A.2 Patterns . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
A.2.1 Formulaic Patterns . . . . . . . . . . . . . . . . . . . . . . . 44
A.2.2 Agentivity Patterns . . . . . . . . . . . . . . . . . . . . . . . 50
A.3 Lexicons . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
A.3.1 Visualization of Relation Between Tag Types and Sentence
Labels . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
A.3.2 Visualization of Relation Between Tag Types and Sentence
Labels . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
B Human Evaluation Details 60
C Precision and Recall Scores of Classification Models 61
Pablo Accuosto, Mariana Neves, and Horacio Saggion. Argumentation mining in scientific literature: From computational linguistics to biomedicine. In Frommholz I, Mayr P, Cabanac G, Verberne S, editors. BIR 2021: 11th International Workshop on Bibliometric-enhanced Information Retrieval; 2021 Apr1; Lucca, Italy. Aachen: CEUR; 2021. p. 20-36. CEUR Workshop Proceedings,
2021.

Titipat Achakulvisut, Chandra Bhagavatula, Daniel Ernesto Acuna, and Konrad Paul Kording. Claim extraction in biomedical publications using deep discourse model and transfer learning. ArXiv, abs/1907.00962, 2019.

Eugene Agichtein, Steve Lawrence, and Luis Gravano. Learning to find answers to questions on the web. ACM Trans. Internet Technol., 4(2):129–162, may 2004. ISSN 1533-5399. doi: 10.1145/990301.990303. URL https://doi.org/
10.1145/990301.990303.

Khalid Al Khatib, Tirthankar Ghosal, Yufang Hou, Anita de Waard, and Dayne Freitag. Argument mining for scholarly document processing: Taking stock and looking ahead. In Proceedings of the Second Workshop on Scholarly Document Processing, pages 56–65, Online, June 2021. Association for Computational Linguistics. doi: 10.18653/v1/2021.sdp-1.7. URL https://aclanthology.org/
2021.sdp-1.7.

Laurence Anthony and George V Lashkia. Mover: A machine learning tool to assist in the reading and writing of technical papers. IEEE transactions on professional communication, 46(3):185–193, 2003.

Iz Beltagy, Kyle Lo, and Arman Cohan. SciBERT: A pretrained language model for scientific text. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pages 3615–3620, Hong Kong, China, November 2019. Association for Computational Linguistics. doi: 10.18653/v1/D19-1371. URL https://aclanthology.org/D19-1371.

Arne Binder, Bhuvanesh Verma, and Leonhard Hennig. Full-text argumentation mining on scientific publications. arXiv preprint arXiv:2210.13084, 2022.

America Chambers. Statistical models for text classification and clustering: applications and analysis. University of California, Irvine, 2013.

Tao Chen, Ruifeng Xu, Yulan He, and Xian Wang. Improving sentiment analysis via sentence type classification using bilstm-crf and cnn. Expert Systems with Applications, 72:221–230, 2017. ISSN 0957-4174. doi: https://doi.org/10.1016/j.eswa.2016.10.065. URL https://www.sciencedirect.com/science/article/pii/S0957417416305929.

Sylvie De Cock, Ga ̈etanelle Gilquin, Sylviane Granger, Marie-Aude Lefer, Magali Paquot, and Suzanne Ricketts. Improve your writing skills. 2007.

Arman Cohan, Waleed Ammar, Madeleine van Zuylen, and Field Cady. Structural scaffolds for citation intent classification in scientific publications. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pages 3586–3596, Minneapolis, Minnesota, June 2019. Association for Computational Linguistics. doi: 10.18653/v1/N19-1361. URL https://aclanthology.org/N19-1361.

Danish Contractor, Yufan Guo, and Anna Korhonen. Using argumentative zones for extractive summarization of scientific articles. In Proceedings of COLING 2012, pages 663–678, Mumbai, India, December 2012. The COLING 2012 Organizing Committee. URL https://aclanthology.org/C12-1041.

Gustavo Bennemann de Moura and Val ́eria Delisandra Feltrim. Using lstm encoder-decoder for rhetorical structure prediction. In 2018 7th Brazilian Conference on Intelligent Systems (BRACIS), pages 278–283. IEEE, 2018.

Franck Dernoncourt and Ji Young Lee. PubMed 200k RCT: a dataset for sequential sentence classification in medical abstracts. In Proceedings of the Eighth International Joint Conference on Natural Language Processing (Volume 2: Short Papers), pages 308–313, Taipei, Taiwan, November 2017. Asian Federation of Natural Language Processing. URL https://aclanthology.org/I17-2052.

Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. BERT: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pages 4171–4186, Minneapolis, Minnesota, June 2019. Association for Computational Linguistics. doi: 10.18653/v1/N19-1423. URL https://aclanthology.org/N19-1423.

Val ́eria D. Feltrim, Simone Teufel, Maria Gra ̧cas V. das Nunes, and Sandra M. Alu ́ısio. Argumentative Zoning Applied to Critiquing Novices’ Scientific Abstracts, pages 233–246. Springer Netherlands, Dordrecht, 2006. ISBN 978-1-364020-4102-0. doi: 10.1007/1-4020-4102-0 18. URL https://doi.org/10.1007/1-4020-4102-018.

Beatriz Fisas, Francesco Ronzano, and Horacio Saggion. A multi-layered annotated corpus of scientific papers. In Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC’16), pages 3081–3088, Portoroˇz, Slovenia, May 2016. European Language Resources Association (ELRA). URL https://aclanthology.org/L16-1492.

Joseph L Fleiss. Measuring nominal scale agreement among many raters. Psychological bulletin, 76(5):378, 1971.

Raymond Fok, Hita Kambhamettu, Luca Soldaini, Jonathan Bragg, Kyle Lo, Marti Hearst, Andrew Head, and Daniel S Weld. Scim: Intelligent skimming support for scientific papers. In Proceedings of the 28th International Conference on Intelligent User Interfaces, IUI ’23, page 476–490, New York, NY, USA, 2023. Association for Computing Machinery. ISBN 9798400701061. doi: 10.1145/3581641.3584034. URL https://doi.org/10.1145/3581641.3584034.

H. Glasman-Deal. Science Research Writing for Non-native Speakers of English. Imperial College Press, 2010. ISBN 9781848163096. URL https://books.google.com.tw/books?id=nu9LZ1x8l8oC.

Kenji Hirohata, Naoaki Okazaki, Sophia Ananiadou, and Mitsuru Ishizuka. Identifying sections in scientific abstracts using conditional random fields. In Proceedings of the Third International Joint Conference on Natural Language Pro-
cessing: Volume-I, 2008.

Matthew Honnibal, Ines Montani, Sofie Van Landeghem, and Adriane Boyd. spacy: Industrial-strength natural language processing in python. 2020. doi: 10.5281/zenodo.1212303.37

Guan-Cheng Huang, Jian-Cheng Wu, Hsiang-Ling Hsu, Tzu-Hsi Yen, and Jason S Chang. Automatic move analysis of research articles for assisting writing[in chinese]. In International Journal of Computational Linguistics & Chinese Language Processing, Volume 19, Number 4, December 2014-Special Issue on Selected Papers from ROCLING XXVI, 2014.

Kenichi Iwatsuki, Florian Boudin, and Akiko Aizawa. An evaluation dataset for identifying communicative functions of sentences in english scholarly papers. In Proceedings of The 12th Language Resources and Evaluation Conference, pages 1712–1720, Marseille, France, May 2020. European Language Resources Association. URL http://www.lrec-conf.org/proceedings/lrec2020/pdf/2020.
lrec-1.212.pdf.

David Jurgens, Srijan Kumar, Raine Hoover, Dan McFarland, and Dan Jurafsky. Measuring the evolution of a scientific field through citation frames. Transactions of the Association of Computational Linguistics, 2018. Diederik P. Kingma and Jimmy Ba. Adam: A method for stochastic optimization.

In Yoshua Bengio and Yann LeCun, editors, 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings, 2015. URL http://arxiv.org/abs/1412.6980. Klaus Krippendorff. Computing krippendorff’s alpha-reliability. 2011. John Lafferty, Andrew McCallum, and Fernando CN Pereira. Conditional random fields: Probabilistic models for segmenting and labeling sequence data. 2001. URL https://repository.upenn.edu/cgi/viewcontent.cgi?
article=1162&context=cis_papers.

Anne Lauscher, Goran Glavaˇs, and Simone Paolo Ponzetto. An argument-annotated corpus of scientific publications. In Proceedings of the 5th Workshop on Argument Mining, pages 40–46, Brussels, Belgium, November 2018. Association for Computational Linguistics. doi: 10.18653/v1/W18-5206. URL https://aclanthology.org/W18-5206.

John Lawrence and Chris Reed. Argument mining: A survey. Computational Linguistics, 45(4):765–818, 2020. Guillaume Lemaˆıtre, Fernando Nogueira, and Christos K. Aridas. Imbalanced-learn: A python toolbox to tackle the curse of imbalanced datasets in machine
learning. Journal of Machine Learning Research, 18(17):1–5, 2017. URL http://jmlr.org/papers/v18/16-365.html.

Xiangci Li, Gully A. Burns, and Nanyun Peng. Scientific discourse tagging for evidence extraction. In Conference of the European Chapter of the Association for Computational Linguistics, 2021. Maria Liakata, Simone Teufel, Advaith Siddharthan, and Colin Batchelor. Corpora for the conceptualisation and zoning of scientific papers. In Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC’10), Valletta, Malta, May 2010. European Language Resources Association (ELRA). URL http://www.lrec-conf.org/proceedings/lrec2010/pdf/644_Paper.pdf.

Nitin Madnani, Michael Heilman, Joel Tetreault, and Martin Chodorow. Identifying high-level organizational elements in argumentative discourse. In Proceedings of the 2012 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 20–28, 2012.

Mitchell P. Marcus, Beatrice Santorini, and Mary Ann Marcinkiewicz. Building a large annotated corpus of English: The Penn Treebank. Computational Linguistics, 19(2):313–330, 1993. URL https://aclanthology.org/J93-2004.

Tobias Mayer, Elena Cabrio, and Serena Villata. Transformer-based Argument Mining for Healthcare Applications. In ECAI 2020 - 24th European Conference on Artificial Intelligence, Santiago de Compostela / Online, Spain, August 2020. URL https://hal.science/hal-02879293.

Rui Meng, Wei Lu, Yu huan Chi, and Shuguang Han. Automatic classification of citation function by new linguistic features. 2017.
Stephen Merity, Tara Murphy, and James R. Curran. Accurate argumentative zoning with maximum entropy models. In Proceedings of the 2009 Workshop on Text and Citation Analysis for Scholarly Digital Libraries, NLPIR4DL’09, page 19–26, USA, 2009. Association for Computational Linguistics. ISBN 9781932432589.

Gaku Morio, Hiroaki Ozaki, Terufumi Morishita, and Kohsuke Yanai. End-to-end Argument Mining with Cross-corpora Multi-task Learning. Transactions of the Association for Computational Linguistics, 10:639–658, 05 2022. ISSN 2307-387X. doi: 10.1162/tacl a 00481. URL https://doi.org/10.1162/tacl_a_00481.

John Morley. Academic phrasebank. https://www.phrasebank.manchester.ac.uk, 2014. Accessed: 2023-06-25.

Diarmuid ́O S ́eaghdha and Simone Teufel. Unsupervised learning of rhetorical structure with un-topic models. In Proceedings of COLING 2014, the 25th International Conference on Computational Linguistics: Technical Papers, pages 2–13, Dublin, Ireland, August 2014. Dublin City University and Association for Computational Linguistics. URL https://aclanthology.org/C14-1002.

Magali Paquot and Sylviane Granger. The academic keyword list (akl). https://uclouvain.be/en/research-institutes/ilc/cecl/the-academic-keyword-list-akl.html. Accessed: 2023-06-25.

Andreas Peldszus and Manfred Stede. An annotated corpus of argumentative microtexts. In Argumentation and Reasoned Action: Proceedings of the 1st European Conference on Argumentation, Lisbon, volume 2, pages 801–815, 2015.

Justus J Randolph. Free-marginal multirater kappa (multirater k [free]): An alternative to fleiss’ fixed-marginal multirater kappa. Online submission, 2005.

Sebastian Ruder. An overview of multi-task learning in deep neural networks. 2017. URL https://doi.org/10.48550/arXiv.1706.05098.
John M Swales. Genre analysis: English in academic and research settings. Cambridge university press, 1990.

Simone Teufel, Advaith Siddharthan, and Colin Batchelor. Towards domain-independent argumentative zoning: Evidence from chemistry and computational linguistics. In Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing, pages 1493–1502, Singapore, August 2009. Association for Computational Linguistics. URL https://aclanthology.org/D09-1155.

Simone Teufel et al. Argumentative zoning: Information extraction from scientific text. PhD thesis, Citeseer, 1999.

Thiemo Wambsganss, Andrew Caines, and Paula Buttery. Alen app: Persuasive writing support to foster english language learning. BEA 2022, page 134, 2022.

Thomas Wolf, Lysandre Debut, Victor Sanh, Julien Chaumond, Clement Delangue, Anthony Moi, Pierric Cistac, Tim Rault, Remi Louf, Morgan Funtowicz, Joe Davison, Sam Shleifer, Patrick von Platen, Clara Ma, Yacine Jernite, Julien Plu, Canwen Xu, Teven Le Scao, Sylvain Gugger, Mariama Drame, Quentin Lhoest, and Alexander Rush. Transformers: State-of-the-art natural language processing. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations, pages 38–45, Online, October 2020. Association for Computational Linguistics. doi: 10.18653/v1/2020.emnlp-demos.6. URL https://aclanthology.org/2020.emnlp-demos.6.

David H. Wolpert. Stacked generalization. Neural Networks, 5(2):241–259, 1992. ISSN 0893-6080. doi: https://doi.org/10.1016/S0893-6080(05)80023-1. URL https://www.sciencedirect.com/science/article/pii/S0893608005800231.

An Yang and Sujian Li. SciDTB: Discourse dependency TreeBank for scientific abstracts. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), pages 444–449, Melbourne, Australia, July 2018. Association for Computational Linguistics. doi:10.18653/v1/P18-2071. URL https://aclanthology.org/P18-2071.

Fan Zhang, Homa B. Hashemi, Rebecca Hwa, and Diane Litman. A corpus of annotated revisions for studying argumentative writing. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 1568–1578, Vancouver, Canada, July 2017. Association for Computational Linguistics. doi: 10.18653/v1/P17-1144. URL https://aclanthology.org/P17-1144.
 
 
 
 
第一頁 上一頁 下一頁 最後一頁 top
* *