帳號:guest(18.222.21.178)          離開系統
字體大小: 字級放大   字級縮小   預設字形  

詳目顯示

以作者查詢圖書館館藏以作者查詢臺灣博碩士論文系統以作者查詢全國書目
作者(中文):張宜榛
作者(外文):Chang, Yi-Chen
論文名稱(中文):利用預訓練語言模型改進針對英文文章不同面向之自動評分
論文名稱(外文):Harnessing the Power of Pre-Trained Language Models for Automated Essay Trait Scoring
指導教授(中文):張俊盛
郭柏志
指導教授(外文):Chang, Jason S.
Kuo, Po-Chih
口試委員(中文):高宏宇
顏安孜
口試委員(外文):Kao, Hung-Yu
Yen, An-Zi
學位類別:碩士
校院名稱:國立清華大學
系所名稱:資訊系統與應用研究所
學號:109065703
出版年(民國):112
畢業學年度:111
語文別:英文
論文頁數:52
中文關鍵詞:文章自動評分文字分析特徵工程Transformer架構的預訓練語言模型
外文關鍵詞:Automated essay scoringText analysisFeature engineeringTransformer-based Pre-trained language models
相關次數:
  • 推薦推薦:0
  • 點閱點閱:183
  • 評分評分:*****
  • 下載下載:0
  • 收藏收藏:0
現有的文章自動評分系統主要側重在文章的整體分數,鮮少有對寫作的不同面向之評估。為了提供更全面的寫作分析,我們提出了針對特定特徵的文章評分模型,例如:組織和句子流暢性。我們的研究涉及利用基於 Transformer 的預訓練語言模型和擷取並加入各種人工之特徵。此外,我們還研究了在單個模型中對多個面向進行分級的可行性。根據實驗結果,使用預訓練的語言模型可以改善單一特徵和多特徵的評分結果。總體而言,雖然包含額外的手動設計的特徵並不能顯著提高結果,但是採用多任務學習可以在特徵評分方面產生優異的性能。
Existing automated essay scoring systems predominantly focus on holistic scores, thereby limiting the thorough evaluation of writing. To provide a more comprehensive analysis of writings, we present essay grading models that emphasize specific traits such as organization and sentence fluency. Our research involves the utilization of transformer-based pre-trained language models while integrating various number of handcrafted features. Additionally, we investigate the feasibility of grading multiple traits within a single model. Based on the experimental results, using a pre-trained language model improves both single trait and multiple traits scoring outcomes. Overall, employing multi-task learning yields superior performance for trait scoring, while the inclusion of additional manually crafted features does not significantly enhance the results.
Abstract ---------------- i
摘要 --------------------- ii
致謝 --------------------- iii
Contents ---------------- iv
List of Figures --------- vii
List of Tables ---------- viii
1 Introduction ---------- 1
2 Related Work ---------- 3
3 Methodology ----------- 6
4 Experiments ----------- 12
5 Results --------------- 16
6 Discussion ------------ 22
7 Human Study ----------- 26
8 Conclusion ------------ 29
References -------------- 30
Appendices ------------- 37
Attali, Y., & Burstein, J. (2006). Automated essay scoring with e-rater® v. 2. The Journal of Technology, Learning and Assessment, 4(3). https://ejournals. bc.edu/index.php/jtla/article/view/1650

Bao, J., Wang, Y., Li, Y., Mi, F., & Xu, R. (2022). AEG: Argumentative essay generation via a dual-decoder model with content planning. Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, 5134–5148. https://aclanthology.org/2022.emnlp-main.343

Beltagy, I., Peters, M. E., & Cohan, A. (2020). Longformer: The long-document transformer. arXiv preprint arXiv:2004.05150.

Bryant, C., Felice, M., & Briscoe, T. (2017). Automatic annotation and evaluation of error types for grammatical error correction. Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 793–805. https://doi.org/10.18653/v1/P17-1074

Carlile, W., Gurrapadi, N., Ke, Z., & Ng, V. (2018). Give me more feedback: Annotating argument persuasiveness and related attributes in student essays. Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 621–631. https://doi.org/10. 18653/v1/P18-1058

Cohen, J. (1968). Weighted kappa: Nominal scale agreement provision for scaled disagreement or partial credit. Psychological bulletin, 70(4), 213.

Cozma, M., Butnaru, A., & Ionescu, R. T. (2018). Automated essay scoring with string kernels and word embeddings. Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), 503–509. https://doi.org/10.18653/v1/P18-2080
Devlin, J., Chang, M.-W., Lee, K., & Toutanova, K. (2019). BERT: Pre-training of deep bidirectional transformers for language understanding. Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), 4171–4186. https://doi.org/10.18653/v1/N19-1423

Dong, F., Zhang, Y., & Yang, J. (2017). Attention-based recurrent convolutional neural network for automatic essay scoring. Proceedings of the 21st Conference on Computational Natural Language Learning (CoNLL 2017), 153–162. https://doi.org/10.18653/v1/K17-1017

Fitzsimmons, P. R., Michael, B., Hulley, J. L., & Scott, G. O. (2010). A readability assessment of online parkinson’s disease information. The journal of the Royal College of Physicians of Edinburgh, 40(4), 292–296.

Granger, S., Dagneaux, E., Meunier, F., Paquot, M., et al. (2009). International corpus of learner english (Vol. 2). Presses universitaires de Louvain Louvainla-
Neuve.

Hussein, M. A., Hassan, H. A., & Nassef, M. (2020). A trait-based deep learning automated essay scoring system with adaptive feedback. International Journal of Advanced Computer Science and Applications, 11(5).

Ke, Z., & Ng, V. (2019). Automated essay scoring: A survey of the state of the art. Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence, IJCAI-19, 6300–6308. https://doi.org/10.24963/ijcai.2019/879

Kincaid, J. P., Fishburne Jr, R. P., Rogers, R. L., & Chissom, B. S. (1975). Derivation of new readability formulas (automated readability index, fog count and flesch reading ease formula) for navy enlisted personnel (tech.rep.). Naval Technical Training Command Millington TN Research Branch.

Lo, K., Wang, L. L., Neumann, M., Kinney, R., & Weld, D. (2020). S2ORC:
The semantic scholar open research corpus. Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, 4969–4983. https://doi.org/10.18653/v1/2020.acl-main.447

Mathias, S., & Bhattacharyya, P. (2018). ASAP++: Enriching the ASAP automated essay grading dataset with essay attribute scores. Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018). https://aclanthology.org/L18-1187

Mathias, S., & Bhattacharyya, P. (2020). Can neural networks automatically score essay traits? Proceedings of the Fifteenth Workshop on Innovative Use of NLP for Building Educational Applications, 85–91. https://doi.org/10.18653/v1/2020.bea-1.8

Miller, G. A., Leacock, C., Tengi, R., & Bunker, R. T. (1993). A semantic concordance. Human Language Technology: Proceedings of a Workshop Held at Plainsboro, New Jersey, March 21-24, 1993. https://aclanthology.org/H93-1061

Page, E. B. (2003). Project essay grade: Peg. Automated essay scoring: A crossdisciplinary perspective, 43–54.

Persing, I., Davis, A., & Ng, V. (2010). Modeling organization in student essays. Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing, 229–239. https://aclanthology.org/D10-1023

Persing, I., & Ng, V. (2013). Modeling thesis clarity in student essays. Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 260–269. https://aclanthology.org/P13-1026

Persing, I., & Ng, V. (2014). Modeling prompt adherence in student essays. Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 1534–1543. https://doi.org/10.3115/v1/P14-1144

Persing, I., & Ng, V. (2015). Modeling argument strength in student essays. Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), 543–552. https://doi.org/10.3115/v1/P15-1053

Radford, A., Narasimhan, K., Salimans, T., Sutskever, I., et al. (2018). Improving language understanding by generative pre-training. https://www.cs.ubc.ca/~amuham01/LING530/papers/radford2018improving.pdf

Raffel, C., Shazeer, N., Roberts, A., Lee, K., Narang, S., Matena, M., Zhou, Y., Li, W., & Liu, P. J. (2020). Exploring the limits of transfer learning with a unified text-to-text transformer. The Journal of Machine Learning Research, 21(1), 5485–5551.

Ridley, R., He, L., Dai, X.-y., Huang, S., & Chen, J. (2021). Automated crossprompt scoring of essay traits. Proceedings of the AAAI Conference on Artificial Intelligence, 35(15), 13745–13753. https://doi.org/10.1609/aaai.v35i15.17620

Rothe, S., Mallinson, J., Malmi, E., Krause, S., & Severyn, A. (2021). A simple recipe for multilingual grammatical error correction. Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 2: Short Papers), 702–707. https://doi.org/10.18653/v1/2021.aclshort.89

Rudner, L. M., & Liang, T. (2002). Automated essay scoring using bayes’theorem. The Journal of Technology, Learning and Assessment, 1(2). https://ejournals.bc.edu/index.php/jtla/article/view/1668

Shazeer, N., & Stern, M. (2018). Adafactor: Adaptive learning rates with sublinear memory cost. In J. Dy & A. Krause (Eds.), Proceedings of the 35th international conference on machine learning (pp. 4596–4604). PMLR. https://proceedings.mlr.press/v80/shazeer18a.html

Stab, C., & Gurevych, I. (2014). Identifying argumentative discourse structures in persuasive essays. Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), 46–56. https://doi.org/10.3115/v1/D14-1006

Taghipour, K., & Ng, H. T. (2016). A neural approach to automated essay scoring. Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, 1882–1891. https://doi.org/10.18653/v1/D16-1193

Uto, M., Xie, Y., & Ueno, M. (2020). Neural automated essay scoring incorporating handcrafted features. Proceedings of the 28th International Conference on Computational Linguistics, 6077–6088. https://doi.org/10.18653/v1/2020.coling-main.535

Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, Ł., & Polosukhin, I. (2017). Attention is all you need. Advances in neural information processing systems, 30.

Wolf, T., Debut, L., Sanh, V., Chaumond, J., Delangue, C., Moi, A., Cistac, P., Rault, T., Louf, R., Funtowicz, M., Davison, J., Shleifer, S., von Platen, P., Ma, C., Jernite, Y., Plu, J., Xu, C., Le Scao, T., Gugger, S., … Rush, A. (2020). Transformers: State-of-the-art natural language processing. Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations, 38–45. https://doi.org/10.18653/v1/2020.emnlp-demos.6

Wu, H.-T. (2022). Gramaconc: A concordancer for grammar patterns and phrases. https://hdl.handle.net/11296/n6dwxc

Yang, P., Li, L., Luo, F., Liu, T., & Sun, X. (2019). Enhancing topic-to-essay generation with external commonsense knowledge. Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, 2002–2012. https://doi.org/10.18653/v1/P19-1193

Yang, R., Cao, J., Wen, Z., Wu, Y., & He, X. (2020). Enhancing automated essay scoring performance via fine-tuning pre-trained language models with combination of regression and ranking. Findings of the Association for Computational Linguistics: EMNLP 2020, 1560–1569. https://doi.org/10.18653/v1/2020.findings-emnlp.141
 
 
 
 
第一頁 上一頁 下一頁 最後一頁 top
* *