帳號:guest(216.73.216.146)          離開系統
字體大小: 字級放大   字級縮小   預設字形  

詳目顯示

以作者查詢圖書館館藏以作者查詢臺灣博碩士論文系統以作者查詢全國書目
作者(中文):周樂儀
作者(外文):CHOW, YVONNE LORK-YEE
論文名稱(中文):基於字義和發音預測馬來西亞華裔姓名之年齡
論文名稱(外文):Age Estimation based on Character Meaning and Pronunciation Using Ethnic-Chinese Malaysian Names
指導教授(中文):陳宜欣
指導教授(外文):Chen, Yi-Shin
口試委員(中文):彭文志
賴郁雯
口試委員(外文):Peng, Wen-Chih
Lai, Yu-Wen
學位類別:碩士
校院名稱:國立清華大學
系所名稱:資訊系統與應用研究所
學號:106065710
出版年(民國):109
畢業學年度:108
語文別:英文
論文頁數:52
中文關鍵詞:年齡預測字義發音馬來西亞華裔姓名
外文關鍵詞:Age EstimationCharacter MeaningPronunciationEthnic-ChineseMalaysian names
相關次數:
  • 推薦推薦:0
  • 點閱點閱:820
  • 評分評分:*****
  • 下載下載:0
  • 收藏收藏:0
在我們的日常生活中,不同年齡的人會因為自身的生活經歷不一樣而往往具有不同
的性格,不同的偏好或不同的行為。不幸的是,由於隱私設置的原因,用戶的年齡
信息很難收集,因此,我們調查其他可能與年齡有關的有用信息,例如:姓名。有
一個研究針對台灣人的中文名字進行分析,透過中華文化在取名上常用的特徵進行
年齡預測。由於該研究僅針對特定的國家和語言,因此,本研究的目的是要探討年
齡預測模型在不同語言和不同國家的可推廣性。我們的實驗結果表明,透過使用名
字本身字義和發音的特徵,則能夠用來預測該名字的年齡層。
In our daily life, people with different age tends to have different personalities due to their life experiences and also have different preference or behavior. Unfortunately, due to the privacy setting, the user’s information for age is difficult to collect, therefore, we look into other useful information, which might related to age, such as name. Previously, there is a research focus on estimating the age-interval of Taiwanese name. Through the observation of Taiwanese culture to give a name, they extract the features from the name to do age prediction. As the work is only focused on a specific country and language, therefore, the objective of this research is to explore the generalisability of the age prediction model on different linguistic and for different country. The experiment results indicates that the name itself carry a lot of meaning and the meaning can be use as a feature to predict the age of a name.
1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
2 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.1 The Role of Age and Gender . . . . . . . . . . . . . . . . . . . . . 5
2.2 Age Prediction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2.3 Relation beyond Words . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.4 Word Pronunciation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
3 Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
3.1 Thesis statement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
3.2 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
3.3 Data Pre-processing . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
3.3.1 Name Filtering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
3.3.2 Family Name Segmentation . . . . . . . . . . . . . . . . . . . . 13
3.4 Given Name Features Extraction . . . . . . . . . . . . . . . . . 14
3.4.1 Pronunciation Feature . . . . . . . . . . . . . . . . . . . . . . . . . 15
3.4.2 Word Embedding Feature . . . . . . . . . . . . . . . . . . . . . . 20
3.4.3 Word Radical Feature . . . . . . . . . . . . . . . . . . . . . . . . . 21
3.4.4 Fortune-Telling Feature . . . . . . . . . . . . . . . . . . . . . . . . 22
3.4.5 Gender Prediction Feature . . . . . . . . . . . . . . . . . . . . . . 23
3.4.6 Phase I: Age-interval Classifier . . . . . . . . . . . . . . . . . . . . 24
3.4.7 Phase II: Cross-border Learning on Malaysian name . . 24
4 Data Collection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
4.1 Taiwanese Dataset Collection . . . . . . . . . . . . . . . . . . . . . . . 28
4.1.1 Taiwanese Student Name Crowdsourcing . . . . . . . . . . . . 28
4.1.2 Collecting Name of Taiwanese Public Figure . . . . . . . . . . 28
4.2 Malaysia Dataset Collection . . . . . . . . . . . . . . . . . . . . . . . . . 29
4.2.1 Malaysian Student Name Crowdsourcing . . . . . . . . . . . 29
5 Experiment and Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
5.1 Experimental Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
5.1.1 Datasets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
5.1.2 Evaluation method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
5.2 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
5.3 Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
5.3.1 Gender Feature Classification . . . . . . . . . . . . . . . . . . . . 38
5.3.2 Age-interval Classifier . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
6 Conclusion and Future Work . . . . . . . . . . . . . . . . . . . . . . . . . 47
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
[1] Arthur S. Abramson. 1. the principles on which the ipa should be based. Journal of
the International Phonetic Association, 18(2):66–68, 1988.

[2] Phillip Ackerman and Margaret Beier. Intelligence, personality, and interests in the
career choice process. Journal of Career Assessment - J CAREER ASSESSMENT,
11:205–218, 05 2003.

[3] Margaret E. Beier and Phillip L. Ackerman. Determinants of health knowledge: An
investigation of age, gender, abilities, personality, and interests. Journal of Personality
and Social Psychology, 84(2):439,448, 2003-02.

[4] Xinxiong Chen, Lei Xu, Zhiyuan Liu, Maosong Sun, and Huan-Bo Luan. Joint learning of character and word embeddings. In IJCAI, 2015.

[5] Kevin Chung. Effects of pinyin and first language words in learning of chinese characters as a second language. Journal of Behavioral Education, 12:207–223, 09 2003.

[6] Akshay Gulati. Extracting Information from Indian First Names. In Proceedings of
the 12th International Conference on Natural Language Processing, pages 138–143,
Trivandrum, India, 12 2015. NLP Association of India.

[7] Mohd Hilmi Hamzah, Aini Ahmad, and Mohd Hasren Yusuf. A comparative study
of pronunciation among chinese learners of english from malaysia and china: The
case of voiceless dental fricatives /θ/ and alveolar liquids /r/. Sains Humanika, 9, 11
2017.

[8] Ching-Yen Hsiao. A Comparative Framework for Person Age Estimation Using only
Taiwanese Name Data. 11 2017.

[9] Erich L. Lehmann and George Casella. Theory of Point Estimation. Springer-Verlag,
New York, NY, USA, second edition, 1998.

[10] M. K. C. Macmahon. The international phonetic association: The first 100 years.
Journal of the International Phonetic Association, 16(1):30–38, 1986.

[11] J.G. Melton. The Encyclopedia of Religious Phenomena. Visible Ink Press, 2008.

[12] Tomas Mikolov, Ilya Sutskever, Kai Chen, G.s Corrado, and Jeffrey Dean. Distributed representations of words and phrases and their compositionality. Advances in Neural Information Processing Systems, 26, 10 2013.

[13] Dong Nguyen, Noah A. Smith, and Carolyn P. Rose. Author age prediction from text ´using linear regression. In Proceedings of the 5th ACL-HLT Workshop on Language Technology for Cultural Heritage, Social Sciences, and Humanities, pages 115–123, Portland, OR, USA, June 2011. Association for Computational Linguistics.

[14] Claudia Peersman, Walter Daelemans, and Leona Van Vaerenbergh. Predicting age and gender in online social networks. In Proceedings of the 3rd International Workshop on Search and Mining User-generated Contents, SMUC ’11, pages 37–44, New York, NY, USA, 2011. ACM.

[15] Daniel Preot¸iuc-Pietro, Johannes Eichstaedt, Gregory Park, Maarten Sap, Laura
Smith, Victoria Tobolsky, H. Andrew Schwartz, and Lyle Ungar. The role of personality, age, and gender in tweeting about mental illness. In Proceedings of the 2nd
Workshop on Computational Linguistics and Clinical Psychology: From Linguistic
Signal to Clinical Reality, pages 21–30, Denver, Colorado, June 5 2015. Association
for Computational Linguistics.

[16] Daniel Preotiuc-Pietro, Johannes C. Eichstaedt, Gregory J. Park, Maarten Sap, Laura Smith, Victoria Tobolsky, H. Andrew Schwartz, and Lyle H. Ungar. The role of
personality, age, and gender in tweeting about mental illness. In CLPsych@HLTNAACL, 2015.

[17] Sara Rosenthal and Kathleen McKeown. Age prediction in blogs: A study of style,
content, and online behavior in pre- and post-social media generations. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1, HLT ’11, pages 763–772, Stroudsburg, PA, USA, 2011. Association for Computational Linguistics.

[18] Maarten Sap, Gregory Park, Johannes Eichstaedt, Margaret Kern, David Stillwell,
Michal Kosinski, Lyle Ungar, and H. Schwartz. Developing age and gender predictive
lexica over social media. pages 1146–1151, 01 2014.

[19] H. Schwartz, Johannes Eichstaedt, Margaret Kern, Lukasz Dziurzynski, Stephanie
Ramones, Megha Agrawal, Achal Shah, Michal Kosinski, David Stillwell, Martin
Seligman, and Lyle Ungar. Personality, gender, and age in the language of social
media: The open-vocabulary approach. PloS one, 8:e73791, 09 2013.

[20] T Story, Cynthia Berg, Timothy Smith, Ryan Beveridge, Nancy Henry, and Gale
Pearce. Age, marital satisfaction, and optimism as predictors of positive sentiment
override in middle-aged and older married couples. Psychology and aging, 22:719–
27, 01 2008.

[21] Duyu Tang, Furu Wei, Nan Yang, Ming Zhou, Ting Liu, and Bing Qin. Learning
sentiment-specific word embedding for twitter sentiment classification. volume 1,
pages 1555–1565, 06 2014.

[22] Peng Wang, Bo Xu, Jiaming Xu, Guanhua Tian, Cheng-Lin Liu, and Hongwei Hao.
Semantic expansion using word embedding clustering and convolutional neural network for improving short text classification. Neurocomputing, 174, 10 2015.

[23] Daksha Yadav, Richa Singh, Mayank Vatsa, and Afzel Noore. Recognizing ageseparated face images: Humans and machines. PLOS ONE, 9(12):1–22, 12 2014.
 
 
 
 
第一頁 上一頁 下一頁 最後一頁 top
* *