帳號:guest(3.133.137.53)          離開系統
字體大小: 字級放大   字級縮小   預設字形  

詳目顯示

以作者查詢圖書館館藏以作者查詢臺灣博碩士論文系統以作者查詢全國書目
作者(中文):鐘淳笙
作者(外文):Chung, Chun-Sheng
論文名稱(中文):於社群網路辨別藥物效果之自我陳述
論文名稱(外文):Identifying Self-Reports of Drug Effects on Social Media Platform
指導教授(中文):陳宜欣
指導教授(外文):Chen, Yi-Shin
口試委員(中文):彭文志
賴郁雯
口試委員(外文):Peng, Wen-Chih
Lai, Yu-Wen
學位類別:碩士
校院名稱:國立清華大學
系所名稱:資訊工程學系
學號:105062520
出版年(民國):108
畢業學年度:108
語文別:英文
論文頁數:58
中文關鍵詞:自然語言處理社群媒體藥物警戒機器學習
外文關鍵詞:Natural Language ProcessingSocial MediaPharmacovigilanceMachine Learning
相關次數:
  • 推薦推薦:0
  • 點閱點閱:839
  • 評分評分:*****
  • 下載下載:0
  • 收藏收藏:0
了解藥物的各種正面與負面效應對於個人與公共福祉至關重要,但現有的自發報告系統對於藥物使用者缺乏誘因,所收集的自我報告數量可能受到影響。為了彌補這點,社群媒體──特別是線上論壇──蘊藏了豐富的藥物使用者對於藥物效果的自我報告,可以作為潛在的補充資料來源。然而,使用者撰寫文章的自由度與討論主題多樣性,使得從這些文章抽取正確的自我報告並辨別種類相當具有挑戰性。本研究提出了一個得以抽取更多潛在的自我報告候選並有效過濾雜訊的新流程,並為了探討各種技術的效果,分別發表兩個不同的文本分類方法:於傳統文字特徵融入觀點資訊的Sentimental Skip-Gram Pattern,以及藉由額外訓練目標引導上游語言模型之注意力機制的Multi-Target BERT。使用自行從線上論壇蒐集的資料集以及其他基準資料集所進行的實驗結果指出,報告候選抽取流程與Multi-Target BERT擁有良好的性能。
It is crucial to understand potential therapeutic and adverse effects of drugs, but for current spontaneous report systems, the lack incentive for patients may reduce the amount of reports submitted. To fill this gap, online social media can be exploited as alternative sources of self-reports of drug effects directly from patients. However, the nature of complexity and variety for user-written texts makes the detection and classification of such self-reports very challenging.
This work introduces a new process that captures more potential self-reports and filters unrelated texts effectively, and, to evaluate different techniques, propose two separate text classification methods: Sentimental Skip-Gram Pattern that embeds sentimental information to traditional text features, and Multi-Target BERT, a novel text classifier that directs the attention mechanism of upstream language model with extra training target.
The results of experiments conducted with original dataset and several benchmark datasets show that the candidate extraction process and Multi-Target BERT have superior performance metrics.
1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
2 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2.1 Drug-Symptom Causal Inference with Spontaneous Reports . . . . . . . . . 6
2.2 Studies of Drug and Symptoms in Natural Texts . . . . . . . . . . . . . . . 7
2.3 Pre-trained Language Models for words and texts . . . . . . . . . . . . . . 10
3 Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
3.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
3.2 Self-Report Candidate Extraction . . . . . . . . . . . . . . . . . . . . . . . 13
3.2.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
3.2.2 Sentence Bi-gram Text Segmentation . . . . . . . . . . . . . . . . 15
3.2.3 Drug-Symptom Pair Construction . . . . . . . . . . . . . . . . . . 16
3.2.4 Fuzzy Matching . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
3.2.5 Drug and Symptom Dropout . . . . . . . . . . . . . . . . . . . . . 18
3.3 Enhancing Traditional Classifier: Sentimental Skip-Gram Pattern . . . . . . 18
3.3.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
3.3.2 Enriching Skip-gram Features with Sentimental Information . . . . 19
3.3.3 Feature Extraction and SVM Setup . . . . . . . . . . . . . . . . . . 20
3.4 Attention-Based Neural Classifier: Multi-Target BERT . . . . . . . . . . . 21
vi
3.4.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
3.4.2 Input Text Preprocessing . . . . . . . . . . . . . . . . . . . . . . . 23
3.4.3 Drug/Symptom Masking . . . . . . . . . . . . . . . . . . . . . . . 23
3.4.4 Further Pre-training BERT . . . . . . . . . . . . . . . . . . . . . . 23
3.4.5 Relation Classification Head . . . . . . . . . . . . . . . . . . . . . 24
3.4.6 Drug/Symptom Identification Head . . . . . . . . . . . . . . . . . 24
3.4.7 Loss Value Weighting . . . . . . . . . . . . . . . . . . . . . . . . . 25
3.4.8 Loss Function and Training Optimizer . . . . . . . . . . . . . . . . 26
4 Experiments & Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
4.1 Data Collection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
4.1.1 Data Source Selection . . . . . . . . . . . . . . . . . . . . . . . . . 27
4.1.2 Text Collection and Sentence Bi-Gram Extraction . . . . . . . . . . 27
4.1.3 Drug-Symptom Pairs . . . . . . . . . . . . . . . . . . . . . . . . . 28
4.1.4 Drug-Symptom Segment Extraction . . . . . . . . . . . . . . . . . 29
4.1.5 Data Annotation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
4.2 Data Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
4.2.1 Vocabulary Analysis . . . . . . . . . . . . . . . . . . . . . . . . . 30
4.2.2 Impact of Sentence Bi-Gram Splitting to Data Distribution . . . . . 34
4.3 Drug Effect Experience Classification Results . . . . . . . . . . . . . . . . 36
4.3.1 Sentimental Skip-Gram Pattern Performance . . . . . . . . . . . . 36
4.3.2 Multi-Target BERT Structure Performance . . . . . . . . . . . . . . 40
vii
4.3.3 Performance Comparison of BERT-Based Models with Different
Pre-trained Weights . . . . . . . . . . . . . . . . . . . . . . . . . . 43
4.3.4 Multi-Target BERT Case Study . . . . . . . . . . . . . . . . . . . . 45
4.4 Multi-Target BERT Experiments on other Biomedical Relation Extraction
Tasks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
4.4.1 Experimental Setup . . . . . . . . . . . . . . . . . . . . . . . . . . 47
5 Conclusions and Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . 49
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
[1] Nestor Alvaro, Mike Conway, Son Doan, Christoph Lofi, John Overington, and Nigel
Collier. Crowdsourcing twitter annotations to identify first-hand experiences of prescription drug use. Journal of biomedical informatics, 58:280–287, 2015.
[2] Alan R Aronson. Effective mapping of biomedical text to the umls metathesaurus:
the metamap program. In Proceedings of the AMIA Symposium, page 17. American
Medical Informatics Association, 2001.
[3] Sairam Balani and Munmun De Choudhury. Detecting and characterizing mental
health related self-disclosure in social media. In Proceedings of the 33rd Annual
ACM Conference Extended Abstracts on Human Factors in Computing Systems, pages
1373–1378. ACM, 2015.
[4] Andrew Bate, Marie Lindquist, I Ralph Edwards, Sten Olsson, Roland Orre, Anders Lansner, and R Melhado De Freitas. A bayesian neural network method for
adverse drug reaction signal generation. European journal of clinical pharmacology,
54(4):315–321, 1998.
51
[5] Iz Beltagy, Arman Cohan, and Kyle Lo. Scibert: Pretrained contextualized embeddings for scientific text. arXiv preprint arXiv:1903.10676, 2019.
[6] Olivier Bodenreider. The unified medical language system (umls): integrating
biomedical terminology. Nucleic acids research, 32(suppl 1):D267–D270, 2004.
[7] Danushka Bollegala, Simon Maskell, Richard Sloane, Joanna Hajne, and Munir Pirmohamed. Causality patterns for detecting adverse drug reactions from social media:
Text mining approach. JMIR public health and surveillance, 4(2), 2018.
[8] Arjan ER Bos, Daphne Kanner, Peter Muris, Birgit Janssen, and Birgit Mayer. Mental
illness stigma and disclosure: Consequences of coming out of the closet. Issues in
Mental Health Nursing, 30(8):509–513, 2009.
[9] Alex Bravo, Janet Pi ` nero, N ˜ uria Queralt-Rosinach, Michael Rautschka, and Laura I ´
Furlong. Extraction of relations between genes and diseases from text and large-scale
data analysis: implications for translational research. BMC bioinformatics, 16(1):55,
2015.
[10] Yoonjung Choi and Janyce Wiebe. +/-EffectWordNet: Sense-level lexicon acquisition
for opinion inference. In Proceedings of the 2014 Conference on Empirical Methods
in Natural Language Processing (EMNLP), pages 1181–1191, Doha, Qatar, October
2014. Association for Computational Linguistics.
[11] Anne Cocos, Alexander G Fiks, and Aaron J Masino. Deep learning for pharmacovigilance: recurrent neural network architectures for labeling adverse drug reactions in
52
twitter posts. Journal of the American Medical Informatics Association, 24(4):813–
821, 2017.
[12] Munmun De Choudhury and Sushovan De. Mental health discourse on reddit: Selfdisclosure, social support, and anonymity. In Eighth International AAAI Conference
on Weblogs and Social Media, 2014.
[13] Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. Bert: Pretraining of deep bidirectional transformers for language understanding. arXiv preprint
arXiv:1810.04805, 2018.
[14] William DuMouchel and Rave Harpaz. Regression-adjusted gps algorithm (rgps).
Oracle Health Sci, 2012.
[15] William DuMouchel and Daryl Pregibon. Empirical bayes screening for multi-item
associations. In Proceedings of the seventh ACM SIGKDD international conference
on Knowledge discovery and data mining, pages 67–76. ACM, 2001.
[16] SJW Evans, Patrick C Waller, and S Davis. Use of proportional reporting ratios (prrs)
for signal generation from spontaneous adverse drug reaction reports. Pharmacoepidemiology and drug safety, 10(6):483–486, 2001.
[17] Manfred Hauben and Xiaofeng Zhou. Quantitative methods in pharmacovigilance.
Drug safety, 26(3):159–186, 2003.
[18] Benjamin Honigman, Joshua Lee, Jeffrey Rothschild, Patrice Light, Russell M
Pulling, Tony Yu, and David W Bates. Using computerized data to identify adverse
53
drug events in outpatients. Journal of the American Medical Informatics Association,
8(3):254–266, 2001.
[19] Trung Huynh, Yulan He, Alistair Willis, and Stefan Ruger. Adverse drug reaction ¨
classification with deep neural networks. Coling, 2016.
[20] Srinivasan V Iyer, Rave Harpaz, Paea LePendu, Anna Bauer-Mehren, and Nigam H
Shah. Mining clinical text for signals of adverse drug-drug interactions. Journal of
the American Medical Informatics Association, 21(2):353–362, 2013.
[21] Keyuan Jiang, Ricardo Calix, and Matrika Gupta. Construction of a personal experience tweet corpus for health surveillance. In Proceedings of the 15th workshop on
biomedical natural language processing, pages 128–135, 2016.
[22] Keyuan Jiang and Yujing Zheng. Mining twitter data for potential drug effects. In
International conference on advanced data mining and applications, pages 434–443.
Springer, 2013.
[23] Diederik P Kingma and Jimmy Ba. Adam: A method for stochastic optimization.
arXiv preprint arXiv:1412.6980, 2014.
[24] Michael Kuhn, Monica Campillos, Ivica Letunic, Lars Juhl Jensen, and Peer Bork.
A side effect resource to capture phenotypic effects of drugs. Molecular systems
biology, 6(1), 2010.
[25] Michael Kuhn, Ivica Letunic, Lars Juhl Jensen, and Peer Bork. The sider database of
drugs and side effects. Nucleic acids research, 44(D1):D1075–D1079, 2015.
54
[26] Robert Leaman, Laura Wojtulewicz, Ryan Sullivan, Annie Skariah, Jian Yang, and
Graciela Gonzalez. Towards internet-age pharmacovigilance: extracting adverse drug
reactions from user posts to health-related social networks. In Proceedings of the 2010
workshop on biomedical natural language processing, pages 117–125. Association
for Computational Linguistics, 2010.
[27] Jinhyuk Lee, Wonjin Yoon, Sungdong Kim, Donghyeon Kim, Sunkyu Kim, Chan Ho
So, and Jaewoo Kang. Biobert: pre-trained biomedical language representation model
for biomedical text mining. arXiv preprint arXiv:1901.08746, 2019.
[28] Paea LePendu, Srinivasan V Iyer, Anna Bauer-Mehren, Rave Harpaz, Jonathan M
Mortensen, Tanya Podchiyska, Todd A Ferris, and Nigam H Shah. Pharmacovigilance
using clinical notes. Clinical pharmacology & therapeutics, 93(6):547–555, 2013.
[29] Tsung-Yi Lin, Priya Goyal, Ross Girshick, Kaiming He, and Piotr Dollar. Focal loss ´
for dense object detection. In Proceedings of the IEEE international conference on
computer vision, pages 2980–2988, 2017.
[30] Wan-Ying Lin, Xinzhi Zhang, Hayeon Song, and Kikuko Omori. Health information
seeking in the web 2.0 age: Trust in social media, uncertainty reduction, and selfdisclosure. Computers in Human Behavior, 56:289–294, 2016.
[31] Xiao Liu, Jing Liu, and Hsinchun Chen. Identifying adverse drug events from health
social media: a case study on heart disease discussion forums. In International conference on smart health, pages 25–36. Springer, 2014.
[32] Xiao Ma, Jeff Hancock, and Mor Naaman. Anonymity, intimacy and self-disclosure
55
in social media. In Proceedings of the 2016 CHI conference on human factors in
computing systems, pages 3857–3869. ACM, 2016.
[33] Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean. Efficient estimation of
word representations in vector space. arXiv preprint arXiv:1301.3781, 2013.
[34] Azadeh Nikfarjam, Abeed Sarker, Karen O’Connor, Rachel Ginn, and Graciela
Gonzalez. Pharmacovigilance from social media: mining adverse drug reaction mentions using sequence labeling with word embedding cluster features. Journal of the
American Medical Informatics Association, 22(3):671–681, 2015.
[35] G Niklas Noren, Andrew Bate, Roland Orre, and I Ralph Edwards. Extending the ´
methods used to screen the who drug safety database towards analysis of complex associations and improved accuracy for rare events. Statistics in medicine, 25(21):3740–
3757, 2006.
[36] J.F. Nunn. Ancient Egyptian Medicine. University of Oklahoma Press, 2002.
[37] Roland Orre, Anders Lansner, Andrew Bate, and Marie Lindquist. Bayesian neural
networks with confidence estimations applied to data mining. Computational Statistics & Data Analysis, 34(4):473–493, 2000.
[38] James W Pennebaker, Ryan L Boyd, Kayla Jordan, and Kate Blackburn. The development and psychometric properties of liwc2015. Technical report, 2015.
[39] Jeffrey Pennington, Richard Socher, and Christopher Manning. Glove: Global vectors
for word representation. In Proceedings of the 2014 conference on empirical methods
in natural language processing (EMNLP), pages 1532–1543, 2014.
56
[40] Matthew E Peters, Mark Neumann, Mohit Iyyer, Matt Gardner, Christopher Clark,
Kenton Lee, and Luke Zettlemoyer. Deep contextualized word representations. arXiv
preprint arXiv:1802.05365, 2018.
[41] Alec Radford, Karthik Narasimhan, Tim Salimans, and Ilya Sutskever. Improving language understanding by generative pre-training. URL https://s3-us-west-2. amazonaws. com/openai-assets/researchcovers/languageunsupervised/language understanding paper. pdf, 2018.
[42] Kenneth J Rothman, Stephan Lanes, and Susan T Sacks. The reporting odds ratio and
its advantages over the proportional reporting ratio. Pharmacoepidemiology and drug
safety, 13(8):519–523, 2004.
[43] Nicolas Rusch, Matthias C Angermeyer, and Patrick W Corrigan. Mental illness ¨
stigma: Concepts, consequences, and initiatives to reduce stigma. European psychiatry, 20(8):529–539, 2005.
[44] Marc Suling and Iris Pigeot. Signal detection and monitoring based on longitudinal
healthcare data. Pharmaceutics, 4(4):607–640, 2012.
[45] Ana Szarfman, Stella G Machado, and Robert T O’neill. Use of screening algorithms and computer systems to efficiently signal higher-than-expected combinations
of drugs and events in the us fda’s spontaneous reports database. Drug Safety,
25(6):381–392, 2002.
[46] Yla R Tausczik and James W Pennebaker. The psychological meaning of words: Liwc
57
and computerized text analysis methods. Journal of language and social psychology,
29(1):24–54, 2010.
[47] Erik M Van Mulligen, Annie Fourrier-Reglat, David Gurwitz, Mariam Molokhia,
Ainhoa Nieto, Gianluca Trifiro, Jan A Kors, and Laura I Furlong. The eu-adr corpus: annotated drugs, diseases, targets, and their relationships. Journal of biomedical
informatics, 45(5):879–884, 2012.
[48] Zhengyan Zhang, Xu Han, Zhiyuan Liu, Xin Jiang, Maosong Sun, and Qun Liu.
Ernie: Enhanced language representation with informative entities. arXiv preprint
arXiv:1905.07129, 2019.
[49] Yukun Zhu, Ryan Kiros, Rich Zemel, Ruslan Salakhutdinov, Raquel Urtasun, Antonio
Torralba, and Sanja Fidler. Aligning books and movies: Towards story-like visual
explanations by watching movies and reading books. In Proceedings of the IEEE
international conference on computer vision, pages 19–27, 2015.
(此全文未開放授權)
電子全文
中英文摘要
 
 
 
 
第一頁 上一頁 下一頁 最後一頁 top
* *