帳號:guest(18.223.107.85)          離開系統
字體大小: 字級放大   字級縮小   預設字形  

詳目顯示

以作者查詢圖書館館藏以作者查詢臺灣博碩士論文系統以作者查詢全國書目
作者(中文):陳俊穎
作者(外文):Chen, Chun-Ying
論文名稱(中文):利用電子病歷資料庫預測腦中風復發
論文名稱(外文):Predicting Ischemic Stroke Recurrence using Electronic Health Record
指導教授(中文):謝文萍
指導教授(外文):Hsieh, Wen-Ping
口試委員(中文):林華君
張國軒
學位類別:碩士
校院名稱:國立清華大學
系所名稱:統計學研究所
學號:106024510
出版年(民國):108
畢業學年度:107
語文別:英文
論文頁數:62
中文關鍵詞:電子病歷腦中風復發機器學習隨機森林循環神經網路注意力機制
外文關鍵詞:Electronic Health RecordIschemic Stroke RecurrenceMachine LearningRandom ForestGated Recurrent UnitAttention Mechanism
相關次數:
  • 推薦推薦:0
  • 點閱點閱:119
  • 評分評分:*****
  • 下載下載:0
  • 收藏收藏:0
腦中風的復發一直是醫療領域中重要的研究問題,因此過去有許多研究開發了各種模型和分數來預測住院後腦中風復發的可能性。本研究使用的特徵組包含了以前文獻中未使用的特徵,例如實驗室檢測資料、醫療費用、以及非結構化的文字診斷變數。我們對原始特徵進行了特徵工程和特徵預處理,構建了多種機器學習模型,包括(1)邏輯回歸,(2)隨機森林,(3)基於注意力機制的GRU神經網絡來預測腦中風復發的概率,並比較了不同模型的預測表現。與過往的研究相比,我們的模型在接收者操作特征曲線下面積(Area under ROC curve)有不錯的表現。除此之外,本研究使用(1)特徵重要性、(2)部分依賴圖(Partial dependence plot)、(3)注意力機制可視化等方法來了解最具預測能力的重要特徵以及其影響預測的方向與程度。
Ischemic stroke recurrence has always been a serious problem in health care, therefore there has been various models and scores developed to predict the possibility of stroke recurrence after hospitalization. Instead of modeling stroke recurrence using only clinical risk factors as past researches did, we use a set of features that contained features not used before in literature, such as laboratory tests, hospitalization fees, and unstructured medical notes. We performed feature engineering and feature preprocessing on raw features, and constructed multiple machine learning models including (1) logistic regression, (2) random forest, and (3) attention based GRU neural network to model the probability of stroke recurrence, and reported the predictive performance for each model. Our model performance evaluated using area under receiver operating characteristic curve is satisfactory comparing to previous researches. Moreover, we performed (1) permutation feature importance, (2) partial dependence plot, and (3) visualization of attention using the above models to prevent black-box prediction and provide insights on what the most predictive features are and how the important features affected prediction.
Contents
Introduction 5
Methods 8
2.1 Analysis workflow 8
2.2 Neural network 9
2.3 GRU 11
2.4 Attention mechanism 13
2.5 Binary word feature generation 14
2.6 Word2vec text embedding 15
2.7 Logistic regression 15
2.8 Random forest 16
2.9 Permutation feature importance 18
2.10 Partial dependence plot 18
Results 19
3.1 Data 19
3.1.1 Data collection and filter 19
3.1.2 Data partition 21
3.1.3 Data features 21
3.2 Data preprocessing 22
3.2.1 Missing values 22
3.2.2 One hot encoding 22
3.2.3 Feature scaling 22
3.2.4 Text preprocessing 23
3.3 Feature engineering 23
3.4 Model performance 25
3.4.1 Conventional classification models 26
3.4.2 Different subset of training data points 27
3.4.3 Neural network model for text data 28
3.4.4 Ensemble model 31
3.4.5 Model performance comparison 32
3.5 Model interpretation 33
3.5.1 Features extracted by Random Forest 33
3.5.2 Interpretation of Neural network model on text data 40
Discussion 45
Reference 47
Appendix A Feature list before engineering 51
Appendix B Other partial dependence plots 53
[1] Samsa GP, Bian J, Lipscomb J, Matchar DB. Epidemiology of recurrent cerebral infarction: a medicare claims-based comparison of first and recurrent strokes on 2-year survival and cost. Stroke. 1999;30:338–349
[2] CAPRIE Steering Committee. A randomised, blinded, trial of Clopidogrel versus Aspirin in Patients at Risk of Ischaemic Events (CAPRIE). Lancet. 1996;348:1329 –1339.
[3] Diener HC, Ringleb PA, Savi P. Clopidogrel for the secondary prevention of stroke. Exp Opin Pharmacother. 2005;6:755–764.
[4] Kernan WN, Viscoli CM, Brass LM, Makuch RW, Sarrel PM, Roberts RS, Gent M, Rothwell P, Sacco RL, Liu RC, Boden-Albala B, Horwitz RI. The Stroke Prognosis Instrument II (SPI-II): a clinical prediction instrument for patients with transient ischemia and nondisabling ischemic stroke. Stroke. 2000;31:456 – 462.
[5] Hankey GJ, Slattery JM, Warlow CP. Can the long term outcome of individual patients with transient ischaemic attacks be predicted accurately? J Neurol Neurosurg Psychiatry. 1993;56:752–759.
[6] van Wijk I, Kappelle LJ, van Gijn J, Koudstaal PJ, Franke CL, Vermeulen M, Gorter JW, Algra A. Long-term survival and vascular event risk after transient ischaemic attack or minor ischaemic stroke: a cohort study. Lancet. 2005;365:2098 –2104.
[7] Weimar C, Bennemann J, Michalski D, Muller M, Luckner K, Katsarava Z, Weber R, Diener H-C: Prediction of recurrent stroke and vascular health in patients with transient ischemic attack or nondisabling stroke: a prospective comparison of validated prognostic scores. Stroke 2010;41:487–493.
[8] Ay H, Gungor L, Arsava EM, et al. A score to predict early risk of recurrence after ischemic stroke. Neruology 2010;74:128e35.
[9] Jingshu Liu, Zachariah Zhang, and Narges Razavian. Deep ehr: Chronic disease prediction using medical notes. arXiv preprint arXiv:1808.04928, 2018.
[10] Geraci J, Wilansky P, Luca VD, Roy A, Kennedy JL, Strauss J. Applying deep neural networks to unstructured text notes in electronic medical records for phenotyping youth depression. EBMH. 2017; 20:83-87
[11] Z. C. Lipton. The Mythos of Model Interpretability. ArXiv e-prints, June 2016.
[12] Balázs Csanád Csáji (2001) Approximation with Artificial Neural Networks; Faculty of Sciences; Eötvös Loránd University, Hungary
[13] A. Krizhevsky, I. Sutskever, and G. E. Hinton. Imagenet classification with deep convolutional neural networks. In Advances in Neural Information Processing Systems 25, pages 1106–1114, 2012.
[14] Werbos, P.J. (1975). Beyond Regression: New Tools for Prediction and Analysis in the Behavioral Sciences.
[15] Cho, Kyunghyun; van Merrienboer, Bart; Gulcehre, Caglar; Bahdanau, Dzmitry; Bougares, Fethi; Schwenk, Holger; Bengio, Yoshua (2014). "Learning Phrase Representations using RNN Encoder-Decoder for Statistical Machine Translation"
[16] Williams, Ronald J.; Hinton, Geoffrey E.; Rumelhart, David E. (October 1986). "Learning representations by back-propagating errors". Nature. 323 (6088): 533–536.
[17] Bahdanau, D., Cho, K., and Bengio, Y. Neural machine translation by jointly learning to align
and translate. In International Conference on Learning Representations (2015).
[18] Yang Z, Yang D, Dyer C, He X, Smola AJ, and Hovy EH. Hierarchical attention networks for document classification. In Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT 2016), 2016.
[19] Heng-Tze Cheng, Levent Koc, Jeremiah Harmsen, Tal Shaked, Tushar Chandra, Hrishi Aradhye, Glen Anderson, Greg Corrado, Wei Chai, Mustafa Ispir, Rohan Anil, Zakaria Haque, Lichan Hong, Vihan Jain, Xiaobing Liu, and Hemal Shah. 2016. Wide & Deep Learning for Recommender Systems. In DLRS. 7–10
[20] Pearson, K. (1900). On the criterion that a given system of deviations from the probable in the case of a correlated system of variables is such that it can be reasonably supposed to have arisen from random sampling. Philosophical Magazine Series, 5, 50 (302): 157–175.
[21] Tomas Mikolov, K. C., Greg Corrado,Jeffrey Dean (2013). Efficient Estimation of Word Representations in Vector Space. International Conference on Learning Representations.
[22] Harris, Z. (1954). Distributional structure. Word, 10(23): 146–162.
[23] Cox, D. R. (1958). The regression analysis of binary sequences. Journal of the Royal Statistical Society. Series B (Methodological), 20(2), 215-242.
[24] Ho, Tin Kam (1995). Random Decision Forests. Proceedings of the 3rd International Conference on Document Analysis and Recognition, Montreal, QC, 14–16 August 1995. pp. 278–282.
[25] L. Breiman. Random forests. Machine Learning, 45(1): 5–32, 2001.
[26] Fisher, Aaron, Cynthia Rudin, and Francesca Dominici. “Model Class Reliance: Variable importance measures for any machine learning model class, from the ‘Rashomon’ perspective.” 2018.
[27] Friedman, Jerome H. “Greedy function approximation: A gradient boosting machine.” Annals of statistics (2001): 1189-1232.
[28] DeLong, E.R., DeLong, D.M., Clarke-Pearson, D.L., 1988. Comparing the areas under two or more correlated receiver operating characteristic curves: a non-parametric approach. Biometrics 44, 837–845.
[29] Billy Chiu, Gamal Crichton, Sampo Pyysalo, and Anna Korhonen. 2016. How to train good word embeddings for biomedical NLP. In Proceedings of BioNLP’16.
[30] J. Ratcliff, and D. Metzener, “Pattern Matching: The Gestalt Approach,” in Dr. Dobb’s Journal, Jul. 1988.
[31] N. Srivastava, G. Hinton, A. Krizhevsky, I. Sutskever, and R. Salakhutdinov. Dropout: A simple way to prevent neural networks from overfitting. The Journal of Machine Learning Research, pages 1929–1958, 2014
 
 
 
 
第一頁 上一頁 下一頁 最後一頁 top
* *