疾病預測於保險上之應用:以機器學習的方法建立疾病預測模型_

帳號：guest(216.73.216.123) 離開系統

字體大小：

詳目顯示

第 1 筆 / 共 1 筆

/1頁

以作者查詢圖書館館藏

、以作者查詢臺灣博碩士論文系統

、以作者查詢全國書目

論文基本資料
摘要
外文摘要
論文目次
參考文獻
電子全文

作者(中文):	劉亭利
作者(外文):	Liou, Ting-Li
論文名稱(中文):	疾病預測於保險上之應用:以機器學習的方法建立疾病預測模型
論文名稱(外文):	Application of the Disease Prediction in Insurance: Disease Prediction Model by Machine Learning
指導教授(中文):	韓傳祥
指導教授(外文):	Han, Chuan-Hsiang
口試委員(中文):	黃能富丁台怡
口試委員(外文):	Huang, Nen-Fu Ding, Tai-Yi
學位類別:	碩士
校院名稱:	國立清華大學
系所名稱:	計量財務金融學系
學號:	106071513
出版年(民國):	109
畢業學年度:	108
語文別:	中文
論文頁數:	32
中文關鍵詞:	機器學習、心臟病預測、保險、監督式學習、分類演算法、特徵重要性
外文關鍵詞:	Machine learning、heart disease prediction、insurance、supervised learning、classification algorithms、feature importance
相關次數:	推薦:0 點閱:852 評分: 下載:0 收藏:0

本篇論文以機器學習的方法建立疾病預測模型，以心臟病作為疾病標的。在疾病預測的大框架下，分別建立心臟病短期與中期的疾病預測模型，探討兩者分別及綜合在保險上之應用價值，以解決保險業之痛點。本文以召回率(Recall)作為首要評估準則，在短期疾病預測模型中，利用Logistic regression達到召回率85.71%，並透過特徵選擇後，召回率提升至88.09%。此外，本文發現胸痛種類、運動是否引發心絞痛、螢光透視鏡看到的血管數量、缺陷種類、運動高峰期ST段斜率、性別等為短期疾病預測重要指標。另外，本文於中期疾病預測模型中，利用SMOTE加上Random Forest達到召回率68.57%，再透過特徵選擇後，召回率提升至70.86%。本文亦發現年齡、抽菸與否、性別、每天平均抽菸量、中風與否、高血壓與否等指標對於中期疾病預測結果更為重要。本文期望以短期疾病預測解決保險業理賠、詐保的痛點，並以中期疾病預測為保險業在銷售、核保、理賠帶來業務優化與改善。最後，綜合兩者為客戶打造更精細的保單規劃，為保險業帶來附加價值，提升客戶體驗。

This paper uses machine learning algorithms to build disease prediction model, with heart disease as the disease target. Under the general framework of disease prediction, this paper establishes short-term and medium-term disease prediction models of heart disease, discussing the application value of these two models in insurance domain respectively and jointly, and solving the above pain points. In this paper, in the short-term disease prediction model, Logistic regression classifier achieves the best recall performance (85.71%), and after feature selection, its recall is increased to 88.09%. In addition, through the short-term prediction model, this paper realizes chest pain type, exercise-induced angina or not, number of major vessels colored by fluoroscopy, thallium scan, the slope of the peak exercise ST segment, and sex are the most important indicators for disease prediction. In addition, in the medium-term disease prediction model, this paper uses SMOTE with Random Forest classifier achieves the best recall 68.57%, and then through feature selection, its recall rate is increased to 70.86%. In addition, the article also found that age, current smoker or not, sex, cigarette amount consumed per day, prevalent stroke or not, and prevalent hypertension or not are the most important features for mid-term disease prediction results. This paper anticipates to use short-term disease prediction to solve the pain points of insurance claims and scam; and to use mid-term disease prediction to optimize and improve the insurance industry marketing, underwriting, and claims business. Finally, to use these two disease prediction models to create more refined policy plannings for customers to bring additional value to insurance companies, and also enhance customer experience.

摘要 i
ABSTRACT ii
誌謝辭 iii
目錄 iv
圖目錄 vii
表目錄 viii
Chapter 1 Introduction 1
Chapter 2 Literature Review 3
2.1演算法 3
2.2醫療數據面臨之挑戰 3
2.3處理數據不平衡 3
2.4評估準則 4
Chapter 3 Methodology 5
3.1資料集 5
3.1.1短期 5
3.1.2中期 6
3.2採樣方法 7
3.2.1 Oversampling過採樣 7
3.2.2 Undersampling欠採樣 8
3.2.3 SMOTE 合成少數類過採樣 8
3.3模型說明 9
3.3.1 Decision Tree決策樹 9
3.3.2 Logistic Regression邏輯斯回歸 9
3.3.3 SVM支持向量機 10
3.3.4 Random Forest隨機森林 10
3.3.5 XGBoost 極限梯度提升、Light GBM 10
3.3.6 ANN 人工神經網路 11
3.4評估準則 11
3.4.1 Accuracy準確度 12
3.4.2 Recall召回率(Sensitivity靈敏度) 12
3.4.3 Specificity 特異度 12
3.4.4 ROC曲線and AUC 12
3.4.5 F-score 13
Chapter 4 Experimental Results and Discussion 14
4.1短期 14
4.1.1資料預處理 14
4.1.2模型結果 15
4.2中期 17
4.2.1資料預處理 17
4.2.2 模型結果 18
4.2.2.1 Undersampling欠採樣 + 演算法 18
4.2.2.2. Oversampling過採樣 + 演算法 19
4.2.2.3 SMOTE + 演算法 20
4.3討論 21
4.3.1特徵重要性 21
4.3.2與其他研究成果之比較 24
4.3.3保險之應用 25
Chapter 5 Conclusion 29
References 30

英文部分
1. A. Géron. Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow, 1-207, 273-286, O'Reilly, 2020
2. A. Kelleher & A. Kelleher. Machine learning in production: developing and optimizing data science workflows and applications, 125-131, Pearson Education, 2019
3. J. J. Beunza, E. Puertas, E. Garcia-Ovejero, G. Villalba, E. Condes, G. Koleva, C. Hurtado, and M. F. Landecho. Comparison of machine learning algorithms for clinical event prediction (risk of coronary heart disease), Journal of Biomedical Informatics, vol. 97, 2019
4. K. Sathya & R. Karthiban. Performance Analysis Of Heart Disease Classification For Computer Diagnosis System, International Conference on Computer Communication and Informatics (ICCCI), pp. 1-7, 2020
5. S. Mohan, C. Thirumalai, and G. Srivastava. Effective Heart Disease Prediction Using Hybrid Machine Learning Techniques, IEEE Access, vol. 7, pp. 81542-81554, 2019
6. A. U. Haq, J. P. Li, M. H. Memon, S. Nazir, and R. Sun. A Hybrid Intelligent System Framework for the Prediction of Heart Disease Using Machine Learning Algorithms, Hindawi, vol.2018, 2018
7. V. Kunwar, K. Chandel, A. S. Sabitha , and A. Bansal. Chronic Kidney Disease analysis using data mining classification techniques, 2016 6th International Conference - Cloud System and Big Data Engineering (Confluence), Noida, pp. 300-305, 2016
8. S. Bharati, M. A. Rahman and P. Podder. Breast Cancer Prediction Applying Different Classification Algorithm with Comparative Analysis using WEKA, 2018 4th International Conference on Electrical Engineering and Information & Communication Technology (iCEEiCT), Dhaka, Bangladesh, pp. 581-584, 2018
9. M. Kubat. An Introduction to Machine Learning, 91-133,173-188,211-228 Springer, 2017
10. M. Bowles. Machine Learning in Python : Essential Techniques for Predictive Analysis, 23-120,211-315, WILEY, 2015
11. B. Krawczyk. Learning from imbalanced data: open challenges and future directions, Progress in Artificial Intelligence, vol. 5, pp.221-232, 2016
12. A. Sonak, R. A. Patankar. A Survey on Methods to Handle Imbalance Dataset, IJCSMC, vol. 4, Issue. 11, pp.338-343, 2015
13. K. Black Jr. & H. D. Skipper Jr. Life & Health Insurance, Pearson Education, 1999
中文部分
1. 陳允傑. Python資料科學與人工智慧應用實務, 8-2~10-45,13-2~14-19,16-2~16-9, 旗標出版, 2019
2. 寺田學, 辻真吾, 鈴木たかのり, 福島真太朗,許郁文(譯). 用Python快速上手資料分析與機器學習, 89-262, 碁峰出版, 2019
3. 阮敬. Python數據分析基礎-包含數據挖掘和機器學習, 104-240, 469-494, 五南出版, 2019
4. 劉凡平. 大數據時代的演算法:機器學習、人工智慧及其典型實例, 5-22~5-25, 8-1~8-20, 松崗出版, 2017
5. 趙志勇. Python機器學習算法, 1-26,58-137, 電子工業出版社, 2017
6. 鄭捷. 機器學習概論:機器學習發展+演算法原理實務, 3-24~3-32,6-1~6-41,8-2~8-36,10-14~10-32佳魁資訊, 2020
7. 文淵閣工作室(編著), 鄧文淵(總監製). Python機器學習與深度學習特訓班:看得懂也會做的AI人工智慧實戰, 2-2~2-31 碁峰出版, 2019
8. 李顯正. 金融科技概論, 369-405, 新陸書局, 2018
9. K. Black Jr., H. D. Skipper Jr., 蔡政憲, 吳福山, 陳彩稚, 許文彥, 曾榮秀, 吳旭立, 康裕民, 王儷玲, 許碩芬(合譯). 人壽保險, 295-388, 中華民國人壽保險管理學會, 2004

電子全文
中英文摘要

推文
推薦
評分
引用網址
轉寄

top

詳目顯示

相關論文