帳號:guest(52.15.223.168)          離開系統
字體大小: 字級放大   字級縮小   預設字形  

詳目顯示

以作者查詢圖書館館藏以作者查詢臺灣博碩士論文系統以作者查詢全國書目
作者(中文):彭婕盈
作者(外文):Peng, Chieh-Ying
論文名稱(中文):運用特徵篩選於醫療分類問題之比較研究
論文名稱(外文):A Comparative Study of Utilizing Feature Selection in Medical Classification Problems
指導教授(中文):蘇朝墩
指導教授(外文):Su, Chao-Ton
口試委員(中文):蕭宇翔
許俊欽
薛友仁
學位類別:碩士
校院名稱:國立清華大學
系所名稱:工業工程與工程管理學系
學號:111034543
出版年(民國):113
畢業學年度:112
語文別:中文
論文頁數:51
中文關鍵詞:機器學習特徵篩選過濾法包裝法嵌入法混合式特徵篩選精神障礙乳癌
外文關鍵詞:Machine LearningFeature SelectionFilter MethodWrapper MethodEmbedded MethodHybrid Feature SelectionMental DisordersBreast Cancer
相關次數:
  • 推薦推薦:0
  • 點閱點閱:10
  • 評分評分:*****
  • 下載下載:0
  • 收藏收藏:0
在大數據時代,有效處理海量數據成為關鍵議題。機器學習和人工智慧技術的發展提供了解決方案,但資料中的冗餘特徵可能影響模型性能。特徵篩選成為重要步驟,其目標是從原始特徵中挑選出最重要的特徵子集,以提高模型效能和解釋性。常見的特徵篩選方法包括過濾法、包裝法、嵌入法和混合式特徵篩選,選擇適合的方法需考慮資料特性、模型需求和計算資源。適當的特徵篩選能夠簡化模型、提高預測性能,有助於更明智的決策和預測。
本研究旨在探討特徵篩選方法是否能提升分類模型之性能。研究中使用了過濾法(皮爾森相關係數)、包裝法(遞歸特徵消除法)、嵌入法(Lasso、Ridge、ElasticNet)、混合式特徵篩選(皮爾森相關係數+遞歸特徵消除法)等不同方法,先將重要特徵進行篩選,再透過分類模型(Adaboost、Xgboost、KNN、SVM以及BPNN)之性能以評估特徵篩選方法是否有效。藉由精神障礙資料集和乳癌資料集進行實驗,結果顯示ElasticNet是所有特徵篩選方法中表現最佳的方法,此方法能將分類模型性能提升,其中,ElasticNet與BPNN之組合,能使準確率和F1分數達到非常出色的表現。本研究印證特徵篩選方法為模型訓練之重要步驟,未來可以著重將特徵篩選方法之通用性提高,以應用於更多不同之領域。
In the era of big data, handling large datasets effectively is crucial. While machine learning and AI offer solutions, redundant features can hinder model performance. Feature selection, like filter, wrapper, embedded, and hybrid methods, aims to enhance model efficiency and interpretability by selecting the most important subset of features. Choosing the right method depends on data characteristics, model needs, and resources. Proper feature selection streamlines models, boosts predictive accuracy, and aids decision-making.
This study investigates the impact of feature selection on classification model performance. Methods used include Pearson correlation, recursive feature elimination, Lasso, Ridge, ElasticNet, and a hybrid approach. Using mental disorder and breast cancer datasets, ElasticNet emerged as the best feature selection method, significantly enhancing model performance. Notably, ElasticNet combined with BPNN achieved outstanding accuracy and F1 scores. The findings highlight the importance of feature selection in model training and suggest further research to enhance the generalizability of these methods across different domains.
目錄
目錄 V
圖目錄 VII
表目錄 VIII
第一章 緒論 1
1.1 研究背景 1
1.2 研究目的 2
1.3 研究架構 2
第二章 文獻回顧 4
2.1 特徵篩選 4
2.1.1 過濾法 4
2.1.2 包裝法 5
2.1.3 嵌入法 6
2.1.4 混合式特徵篩選 7
2.2 分類模型 9
第三章 研究方法 4
3.1 研究流程 10
3.2 特徵篩選 11
3.2.1 過濾法 13
3.2.2 包裝法 14
3.2.3 混合式特徵篩選 15
3.2.4 嵌入法 16
3.2.4.1 Lasso 16
3.2.4.2 Ridge 17
3.2.4.3 ElasticNet 17
3.3 分類模型 18
3.3.1 自適應提升 18
3.3.2 極限梯度提升 19
3.3.3 K-近鄰演算法 20
3.3.4 支持向量機 20
3.3.5 反向傳播神經網絡 20
3.4 模型評估指標 22
第四章 案例研究 24
4.1 資料集 24
4.1.1 精神障礙 24
4.1.2 乳癌 26
4.2 實驗結果 27
4.2.1 特徵篩選 27
4.2.2 模型預測結果 31
4.3 執行結果之討論 42
第五章 結論與建議 45
5.1 總結 45
5.2 未來研究方向 46
參考文獻 47

[1] Alfyani, R. (2020). Comparison of Naïve Bayes and KNN algorithms to understand hepatitis. 2020 International Seminar on Application for Technology of Information and Communication (iSemantic) , pp. 196-201.
[2] Bindu, M. G., & Sabu, M. K. (2020). A Hybrid Feature Selection Approach Using Artificial Bee Colony and Genetic Algorithm. 2020 Advanced Computing and Communication Technologies for High Performance Applications (ACCTHPA), pp. 211-216.
[3] Bouchlaghem, Younes, Yassine Akhiat, and Souad Amjad. (2022). Feature selection: a review and comparative study. E3S web of conferences, 351, pp. 1-6.
[4] Challita, N., Khalil, M., & Beauseroy, P. (2016). New feature selection method based on neural network and machine learning. 2016 IEEE International Multidisciplinary Conference on Engineering Technology (IMCET), pp. 81-85
[5] Chen, C. W., Tsai, Y. H., Chang, F. R., & Lin, W. C. (2020). Ensemble feature selection in medical datasets: Combining filter, wrapper, and embedded feature selection results. Expert Systems, 37(5), pp 1-10.
[6] Chen, Tianqi, and Carlos Guestrin. Xgboost: A scalable tree boosting system. Proceedings of the 22nd acm sigkdd international conference on knowledge discovery and data mining.
[7] Cherrington, M., Thabtah, F., Lu, J., & Xu, Q. (2019). Feature Selection: Filter Methods Performance Challenges. 2019 International Conference on Computer and Information Sciences (ICCIS), pp. 1-4.
[8] Cortes, C., & Vapnik, V. (1995). Support-vector networks. Machine learning, 20, pp. 273-297.
[9] Fix, E., & Hodges, J. L. (1952). Discriminatory analysis: Nonparametric discrimination: Small sample performance.
[10] Freund, Yoav, and Robert E. Schapire. (1995). A desicion-theoretic generalization of on-line learning and an application to boosting. European conference on computational learning theory.
[11] Gao, H. (2023). Spam Sorting Based on Naive Bayes Algorithm. 2023 International Conference on Blockchain Technology and Applications (ICBTA), pp. 81-84.
[12] Hamed, T., Dara, R., & Kremer, S. C. (2014). An Accurate, Fast Embedded Feature Selection for SVMs. 2014 13th International Conference on Machine Learning and Applications, pp. 135-140.
[13] Hoerl, A. E., & Kennard, R. W. (1970). Ridge regression: Biased estimation for nonorthogonal problems. Technometrics, 12(1), pp. 55-67.
[14] Jain, D., & Singh, V. (2018). Diagnosis of Breast Cancer and Diabetes using Hybrid Feature Selection Method. 2018 Fifth International Conference on Parallel, Distributed and Grid Computing (PDGC), pp. 64-69.
[15] Jothi, N., Syed-Mohamed, S. M., & Rajagopal, H. (2022). Hybrid Feature Selection using Shapley Value and ReliefF for Medical Datasets. 2022 International Conference on Inventive Computation Technologies (ICICT), pp. 351-355.
[16] Kaur, A., Guleria, K., & Kumar Trivedi, N. (2021). Feature Selection in Machine Learning: Methods and Comparison. 2021 International Conference on Advance Computing and Innovative Technologies in Engineering (ICACITE), pp. 789-795.
[17] Keerthana, R., Haripriya, C., Pabitha, P., & Moorthy, R. S. (2018). An Efficient Cancer Prediction Mechanism Using SA and SVM. In 2018 Second International Conference on Intelligent Computing and Control Systems (ICICCS), pp. 1140-1145.
[18] Kumar, H., & Das, H. (2023). Software Fault Prediction using Wrapper based Feature Selection Approach employing Genetic Algorithm. In 2022 OPJU International Technology Conference on Emerging Technologies for Sustainable Development (OTCON), pp. 1-7.
[19] Kurt, M. S., & Ensarı, T. (2017). Diabet diagnosis with support vector machines and multi layer perceptron. 2017 Electric Electronics, Computer Science, Biomedical Engineerings' Meeting (EBBT) ,pp. 1-4.
[20] Pemmaraju, A. G., Asish, A., & Das, S. (2022). Heart disease prediction using feature selection and machine learning techniques. In 2022 International Conference on Machine Learning, Computer Systems and Security (MLCSS) , pp. 28-33.
[21] Queen, O., & Emrich, S. J. (2021). LASSO-based feature selection for improved microbial and microbiome classification. 2021 IEEE International Conference on Bioinformatics and Biomedicine (BIBM), pp. 2301-2308.
[22] Romadhon, M. R., & Kurniawan, F. (2021). A Comparison of Naive Bayes Methods, Logistic Regression and KNN for Predicting Healing of Covid-19 Patients in Indonesia. 2021 3rd East Indonesia Conference on Computer and Information Technology (EIConCIT), pp. 41-44.
[23] Rumelhart, D. E., Hinton, G. E., & Williams, R. J. (1986). Learning representations by back-propagating errors. Nature, 323(6088), pp. 533-536.
[24] Shuran, C., & Yian, L. (2020). Breast cancer diagnosis and prediction model based on improved PSO-SVM based on gray relational analysis. 2020 19th International Symposium on Distributed Computing and Applications for Business Engineering and Science (DCABES), pp. 231-234.
[25] Sriraam, N., Nithyashri, D., Vinodashri, L., & Niranjan, P. M. (2010). Detection of uterine fibroids using wavelet packet features with BPNN classifier. 2010 IEEE EMBS Conference on Biomedical Engineering and Sciences (IECBES), pp. 406-409.
[26] Sumi, M. S. S., & Narayanan, A. (2019). Improving Classification Accuracy Using Combined Filter+Wrapper Feature Selection Technique. 2019 IEEE International Conference on Electrical, Computer and Communication Technologies (ICECCT), pp. 1-6.
[27] Syafrudin, M., Alfian, G., Fitriyani, N. L., Sidiq, A. H., Tjahjanto, T., & Rhee, J. (2020). Improving Efficiency of Self-care Classification Using PCA and Decision Tree Algorithm. In 2020 International Conference on Decision Aid Sciences and Application (DASA), pp. 224-227
[28] Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society Series B: Statistical Methodology, 58(1), pp. 267-288.
[29] Wang, G., Lauri, F., & El Hassani, A. H. (2022). Feature selection by mRMR method for heart disease diagnosis. IEEE Access, 10, pp. 100786-100796.
[30] Wu, M., & Liu, X. (2020). A double weighted KNN algorithm and its application in the music genre classification. 2019 6th International Conference on Dependable Systems and Their Applications (DSA), pp. 335-340.
[31] Wulandari, U. M., Warsito, B., & Farikin, F. (2023). Survival Information System Using ReliefF Feature Selection and Backpropagation in Hepatocellular Carcinoma Disease. In 2023 International Seminar on Intelligent Technology and Its Applications (ISITIA). pp. 37-42.
[32] Yao, Y. (2023). Prediction of hospital stay for patients undergoing partial hip replacement surgery based on machine learning algorithms. In 2023 IEEE 3rd International Conference on Data Science and Computer Application (ICDSCA), pp. 996-999.
[33] Zhang, X., Lu, X., Shi, Q., Xu, X., Leung, H. C., Harris, L. N., et al. (2006). Recursive SVM feature selection and sample classification for mass-spectrometry and microarray data. BMC Bioinformatics, 7(197), pp. 1-13.
[34] Zou, H., & Hastie, T. (2005). Regularization and variable selection via the elastic net. Journal of the Royal Statistical Society Series B: Statistical Methodology, 67(2), pp. 301-320.
 
 
 
 
第一頁 上一頁 下一頁 最後一頁 top
* *