帳號:guest(18.218.84.3)          離開系統
字體大小: 字級放大   字級縮小   預設字形  

詳目顯示

以作者查詢圖書館館藏以作者查詢臺灣博碩士論文系統以作者查詢全國書目
作者(中文):許季庭
作者(外文):Hsu, Chi-Ting
論文名稱(中文):應用混合式過濾器模型及多目標簡化群體演算法解決基因型態資料之基因選取問題
論文名稱(外文):Appling hybrid filter model and multi-objective simplified swarm optimization for gene selection problem using gene expression data
指導教授(中文):葉維彰
指導教授(外文):Yeh, Wei-Chang
口試委員(中文):黃佳玲
劉淑範
口試委員(外文):Huang, Chia-Ling
Liu, Shu-Fan
學位類別:碩士
校院名稱:國立清華大學
系所名稱:工業工程與工程管理學系
學號:104034517
出版年(民國):106
畢業學年度:105
語文別:英文
論文頁數:51
中文關鍵詞:簡化群體演算法基因選取多目標規劃層級分析法支撐向量機
外文關鍵詞:Simplified swarm optimization (SSO)Gene selectionMulti-objective optimizationsanalytic hierarchy process (AHP)support vector machine (SVM)
相關次數:
  • 推薦推薦:0
  • 點閱點閱:150
  • 評分評分:*****
  • 下載下載:36
  • 收藏收藏:0
近幾十年來,資料的資料量及資料維度日益漸增。而在處理龐大資料時,有鑒於特徵選取能夠降低雜訊及找尋具有鑑別力的特徵,特徵選取的相關議題越來越備受重視。在現今,特徵選取已被廣泛應用於不同領域,舉例來講,在醫學上醫療人員可以透過特徵選取來找尋與癌症較為相關的眾多特徵並檢測出癌症徵兆,而此時特徵選取也可稱為基因選取。一般在處理基因選取問題時,我們期望能夠在選到最少基因的情況下去最大化分類的正確率。因此,在本質上來講,基因選取是一個雙目標問題。有鑑於此,為了更貼近其問題本質,本研究提出了一個多目標基因選取方法,其方法是結合了多個過濾器模型及多目標簡化群體演算法所完成之方法。在本方法中,首先我們會建立一個包含五種過濾器模型的層級架構並用層級分析法去決定哪些基因該被剔除。而在多數的非相關基因被剔除後,藉由多目標簡化群體演算法及支撐向量機搭配留一驗證法以評估迭代過程中各個基因集合的表現並找出最佳的基因集合。而在最終的實驗結果當中,本研究以十組癌症相關的資料做為測試資料,以本研究所提出之方法與過去文獻所提出之方法進行筆較;結果顯示,本研究提出之方法是能夠在選到較少基因的情況下得到較高的分類正確率。
In recent decades, feature selection has become an important issue since it is capable of reducing the noise and find discriminate features while dealing with high dimensional data. Nowadays, feature selection has been widely applied to many research field, such as medical research related to cancer. Feature selection can also called gene selection in the context of finding discriminate genes related to cancer. Via gene selection, medical personnel can detect the symptoms of cancer at medical early stage. Generally, while dealing with gene selection problem, the maximization of the classification accuracy and the minimization of number of selected genes need to be satisfied simultaneously. As a result, to be closer to reality, in this research, we develop a gene selection method with multi-objective design while dealing with 10 benchmark gene expression datasets. The proposed method is called hybrid filter models and multi-objective simplified swarm optimization (HF-MOSSO). In the HF-MOSSO, firstly, the AHP structure with five filter models is constructed to remove most irrelevant genes. After irrelevant genes have been removed, we develop a multi-objective simplified swarm optimization (MOSSO) using SVM with leave-one-out cross validation (LOOCV) as evaluator to evaluate the performance of gene subsets. In the experimental results, the proposed HF-MOSSO is compared with some previous methods in the literature and the results showed that the proposed method can obtain better accuracy while choosing less genes.
LIST OF CONTENTS
致謝 I
中文摘要 II
ABSTRACT III
LIST OF CONTENTS IV
LIST OF TABLES VI
LIST OF FIGURES VII
CHAPTER 1 INTRODUCTION 1
1.1 BACKGROUND AND MOTIVATION 1
1.2 RESEARCH FRAMEWORK 5
CHAPTER 2 LITERATURE REVIEW 6
2.1 FEATURE SELECTION MODEL 6
2.2 GENE SELECTION 7
2.3 META-HEURISTIC ALGORITHMS IN GENE SELECTION 7
2.4 MULTI-OBJECTIVE OPTIMIZATIONS 8
CHAPTER 3 METHODOLOGY 11
3.1 ANALYTIC HIERARCHY PROCESS 11
3.2 RANDOM FOREST 12
3.3 ANALYSIS OF VARIANCE (ANOVA) 13
3.4 ENTROPY BASED FILTER FEATURE SELECTION METHODS 14
3.5 FILTER FEATURE SELECTION BASED ON CRAMER’S V 15
3.6 SIMPLIFIED SWARM OPTIMIZATION (SSO) 16
3.7 SUPPORT VECTOR MACHINE 17
CHAPTER 4 THE PROPOSED ALGORITHM 20
4.1 THE AHP STRUCTURE 20
4.2 NOTATIONS 21
4.3 FEASIBLE SEARCH SPACE REDUCTION 22
4.4 FITNESS FUNCTION 24
4.5 MULTI-OBJECTIVE SIMPLIFIED SWARM OPTIMIZATION (MOSSO) 24
4.6 THE PROPOSED HF-MOSSO 27
CHAPTER 5 EXPERIMENTAL RESULTS 30
5.1 EXPERIMENT DATA 30
5.2 WEIGHTS OF FIVE FILTER MODELS 31
5.3 DESIGN OF EXPERIMENTS 31
5.4 COMPARED TO PREVIOUS METHODS 34
5.5 COMPARED TO ORIGINAL SSO 42
CHAPTER 6 CONCLUSIONS 44
6.1 DISCUSSIONS AND CONCLUSION 44
REFERENCES 46


CHAPTER 6 CONCLUSIONS 44
6.1 DISCUSSIONS AND CONCLUSION 44
REFERENCES 46

[1] K. D. Bailey, Typologies and taxonomies: an introduction to classification techniques vol. 102: Sage, 1994.
[2] H. Hackl, F. S. Cabo, A. Sturn, O. Wolkenhauer, and Z. Trajanoski, "Analysis of DNA microarray data," Current topics in medicinal chemistry, vol. 4, pp. 1355-1368, 2004.
[3] Y. Saeys, I. Inza, and P. Larrañaga, "A review of feature selection techniques in bioinformatics," bioinformatics, vol. 23, pp. 2507-2517, 2007.
[4] V. Bolón-Canedo, N. Sánchez-Maroño, A. Alonso-Betanzos, J. M. Benítez, and F. Herrera, "A review of microarray datasets and applied feature selection methods," Information Sciences, vol. 282, pp. 111-135, 2014.
[5] M. S. Mohamad, S. Omatu, M. Yoshioka, and S. Deris, "A cyclic hybrid method to select a smaller subset of informative genes for cancer classification," International Journal of Innovative Computing. Inf Control, vol. 5, pp. 2189-2202, 2009.
[6] C. Cortes and V. Vapnik, "Support-vector networks," Machine learning, vol. 20, pp. 273-297, 1995.
[7] T. Cover and P. Hart, "Nearest neighbor pattern classification," IEEE transactions on information theory, vol. 13, pp. 21-27, 1967.
[8] I. Guyon, J. Weston, S. Barnhill, and V. Vapnik, "Gene selection for cancer classification using support vector machines," Machine learning, vol. 46, pp. 389-422, 2002.
[9] T. L. Saaty, "Decision making with the analytic hierarchy process," International journal of services sciences, vol. 1, pp. 83-98, 2008.
[10] S. Lee, S. Soak, S. Oh, W. Pedrycz, and M. Jeon, "Modified binary particle swarm optimization," Progress in Natural Science, vol. 18, pp. 1161-1166, 2008.
[11] I. Guyon and A. Elisseeff, "An introduction to variable and feature selection," Journal of machine learning research, vol. 3, pp. 1157-1182, 2003.
[12] A. Konak, D. W. Coit, and A. E. Smith, "Multi-objective optimization using genetic algorithms: A tutorial," Reliability Engineering & System Safety, vol. 91, pp. 992-1007, 2006.
[13] A. Al-Ani, A. Alsukker, and R. N. Khushaba, "Feature subset selection using differential evolution and a wheel based search strategy," Swarm and Evolutionary Computation, vol. 9, pp. 15-26, 2013.
[14] K. Deb, "Multi-objective optimization," in Search methodologies, ed: Springer, 2014, pp. 403-449.
[15] K. Deb, A. Pratap, S. Agarwal, and T. Meyarivan, "A fast and elitist multiobjective genetic algorithm: NSGA-II," IEEE transactions on evolutionary computation, vol. 6, pp. 182-197, 2002.
[16] X. Li and M. Yin, "Multiobjective binary biogeography based optimization for feature selection using gene expression data," IEEE transactions on nanobioscience, vol. 12, pp. 343-353, 2013.
[17] M. S. Mohamad, S. Omatu, S. Deris, M. F. Misman, and M. Yoshioka, "A multi-objective strategy in genetic algorithms for gene selection of gene expression data," Artificial Life and Robotics, vol. 13, pp. 410-413, 2009.
[18] A. Mukhopadhyay and M. Mandal, "A hybrid multiobjective particle swarm optimization approach for non-redundant gene marker selection," in Proceedings of Seventh International Conference on Bio-Inspired Computing: Theories and Applications (BIC-TA 2012), 2013, pp. 205-216.
[19] N. Spolaôr, A. C. Lorena, and H. D. Lee, "Multi-objective genetic algorithm evaluation in feature selection," in International Conference on Evolutionary Multi-Criterion Optimization, 2011, pp. 462-476.
[20] D. E. Golberg, "Genetic algorithms in search, optimization, and machine learning," Addion wesley, vol. 1989, p. 102, 1989.
[21] J. Kennedy, "Particle swarm optimization," in Encyclopedia of machine learning, ed: Springer, 2011, pp. 760-766.
[22] W.-C. Yeh, "A two-stage discrete particle swarm optimization for the problem of multiple multi-level redundancy allocation in series systems," Expert Systems with Applications, vol. 36, pp. 9192-9200, 2009.
[23] W.-C. Yeh, W.-W. Chang, and Y. Y. Chung, "A new hybrid approach for mining breast cancer pattern using discrete particle swarm optimization and statistical method," Expert Systems with Applications, vol. 36, pp. 8204-8211, 2009.
[24] L. Yu and H. Liu, "Feature selection for high-dimensional data: A fast correlation-based filter solution," in ICML, 2003, pp. 856-863.
[25] T. M. Cover and J. A. Thomas, Elements of information theory: John Wiley & Sons, 2012.
[26] M. F. B. Wanderley, V. Gardeux, R. Natowicz, and A. de Pádua Braga, "GA-KDE-Bayes: an evolutionary wrapper method based on non-parametric density estimation applied to bioinformatics problems," in ESANN, 2013.
[27] A. Sharma, S. Imoto, and S. Miyano, "A top-r feature selection algorithm for microarray gene expression data," IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB), vol. 9, pp. 754-764, 2012.
[28] S. Maldonado, R. Weber, and J. Basak, "Simultaneous feature selection and classification using kernel-penalized support vector machines," Information Sciences, vol. 181, pp. 115-128, 2011.
[29] L.-Y. Chuang, C.-H. Yang, and C.-H. Yang, "Tabu search and binary particle swarm optimization for feature selection using microarray data," Journal of computational biology, vol. 16, pp. 1689-1703, 2009.
[30] E. Alba, J. Garcia-Nieto, L. Jourdan, and E.-G. Talbi, "Gene selection in cancer classification using PSO/SVM and GA/SVM hybrid algorithms," in Evolutionary Computation, 2007. CEC 2007. IEEE Congress on, 2007, pp. 284-290.
[31] R. Díaz-Uriarte and S. A. De Andres, "Gene selection and classification of microarray data using random forest," BMC bioinformatics, vol. 7, p. 1, 2006.
[32] F. V. Sharbaf, S. Mosafer, and M. H. Moattar, "A hybrid gene selection approach for microarray data classification using cellular learning automata and ant colony optimization," Genomics, vol. 107, pp. 231-238, 2016.
[33] C.-H. Yang, L.-Y. Chuang, and C. H. Yang, "IG-GA: a hybrid filter/wrapper method for feature selection of microarray data," Journal of Medical and Biological Engineering, vol. 30, pp. 23-28, 2010.
[34] P. A. Mundra and J. C. Rajapakse, "SVM-RFE with MRMR filter for gene selection," IEEE transactions on nanobioscience, vol. 9, pp. 31-37, 2010.
[35] C. Ding and H. Peng, "Minimum redundancy feature selection from microarray gene expression data," Journal of bioinformatics and computational biology, vol. 3, pp. 185-205, 2005.
[36] R. T. Marler and J. S. Arora, "Survey of multi-objective optimization methods for engineering," Structural and multidisciplinary optimization, vol. 26, pp. 369-395, 2004.
[37] L. Breiman, "Random forests," Machine learning, vol. 45, pp. 5-32, 2001.
[38] R. A. Fisher, Statistical methods for research workers: Genesis Publishing Pvt Ltd, 1925.
[39] H. Cramér, "Mathematical methods of statistics," ed: JSTOR, 1947.
[40] R. Genuer, J.-M. Poggi, and C. Tuleau-Malot, "Variable selection using random forests," Pattern Recognition Letters, vol. 31, pp. 2225-2236, 2010.
[41] G. James, D. Witten, T. Hastie, and R. Tibshirani, An introduction to statistical learning vol. 6: Springer, 2013.
[42] Y. Shi and R. C. Eberhart, "Parameter selection in particle swarm optimization," in International Conference on Evolutionary Programming, 1998, pp. 591-600.
[43] C. Bae, W.-C. Yeh, N. Wahid, Y. Y. Chung, and Y. Liu, "A new simplified swarm optimization (SSO) using exchange local search scheme," International Journal of Innovative Computing, Information and Control, vol. 8, pp. 4391-4406, 2012.
[44] W.-C. Yeh, "Orthogonal simplified swarm optimization for the series–parallel redundancy allocation problem with a mix of components," Knowledge-Based Systems, vol. 64, pp. 1-12, 2014.
[45] W.-C. Yeh, "Optimization of the disassembly sequencing problem on the basis of self-adaptive simplified swarm optimization," IEEE transactions on systems, man, and cybernetics-part A: systems and humans, vol. 42, pp. 250-261, 2012.
[46] W.-C. Yeh, "Simplified swarm optimization in disassembly sequencing problems with learning effects," Computers & Operations Research, vol. 39, pp. 2168-2177, 2012.
[47] A. M. Andrew, "An Introduction to Support Vector Machines and Other Kernel-Based Learning Methods by Nello Christianini and John Shawe-Taylor, Cambridge University Press, Cambridge, 2000, xiii+ 189 pp., ISBN 0-521-78019-5 (Hbk,£ 27.50)," ed: Cambridge Univ Press, 2000.
[48] S. Mostaghim and J. Teich, "Strategies for finding good local guides in multi-objective particle swarm optimization (MOPSO)," in Swarm Intelligence Symposium, 2003. SIS'03. Proceedings of the 2003 IEEE, 2003, pp. 26-33.
[49] B. Schölkopf and C. J. Burges, Advances in kernel methods: support vector learning: MIT press, 1999.
[50] C.-W. Hsu and C.-J. Lin, "A comparison of methods for multiclass support vector machines," IEEE transactions on Neural Networks, vol. 13, pp. 415-425, 2002.
[51] J. Weston and C. Watkins, "Support vector machines for multi-class pattern recognition," in ESANN, 1999, pp. 219-224.
[52] K. Crammer and Y. Singer, "On the learnability and design of output codes for multiclass problems," Machine learning, vol. 47, pp. 201-233, 2002.
[53] X. Yuan, H. Nie, A. Su, L. Wang, and Y. Yuan, "An improved binary particle swarm optimization for unit commitment problem," Expert Systems with applications, vol. 36, pp. 8049-8055, 2009.
 
 
 
 
第一頁 上一頁 下一頁 最後一頁 top
* *