帳號:guest(18.219.206.102)          離開系統
字體大小: 字級放大   字級縮小   預設字形  

詳目顯示

以作者查詢圖書館館藏以作者查詢臺灣博碩士論文系統以作者查詢全國書目
作者(中文):李彥瑾
作者(外文):Lee, Yen Chin
論文名稱(中文):應用混合式數據引力分類演算法於基因表現資料分類問題之研究
論文名稱(外文):A hybrid data gravitation based classification algorithm applied to gene expression data
指導教授(中文):葉維彰
指導教授(外文):Yeh, Wei Chang
口試委員(中文):黃佳玲
魏上佳
學位類別:碩士
校院名稱:國立清華大學
系所名稱:工業工程與工程管理學系
學號:103034702
出版年(民國):106
畢業學年度:105
語文別:英文
論文頁數:44
中文關鍵詞:數據引力分類優化簡化群體演算法基因表現資料基因選取變異數分析
外文關鍵詞:Data Gravitation based ClassificationSimplified swarm optimization (SSO)Gene selectionANOVAGene expression data
相關次數:
  • 推薦推薦:0
  • 點閱點閱:314
  • 評分評分:*****
  • 下載下載:18
  • 收藏收藏:0
在機器學習以及資料採礦的研究中,分類問題佔有一席之地。分類的基本概念即是運用機器學習的演算法根據訓練資料來建構一個分類器,並用來預測測試資料所屬的類別。現今,部分分類演算法已被廣泛地運用在許多生活常見的問題上,譬如說垃圾郵件偵測、手寫辨識以及生物資訊學等等。近年來對於機器學習應用在生物資訊學領域上尤為關注。而基因表現資料對於疾病的診斷及預防扮演著不可或缺的角色,甚而從中發現治療方法,因此其重要性日益增加。然而,基因表現資料的特性構成了分類上極大的挑戰,主要原因為樣本數量的限制及龐大的特徵數。
在過去幾年間,學者根據牛頓的萬有引力理論提出了數據引力分類演算法,此演算法有一套訓練特徵權重的機制,可藉由演算法來搜尋最佳的特徵權重來提升分類準確率。本研究,我們針對基因表現資料的分類問題設計一個以數據引力分類演算法為基礎的分類模型。我們首先以變異數分析為特徵過濾器將冗餘的基因剔除;其次剩餘的基因將會被用來訓練我們建立的數據引力分類模型,並利用優化簡化群體演算法來進行特徵權重的最佳化。最後,我們將本研究提出的演算法與過去文獻提出之方法做比較與討論;結果顯示,本研究提出的演算法能有效的處理基因表現資料的分類問題。
One of the important application for gene expression profiling technology in medical field is to support clinical decision in the form of diagnosis of disease and the prediction of clinical outcomes in response to treatment. The disease prediction and diagnosis become popular in the machine learning field and gene expression data classification problem has attracted considerable research interests in recent years. The challenges posed in gene expression data classification are the limited size of samples and the high dimensionality of the sample.
Data gravitation based classification (DGC) model is a novel classification algorithm which performs well in many classification problems. Also, there is an important character of DGC to deal with gene expression data classification problem, feature weighing procedure which measures the importance of a feature by weighting them. In this study, we design a classifier based on the basic DGC model namely k-DGC for the gene selection and classification of gene expression data. We use ANOVA as a filter which can quickly reduce the number of genes and then apply our proposed k-DGC model based on the concept of K-Nearest Neighbor (KNN) and use improved Simplified Swarm Optimization algorithm (iSSO) to optimize the feature weight. Leave one out cross validation (LOOCV) served as an evaluator of the k-DGC model. We compared our method k-DGC with previous research by running ten gene expression datasets from GEMS. Experimental results show that our method is effective for gene expression data classification problems.
Chapter 1 Introduction 1
1.1 Background and Motivation 1
1.2 Research Framework 4
Chapter 2 Literature Review 6
2.1 Data Gravitation-based Classification Model 6
2.2 Feature selection and classification for gene expression data 8
Chapter 3 Methodology 11
3.1 ANOVA 11
3.2 DGC model 12
3.3 Improved Simplified Swarm Optimization (iSSO) 15
Chapter 4 The Proposed Algorithm 17
4.1 The proposed k-DGC model 17
4.2 Feature weight optimization 19
4.3 Procedures of k-DGC 21
Chapter 5 Example and Computational Result 24
5.1 Experiment data 24
5.2 Design of experiment and Parameter settings 25
5.3 Algorithms selected for study 29
Chapter 6 Conclusions and Future Research 37
6.1 Conclusions 37
6.2 Future Research 38
REFERENCES 39

[1] C. Phua, D. Alahakoon, and V. Lee, "Minority report in fraud detection: classification of skewed data," Acm sigkdd explorations newsletter, vol. 6, pp. 50-59, 2004.
[2] L. Xu, A. Krzyzak, and C. Y. Suen, "Methods of combining multiple classifiers and their applications to handwriting recognition," IEEE transactions on systems, man, and cybernetics, vol. 22, pp. 418-435, 1992.
[3] T. S. Furey, N. Cristianini, N. Duffy, D. W. Bednarski, M. Schummer, and D. Haussler, "Support vector machine classification and validation of cancer tissue samples using microarray expression data," Bioinformatics, vol. 16, pp. 906-914, 2000.
[4] W. F. Punch III, E. D. Goodman, M. Pei, L. Chia-Shun, P. D. Hovland, and R. J. Enbody, "Further Research on Feature Selection and Classification Using Genetic Algorithms," in ICGA, 1993, pp. 557-564.
[5] J. Khan, J. S. Wei, M. Ringner, L. H. Saal, M. Ladanyi, F. Westermann, et al., "Classification and diagnostic prediction of cancers using gene expression profiling and artificial neural networks," Nature medicine, vol. 7, pp. 673-679, 2001.
[6] S. Ramaswamy, P. Tamayo, R. Rifkin, S. Mukherjee, C.-H. Yeang, M. Angelo, et al., "Multiclass cancer diagnosis using tumor gene expression signatures," Proceedings of the National Academy of Sciences, vol. 98, pp. 15149-15154, 2001.
[7] D. Singh, P. G. Febbo, K. Ross, D. G. Jackson, J. Manola, C. Ladd, et al., "Gene expression correlates of clinical prostate cancer behavior," Cancer cell, vol. 1, pp. 203-209, 2002.
[8] Y. Wang, I. V. Tetko, M. A. Hall, E. Frank, A. Facius, K. F. Mayer, et al., "Gene selection from microarray data for cancer classification—a machine learning approach," Computational biology and chemistry, vol. 29, pp. 37-46, 2005.
[9] T. R. Golub, D. K. Slonim, P. Tamayo, C. Huard, M. Gaasenbeek, J. P. Mesirov, et al., "Molecular classification of cancer: class discovery and class prediction by gene expression monitoring," science, vol. 286, pp. 531-537, 1999.
[10] H. Liu, J. Li, and L. Wong, "A comparative study on feature selection and classification methods using gene expression profiles and proteomic patterns," Genome informatics, vol. 13, pp. 51-60, 2002.
[11] H. Abusamra, "A comparative study of feature selection and classification methods for gene expression data of glioma," Procedia Computer Science, vol. 23, pp. 5-14, 2013.
[12] V. Bolón-Canedo, N. Sánchez-Maroño, A. Alonso-Betanzos, J. M. Benítez, and F. Herrera, "A review of microarray datasets and applied feature selection methods," Information Sciences, vol. 282, pp. 111-135, 2014.
[13] L. Yu and H. Liu, "Feature selection for high-dimensional data: A fast correlation-based filter solution," in ICML, 2003, pp. 856-863.
[14] M. Pirooznia, J. Y. Yang, M. Q. Yang, and Y. Deng, "A comparative study of different machine learning methods on microarray gene expression data," BMC genomics, vol. 9, p. 1, 2008.
[15] R. Díaz-Uriarte and S. A. De Andres, "Gene selection and classification of microarray data using random forest," BMC bioinformatics, vol. 7, p. 1, 2006.
[16] M. P. Brown, W. N. Grundy, D. Lin, N. Cristianini, C. Sugnet, M. Ares, et al., "Support vector machine classification of microarray gene expression data," University of California, Santa Cruz, Technical Report UCSC-CRL-99-09, 1999.
[17] I. Guyon, J. Weston, S. Barnhill, and V. Vapnik, "Gene selection for cancer classification using support vector machines," Machine learning, vol. 46, pp. 389-422, 2002.
[18] L. Peng, B. Yang, Y. Chen, and A. Abraham, "Data gravitation based classification," Information Sciences, vol. 179, pp. 809-819, 2009.
[19] A. Cano, A. Zafra, and S. Ventura, "Weighted data gravitation classification for standard and imbalanced data," IEEE transactions on cybernetics, vol. 43, pp. 1672-1687, 2013.
[20] J. Kennedy, "Particle swarm optimization," in Encyclopedia of machine learning, ed: Springer, 2011, pp. 760-766.
[21] L. Peng, H. Zhang, B. Yang, and Y. Chen, "A new approach for imbalanced data classification based on data gravitation," Information Sciences, vol. 288, pp. 347-373, 2014.
[22] W.-C. Yeh, "An improved simplified swarm optimization," Knowledge-Based Systems, vol. 82, pp. 60-69, 2015.
[23] W. E. Wright, "Gravitational clustering," Pattern recognition, vol. 9, pp. 151-166, 1977.
[24] Y. Endo and H. Iwata, "Dynamic clustering based on universal gravitation model," in International Conference on Modeling Decisions for Artificial Intelligence, 2005, pp. 183-193.
[25] C. Wang and Y. Q. Chen, "Improving nearest neighbor classification with simulated gravitational collapse," in International Conference on Natural Computation, 2005, pp. 845-854.
[26] M. A. Hall, "Correlation-based feature selection for machine learning," The University of Waikato, 1999.
[27] M. A. Hall and L. A. Smith, "Practical feature subset selection for machine learning," 1998.
[28] P. Jafari and F. Azuaje, "An assessment of recently published gene expression data analyses: reporting experimental design and statistical factors," BMC Medical Informatics and Decision Making, vol. 6, p. 1, 2006.
[29] J. Kittler, "Feature set search algorithms," Pattern recognition and signal processing, pp. 41-60, 1978.
[30] Y. Saeys, I. Inza, and P. Larrañaga, "A review of feature selection techniques in bioinformatics," bioinformatics, vol. 23, pp. 2507-2517, 2007.
[31] W. Yeh, "Study on quickest path networks with dependent components and apply to RAP," Rep. NSC, pp. 97-2221, 2008.
[32] W.-C. Yeh, W.-W. Chang, and Y. Y. Chung, "A new hybrid approach for mining breast cancer pattern using discrete particle swarm optimization and statistical method," Expert Systems with Applications, vol. 36, pp. 8204-8211, 2009.
[33] W.-C. Yeh, "Optimization of the disassembly sequencing problem on the basis of self-adaptive simplified swarm optimization," IEEE transactions on systems, man, and cybernetics-part A: systems and humans, vol. 42, pp. 250-261, 2012.
[34] W.-C. Yeh, "Orthogonal simplified swarm optimization for the series–parallel redundancy allocation problem with a mix of components," Knowledge-Based Systems, vol. 64, pp. 1-12, 2014.
[35] W.-C. Yeh, "Simplified swarm optimization in disassembly sequencing problems with learning effects," Computers & Operations Research, vol. 39, pp. 2168-2177, 2012.
[36] W.-C. Yeh, "A two-stage discrete particle swarm optimization for the problem of multiple multi-level redundancy allocation in series systems," Expert Systems with Applications, vol. 36, pp. 9192-9200, 2009.
[37] T. Cover and P. Hart, "Nearest neighbor pattern classification," IEEE transactions on information theory, vol. 13, pp. 21-27, 1967.
[38] J. Moody and C. J. Darken, "Fast learning in networks of locally-tuned processing units," Neural computation, vol. 1, pp. 281-294, 1989.
[39] G. G. Towell and J. W. Shavlik, "Knowledge-based artificial neural networks," Artificial intelligence, vol. 70, pp. 119-165, 1994.
[40] A. Barwad, P. Dey, and S. Susheilia, "Artificial neural network in diagnosis of metastatic carcinoma in effusion cytology," Cytometry Part B: Clinical Cytometry, vol. 82, pp. 107-111, 2012.
[41] J.-T. Horng, L.-C. Wu, B.-J. Liu, J.-L. Kuo, W.-H. Kuo, and J.-J. Zhang, "An expert system to classify microarray gene expression data using gene selection by decision tree," Expert Systems with Applications, vol. 36, pp. 9072-9081, 2009.
[42] B. Schölkopf and C. J. Burges, Advances in kernel methods: support vector learning: MIT press, 1999.
[43] C.-W. Hsu and C.-J. Lin, "A comparison of methods for multiclass support vector machines," IEEE transactions on Neural Networks, vol. 13, pp. 415-425, 2002.
[44] J. Weston and C. Watkins, "Support vector machines for multi-class pattern recognition," in ESANN, 1999, pp. 219-224.
[45] K. Crammer and Y. Singer, "On the learnability and design of output codes for multiclass problems," Machine learning, vol. 47, pp. 201-233, 2002.
[46] R. L. Somorjai, B. Dolenko, and R. Baumgartner, "Class prediction and discovery using gene microarray and proteomics mass spectroscopy data: curses, caveats, cautions," Bioinformatics, vol. 19, pp. 1484-1491, 2003.
[47] C.-H. Yang, L.-Y. Chuang, and C. H. Yang, "IG-GA: a hybrid filter/wrapper method for feature selection of microarray data," Journal of Medical and Biological Engineering, vol. 30, pp. 23-28, 2010.
[48] L.-Y. Chuang, H.-W. Chang, C.-J. Tu, and C.-H. Yang, "Improved binary PSO for feature selection using gene expression data," Computational Biology and Chemistry, vol. 32, pp. 29-38, 2008.
[49] M. S. Mohamad, S. Omatu, S. Deris, and M. Yoshioka, "A modified binary particle swarm optimization for selecting the small subset of informative genes from gene expression data," IEEE Transactions on Information Technology in Biomedicine, vol. 15, pp. 813-822, 2011.
[50] K. Deb, A. Pratap, S. Agarwal, and T. Meyarivan, "A fast and elitist multiobjective genetic algorithm: NSGA-II," IEEE transactions on evolutionary computation, vol. 6, pp. 182-197, 2002.
[51] X. Li and M. Yin, "Multiobjective binary biogeography based optimization for feature selection using gene expression data," IEEE transactions on nanobioscience, vol. 12, pp. 343-353, 2013.
[52] A. Statnikov, C. F. Aliferis, I. Tsamardinos, D. Hardin, and S. Levy, "A comprehensive evaluation of multicategory classification methods for microarray gene expression cancer diagnosis," Bioinformatics, vol. 21, pp. 631-643, 2005.

 
 
 
 
第一頁 上一頁 下一頁 最後一頁 top
* *