帳號:guest(18.221.61.135)          離開系統
字體大小: 字級放大   字級縮小   預設字形  

詳目顯示

以作者查詢圖書館館藏以作者查詢臺灣博碩士論文系統以作者查詢全國書目
作者(中文):林承鋐
作者(外文):Lin, Cheng Hon
論文名稱(中文):運用基因表達規劃法於支持向量機的規則萃取
論文名稱(外文):Rule Extraction from Support Vector Machines by Gene Expression Programming
指導教授(中文):蘇朝墩
陳衍成
指導教授(外文):Su, Chao Ton
Chen, Yan Cheng
口試委員(中文):蕭宇翔
陳麗妃
口試委員(外文):Hsiao, Yu Hsiang
Chen, Li Fei
學位類別:碩士
校院名稱:國立清華大學
系所名稱:工業工程與工程管理學系
學號:103034521
出版年(民國):105
畢業學年度:104
語文別:中文
論文頁數:67
中文關鍵詞:支持向量機基因表達規劃法類別不平衡分類問題規則萃取
外文關鍵詞:Support Vector MachinesGene Expression ProgrammingClass imbalanceClassificationRule extraction
相關次數:
  • 推薦推薦:0
  • 點閱點閱:285
  • 評分評分:*****
  • 下載下載:0
  • 收藏收藏:0
近年來,資料探勘(Data mining)技術廣泛運用在分類和預測(Classification and Prediction)的問題上。雖然透過資料探勘的方法論可挖掘出感興趣的資訊,但至今也面臨了許多挑戰,包括類別不平衡的資料(Class imbalance)、資料量大、資料包含連續和離散型屬性、解釋能力不足等問題,造成計算耗時以及影響分類預測能力。其中支持向量機(Support Vector Machine)是現今最熱門且擁有最佳化數學理論基礎的分類器,但是支持向量機最大的缺陷在於產生的決策邊界(decision boundary)呈現方式為複雜的數學模型,讓人不容易理解其背後的意涵,因此需要建構出一個支持向量機的規則萃取(Rule extraction)演算法來增加其解釋能力。

本研究整合支持向量機與基因表達規劃法(Gene Expression Programming),利用SVM決策邊界上的SVs來提供可能解的資訊,以及結合GEP突變與交配等生物遺傳的特性,並藉由特徵選取(Feature Selection)產生出較佳的規則,進而增強了支持向量機黑箱模型的解釋能力。透過不同的績效指標(如準確度、敏感度以及特異度)來評估SVM + GEP的分類表現,並與原始SVM、C5.0決策樹以及粗糙集進行比較。經由UCI數據分析,SVM + GEP的分類表現皆優於其他三者,證實此方法在分類上有優異的分類性能;實際案例分析中,本研究提出之方法運用特徵選取找出重要屬性的分類效果也是遠勝於C5.0決策樹和粗糙集,說明此方法篩選出的重要屬性較具參考價值。本研究提出SVM + GEP方法成功應用在UCI數據集合與實際案例之中。
In recent years, data mining techniques have been widely used to solve classification and prediction problems. Although employing the data mining methodology can unearth information that people are interested in, it has faced numerous challenges, including issues regarding class imbalance, large data volume, continuous and discrete data, and lack of explanatory abilities. These issues can lead to time-consuming computation and lower classification and prediction performance. Support vector machines (SVM) are the most popular classifier with optimal mathematical theoretical foundation. However, the disadvantage of SVM is that representation of decision boundaries are complex mathematical formulation, which increase the difficulty for people to comprehend their implications. Therefore, it is desired to construct a rule extraction algorithm from SVM to enhance their explanatory abilities.

The study proposed an integration of SVM and gene expression programming (GEP), used support vectors (SVs) on the decision boundaries of SVM as the possible solutions space of GEP chromosome to extract rules for enhancing explanatory abilities of the black box model of SVM. Moreover, three performance metrics, including accuracy, sensitivity, and specificity, were adopted to evaluate the classification performances of the proposed method and other classifiers. For UCI datasets, the results showed that the classification performances of SVM + GEP were better than that of the other three, which verified that the method had an excellent classification property. From the case study, the proposed method not only greatly outperformed the other rule learners on classification performance but also selected important attributes as strategic references for management decision making. The proposed SVM + GEP has been successfully applied to UCI data collection and actual cases.
【摘要】 I
【Abstract】 III
表目錄 VII
圖目錄 VIII
第一章 緒論 1
1.1研究背景與動機 1
1.2研究目的 2
1.3研究架構 3
第二章 文獻探討 4
2.1資料探勘 4
2.2探勘技術 5
2.2.1決策樹 5
2.2.2粗糙集 9
2.2.3基因演算法 10
2.2.4基因表達規劃法 15
2.2.5支持向量機 16
2.3支持向量機的規則萃取 20
第三章 研究方法 24
3.1 基本構想 24
3.2 SVM + GEP模型 24
第四章 數據分析 30
4.1數據 30
4.2衡量指標 30
4.3實驗的平台與技術 32
4.4建構分類模型 35
4.4.1 SVM分類預測模型之建立 35
4.4.2 SVM + GEP分類預測模型之建立 36
4.4.3粗糙集以及C5.0決策樹分類預測模型之建立 36
4.4.4效能評估 37
4.4.5規則比較 45
第五章 個案研究 49
5.1案例背景 49
5.2資料蒐集 50
5.3先前分析 53
5.4各分類方法之效能評估 56
5.5屬性篩選 58
第六章 結論 61
6.1結論 61
6.2未來研究方向 62
參考文獻 63
附錄一 問卷問題 67
[1] Bäck, Thomas (1996). Evolutionary Algorithms in Theory and Practice, Oxford University Press, New York.
[2] Vapnik, V. (1995). The Nature of Statistical Learning Theory, New York. NY: Springer.
[3] Fu, X., Ong, C., Keerthi, S., Hung, G. G., & Goh, L. (2004, July). Extracting the knowledge embedded in support vector machines. In Neural Networks, 2004. Proceedings. 2004 IEEE International Joint Conference on (Vol. 1).
[4] Goldberg, D. E. (1989). Genetic algorithms in search optimization and machine learning (Vol. 412). Reading Menlo Park: Addison-wesley.
[5] Peppers, D. and Rogers, M., Enterprise One to One: Tools for Competing in the Interactive Age, New York: Currency Doubleday, 1997.
[6] Moshkovich, H. M., Mechitov, A. I. & Olson, D. L. (2002). Rule Induction in the Data Mining: Effect of Ordinal Scales. Expert System with Applications, 22(4), pp.303-311.
[7] Ferreira, C., Gene Expression Programming: A New Adaptive Algorithm for Solving Problem, Complex System, Vol.13, 2001, pp.87-129.
[8] Núñez, H., Angulo, C., & Català, A. (2002). Rule extraction from support vector machines. In ESANN, pp.107-112.
[9] Barakat, N., & Diederich, J. (2005). Eclectic rule-extraction from support vector machines. International Journal of Computational Intelligence, 2(1), pp.59-62.
[10] Stevens, P., Knutson, B., & Patton, M. (1995). DINESERV: A tool for measuring service quality in restauran. Cornell Hospitality Quarterly, 36(2), pp.56.
[11] Zhou, C., Xiao, W., Tirpak, T. M., & Nelson, P. C. (2003). Evolving accurate and compact classification rules with gene expression programming. Evolutionary Computation, IEEE Transactions on, 7(6), pp.519-531.
[12] Fu, X., Ong, C., Keerthi, S., Hung, G. G., & Goh, L. (2004, July). Extracting the knowledge embedded in support vector machines. In Neural Networks, 2004. Proceedings. 2004 IEEE International Joint Conference on (Vol. 1).
[13] Holland, J. H. (1975). Adaptation in natural and artificial systems: an introductory analysis with applications to biology, control, and artificial intelligence. U Michigan Press.
[14] Dorsey, R. E., & Mayer, W. J. (1995). Genetic algorithms for estimation problems with multiple optima, nondifferentiability, and other irregular features.Journal of Business & Economic Statistics, 13(1), pp.53-66.
[15] Pawlak, Z. (1982). Rough sets. International Journal of Computer & Information Sciences, 11(5), pp.341-356.
[16] Kotler, P. (2001). Marketing Management: Analysis, Planning, Implementation, and Control, 11th ed., New Jersey: Prentice Hall, Inc.
[17] Datta, S., & Das, S. (2015). Near-Bayesian Support Vector Machines for imbalanced data classification with equal or unequal misclassification costs.Neural Networks, 70, pp.39-52.
[18] López, V., del Río, S., Benítez, J. M., & Herrera, F. (2015). Cost-sensitive linguistic fuzzy rule based classification systems under the MapReduce framework for imbalanced big data. Fuzzy Sets and Systems, 258, pp.5-38.
[19] Šter, B., & Dobnikar, A. (1996). Neural networks in medical diagnosis: Comparison with other methods. In International Conference on Engineering Applications of Neural Networks, pp.427-30.
[20] Chien, C. F., & Chen, L. F. (2008). Data mining to improve personnel selection and enhance human capital: A case study in high-technology industry. Expert Systems with applications, 34(1), pp.280-290.
[21] Jensen, H. L. (1992). Using neural networks for credit scoring. Managerial finance, 18(6), pp.15-26.
[22] Schafer, J. B., Konstan, J. A., & Riedl, J. (2001). E-commerce recommendation applications. In Applications of Data Mining to Electronic Commerce, Springer US, pp.115-153.
[23] Cortes, C., & Vapnik, V. (1995). Support-vector networks. Machine learning, 20(3), pp.273-297.
[24] Boser, B. E., Guyon, I. M., & Vapnik, V. N. (1992). A training algorithm for optimal margin classifiers. In Proceedings of the fifth annual workshop on Computational learning theory, ACM, pp.144-152.
[25] Su, C. T., & Yang, C. H. (2008). Feature selection for the SVM: An application to hypertension diagnosis. Expert Systems with Applications, 34(1), pp.754-763.
[26] Chen, L. F. (2012). A novel approach to regression analysis for the classification of quality attributes in the Kano’s model: an empirical test in the food and beverage industry. Omega-International Journal of Management Science, 40(5), pp.651-659.
[27] Chen, L. F. (2014). A novel framework for customer-driven service strategies: a case study of a restaurant chain. Tourism Management, 41, pp.119-128
[28] Chen, L. F. (2015). Exploring asymmetric effects of attribute performance on customer satisfaction using association rule method. International Journal of Hospitality Management, 47, pp.54-64
[29] 陳衍成 (2012),支持向量器的分類和規則萃取:理論與運用,國立清華大學工業工程與工程管理系博士論文。
[30] 劉昱江 (2000),基因演算法在重複性工程時間成本分析之應用,朝陽科技大學營建工程研究所碩士論文。
[31] 蘇朝墩 (2002),品質工程,中華民國品質學會。
[32] 陳俗玄 (2012),運用基因演算法發展差異性極大化之集成式分類器,國立清華大學工業工程與工程管理系碩士論文。
[33] 翁慈宗 (2009),資料探勘的發展與挑戰,科學發展期刊,442期,34-37頁。
[34] 翁振益、張德儀、鄭光遠 (2006),資料探勘技術應用於航空業顧客再撘意願區隔與服務滿意項目組合之分析,觀光研究學報,第12卷第2期,142頁。
[35] 鐘依芸 (2004),行動電話系統業服務品質滿意度之研究-應用統計分析與決策樹,元智大學工業工程與管理學系碩士論文。
[36] 李永山、謝逸凡 (2007),網際網路服務業客戶流失預測模式之研究,Electronic Commerce Studies,第1卷第4期,485-502頁。
[37] 吳明輝 (2010),應用基因表示規劃法於顧客流失預測模型之研究-以某電信公司為例,天主教輔仁大學資訊管理學系在職專班碩士論文。
[38] 胡維萍、陳雅玲 (2010),以約略集理論輔助網路商店提昇電子交易品質之探討,資訊管理展望,第12卷第2期,201-219頁。
[39] 郭承林 (2012),應用資料探勘技術建立顧客流失預測模型-以行動通訊產業為例,高雄應用科技大學企業管理研究所碩士論文。
[40] 高靖翔 (2008),多項分配之分類方法比較與實證研究,政治大學統計研究所碩士論文。
[41] 孫華麗、謝劍英、薛耀鋒 (2006),基於支持向量機的物流服務顧客滿意度評價模型,上海交通大學學報,第40卷第4期,684-688頁。
[42] 尹其言、楊建民 (2010),應用文件分群與文字探勘技術於機器學習領域趨勢分析以SSCI資料庫為例,長榮大學學報,第14卷第2期,1-16頁。
[43] 蔡詩怡 (2011),以探索性資料分析方法發展心臟血管疾病臨床輔助預知模型,國立臺北護理健康大學資訊管理研究所碩士論文。
(此全文未開放授權)
電子全文
摘要
 
 
 
 
第一頁 上一頁 下一頁 最後一頁 top
* *