帳號:guest(3.145.199.206)          離開系統
字體大小: 字級放大   字級縮小   預設字形  

詳目顯示

以作者查詢圖書館館藏以作者查詢臺灣博碩士論文系統以作者查詢全國書目
作者(中文):賴智明
作者(外文):Lai, Chyh Ming
論文名稱(中文):Applying Simplified Swarm Optimization for Solving Clustering Problem
論文名稱(外文):應用簡群最佳化演算法求解資料分群問題
指導教授(中文):葉維彰
指導教授(外文):Yeh, Wei Chang
口試委員(中文):孟昭宇
楊國隆
劉培林
張桂琥
學位類別:博士
校院名稱:國立清華大學
系所名稱:工業工程與工程管理學系
學號:101034803
出版年(民國):105
畢業學年度:104
語文別:英文
論文頁數:99
中文關鍵詞:資料分群簡群演算法主成分分析隨機取樣K平均分群K調和平均分群
外文關鍵詞:data clusteringK-meansK-harmonic-meanssimplified swarm optimizationprinciple component analysisrandom sampling technique
相關次數:
  • 推薦推薦:0
  • 點閱點閱:86
  • 評分評分:*****
  • 下載下載:10
  • 收藏收藏:0
資料分群廣泛應用於各種領域,是一種藉由某特定的衡量指標將資料分類成群的方法。K-means (KM) 與 K-harmonic-means (KHM) 因簡單有效率是常見且基礎的分群工具。然而,KM與KHM皆有其本質上的缺陷,導致效能或效率不佳。本文,應用簡群演算法(simplified swarm optimization)分別針對KM與KHM的缺陷提出兩種分群演算法,藉此改善KM與KHM在分群上的效能與效率。
其次,由於資訊科技發展、網路崛起,催生大數據時代的來臨,許多現有的分群演算法無法在合理時間內處理大型資料集,其中包含KM與KHM。為改善提出之分群演算法在處理大型資料集之效率,本研究結合主成分分析與新式隨機取樣技術,發展一個新的分群架構。此一架構之原理是藉由同時減少資料集之維度與資料點之取用率,大幅提昇分群演算法的效率。
為驗證本研究提出之演算法與架構,所有實驗均使用真實資料集,實驗結果均與過去文獻比較。
Data clustering is commonly employed in many disciplines. The aim of clustering is to partition a set of data into clusters, in which objects within the same cluster are similar and dissimilar to other objects that belong to different clusters. K-means (KM) and K-harmonic-means (KHM) are two common and fundamental clustering methods because of their simplicity and efficiency. However, both of them suffer from some problems. This study presents two novel algorithms based on simplified swarm optimization to deal with the drawbacks of KM and KHM, respectively.
In addition, with the advance of internet and information technologies, the data size is increasing explosively and many existing clustering approaches including KM and KHM are inefficiency for dealing with the large-size problem. For that, we propose a clustering framework by exploring the connection between principle component analysis and a novel random sampling technique into a procedure to increase the scalability of the proposed clustering algorithm.
To empirically evaluate the performance of the proposed methods, all experiments are examined using real-world datasets, and corresponding results are compared with recent works in the literature.
中文摘要 I
Abstract II
Acknowledgement III
Table of Contents IV
List of Tables VII
List of Figures X
Chapter 1. Introduction 1
1.1 Motivation 1
1.1.1 The drawbacks of K-means-type method 1
1.1.2 The drawbacks of K-harmonic-means-type method 2
1.1.3 Inefficiency on large-size problem 4
1.2 Objective and methodology 6
1.2.1 Objective for KM-type problem 6
1.2.2 Objective for KHM-type problem 7
1.2.3 Objective for clustering on large-size problem 7
1.3 Framework and organization 8
Chapter 2. Methodology 10
2.1 Clustering problem 10
2.2 K-means clustering 10
2.3 K-harmonic-means clustering 11
2.4 Taguchi method 12
2.5 Simplified swarm optimization 14
2.6 Principal component analysis 15
Chapter 3. Improved SSO for KM-type problem 18
3.1 Proposed methods 18
3.1.1 SSO clustering algorithm 18
3.1.2 Variable vibrating search (VVS) 19
3.1.3 Rapid centralized strategy (RCS) 21
3.1.4 The overall procedure of proposed method 23
3.2 Experiment results and discussion 25
3.2.1 Datasets 25
3.2.2 Parameter settings 26
3.2.3 Ex-1: Evaluation of RCS 27
3.2.4 Ex-2: Comparing ISSOKM with existing algorithms 30
3.2.5 Statistical analysis 35
3.3 Summary 37
Chapter 4. Improved SSO for KHM-type problem 40
4.1 Proposed methods 40
4.1.1 Initial population 40
4.1.2 Minimum movement strategy (MMS) 40
4.1.3 The overall procedure of proposed method 41
4.2 Experiment results and discussion 44
4.2.1 Datasets 44
4.2.2 Ex-1: Parameter settings 45
4.2.3 Ex-2: Comparing ISSOKHM with existing algorithms 49
4.2.4 Statistical analysis 55
4.3 Summary 58
Chapter 5. A framework for large-size problem 59
5.1 Empirical analysis of ISSOKM efficiency 59
5.2 Proposed framework 62
5.2.1 The first m PCs determination 62
5.2.2 Rolling random sampling (RRS) 62
5.2.3 The overall procedure of proposed framework 63
5.3 Experiment results and discussion 65
5.3.1 Datasets 66
5.3.2 Ex-1: parameter setting for Nsub and Nsam 67
5.3.3 Ex-2: comparing ISSOKMPR with existing algorithms 72
5.3.4 Time complexity of proposed algorithm 81
5.3.5 Statistical analysis 81
5.4 Summary 91
Chapter 6. Conclusion and future work 93
Reference 95
[1] A. A. Chaves and L. A. N. Lorena, "Clustering search algorithm for the capacitated centered clustering problem," Computers & Operations Research, vol. 37, pp. 552-558, 2010.
[2] R. Xu and D. Wunsch, "Survey of clustering algorithms," IEEE Transactions on Neural Networks, vol. 16, pp. 645-678, 2005.
[3] S. Z. Selim and M. A. Ismail, "K-means-type algorithms: a generalized convergence theorem and characterization of local optimality," IEEE Transactions on Pattern Analysis and Machine Intelligence, pp. 81-87, 1984.
[4] K. Krishna and M. N. Murty, "Genetic K-means algorithm," IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics, vol. 29, pp. 433-439, 1999.
[5] U. Maulik and S. Bandyopadhyay, "Genetic algorithm-based clustering technique," Pattern recognition, vol. 33, pp. 1455-1465, 2000.
[6] Y. Liu, Z. Yi, H. Wu, M. Ye, and K. Chen, "A tabu search approach for the minimum sum-of-squares clustering problem," Information Sciences, vol. 178, pp. 2680-2704, 2008.
[7] P. Shelokar, V. K. Jayaraman, and B. D. Kulkarni, "An ant colony approach for clustering," Analytica Chimica Acta, vol. 509, pp. 187-195, 2004.
[8] L. Zhang and Q. Cao, "A novel ant-based clustering algorithm using the kernel method," Information Sciences, vol. 181, pp. 4658-4672, 2011.
[9] D. Karaboga and C. Ozturk, "A novel clustering approach: Artificial Bee Colony (ABC) algorithm," Applied soft computing, vol. 11, pp. 652-657, 2011.
[10] Y. T. Kao, E. Zahara, and I. W. Kao, "A hybridized approach to data clustering," Expert Systems with Applications, vol. 34, pp. 1754-1762, 2008.
[11] L. Y. Chuang, C. J. Hsiao, and C. H. Yang, "Chaotic particle swarm optimization for data clustering," Expert systems with Applications, vol. 38, pp. 14555-14563, 2011.
[12] C. Y. Tsai and I. W. Kao, "Particle swarm optimization with selective particle regeneration for data clustering," Expert Systems with Applications, vol. 38, pp. 6565-6576, 2011.
[13] S. Z. Selim and K. Alsultan, "A simulated annealing algorithm for the clustering problem," Pattern recognition, vol. 24, pp. 1003-1008, 1991.
[14] A. Hatamlou, S. Abdullah, and M. Hatamlou, "Data clustering using big bang-big crunch algorithm," in Innovative Computing Technology, ed: Springer, 2011, pp. 383-388.
[15] A. Hatamlou, S. Abdullah, and H. Nezamabadi-pour, "A combined approach for clustering based on K-means and gravitational search algorithms," Swarm and Evolutionary Computation, vol. 6, pp. 47-52, 2012.
[16] A. Hatamlou, "Black hole: A new heuristic optimization approach for data clustering," Information sciences, vol. 222, pp. 175-184, 2013.
[17] B. Zhang, M. Hsu, and U. Dayal, "K-harmonic means-a data clustering algorithm," Hewlett-Packard Labs Technical Report HPL-1999-124, 1999.
[18] G. Hamerly and C. Elkan, "Alternatives to the K-means algorithm that find better clusterings," in Proceedings of the eleventh international conference on Information and knowledge management, 2002, pp. 600-607.
[19] Z. Güngör and A. Ünler, "K-harmonic means data clustering with simulated annealing heuristic," Applied mathematics and computation, vol. 184, pp. 199-209, 2007.
[20] Z. Güngör and A. Ünler, "K-harmonic means data clustering with tabu-search method," Applied Mathematical Modelling, vol. 32, pp. 1115-1125, 2008.
[21] H. Jiang, S. Yi, J. Li, F. Yang, and X. Hu, "Ant clustering algorithm with K-harmonic means clustering," Expert Systems with Applications, vol. 37, pp. 8679-8684, 2010.
[22] F. Yang, T. Sun, and C. Zhang, "An efficient hybrid data clustering method based on K-harmonic means and Particle Swarm Optimization," Expert Systems with Applications, vol. 36, pp. 9847-9852, 2009.
[23] M. Yin, Y. Hu, F. Yang, X. Li, and W. Gu, "A novel hybrid K-harmonic means and gravitational search algorithm approach for clustering," Expert Systems with Applications, vol. 38, pp. 9319-9324, 2011.
[24] C. Charu and K. Chandan, "Data Clustering: Algorithms and Applications," ed: Boca Raton, FL, USA: CRC Press, 2013.
[25] A. Banharnsakun, B. Sirinaovakul, and T. Achalakul, "The best-so-far ABC with multiple patrilines for clustering problems," Neurocomputing, vol. 116, pp. 355-366, 2013.
[26] K. Shim, "MapReduce algorithms for big data analysis," in Proceedings of the VLDB Endowment, vol. 5, pp. 2016-2017, 2012.
[27] X. Cui, P. Zhu, X. Yang, K. Li, and C. Ji, "Optimized big data K-means clustering using MapReduce," The Journal of Supercomputing, vol. 70, pp. 1249-1259, 2014.
[28] Y. Kim, K. Shim, M. S. Kim, and J. S. Lee, "DBCURE-MR: an efficient density-based clustering algorithm for large data using MapReduce," Information Systems, vol. 42, pp. 15-35, 2014.
[29] M. Chen, S. Mao, and Y. Liu, "Big data: A survey," Mobile Networks and Applications, vol. 19, pp. 171-209, 2014.
[30] L. Kaufman and P. J. Rousseeuw, Finding groups in data: an introduction to cluster analysis vol. 344: John Wiley & Sons, 2009.
[31] R. T. Ng and J. Han, "Clarans: A method for clustering objects for spatial data mining," IEEE Transactions on Knowledge and Data Engineering, vol. 14, pp. 1003-1016, 2002.
[32] S. Guha, R. Rastogi, and K. Shim, "CURE: an efficient clustering algorithm for large databases," in ACM SIGMOD Record, 1998, pp. 73-84.
[33] O. R. Zaïane, A. Foss, C. H. Lee, and W. Wang, "On data clustering analysis: Scalability, constraints, and validation," in Advances in knowledge discovery and data mining, ed: Springer, 2002, pp. 28-39.
[34] M. Dash, H. Liu, and J. Yao, "Dimensionality reduction of unsupervised data," in Proceedings of Ninth IEEE International Conference on Tools with Artificial Intelligence, 1997, pp. 532-539.
[35] M. Dash and P. W. Koot, "Feature selection for clustering," in Encyclopedia of database systems, ed: Springer, 2009, pp. 1119-1125.
[36] M. Devaney and A. Ram, "Efficient feature selection in conceptual clustering," in ICML, 1997, pp. 92-97.
[37] J. G. Dy and C. E. Brodley, "Feature subset selection and order identification for unsupervised learning," in ICML, 2000, pp. 247-254.
[38] A. L. Blum and R. L. Rivest, "Training a 3-node neural network is NP-complete," Neural Networks, vol. 5, pp. 117-127, 1992.
[39] I. Jolliffe, Principal component analysis: Wiley Online Library, 2002.
[40] K. Yata and M. Aoshima, "Effective PCA for high-dimension, low-sample-size data with noise reduction via geometric representations," Journal of multivariate analysis, vol. 105, pp. 193-215, 2012.
[41] K. Y. Yeung and W. L. Ruzzo, "Principal component analysis for clustering gene expression data," Bioinformatics, vol. 17, pp. 763-774, 2001.
[42] M. Xu and P. Fränti, "A heuristic K-means clustering algorithm by kernel PCA," in Processing of International Conference on Image, 2004, pp. 3503-3506.
[43] Q. Xu, C. Ding, J. Liu, and B. Luo, "PCA-guided search for K-means," Pattern Recognition Letters, vol. 54, pp. 50-55, 2015.
[44] S. C. Deerwester, S. T. Dumais, T. K. Landauer, G. W. Furnas, and R. A. Harshman, "Indexing by latent semantic analysis," JAsIs, vol. 41, pp. 391-407, 1990.
[45] V. Castelli, A. Thomasian, and C. S. Li, "CSVD: clustering and singular value decomposition for approximate similarity search in high-dimensional spaces," IEEE Transactions on Knowledge and Data Engineering, vol. 15, pp. 671-685, 2003.
[46] V. C. Klema and A. J. Laub, "The singular value decomposition: Its computation and some applications," IEEE Transactions on Automatic Control, vol. 25, pp. 164-176, 1980.
[47] W. C. Yeh, "A two-stage discrete particle swarm optimization for the problem of multiple multi-level redundancy allocation in series systems," Expert Systems with Applications, vol. 36, pp. 9192-9200, 2009.
[48] W. C. Yeh, "An improved simplified swarm optimization," Knowledge-Based Systems, vol. 82, pp. 60-69, 2015.
[49] Y. Y. Chung and N. Wahid, "A hybrid network intrusion detection system using simplified swarm optimization (SSO)," Applied Soft Computing, vol. 12, pp. 3014-3022, 2012.
[50] R. Azizipanah-Abarghooee, "A new hybrid bacterial foraging and simplified swarm optimization algorithm for practical optimal dynamic load dispatch," International Journal of Electrical Power & Energy Systems, vol. 49, pp. 414-429, 2013.
[51] W. C. Yeh, Y. M. Yeh, P. C. Chang, Y. C. Ke, and V. Chung, "Forecasting wind power in the Mai Liao Wind Farm based on the multi-layer perception artificial neural network model with improved simplified swarm optimization," International Journal of Electrical Power & Energy Systems, vol. 55, pp. 741-748, 2014.
[52] A. B. Adib, "NP-hardness of the cluster minimization problem revisited," Journal of Physics A: Mathematical and General, vol. 38, p. 8487, 2005.
[53] S. Bandyopadhyay and U. Maulik, "An evolutionary technique based on K-means algorithm for optimal clustering in RN," Information Sciences, vol. 146, pp. 221-237, 2002.
[54] J. E. Jackson, A user's guide to principal components vol. 587: John Wiley & Sons, 2005.
[55] C. Bae, W. C. Yeh, N. Wahid, Y. Y. Chung, and Y. Liu, "A new simplified swarm optimization (SSO) using exchange local search scheme," International Journal of Innovative Computing, Information and Control, vol. 8, pp. 4391-4406, 2012.
[56] M. Clerc and J. Kennedy, "The particle swarm-explosion, stability, and convergence in a multidimensional complex space," IEEE Transactions on Evolutionary Computation, vol. 6, pp. 58-73, 2002.
[57] M. Ben Ghalia, "Particle swarm optimization with an improved exploration-exploitation balance," in Proceedings of the 51st Midwest Symposium on Circuits and Systems, 2008, pp. 759-762.
[58] W. C. Yeh, W. W. Chang, and Y. Y. Chung, "A new hybrid approach for mining breast cancer pattern using discrete particle swarm optimization and statistical method," Expert Systems with Applications, vol. 36, pp. 8204-8211, 2009.
[59] M. Kudo and J. Sklansky, "Comparison of algorithms that select features for pattern classifiers," Pattern recognition, vol. 33, pp. 25-41, 2000.
[60] E. Rashedi, H. Nezamabadi-Pour, and S. Saryazdi, "GSA: a gravitational search algorithm," Information sciences, vol. 179, pp. 2232-2248, 2009.
[61] J. Derrac, S. García, D. Molina, and F. Herrera, "A practical tutorial on the use of nonparametric statistical tests as a methodology for comparing evolutionary and swarm intelligence algorithms," Swarm and Evolutionary Computation, vol. 1, pp. 3-18, 2011.
[62] W. C. Yeh and C. M. Lai, "Accelerated Simplified Swarm Optimization with Exploitation Search Scheme for Data Clustering," PloS one, vol. 10, p. e0137246, 2015.
[63] W. C. Yeh, C. M. Lai, and K. H. Chang, "A novel hybrid clustering approach based on K-harmonic means using robust design," Neurocomputing, vol. 173, pp. 1720-1732, 2016.
[64] Y. Sebzalli and X. Wang, "Knowledge discovery from process operational data using PCA and fuzzy clustering," Engineering Applications of Artificial Intelligence, vol. 14, pp. 607-616, 2001.
[65] R. A. Johnson and D. W. Wichern, Applied multivariate statistical analysis vol. 4: Prentice hall Englewood Cliffs, NJ, 1992.
[66] A. Fahad, N. Alshatri, Z. Tari, A. Alamri, I. Khalil, A. Y. Zomaya, et al., "A survey of clustering algorithms for big data: Taxonomy and empirical analysis," IEEE Transactions on Emerging Topics in Computing, vol. 2, pp. 267-279, 2014.
 
 
 
 
第一頁 上一頁 下一頁 最後一頁 top
* *