作者(外文):Lai, Chyh Ming
論文名稱(中文):Applying Simplified Swarm Optimization for Solving Clustering Problem
指導教授(外文):Yeh, Wei Chang
外文關鍵詞:data clusteringK-meansK-harmonic-meanssimplified swarm optimizationprinciple component analysisrandom sampling technique
資料分群廣泛應用於各種領域,是一種藉由某特定的衡量指標將資料分類成群的方法。K-means (KM) 與 K-harmonic-means (KHM) 因簡單有效率是常見且基礎的分群工具。然而,KM與KHM皆有其本質上的缺陷,導致效能或效率不佳。本文,應用簡群演算法(simplified swarm optimization)分別針對KM與KHM的缺陷提出兩種分群演算法,藉此改善KM與KHM在分群上的效能與效率。
Data clustering is commonly employed in many disciplines. The aim of clustering is to partition a set of data into clusters, in which objects within the same cluster are similar and dissimilar to other objects that belong to different clusters. K-means (KM) and K-harmonic-means (KHM) are two common and fundamental clustering methods because of their simplicity and efficiency. However, both of them suffer from some problems. This study presents two novel algorithms based on simplified swarm optimization to deal with the drawbacks of KM and KHM, respectively.
In addition, with the advance of internet and information technologies, the data size is increasing explosively and many existing clustering approaches including KM and KHM are inefficiency for dealing with the large-size problem. For that, we propose a clustering framework by exploring the connection between principle component analysis and a novel random sampling technique into a procedure to increase the scalability of the proposed clustering algorithm.
To empirically evaluate the performance of the proposed methods, all experiments are examined using real-world datasets, and corresponding results are compared with recent works in the literature.
中文摘要 I
Abstract II
Acknowledgement III
Table of Contents IV
List of Tables VII
List of Figures X
Chapter 1. Introduction 1
1.1 Motivation 1
1.1.1 The drawbacks of K-means-type method 1
1.1.2 The drawbacks of K-harmonic-means-type method 2
1.1.3 Inefficiency on large-size problem 4
1.2 Objective and methodology 6
1.2.1 Objective for KM-type problem 6
1.2.2 Objective for KHM-type problem 7
1.2.3 Objective for clustering on large-size problem 7
1.3 Framework and organization 8
Chapter 2. Methodology 10
2.1 Clustering problem 10
2.2 K-means clustering 10
2.3 K-harmonic-means clustering 11
2.4 Taguchi method 12
2.5 Simplified swarm optimization 14
2.6 Principal component analysis 15
Chapter 3. Improved SSO for KM-type problem 18
3.1 Proposed methods 18
3.1.1 SSO clustering algorithm 18
3.1.2 Variable vibrating search (VVS) 19
3.1.3 Rapid centralized strategy (RCS) 21
3.1.4 The overall procedure of proposed method 23
3.2 Experiment results and discussion 25
3.2.1 Datasets 25
3.2.2 Parameter settings 26
3.2.3 Ex-1: Evaluation of RCS 27
3.2.4 Ex-2: Comparing ISSOKM with existing algorithms 30
3.2.5 Statistical analysis 35
3.3 Summary 37
Chapter 4. Improved SSO for KHM-type problem 40
4.1 Proposed methods 40
4.1.1 Initial population 40
4.1.2 Minimum movement strategy (MMS) 40
4.1.3 The overall procedure of proposed method 41
4.2 Experiment results and discussion 44
4.2.1 Datasets 44
4.2.2 Ex-1: Parameter settings 45
4.2.3 Ex-2: Comparing ISSOKHM with existing algorithms 49
4.2.4 Statistical analysis 55
4.3 Summary 58
Chapter 5. A framework for large-size problem 59
5.1 Empirical analysis of ISSOKM efficiency 59
5.2 Proposed framework 62
5.2.1 The first m PCs determination 62
5.2.2 Rolling random sampling (RRS) 62
5.2.3 The overall procedure of proposed framework 63
5.3 Experiment results and discussion 65
5.3.1 Datasets 66
5.3.2 Ex-1: parameter setting for Nsub and Nsam 67
5.3.3 Ex-2: comparing ISSOKMPR with existing algorithms 72
5.3.4 Time complexity of proposed algorithm 81
5.3.5 Statistical analysis 81
5.4 Summary 91
Chapter 6. Conclusion and future work 93
Reference 95
