帳號:guest(18.219.206.102)          離開系統
字體大小: 字級放大   字級縮小   預設字形  

詳目顯示

以作者查詢圖書館館藏以作者查詢臺灣博碩士論文系統以作者查詢全國書目
作者(中文):劉禎宇
作者(外文):Liu, Chen-Yu
論文名稱(中文):利用樹狀結構在密度導向分群的動態演算法設計
論文名稱(外文):Dynamic Algorithms for Density-based Clustering via Tree Structure
指導教授(中文):廖崇碩
指導教授(外文):Liao, Chung-Shou
口試委員(中文):謝孫源
彭勝龍
學位類別:碩士
校院名稱:國立清華大學
系所名稱:工業工程與工程管理學系
學號:106034508
出版年(民國):108
畢業學年度:107
語文別:中文
論文頁數:23
中文關鍵詞:動態演算法群集分析樹狀結構
外文關鍵詞:Dynamic algorithmClusteringTree structure
相關次數:
  • 推薦推薦:0
  • 點閱點閱:218
  • 評分評分:*****
  • 下載下載:0
  • 收藏收藏:0
資料探勘在現今是相當熱門的領域 [2],透過分析數據資料能夠更容易解釋數據中隱含的資訊。然而隨著時間發展,能獲取的資料量持續擴大,傳統統計方法將無法良好地應付巨量資料。加上資料更迭速度的提升,若需要動態更新大尺度的資料,會需要耗費大量時間重複處理所有的資料更新,因而若能針對資料的局部變化進行處理,會更加有效率,也因此能夠有效處理資料動態更新的動態演算法便顯得相當重要 [3], [6]。
而在資料探勘領域,分群演算法是很重要的非監督式學習演算法之一 [12],其在文獻中被廣泛地探討,尤其是在統計領域利用計算資料間相似度,而將大量資料進行分群的過程。但是大量資料在進行分群演算法的過程中,將會遇到重大的挑戰:若是遇到動態的更新,其重複計算分群的過程將會非常耗時。因此為了解決大量動態資料的分群問題,本研究的目標在將動態模型的設計套用於分群演算法上,提升分群演算法在巨量資料下動態運算的效率。
在本研究中,我們將主要探討目前最熱門的密度導向分群演算法,例如DBSCAN、 OPTICS [9], [11]等著名的分群演算法,尤其是目前最新且廣為討論的Density Peaks Clustering (DP) [1]。DP的主要特徵有兩項,一為群心(cluster center)的區域密度要比其鄰近區域的其他資料點來得高,另一則為群心和其他區域密度高的資料點間會有相對遠的距離。儘管DP在實驗結果上勝過幾乎所有其他傳統的分群演算法,但其實DP仍有其改善的空間,特別是在動態資料更新的部分。
改良版的DP,也就是所謂的Seed-and-extension-based Density Peaks clustering (SDP)透過改變群心的挑選方式優化群心的品質,同時提升了分群結果的準確性。本研究主要目標即是透過開發動態模型來推廣SDP,稱之為Dynamic Seed-and-extension-based Density Peaks clustering (DSDP)。在DSDP中我們將討論加入資料點及刪除資料點等情形,並且利用樹狀結構加速更新分群的結果,最後我們將同時從理論及實驗的角度證明DSDP的效能。
Data mining has become one of the most popular research topics in recent decades [2]. Patterns can be easily discovered by data analysis techniques. With new technological advances, data size is getting enormous, so that traditional statistical methods cannot handle the big data either. Moreover, with the rapid update frequency of the data, dealing with all data points is time-consuming if a large scale of data updates is needed. Therefore, it would be more effective if we focus on local properties of the data only. A dynamic algorithm which can cope with dynamic operations would become significantly important [3], [6].
Cluster analysis is one of the most common unsupervised learning skills in the field of machine learning [13]. It has been widely discussed in the literature, especially for statistical data analysis that aims to categorize numerous data into different classes by their similarity. In order to solve the clustering problem under massive data update, we develop a dynamic mechanism for clustering algorithm to improve its effectiveness.
In this study, we focus on density-based clustering approaches, such as DBSCAN, OPTICS and so on [9], [12]. In particular, the state-of-art density-based algorithm, called Density Peaks clustering (DP), was presented recently [1]. The idea of DP is to characterize cluster centers in a relatively higher density than their neighbors, and the centers have a relatively larger distance to the points with higher density. Although DP outperforms almost all the previous clustering algorithms from the practical perspective, there is still room for improvement, especially for its dynamic updates.
We design a dynamic model called Dynamic Seed-and-extension-based Density Peaks clustering (DSDP) for Seed-and-extension-based Density Peaks clustering (SDP), which improves DP not only in precision of the output cluster but also the quality of the selected centers. Note that point insertion and deletion is allowed in our dynamic model. We use a tree structure to select cluster centers and build the corresponding clusters simultaneously. We also present an index-based approach to update the required parameters. Finally, we demonstrate the effectiveness of the DSDP algorithm from both the theoretical and practical perspectives.
摘要-----I
Abstract-----II
誌謝-----III
目錄-----IV
圖表目錄-----V
第一章 緒論-----1
第二章 背景介紹-----2
2.1 Density Peaks Clustering (DP)-----2
2.2 Seed-and-extension-based Density Peaks clustering (SDP)-----5
第三章 研究方法-----8
3.1 動態參數更新-----8
3.1.1 區域密度更新-----8
3.1.2 CHD更新-----9
3.2 分群結果更新-----12
第四章 實驗結果-----17
4.1 實驗比較-----17
4.2 參數敏感度-----19
第五章 結論及未來展望-----21
參考資料-----22
[1] A. Rodriguez, A. Laio. (2014) Clustering by fast search and find of density peaks, Science 344, pp 1492-1496
[2] Berkhin P. (2006) A Survey of Clustering Data Mining Techniques. In: Kogan J., Nicholas C., Teboulle M. (eds) Grouping Multidimensional Data. Springer, Berlin, Heidelberg. pp 25-71
[3] Edwin Lughofer. (2012) A dynamic split-and-merge approach for evolving cluster models. Evolving Systems. pp 135-151
[4] Fabian Pedregosa, Gael Varoquaux, Alexandre Gramfort, Vincent Michel, Bertrand Thirion, Olivier Grisel, Mathieu Blondel, Peter Prettenhofer, Ron Weiss, Vincent Dubourg, Jake Vanderplas, Alexandre Passos, David Cournapeau, Matthieu Brucher, Matthieu Perrot, Edouard Duchesnay. (2011) Scikit-learn: Machine Learning in Python. Journal of Machine Learning Research 12. pp 2825-2830
[5] G. Karypis, Eui-Hong Han, V. Kumar. (1999) Chameleon: hierarchical clustering using dynamic modeling. Computer 32(8). pp 68-75
[6] Jing Gao, Liang Zhao, Zhikui Chen, Peng Li, Han Xu, Yueming Hu. (2016) ICFS: An Improved Fast Search and Find of Density Peaks Clustering Algorithm. IEEE 14th Intl Conf on DASC, 14th Intl Conf on PiCom, 2nd Intl Conf on Big Data Intelligence and Computing and Cyber Science and Technology Congress.
[7] J. MacQueen. (1967) Some methods for classification and analysis of multivariate observations. Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability, Volume 1: Statistics, Berkeley, Calif. University of California Press. pp 281-297
[8] L. Kaufman, P.J. Rousseeuw. (1990) Finding Groups in Data: an introduction to cluster analysis. Wiley.
[9] M. Ankerst, M. M. Breunig, H.-P. Kriegel, J. Sander. (1999) OPTICS: Ordering Points to Identify the Clustering Structure. In Proc. ACM SIGMOD Int. Conf. on Management of Data (SIGMOD'99). pp 49-60.
[10] Mark de Berg, Ade Gunawan, Marcel Roeloffzen. (2017) Faster DBScan and HDBScan in Low-Dimensional Euclidean Spaces. 28th International Symposium on Algorithms and Computation (ISAAC 2017)
[11] Mingjing Du, Shifei Ding, Hongjie Jia. (2016) Study on density peaks clustering based on k-nearest neighbors and principal component analysis. Knowledge-Based Systems Volume 99, pp 135-145
[12] M. Ester, H.-P. Kriegel, J. Sander, X. Xu. (1996) A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise. Proc. 2nd Int. Conf. on Knowledge Discovery and Data Mining. pp 226-231
[13] Saeed Aghabozorgi, Ali Seyed Shirkhorshidi, Teh Ying Wah. (2015) Time-series clustering—A decade review. Information Systems, Volume 53, pp 16-38
 
 
 
 
第一頁 上一頁 下一頁 最後一頁 top
* *