作者(外文):Huang, Ching-Chu
論文名稱(外文):Generalized Modularity Embedding
指導教授(外文):Chang, Cheng-Shang
口試委員(外文):Lee, Duan-Shin
Lin, Hwa-Chun
外文關鍵詞:Network EmbeddingModularityCommunity Detection
本論文中,我們提供了一個統一的架構來解決網路嵌入問題。網路嵌入問題的目標是將在高維空間中相似的點,投影到歐氏幾何空間中相近的點,進而學習到網路中節點的低維向量表示式。因為眾人對網路上哪些節點為“相似”可以有不同的看法,網路嵌入問題為一不適定問題。我們引入了一個機率架構來隨機取樣網路中的節點,進而將模組性嵌入延伸到可以由取樣網路(sampled graph)得到廣義的模組性嵌入(generalized modularity). 基於取樣網路,模組化的概念可以精確地被定義。廣義的模組性嵌入可以藉由應用不同的取樣方法,簡單地採用不同的看法,來解決網路嵌入問題。與網路嵌入問題相同,社群偵測問題也為一不適定問題。在社群偵測問題中,人們找尋大的模組值保存社群結構。我們的推導會展現出網路嵌入問題與社群偵測問題,這兩個問題在特定的限制下可以被化簡成相同的形式。廣義的模組性嵌入進而能夠代入多樣觀點,藉由應用不同取樣方法捕獲網路特徵。本論文提供了一種基於廣義的模組性嵌入,靈活且有效的網路嵌入方法。最後,我們在真實網路數據下進行實驗以說明我們方法的有效性。
In this thesis, we provide a unified framework of the network embedding problem. The embedding problem aims to map nodes that are similar to each other in a high- dimensional space to vectors in a Euclidean space that are close to each other. It then learns the low-dimensional representations of nodes in networks. The embedding problem is ill-posed since ones may hold different opinions about what kind of nodes are ”simi- lar”. Here we introduce a probabilistic framework to sample a network randomly, so as to extend modularity to generalized modularity, which can be obtained from the sampled graph. Based on sampled graphs, the notion of modularity can be explicitly defined. To solve the embedding problem, the generalized modularity can easily adopt different viewpoints by applying different sampling methods. Same as the embedding problem, the community detection problem is also an ill-posed problem. In the community detection problem, one looks for large values of modularity to preserve the community structure. Our derivation shows that the network embedding problem and the community detection problem can be formulated as similar forms under certain constraints. Generalized mod- ularity further brings a variety of viewpoints and captures characteristics by applying different sampling methods. This thesis provides a flexible and effective network embed- ding method based on the generalized modularity. Experiments are conducted on real datasets to illustrate the effectiveness of our approach.
Contents 1
List of Figures 3
1 Introduction 4
2 The probabilistic framework for generalized modularity embedding 8
2.1 The generalized modularity of a sampled graph . . . . . . . . . . . . . . 8
2.2 Community detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
2.3 Modularity embedding . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
3 Extensions and applications of generalized modularity embedding 20
3.1 Modularity embedding for data points in a semi-metric space . . . . . . . 20
3.2 Dimensionality reduction and connections to PCA . . . . . . . . . . . . . 23
3.3 The Laplacian eigenmaps as a special case of generalized modularity embedding
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
4 Nonnegative embedding 27
4.1 A softmax embedding/clustering algorithm . . . . . . . . . . . . . . . . . 28
4.2 Adding an annealing parameter . . . . . . . . . . . . . . . . . . . . . . . 29
4.3 Hardmax decision: assigning point u to the cluster with the largest positive
expected covariance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
5 Experiments 31
5.1 Settings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
5.1.1 Datasets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
5.1.2 Baseline algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . 33
5.1.3 Parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
5.1.4 Evaluation metrics . . . . . . . . . . . . . . . . . . . . . . . . . . 35
5.2 Results and analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
5.2.1 Performance on Six Clusters . . . . . . . . . . . . . . . . . . . . . 36
5.2.2 Performance on SBM . . . . . . . . . . . . . . . . . . . . . . . . . 37
5.2.3 Performance on Amazon Network . . . . . . . . . . . . . . . . . . 40
5.2.4 Performance on Flickr Network . . . . . . . . . . . . . . . . . . . 40
6 Conclusion and future work 44
